BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//ISISLab - ECPv6.3.3//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:ISISLab
X-ORIGINAL-URL:https://www.isislab.it
X-WR-CALDESC:Eventi per ISISLab
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:Europe/Rome
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:20260329T010000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:20261025T010000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Europe/Rome:20260227T030000
DTEND;TZID=Europe/Rome:20260227T160000
DTSTAMP:20260501T183724
CREATED:20260224T174935Z
LAST-MODIFIED:20260224T175017Z
UID:50787-1772161200-1772208000@www.isislab.it
SUMMARY:Seminario: "Enabling Portable Collective Communication on Heterogeneous GPU Systems" di Salvatore Sirica e  "Vettorizzazione Esplicita del Prodotto Matriciale su RISC-V con RVV" di Sergio Guastaferro
DESCRIPTION:Speaker 1: Salvatore SiricaTitolo: "Enabling Portable Collective Communication on Heterogeneous GPU Systems" di Salvatore Sirica \n\n\n\nAbstract:Many distributed HPC applications and data-parallel AI training pipelines depend on MPI-style communication patterns. As systems scale to many GPUs and nodes\, collective operations like Allreduce become a primary performance bottleneck\, directly impacting scalability. In practice\, the highest-performing GPU collectives are typically provided by vendor-specific libraries such as NVIDIA NCCL and AMD RCCL\, which exploit hardware and topology-aware optimizations but reduce portability across heterogeneous clusters. \n\n\n\nIntel oneAPI Collective Communications Library (oneCCL) is uniquely positioned to mitigate this issue thanks to its SYCL-based interface. However\, deep in its implementation it is tightly coupled to Intel Level Zero\, effectively preventing execution on non-Intel GPUs. \n\n\n\nThis work enables portable GPU collectives within oneCCL by introducing NCCL and RCCL as pluggable backends while preserving the existing oneCCL user-facing API. The integration reuses oneCCL’s bootstrap mechanisms to initialize vendor communicators and dispatches collectives on native GPU streams through SYCL interoperability. Micro-benchmarks on the UNISA-HPC cluster with NVIDIA A100 GPUs show that oneCCL+NCCL achieves performance close to native NCCL for key collectives across a broad range of message sizes and data types. \n\n\n\nSpeaker 2: Sergio GuastaferroTitolo: "Vettorizzazione Esplicita del Prodotto Matriciale su RISC-V con RVV" di Sergio Guastaferro \n\n\n\nAbstract:Il prodotto matriciale rappresenta un'operazione fondamentale nel calcolo ad alte prestazioni\, alla base di framework per l'intelligenza artificiale\, simulazioni scientifiche e analisi dati. Con l'emergere di RISC-V come architettura aperta e modulare\, diventa cruciale valutare l'efficacia delle sue estensioni vettoriali (RVV) nell'accelerare operazioni numericamente intensive come il prodotto matriciale. \n\n\n\nQuesta tesi analizza sistematicamente le performance del prodotto matriciale su processori RISC-V dotati di supporto vettoriale\, confrontando la vettorizzazione automatica offerta dai compilatori e una vettorizzazione esplicita realizzata mediante intrinseche RVV. L'implementazione proposta si basa su microkernel ottimizzati con formulazione a prodotto esterno\, tiling progressivo e gestione dinamica della lunghezza vettoriale a runtime. Un'attenzione particolare è dedicata all'esplorazione del parametro LMUL\, che regola il parallelismo vettoriale in relazione alla configurazione hardware (VLEN). \n\n\n\nI risultati sperimentali\, ottenuti sul processore Spacemit X60 a 8 core\, dimostrano che la vettorizzazione esplicita supera significativamente l'approccio automatico\, raggiungendo speedup fino a 12.5× rispetto a una baseline non ottimizzata. L'analisi evidenzia inoltre un impatto non lineare di LMUL sulle performance\, con un valore ottimale (LMUL=2 per VLEN=256) che massimizza l'utilizzo della banda di memoria e riduce i cache miss. Il lavoro fornisce indicazioni pratiche per lo sviluppo di codice ad alte prestazioni su RISC-V e mette in luce le attuali limitazioni dei compilatori nell'ottimizzare automaticamente codice vettoriale su questa architettura.
URL:https://www.isislab.it/event/seminario-enabling-portable-collective-communication-on-heterogeneous-gpu-systems-di-salvatore-sirica-e-vettorizzazione-esplicita-del-prodotto-matriciale-su-risc-v-con-rvv-di-sergio-guastafe/
ATTACH;FMTTYPE=image/png:https://www.isislab.it/wp-content/uploads/2026/02/seminar-27-02-2026-.png
END:VEVENT
END:VCALENDAR