Welcome to CSE 5304 / ECE 6095 - High-Performance Parallel Computing with GPUs

Electrical and Computer Engineering
University of Connecticut

Spring 2026

Mondays 11am - 2pm, ITE 330

Course Syllabus

Instructor

Omer Khan
Office: ITEB 447
Email: khan@uconn.edu

Announcements

Exam is scheduled in class on April 27, 2026 (all lectures)

Lectures

Lecture 1: Why Study Parallel CPU and GPU Processors? - PDF

Amdahl's Law; Parallel Taxonomy; Single-threaded Performance Challenge; Throughput view of Performance: Data + Vector Parallelism, Latency hiding, Specialization; Parallel Processor Challenges

Lecture 2: CPU and GPU Parallel Programming - PDF

Shared Memory; Single Program Multiple Data (SPMD); CPU and GPU Parallel Programming Principles; Parallel Primitive: Map

Lecture 3: CPU-GPU Memory System - PDF

Memory Hierarchy: Bandwidth, Latency, Coalescing, Banking/Partitioning, Synchronization, Reuse; Parallel Primitive: Stencil

Lecture 4: Parallelism in Memory System - PDF

FLOPS and Roofline Models; Tiling/Blocking and Data Reuse; Overlap Compute and Memory: Compilers, Multithreading, Bulk/Vector data access, Software Pipelining, Software Specialization, Asynchronous data transfers; Block Scheduling; Overlap GPU and CPU

Lecture 5: Parallelism with Efficiency I - PDF

Work and Step Efficiency Metrics; Parallel Primitives: Matrix Multiply, Reduce, Scan Algorithms

Lecture 6: Parallelism with Efficiency II - PDF

Parallel Primitives: Work-efficient Scan, Sort, Merge, Histogram, Dynamic Algorithms

Lecture 7: Synchronization and Memory Consistency Models - PDF

Synchronization and Memory Model; Sequential Consistency: Producer-Consumer communication, Locks/Semaphores, Challenges with caches and out-of-order execution, Coherence; Total-Store Order (TSO) Consistency Model: Compiler and hardware Memory Fences, Data-Race Free (DRF) programs; Weak Memory Consistency Models

Lecture 8: Memory Consistency and Coherence - PDF

GPU Weak Consistency PTX Model: Compiler and hardware memory fences, Axioms and litmus tests, Memory ordering and visibility; Cache Coherence Problem; Hardware Cache Coherence: Snooping and Directory Protocols, Implementations and Optimizations

Lecture 9: Exploiting Parallelism with Specialization - PDF

Data Supply Challenge; Complex Compute and Memory Instructions: Asynchronous memory operations, Matrix-Multiuply Accumulate (MMA/Tensor Core); Throughput Optimizations: Software pipelining, Asynchronous data transfer, Caches, Offloading memory accesses, Decoupled compute and memory, Register reallocations, Cooperative cache hierarchy

Lecture 10: Specialization to Accelerate Compute and Communication - PDF

Ray Tracing Accleration in GPUs, Tensor Cores beyond Matrix Multiply; Message Passing Model: Explicit memory transfer for efficiency; Acclerating Communication: Hardware send and receive instructions, Overlap compute and data

Lecture 11: Caches - PDF

Data Movement Challenges: Energy and Latency, Data parallel primitives and communication; Shared vs. Private Cache Hierarchy Tradeoffs, Non-Uniform Cache Access (NUCA); Cache Interference; Caches and Coherence

Lecture 12: Interconnection Networks - PDF

Topology; Flow Control; Routing; Addressing; Switch Architecture; Performance: Latency and Bandwidth

Programming Assignments (due at HuskyCT specified deadline)

All programming assignment code materials are provided at GitHub repository: https://github.com/cag-uconn/cse5304
Lab 1a: SIMD Mandelbrot PDF (due via HuskyCT Feb 2, 2026, 8:59am EST)
Lab 1b: Massively Parallel Mandelbrot PDF (due via HuskyCT Feb 9, 2026, 8:59am EST)
Lab 2: Wave Simulation PDF (due via HuskyCT Feb 23, 2026, 8:59am EST)
Lab 3: Matrix Multiply - Tiling and Reuse PDF (due via HuskyCT March 2, 2026, 8:59am EST)
Lab 4: Matrix Multiply - Improved Scheduling PDF (due via HuskyCT March 9, 2026, 8:59am EST)
Lab 5: Run-Length Compression PDF (due via HuskyCT March 30, 2026, 8:59am EST)
Lab 6: Matrix Multiply - Tensor Cores PDF (due via HuskyCT April 6, 2026, 8:59am EST)
PROJECT: Matrix Multiply with Hopper Architecture PDF (due via HuskyCT May 4, 2026, 8:59am EST)