Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
4765a5b
Fixed the issues of compilation
coketaste Sep 14, 2025
9c4b00e
Fixed the compatible issue and kernel structure
coketaste Sep 14, 2025
99e56e3
Fixed the module5
coketaste Sep 14, 2025
8c1b107
Fixed module5 issue
coketaste Sep 14, 2025
7d9a09d
Fixed the Makefile in moduel5
coketaste Sep 14, 2025
b02f2ee
Fixed the errors in module 6
coketaste Sep 15, 2025
e39cbc8
Fixed the warnings in examples and update Makefiles of module 7 and 9
coketaste Sep 19, 2025
e82b138
Fixed the logic of GPU detection
coketaste Sep 19, 2025
bf3bb54
Fixed the module8
coketaste Sep 19, 2025
13c79ac
Fixed the build of cuda examples
coketaste Sep 20, 2025
7f48be6
Updated the rocm to 7
coketaste Sep 20, 2025
1cc5575
Updated Dockerfile of rocm
coketaste Sep 20, 2025
b7e87e1
Updated the docs
coketaste Sep 20, 2025
c2bda49
Fixed and Debug rocm 7 docker image
coketaste Sep 21, 2025
7db63a8
Fixed the issues in module1 of ROCm
coketaste Sep 21, 2025
463fb4b
Updated the performance comparison hip
coketaste Sep 21, 2025
8838a78
Fixed the issues of examples of module 2 to 9
coketaste Sep 21, 2025
33ab78c
Fixed the examples of module2
coketaste Sep 21, 2025
37434c8
Fixed the error in texture memory hip
coketaste Sep 21, 2025
48ba308
Fixed the examples of module3
coketaste Sep 21, 2025
357d727
Fixed the module3
coketaste Sep 21, 2025
bbe90cb
Fixed the module 6 and 7
coketaste Sep 21, 2025
8aa45af
Fixed the errors in module 7
coketaste Sep 21, 2025
bd98398
Fixed the examples of module 7 and 8
coketaste Sep 21, 2025
7c62a78
Fixed the error in deep learning hip
coketaste Sep 21, 2025
a06a5a3
Fixed the module9
coketaste Sep 21, 2025
4bdf569
Fixed the make clean
coketaste Sep 21, 2025
cddfd42
Fixed the issues of including Thrust and MIOpen
coketaste Sep 21, 2025
fde7cef
Updated deep learning hip
coketaste Sep 21, 2025
14f7663
Fixed the module8 and module9 Makefile
coketaste Sep 21, 2025
7543909
Fixed the module 9
coketaste Sep 21, 2025
ea59485
Updated Makefile of module9
coketaste Sep 21, 2025
e3b6f95
Updated the content of each modules
coketaste Sep 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ docker-compose up -d cuda-dev # For NVIDIA GPUs
docker-compose up -d rocm-dev # For AMD GPUs

# Option 2: Native development
# Install CUDA Toolkit 12.9.1+ or ROCm 6.4.3+
# Install CUDA Toolkit 12.9.1+ or ROCm latest
# See modules/module1/README.md for detailed setup instructions

# Build all examples
Expand Down Expand Up @@ -241,8 +241,8 @@ When reporting bugs, please include:
### Environment Information
- **Operating System**: (Ubuntu 22.04, Windows 11, etc.)
- **GPU**: (RTX 4090, RX 7900 XTX, etc.)
- **Driver Version**: (NVIDIA 535.x, ROCm 6.4.3, etc.)
- **CUDA/HIP Version**: (12.9.1, 6.4.3, etc.)
- **Driver Version**: (NVIDIA 535.x, ROCm latest, etc.)
- **CUDA/HIP Version**: (12.9.1, 7.0, etc.)
- **Docker**: (if using containerized development)

### Bug Description
Expand Down
43 changes: 30 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![CUDA](https://img.shields.io/badge/CUDA-12.9.1-76B900?logo=nvidia)](https://developer.nvidia.com/cuda-toolkit)
[![ROCm](https://img.shields.io/badge/ROCm-6.4.3-red?logo=amd)](https://rocmdocs.amd.com/)
[![ROCm](https://img.shields.io/badge/ROCm-7.0-red?logo=amd)](https://rocmdocs.amd.com/)
[![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker)](https://www.docker.com/)
[![Examples](https://img.shields.io/badge/Examples-70%2B-green)](modules/)
[![Examples](https://img.shields.io/badge/Examples-71-green)](modules/)
[![CI](https://img.shields.io/badge/CI-GitHub%20Actions-2088FF?logo=github-actions)](https://github.com/features/actions)

**A comprehensive, hands-on educational project for mastering GPU programming with CUDA and HIP**
Expand Down Expand Up @@ -35,7 +35,7 @@
**GPU Programming 101** is a complete educational resource for learning modern GPU programming. This project provides:

- **9 comprehensive modules** covering beginner to expert topics
- **70+ working code examples** in both CUDA and HIP
- **71 working code examples** in both CUDA and HIP
- **Cross-platform support** for NVIDIA and AMD GPUs
- **Production-ready development environment** with Docker
- **Professional tooling** including profilers, debuggers, and CI/CD
Expand Down Expand Up @@ -197,10 +197,11 @@ This architectural knowledge is essential for writing efficient GPU code and is
|---------|-------------|
| 🎯 **Complete Curriculum** | 9 progressive modules from basics to advanced topics |
| πŸ’» **Cross-Platform** | Full CUDA and HIP support for NVIDIA and AMD GPUs |
| 🐳 **Docker Ready** | Complete containerized development environment |
| πŸ”§ **Production Quality** | Professional build systems, testing, and profiling |
| 🐳 **Docker Ready** | Complete containerized development environment with CUDA 12.9.1 & ROCm 7.0 |
| πŸ”§ **Production Quality** | Professional build systems, auto-detection, testing, and profiling |
| πŸ“Š **Performance Focus** | Optimization techniques and benchmarking throughout |
| 🌐 **Community Driven** | Open source with comprehensive contribution guidelines |
| πŸ§ͺ **Advanced Libraries** | Support for Thrust, MIOpen, and production ML frameworks |

## πŸš€ Quick Start

Expand All @@ -217,14 +218,14 @@ cd gpu-programming-101

# Inside container: verify GPU access and start learning
/workspace/test-gpu.sh
cd modules/module1 && make && ./01_vector_addition_cuda
cd modules/module1 && make && ./build/01_vector_addition_cuda
```

### Option 2: Native Installation
For direct system installation:

```bash
# Prerequisites: CUDA 11.0+ or ROCm 5.0+, GCC 7+, Make
# Prerequisites: CUDA 12.0+ or ROCm 7.0+, GCC 9+, Make

# Clone and build
git clone https://github.com/AIComputing101/gpu-programming-101.git
Expand Down Expand Up @@ -265,7 +266,7 @@ Our comprehensive curriculum progresses from fundamental concepts to production-
| [**Module 8**](modules/module8/) | πŸš€ Expert | 10-12h | **Domain Applications** | ML, Scientific Computing | 4 |
| [**Module 9**](modules/module9/) | πŸš€ Expert | 6-8h | **Production Deployment** | Libraries, Integration, Scaling | 4 |

**πŸ“ˆ Progressive Learning Path: 70+ Examples β€’ 50+ Hours β€’ Beginner to Expert**
**πŸ“ˆ Progressive Learning Path: 71 Examples β€’ 50+ Hours β€’ Beginner to Expert**

### Learning Progression

Expand Down Expand Up @@ -313,7 +314,7 @@ Module 5: Performance Tuning
### Software Requirements

#### Operating System Support
- **Linux** (Recommended): Ubuntu 22.04 LTS, RHEL 8/9, SLES 15 SP5
- **Linux** (Recommended): Ubuntu 22.04/24.04 LTS, RHEL 8/9, SLES 15 SP5
- **Windows**: Windows 10/11 with WSL2 recommended for optimal compatibility
- **macOS**: macOS 12+ (Metal Performance Shaders for basic GPU compute)

Expand All @@ -322,7 +323,7 @@ Module 5: Performance Tuning
- **Driver Requirements**:
- Linux: 550.54.14+ for CUDA 12.4+
- Windows: 551.61+ for CUDA 12.4+
- **ROCm Platform**: 6.0+ (Docker uses ROCm 6.4.3)
- **ROCm Platform**: 7.0+ (Docker uses ROCm 7.0)
- **Driver Requirements**: Latest AMDGPU-PRO or open-source AMDGPU drivers
- **Kernel Support**: Linux kernel 5.4+ recommended

Expand All @@ -338,6 +339,8 @@ Module 5: Performance Tuning
- **Profiling**: Nsight Compute, Nsight Systems (NVIDIA), rocprof (AMD)
- **Debugging**: cuda-gdb, rocgdb, compute-sanitizer
- **Libraries**: cuBLAS, cuFFT, rocBLAS, rocFFT (for advanced modules)
- **ML Libraries**: Thrust (NVIDIA), MIOpen (AMD) for deep learning applications
- **System Management**: NVML (NVIDIA), ROCm SMI (AMD) for hardware monitoring

### Performance Expectations by Hardware Tier

Expand Down Expand Up @@ -381,28 +384,42 @@ Experience the full development environment with zero setup:
- πŸ“¦ Isolated and reproducible builds
- 🧹 Easy cleanup when done

**Container Specifications:**
- **CUDA**: NVIDIA CUDA 12.9.1 on Ubuntu 22.04
- **ROCm**: AMD ROCm 7.0 on Ubuntu 24.04
- **Libraries**: Production-ready toolchains with debugging support

**[πŸ“– Complete Docker Guide β†’](docker/README.md)**

## πŸ”§ Build System

Our advanced build system features automatic GPU vendor detection and optimized configurations:

### Project-Wide Commands
```bash
make all # Build all modules
make all # Build all modules with auto-detection
make test # Run comprehensive tests
make clean # Clean all artifacts
make check-system # Verify GPU setup
make check-system # Verify GPU setup and dependencies
make status # Show module completion status
```

### Module-Specific Commands
```bash
cd modules/module1/examples
make # Build all examples in module
make # Build all examples with vendor auto-detection
make test # Run module tests
make profile # Performance profiling
make debug # Debug builds with extra checks
```

### Advanced Build Features
- **Automatic GPU Detection**: Detects NVIDIA/AMD hardware and builds accordingly
- **Production Optimization**: `-O3`, fast math, architecture-specific optimizations
- **Debug Support**: Full debugging symbols and validation checks
- **Library Management**: Automatic detection of optional dependencies (NVML, MIOpen)
- **Cross-Platform**: Single Makefile supports both CUDA and HIP builds

## Performance Expectations

| Module Level | Typical GPU Speedup | Memory Efficiency | Code Quality |
Expand Down
24 changes: 12 additions & 12 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ This directory contains Docker configurations for comprehensive GPU programming
## πŸš€ Latest Versions (2025)

- **CUDA**: 12.9.1 (Latest stable release)
- **ROCm**: 6.4.3 (Latest stable release)
- **ROCm**: 7.0 (Latest stable release)
- **Ubuntu**: 22.04 LTS
- **Nsight Tools**: 2025.1.1 (with fallback to 2024.6.1)
- **Nsight Tools**: 2025.1.1

## πŸš€ Quick Start

Expand Down Expand Up @@ -58,10 +58,10 @@ docker/

### CUDA Development Container
**Image**: `gpu-programming-101:cuda`
**Base**: `nvidia/cuda:12.4-devel-ubuntu22.04`
**Base**: `nvidia/cuda:12.9.1-devel-ubuntu22.04`

**Features**:
- CUDA 12.4 with development tools
- CUDA 12.9.1 with development tools
- NVIDIA Nsight Systems & Compute profilers
- Python 3 with scientific libraries
- GPU monitoring and debugging tools
Expand All @@ -73,17 +73,17 @@ docker/

### ROCm Development Container
**Image**: `gpu-programming-101:rocm`
**Base**: `rocm/dev-ubuntu-22.04:6.0`
**Base**: `rocm/dev-ubuntu-22.04:7.0-complete`

**Features**:
- ROCm 6.0 with HIP development environment
- ROCm 7.0 with HIP development environment
- Cross-platform GPU programming (AMD/NVIDIA)
- ROCm profiling tools (rocprof, roctracer)
- Python 3 with scientific libraries

**GPU Requirements**:
- AMD GPU with ROCm support (RX 580+, MI series)
- AMD drivers with ROCm 6.0+
- AMD drivers with ROCm 7.0+

## πŸ”§ Container Usage

Expand Down Expand Up @@ -251,7 +251,7 @@ NVIDIA_VISIBLE_DEVICES=all
ROCM_PATH=/opt/rocm
HIP_PATH=/opt/rocm/hip
HIP_PLATFORM=amd
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=11.0.0
```

## πŸ›‘οΈ Security Considerations
Expand Down Expand Up @@ -282,10 +282,10 @@ nvidia-smi # For NVIDIA
rocm-smi # For AMD

# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:12.4-base nvidia-smi
docker run --rm --gpus all nvidia/cuda:12.9.1-base nvidia-smi

# Check container runtime
docker run --rm --device=/dev/kfd rocm/dev-ubuntu-22.04 rocminfo
docker run --rm --device=/dev/kfd rocm/dev-ubuntu-22.04:7.0 rocminfo
```

**"Container build fails"**
Expand All @@ -297,8 +297,8 @@ docker system prune -a
sudo apt update && sudo apt upgrade docker-ce docker-compose

# Check base image availability
docker pull nvidia/cuda:12.4-devel-ubuntu22.04
docker pull rocm/dev-ubuntu-22.04:6.0
docker pull nvidia/cuda:12.9.1-devel-ubuntu22.04
docker pull rocm/dev-ubuntu-22.04:7.0-complete
```

**"Permission denied errors"**
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GPU Programming 101 - Docker Compose Configuration
# Supports both NVIDIA CUDA and AMD ROCm platforms
# Updated for CUDA 12.9.1 and ROCm 6.4.3 (2025)
# Updated for CUDA 12.9.1 and ROCm 7.0 (2025)

services:
# NVIDIA CUDA Development Environment
Expand Down Expand Up @@ -83,7 +83,7 @@ services:
environment:
- HIP_VISIBLE_DEVICES=0
- HSA_OVERRIDE_GFX_VERSION=11.0.0
- ROCM_VERSION=6.4.3
- ROCM_VERSION=7.0

# Development tools container (CPU-only for general development)
dev-tools:
Expand Down
83 changes: 6 additions & 77 deletions docker/rocm/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,73 +1,14 @@
# GPU Programming 101 - ROCm Development Container
# Based on AMD's official ROCm 6.4.3 development image (latest stable as of 2025)
# Based on AMD's official ROCm development image - used as-is for maximum compatibility

FROM rocm/dev-ubuntu-22.04:6.4.3
FROM rocm/dev-ubuntu-24.04:7.0-complete

# Metadata
LABEL maintainer="GPU Programming 101"
LABEL description="ROCm/HIP development environment for GPU programming course"
LABEL version="2.0"
LABEL rocm.version="6.4.3"
LABEL ubuntu.version="22.04"

# Avoid interactive prompts during package installation
ARG DEBIAN_FRONTEND=noninteractive

# Install essential development tools for GPU programming
RUN apt-get update && apt-get install -y \
# Core development tools
build-essential \
cmake \
git \
wget \
curl \
vim \
nano \
htop \
tree \
# Minimal Python for basic scripting (not data science)
python3 \
python3-pip \
python3-dev \
# Additional utilities
pkg-config \
software-properties-common \
# Debugging and profiling tools
gdb \
valgrind \
strace \
# Network tools
net-tools \
iputils-ping \
&& rm -rf /var/lib/apt/lists/*

# Install core ROCm development packages (keep minimal)
RUN apt-get update && apt-get install -y \
# Core ROCm packages for GPU programming
hip-dev \
hip-samples \
hipblas-dev \
# ROCm profiling tools (essential for performance work)
rocprofiler-dev \
roctracer-dev \
&& rm -rf /var/lib/apt/lists/*

# Install minimal Python packages for basic development (no heavy data science libs)
RUN pip3 install --no-cache-dir \
numpy \
matplotlib

# Set up ROCm environment variables
ENV ROCM_PATH=/opt/rocm
ENV HIP_PATH=/opt/rocm/hip
ENV PATH=${ROCM_PATH}/bin:${HIP_PATH}/bin:${PATH}
ENV LD_LIBRARY_PATH=${ROCM_PATH}/lib:${HIP_PATH}/lib:${LD_LIBRARY_PATH}
ENV HIP_PLATFORM=amd
ENV HSA_OVERRIDE_GFX_VERSION=11.0.0
ENV ROCM_VERSION=6.4.3

# Verify HIP compiler installation (skip rocminfo as no GPU during build)
RUN hipcc --version
LABEL rocm.version="latest"
LABEL ubuntu.version="24.04"

# Create development workspace
WORKDIR /workspace
Expand All @@ -76,7 +17,7 @@ RUN mkdir -p /workspace/{projects,samples,output}
# Copy course materials (will be mounted as volume in practice)
COPY . /workspace/gpu-programming-101/

# Set up convenient aliases and environment
# Set up convenient aliases and environment for the course
RUN echo 'alias ll="ls -alF"' >> /root/.bashrc && \
echo 'alias la="ls -A"' >> /root/.bashrc && \
echo 'alias l="ls -CF"' >> /root/.bashrc && \
Expand Down Expand Up @@ -159,17 +100,5 @@ echo "=== All tests completed ==="\n' > /workspace/test-gpu.sh

RUN chmod +x /workspace/test-gpu.sh

# Install HIP samples for learning and reference
RUN cd /workspace && \
if [ -d "/opt/rocm/hip/samples" ]; then \
cp -r /opt/rocm/hip/samples ./hip-samples; \
else \
git clone https://github.com/ROCm-Developer-Tools/HIP-Examples.git hip-examples; \
fi

# Default command
CMD ["/bin/bash"]

# Health check to verify HIP compiler access (will only work when GPU is available)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD hipcc --version > /dev/null 2>&1 || exit 1
CMD ["/bin/bash"]
2 changes: 1 addition & 1 deletion docker/scripts/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ main() {
if [ "$pull" = true ]; then
log "Pulling base images..."
docker pull nvidia/cuda:12.4-devel-ubuntu22.04 || warning "Failed to pull CUDA base image"
docker pull rocm/dev-ubuntu-22.04:6.0 || warning "Failed to pull ROCm base image"
docker pull rocm/dev-ubuntu-24.04:latest || warning "Failed to pull ROCm base image"
fi

local success_count=0
Expand Down
4 changes: 1 addition & 3 deletions docker/scripts/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ run_rocm() {
# Set up GPU access for AMD
local detected_gpu=$(detect_gpu)
if [ "$detected_gpu" = "amd" ] && [ "$no_gpu_requested" = false ]; then
gpu_args="--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined"
gpu_args="--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video"
log "Enabling AMD GPU access"
elif [ "$no_gpu_requested" = true ]; then
log "GPU access explicitly disabled with --no-gpu"
Expand All @@ -247,8 +247,6 @@ run_rocm() {
-v "$PROJECT_ROOT:/workspace/gpu-programming-101:rw"
-v "gpu101-rocm-home:/root"
-w "/workspace/gpu-programming-101"
-e HIP_VISIBLE_DEVICES=0
-e HSA_OVERRIDE_GFX_VERSION=10.3.0
)

# Add port mapping
Expand Down
Loading