ManifoldScript Docs
Documentation
AMD ROCm Documentation
Complete guide for setting up and optimizing ManifoldScript on AMD GPUs with ROCm.
Prerequisites
- • AMD GPU with RDNA or CDNA architecture
- • ROCm 5.7 or later
- • Linux distribution (Ubuntu 22.04, RHEL 9, or SUSE 15)
- • AMDGPU-Pro driver or open-source driver
- • At least 8GB of GPU memory
Installation
# Add AMD repository sudo apt-get update sudo apt-get install wget wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60200-1_all.deb sudo apt-get install ./amdgpu-install_6.0.60200-1_all.deb # Install ROCm sudo amdgpu-install --usecase=rocm --no-dkms # Install ManifoldScript with ROCm support curl -fsSL https://get.manifoldscript.dev/rocm | bash # Verify installation manifoldscript --version manifoldscript --check-rocm
ROCm Architecture
ManifoldScript ROCm Architecture
graph TD
A[ManifoldScript Source] --> B[ROCm Frontend]
B --> C[HIP Code Generation]
C --> D[HSACO Assembly]
D --> E[ISA Binary]
E --> F[GPU Execution]
G[Memory Manager] --> H[ROCm Memory]
H --> I[Fine-Grained Memory]
I --> J[Compute Units]
K[Grid Manager] --> L[Work Groups]
L --> M[Wavefronts]
M --> N[Work Items]
N --> F
O[Multi-GPU] --> P[XGMI Bridge]
P --> Q[Infinity Fabric]
Q --> F
classDef frontend fill:#e1f5fe
classDef compilation fill:#f3e5f5
classDef execution fill:#e8f5e9
classDef memory fill:#fff3e0
class B,C,D,E compilation
class F,J execution
class G,H,I memory
class K,L,M,N,O,P,Q frontend
Multi-GPU with XGMI
Multi-GPU Configuration with XGMI
graph TB
subgraph "Host System"
A[ManifoldScript Runtime] --> B[ROCm Runtime]
B --> C[GPU 0]
B --> D[GPU 1]
B --> E[GPU 2]
end
subgraph "GPU 0"
C --> F[HIP Context]
F --> G[Memory Pool]
G --> H[Compute Queue]
end
subgraph "GPU 1"
D --> I[HIP Context]
I --> J[Memory Pool]
J --> K[Compute Queue]
end
subgraph "GPU 2"
E --> L[HIP Context]
L --> M[Memory Pool]
M --> N[Compute Queue]
end
H -->|XGMI| K
K -->|XGMI| N
N -->|XGMI| H
P[XGMI Hub] -->|High Speed| Q[Infinity Fabric]
Q -->|Low Latency| H
Q -->|Low Latency| K
Q -->|Low Latency| N
classDef host fill:#e3f2fd
classDef gpu fill:#ffebee
classDef link fill:#f3e5f5
class A,B host
class C,D,E,F,G,H,I,J,K,L,M,N gpu
class P,Q link
Performance Optimization
Memory Optimization
- • Use fine-grained memory system
- • Enable coarse-grained for large transfers
- • Optimize LDS (Local Data Share) usage
- • Use texture memory for 2D access
Kernel Optimization
- • Maximize wavefront occupancy
- • Use VALU for scalar operations
- • Enable MFMA for matrix operations
- • Optimize register usage
Code Example
(manifold rocm_ops :requirements (:rocm :hip) :types tensor - rocm_tensor :action (gemm_acceleration :parameters (?A ?B ?C - tensor) :pre (:rocm (?A :tensor) (?B :tensor)) :eff (:rocm (?C :tensor) (rocm_gemm ?A ?B ?C) (rocm_mfma_optimize ?C) ) ) ) ;; Compile with ROCm optimization manifoldscript compile --target=rocm --arch=gfx1100 rocm_ops.ms