Documentation

Documentation

Getting Started with ManifoldScript

A comprehensive guide to installing, configuring, and running your first ManifoldScript programs across different GPU platforms.

Prerequisites

System Requirements

  • Operating System: Linux (Ubuntu 20.04+), macOS (12.0+), or Windows 10+ (via WSL2)
  • GPU: NVIDIA GPU with CUDA 11.0+, Apple Silicon with Metal 3, or AMD GPU with ROCm 5.0+
  • Memory: Minimum 8GB RAM, 4GB GPU memory
  • Compiler: GCC 9.0+, Clang 12.0+, or MSVC 2019+

Installation

NVIDIA CUDA Installation

Linux (Ubuntu/Debian)

bash
1# Install CUDA toolkit
2wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
3sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
4wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
5sudo dpkg -i cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
6sudo cp /var/cuda-repo-ubuntu2004-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
7sudo apt-get update
8sudo apt-get -y install cuda
9
10# Install ManifoldScript
11curl -fsSL https://get.manifoldscript.dev/cuda | bash
12
13# Verify installation
14manifoldscript --version
15manifoldscript --check-gpu

Apple Metal Installation

macOS (Apple Silicon)

bash
1# Install Xcode command line tools
2xcode-select --install
3
4# Install ManifoldScript
5curl -fsSL https://get.manifoldscript.dev/metal | bash
6
7# Verify installation
8manifoldscript --version
9manifoldscript --check-gpu

AMD ROCm Installation

Linux (Ubuntu/Debian)

bash
1# Install ROCm
2sudo apt update
3sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
4sudo usermod -a -G render,video $LOGNAME
5wget https://repo.radeon.com/amdgpu-install/5.0/ubuntu/focal/amdgpu-install_5.0.50000-1_all.deb
6sudo apt install ./amdgpu-install_5.0.50000-1_all.deb
7sudo apt update
8sudo apt install rocm-dev
9
10# Install ManifoldScript
11curl -fsSL https://get.manifoldscript.dev/rocm | bash
12
13# Verify installation
14manifoldscript --version
15manifoldscript --check-gpu

Your First Program

1. Create a Simple Tensor Program

Let's create a simple matrix multiplication program:

manifoldscript
1# Create a file called hello.ms
2# Simple matrix multiplication
3tensor A[1024, 1024] = random(1024, 1024);
4tensor B[1024, 1024] = random(1024, 1024);
5tensor C[1024, 1024] = A @ B;
6
7# Print result
8debug("Matrix multiplication completed");
9debug("Result shape: " + shape(C));
10debug("Result sum: " + sum(C));

2. Compile and Run

bash
1# Compile for your GPU platform
2manifoldscript compile hello.ms
3
4# Run the compiled program
5./hello
6
7# Expected output:
8# Matrix multiplication completed
9# Result shape: [1024, 1024]
10# Result sum: 262144.0

Understanding ManifoldScript

Tensor Declaration

manifoldscript
1# Declare tensors with explicit shapes
2tensor A[100, 200] = zeros(100, 200); # Zero matrix
3tensor B[100, 200] = ones(100, 200); # Ones matrix
4tensor C[100, 200] = random(100, 200); # Random values
5tensor D[100, 200] = identity(100); # Identity matrix
6
7# Infer shapes from expressions
8tensor E = A + B; # Shape: [100, 200]
9tensor F = A @ C'; # Shape: [100, 100]

Operations

manifoldscript
1# Element-wise operations
2tensor G = A + B; # Addition
3tensor H = A - B; # Subtraction
4tensor I = A * B; # Element-wise multiplication
5tensor J = A / B; # Element-wise division
6
7# Matrix operations
8tensor K = A @ B'; # Matrix multiplication
9tensor L = transpose(A); # Transpose
10tensor M = inverse(A); # Matrix inverse
11
12# Reduction operations
13tensor sum_A = sum(A); # Sum of all elements
14tensor max_A = max(A); # Maximum value
15tensor mean_A = mean(A); # Mean value

GPU Optimization

Memory Management

manifoldscript
1# Optimize memory usage
2tensor A[1000, 1000] = random(1000, 1000);
3tensor B[1000, 1000] = random(1000, 1000);
4
5# Use temporary tensors efficiently
6temp T1 = A @ B; # Temporary tensor
7T1 = T1 + 1.0; # Reuse temporary
8
9# Explicit memory management
10free(A); # Free memory when done
11free(B);

Performance Tuning

manifoldscript
1# Use optimized operations
2pragma optimize_level = 3;
3pragma use_shared_memory = true;
4pragma max_threads_per_block = 1024;
5
6# Block operations for better performance
7block_size = 32;
8tensor C[1024, 1024] = blocked_matmul(A, B, block_size);

Common Patterns

Neural Network Layer

manifoldscript
1# Simple neural network layer
2function dense_layer(input[batch, in_features],
3 weights[in_features, out_features],
4 bias[out_features]) {
5 return input @ weights + bias;
6}
7
8# Usage
9tensor X[64, 784] = random(64, 784); # Input batch
10tensor W[784, 256] = random(784, 256); # Weights
11tensor b[256] = zeros(256); # Bias
12tensor output = dense_layer(X, W, b);

Convolution Operation

manifoldscript
1# Convolution operation
2function conv2d(input[H, W, C_in],
3 filters[K, K, C_in, C_out],
4 stride = 1,
5 padding = 0) {
6 H_out = (H + 2 * padding - K) / stride + 1;
7 W_out = (W + 2 * padding - K) / stride + 1;
8
9 tensor output[H_out, W_out, C_out];
10
11 # Convolution implementation
12 for h = 0 to H_out-1:
13 for w = 0 to W_out-1:
14 for c_out = 0 to C_out-1:
15 sum = 0.0;
16 for k1 = 0 to K-1:
17 for k2 = 0 to K-1:
18 for c_in = 0 to C_in-1:
19 sum += input[h*stride+k1, w*stride+k2, c_in] *
20 filters[k1, k2, c_in, c_out];
21 output[h, w, c_out] = sum;
22
23 return output;
24}

Debugging and Testing

Debugging Tools

bash
1# Enable debug mode
2manifoldscript compile --debug hello.ms
3
4# Profile execution
5manifoldscript compile --profile hello.ms
6
7# Check memory usage
8manifoldscript compile --memory-profile hello.ms
9
10# Generate intermediate representations
11manifoldscript compile --emit-ir --emit-asm hello.ms

Testing Your Code

manifoldscript
1# Test tensor operations
2test "matrix multiplication" {
3 tensor A[2, 2] = [[1, 2], [3, 4]];
4 tensor B[2, 2] = [[5, 6], [7, 8]];
5 tensor C = A @ B;
6
7 expected = [[19, 22], [43, 50]];
8 assert(allclose(C, expected, 1e-6));
9}
10
11# Performance test
12test "performance benchmark" {
13 tensor A[1024, 1024] = random(1024, 1024);
14 tensor B[1024, 1024] = random(1024, 1024);
15
16 start_time = time();
17 tensor C = A @ B;
18 end_time = time();
19
20 assert(end_time - start_time < 100.0); # Less than 100ms
21}

Next Steps