~/leocamacho.co

Get Around

🧠 EdinburghAI
Co-founder and President of my University's AI Society
🛠️ Projects
Side projects I've worked on
📝 Essays
Thoughts on AI, startups, and the future

Contact Me

📧 Email
💼 LinkedIn
🐦 Twitter

GPU

Made Jul 15, 2026modified Jul 15, 20261 min read

Why GPUs go Vroom:

Thousands of simple cores (Streaming Multiprocessors/Compute Units) designed for parallel tasks.
SIMT Architecture (Single Instruction, Multiple Threads): Warps/Wavefronts execute the same math on different data simultaneously.
High Bandwidth Memory (HBM): Stacked directly on the chip for massive data throughput.
Tensor Cores: Specialized hardware units hardwired to perform 4x4 matrix multiply-accumulate (MMA) operations in a single clock cycle.
Latency Hiding: Fast context-switching between thousands of threads keeps the math cores busy while waiting for memory fetches.

Graph View

Backlinks

CUDA Graphs
CUDA
GPU Profiling
Google TPU (Tensor Processing Units)
Model Parallelism

Created with Quartz v4.5.2 © 2026

GitHub
LinkedIn
Twitter