When optimising a GPU, you do a profiling step to see where most of the chip is spending its time. This is about optimising High Performance Machine Learning.
2 Way of profiling:
- TensorBoard: Gives high level view (“Waiting for CPU to load images from hard drive”)
- Nsight Compute (NVIDIA): Microscopic view (check if you’re using registers efficiently)