When optimising a GPU, you do a profiling step to see where most of the chip is spending its time. This is about optimising High Performance Machine Learning.

2 Way of profiling:

  • TensorBoard: Gives high level view (“Waiting for CPU to load images from hard drive”)
  • Nsight Compute (NVIDIA): Microscopic view (check if you’re using registers efficiently)