If youโre building AI models and working with PyTorch, GPU kernels, or performance tuningโฆ you probably know the pain of profiling.
- Running a trace on a remote GPUโฆ
- Downloading 200MB+ filesโฆ
- Opening Nsight or Perfetto locallyโฆ
- Still not knowing which line of code triggered the slowdown.
- Then branching your code to sprinkle profiling markers, re-running everything, and waiting 10โ30 minutes for each iteration.
The whole workflow feels like wading through butter.
Weโve felt this firsthand at nCompass โ and it slowed us down more than it should. So we built the tool we always wished existed: ๐ป๐ฐ๐ฝ๐ฟ๐ผ๐ณ.
๐ป๐ฐ๐ฝ๐ฟ๐ผ๐ณ ๐ฏ๐ฟ๐ถ๐ป๐ด๐ ๐๐ฃ๐จ + ๐๐ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ฝ๐ฟ๐ผ๐ณ๐ถ๐น๐ถ๐ป๐ด ๐ฑ๐ถ๐ฟ๐ฒ๐ฐ๐๐น๐ ๐ถ๐ป๐๐ผ ๐ฉ๐ฆ๐๐ผ๐ฑ๐ฒ/๐๐๐ฟ๐๐ผ๐ฟ:
Add TorchRecord / NVTX markers ๐๐ถ๐๐ต๐ผ๐๐ ๐ฒ๐ฑ๐ถ๐๐ถ๐ป๐ด ๐๐ผ๐๐ฟ ๐๐ผ๐๐ฟ๐ฐ๐ฒ ๐ฐ๐ผ๐ฑ๐ฒ
View traces ๐ถ๐ป๐๐ถ๐ฑ๐ฒ ๐๐ต๐ฒ ๐๐๐ with an integrated Perfetto viewer
Jump from ๐ฎ๐ป๐ ๐๐ฟ๐ฎ๐ฐ๐ฒ ๐ฒ๐๐ฒ๐ป๐ โ ๐๐ต๐ฒ ๐ฒ๐
๐ฎ๐ฐ๐ ๐๐ผ๐๐ฟ๐ฐ๐ฒ ๐น๐ถ๐ป๐ฒ
View traces where theyโre stored โ no more shuttling files around
If youโve ever thought, โWhy is this GPU op so slow?โ or โWhere is this memory allocation coming from?โ โ ncprof makes the answer visible.
๐๐ป๐๐๐ฎ๐น๐น ๐ถ๐ป ๐ฉ๐ฆ๐๐ผ๐ฑ๐ฒ / ๐๐๐ฟ๐๐ผ๐ฟ: Search for nCompass in the Extensions Marketplace
๐ฆ๐๐ & ๐ฒ๐
๐ฎ๐บ๐ฝ๐น๐ฒ๐, Pip
๐๐ผ๐ฐ๐
๐๐ถ๐๐ฐ๐๐๐๐ถ๐ผ๐ป
We would love to hear what you think as you try it out!