๐Ÿงญ Public Launch Post

LinkedIn Post

If youโ€™re building AI models and working with PyTorch, GPU kernels, or performance tuningโ€ฆ you probably know the pain of profiling.

  • Running a trace on a remote GPUโ€ฆ
  • Downloading 200MB+ filesโ€ฆ
  • Opening Nsight or Perfetto locallyโ€ฆ
  • Still not knowing which line of code triggered the slowdown.
  • Then branching your code to sprinkle profiling markers, re-running everything, and waiting 10โ€“30 minutes for each iteration.

The whole workflow feels like wading through butter.

Weโ€™ve felt this firsthand at nCompass โ€” and it slowed us down more than it should. So we built the tool we always wished existed: ๐—ป๐—ฐ๐—ฝ๐—ฟ๐—ผ๐—ณ.

๐—ป๐—ฐ๐—ฝ๐—ฟ๐—ผ๐—ณ ๐—ฏ๐—ฟ๐—ถ๐—ป๐—ด๐˜€ ๐—š๐—ฃ๐—จ + ๐—”๐—œ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ณ๐—ถ๐—น๐—ถ๐—ป๐—ด ๐—ฑ๐—ถ๐—ฟ๐—ฒ๐—ฐ๐˜๐—น๐˜† ๐—ถ๐—ป๐˜๐—ผ ๐—ฉ๐—ฆ๐—–๐—ผ๐—ฑ๐—ฒ/๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟ:
:sparkles: Add TorchRecord / NVTX markers ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ฒ๐—ฑ๐—ถ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ
:sparkles: View traces ๐—ถ๐—ป๐˜€๐—ถ๐—ฑ๐—ฒ ๐˜๐—ต๐—ฒ ๐—œ๐——๐—˜ with an integrated Perfetto viewer
:sparkles: Jump from ๐—ฎ๐—ป๐˜† ๐˜๐—ฟ๐—ฎ๐—ฐ๐—ฒ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜ โ†’ ๐˜๐—ต๐—ฒ ๐—ฒ๐˜…๐—ฎ๐—ฐ๐˜ ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—น๐—ถ๐—ป๐—ฒ
:sparkles: View traces where theyโ€™re stored โ€” no more shuttling files around

If youโ€™ve ever thought, โ€œWhy is this GPU op so slow?โ€ or โ€œWhere is this memory allocation coming from?โ€ โ€” ncprof makes the answer visible.

:light_bulb: ๐—œ๐—ป๐˜€๐˜๐—ฎ๐—น๐—น ๐—ถ๐—ป ๐—ฉ๐—ฆ๐—–๐—ผ๐—ฑ๐—ฒ / ๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟ: Search for nCompass in the Extensions Marketplace
:toolbox: ๐—ฆ๐——๐—ž & ๐—ฒ๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ๐˜€, Pip
:blue_book: ๐——๐—ผ๐—ฐ๐˜€
:speech_balloon: ๐——๐—ถ๐˜€๐—ฐ๐˜‚๐˜€๐˜€๐—ถ๐—ผ๐—ป

We would love to hear what you think as you try it out!