🧭 Public Launch Post

vinay · 24 November 2025 10:04

If you’re building AI models and working with PyTorch, GPU kernels, or performance tuning… you probably know the pain of profiling.

Running a trace on a remote GPU…
Downloading 200MB+ files…
Opening Nsight or Perfetto locally…
Still not knowing which line of code triggered the slowdown.
Then branching your code to sprinkle profiling markers, re-running everything, and waiting 10–30 minutes for each iteration.

The whole workflow feels like wading through butter.

We’ve felt this firsthand at nCompass — and it slowed us down more than it should. So we built the tool we always wished existed: 𝗻𝗰𝗽𝗿𝗼𝗳.

𝗻𝗰𝗽𝗿𝗼𝗳 𝗯𝗿𝗶𝗻𝗴𝘀 𝗚𝗣𝗨 + 𝗔𝗜 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗽𝗿𝗼𝗳𝗶𝗹𝗶𝗻𝗴 𝗱𝗶𝗿𝗲𝗰𝘁𝗹𝘆 𝗶𝗻𝘁𝗼 𝗩𝗦𝗖𝗼𝗱𝗲/𝗖𝘂𝗿𝘀𝗼𝗿:
Add TorchRecord / NVTX markers 𝘄𝗶𝘁𝗵𝗼𝘂𝘁 𝗲𝗱𝗶𝘁𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝘀𝗼𝘂𝗿𝗰𝗲 𝗰𝗼𝗱𝗲
View traces 𝗶𝗻𝘀𝗶𝗱𝗲 𝘁𝗵𝗲 𝗜𝗗𝗘 with an integrated Perfetto viewer
Jump from 𝗮𝗻𝘆 𝘁𝗿𝗮𝗰𝗲 𝗲𝘃𝗲𝗻𝘁 → 𝘁𝗵𝗲 𝗲𝘅𝗮𝗰𝘁 𝘀𝗼𝘂𝗿𝗰𝗲 𝗹𝗶𝗻𝗲
View traces where they’re stored — no more shuttling files around

If you’ve ever thought, “Why is this GPU op so slow?” or “Where is this memory allocation coming from?” — ncprof makes the answer visible.

𝗜𝗻𝘀𝘁𝗮𝗹𝗹 𝗶𝗻 𝗩𝗦𝗖𝗼𝗱𝗲 / 𝗖𝘂𝗿𝘀𝗼𝗿: Search for nCompass in the Extensions Marketplace
𝗦𝗗𝗞 & 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀, Pip
𝗗𝗼𝗰𝘀
𝗗𝗶𝘀𝗰𝘂𝘀𝘀𝗶𝗼𝗻

We would love to hear what you think as you try it out!