mstar.utils.profiler#
Utilities for NVTX range annotations for profiling with nsys.
Functions
|
Emit an instant NVTX marker without CUDA synchronization. |
|
Convenience context manager for range_push/range_pop. |
|
Pop the current NVTX range, optionally syncing before the marker. |
|
Push an NVTX range, optionally syncing before the marker. |
- mstar.utils.profiler.mark(name)[source]#
Emit an instant NVTX marker without CUDA synchronization.
- Parameters:
name (str)
- Return type:
None
- mstar.utils.profiler.nvtx_range(name, *, synchronize=False)[source]#
Convenience context manager for range_push/range_pop.
- mstar.utils.profiler.range_pop(*, synchronize=False)[source]#
Pop the current NVTX range, optionally syncing before the marker.
Same semantics as
range_push— default issynchronize=False.- Parameters:
synchronize (bool)
- Return type:
None
- mstar.utils.profiler.range_push(name, *, synchronize=False)[source]#
Push an NVTX range, optionally syncing before the marker.
Default is
synchronize=Falseso adding NVTX markers doesn’t serialize the execution. Setsynchronize=Trueonly when the caller specifically wants the range to extend over the GPU work it wraps (e.g. an ad-hoc benchmark of one kernel) — and remember that eachsynchronize=Truecall drains the entire default stream viatorch.cuda.synchronize(), not just the wrapped kernel.