mstar.utils.profiler

mstar.utils.profiler#

Utilities for NVTX range annotations for profiling with nsys.

Functions

mark(name)

Emit an instant NVTX marker without CUDA synchronization.

nvtx_range(name, *[, synchronize])

Convenience context manager for range_push/range_pop.

range_pop(*[, synchronize])

Pop the current NVTX range, optionally syncing before the marker.

range_push(name, *[, synchronize])

Push an NVTX range, optionally syncing before the marker.

mstar.utils.profiler.mark(name)[source]#

Emit an instant NVTX marker without CUDA synchronization.

Parameters:

name (str)

Return type:

None

mstar.utils.profiler.nvtx_range(name, *, synchronize=False)[source]#

Convenience context manager for range_push/range_pop.

Parameters:
Return type:

Iterator[None]

mstar.utils.profiler.range_pop(*, synchronize=False)[source]#

Pop the current NVTX range, optionally syncing before the marker.

Same semantics as range_push — default is synchronize=False.

Parameters:

synchronize (bool)

Return type:

None

mstar.utils.profiler.range_push(name, *, synchronize=False)[source]#

Push an NVTX range, optionally syncing before the marker.

Default is synchronize=False so adding NVTX markers doesn’t serialize the execution. Set synchronize=True only when the caller specifically wants the range to extend over the GPU work it wraps (e.g. an ad-hoc benchmark of one kernel) — and remember that each synchronize=True call drains the entire default stream via torch.cuda.synchronize(), not just the wrapped kernel.

Parameters:
Return type:

None