mstar.utils.profiler#

Utilities for NVTX range annotations for profiling with nsys.

Functions

`mark`(name)	Emit an instant NVTX marker without CUDA synchronization.
`nvtx_range`(name, *[, synchronize])	Convenience context manager for range_push/range_pop.
`range_pop`(*[, synchronize])	Pop the current NVTX range, optionally syncing before the marker.
`range_push`(name, *[, synchronize])	Push an NVTX range, optionally syncing before the marker.

mstar.utils.profiler.mark(name)[source]#

Emit an instant NVTX marker without CUDA synchronization.

Parameters:: name (str)
Return type:: None

mstar.utils.profiler.nvtx_range(name, *, synchronize=False)[source]#

Convenience context manager for range_push/range_pop.

Parameters:

name (str)
synchronize (bool)

Return type:

Iterator[None]

mstar.utils.profiler.range_pop(*, synchronize=False)[source]#

Pop the current NVTX range, optionally syncing before the marker.

Same semantics as range_push — default is synchronize=False.

Parameters:: synchronize (bool)
Return type:: None

mstar.utils.profiler.range_push(name, *, synchronize=False)[source]#

Push an NVTX range, optionally syncing before the marker.

Default is synchronize=False so adding NVTX markers doesn’t serialize the execution. Set synchronize=True only when the caller specifically wants the range to extend over the GPU work it wraps (e.g. an ad-hoc benchmark of one kernel) — and remember that each synchronize=True call drains the entire default stream via torch.cuda.synchronize(), not just the wrapped kernel.

Parameters:

name (str)
synchronize (bool)

Return type:

None

mstar.utils.profiler

Contents

mstar.utils.profiler#