nemo-lens

Early development: This library is under active development. Expect breaking changes between releases.

Shared OpenTelemetry instrumentation library for the NVIDIA NeMo ecosystem (Megatron-LM, NeMo-RL, NeMo-Gym).

Provides unified tracing, metrics, and log bridging across distributed training jobs. Cheap when disabled — group-gated calls (managed_span, @trace_fn) cost only a single frozenset lookup when their span group is off. managed_span then yields None (its body still runs); @trace_fn just calls the wrapped function. (span_cm is always-on and not gated.) Only opentelemetry-api (no-op) is required at import time; the full SDK loads only on exporting ranks.

Install

pip install nemo-lens           # API only — no-op at runtime, no SDK overhead
pip install 'nemo-lens[sdk]'    # adds SDK + OTLP exporters, required on exporting ranks

Quickstart

from nemo.lens import NemoLensConfig, setup_telemetry, managed_span

config = NemoLensConfig.from_env()
handle = setup_telemetry(config, rank=rank, world_size=world_size)

try:
    for i in range(steps):
        with managed_span('step', 'train.step', iteration=i) as span:
            loss = train_step()
            if span:
                span.set_attribute('loss', loss)
finally:
    handle.shutdown()

Enable with environment variables:

NEMO_LENS_ENABLED=1
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
NEMO_LENS_SPAN_GROUPS=per_step   # includes the 'step' group used above (default={job,checkpoint,evaluate} omits it)

Three instrumentation primitives

Primitive	Use when
`managed_span(group, name, **attrs)`	Context manager; group-gated, yields `None` when disabled
`@trace_fn(group, name)`	Decorator; same gating, no re-indentation
`span_cm(name, tracer=...)`	Always-on context manager; use for top-level spans

Distributed training

By default only one rank exports (single_rank, last rank). Change with:

NEMO_LENS_EXPORT_STRATEGY=all_ranks            # every rank
NEMO_LENS_EXPORT_STRATEGY=sampled              # fraction via NEMO_LENS_EXPORT_SAMPLE_RATE
NEMO_LENS_EXPORT_STRATEGY=first_rank_per_node  # one rank per node (LOCAL_RANK=0)

Custom strategies (your own rank-selection logic) are supported via register_export_strategy — see docs/user-guide/custom-strategies.md.

Local observability stack

docker compose -f docker-compose.otel.yml up -d
# Jaeger   → http://localhost:16686
# Grafana  → http://localhost:3000
# Kibana   → http://localhost:5601

Development

git clone <repo-url> && cd lens
uv venv && uv pip install -e . --group dev
pre-commit install
pytest

Docs

Full documentation: cd docs && make serve (requires pip install --group docs -e .).

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
docs		docs
observability		observability
src/nemo		src/nemo
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.honeycomb.yml		docker-compose.honeycomb.yml
docker-compose.otel.yml		docker-compose.otel.yml
docker-compose.weave.yml		docker-compose.weave.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nemo-lens

Install

Quickstart

Three instrumentation primitives

Distributed training

Local observability stack

Development

Docs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

nemo-lens

Install

Quickstart

Three instrumentation primitives

Distributed training

Local observability stack

Development

Docs

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages