feat(customizer): add RL backend in customizer plugin#479
Conversation
Signed-off-by: anubhutiv <anubhutiv@nvidia.com>
cc4e23d to
bae7dad
Compare
📝 WalkthroughWalkthroughAdds NeMo-RL build targets, plugin submission wiring, compiler/service schemas, dataset and prompt preparation, training runtime support, and file I/O/model-entity task entrypoints. ChangesNeMo-RL support stack
Sequence Diagram(s)sequenceDiagram
participant TrainingRunner
participant DistributedContext
participant NemoRLBackend
participant RayClusterBootstrap
TrainingRunner->>DistributedContext: from_env()
TrainingRunner->>NemoRLBackend: compile_config()
TrainingRunner->>NemoRLBackend: execute_training()
NemoRLBackend->>RayClusterBootstrap: run_with_driver()
TrainingRunner->>NemoRLBackend: find_best_checkpoint()
TrainingRunner->>NemoRLBackend: process_checkpoint()
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docker/Dockerfile.nmp-rl-base (1)
12-45: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDon’t publish
nmp-rl-baseas root.
docker-bake.hcltags this stage directly, so this is a shipped image, not just an internal builder. Add a non-rootUSERhere and let child images switch back to root only for the install/chown steps that need it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docker/Dockerfile.nmp-rl-base` around lines 12 - 45, The nmp-rl-base stage is being published directly, so it should not remain running as root. In the Dockerfile.nmp-rl-base stage, add a non-root USER before the image is finalized, and only switch back to root in downstream child images for the install or chown steps that require it; keep the existing VIRTUAL_ENV, chmod, and vllm removal logic in nmp-rl-base, but ensure the final shipped stage uses a non-root runtime user.Source: Linters/SAST tools
🟠 Major comments (36)
services/rl/src/nmp/rl/tasks/model_entity/run.py-154-163 (1)
154-163: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winLet transient fileset lookup failures reach the retry wrapper.
This
except Exceptionconverts API timeouts/500s fromfilesets.retrieve()intoModelEntityCreationError, so@retryoncreate_model_entity()never retries them. Only translate permanent lookup failures here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/model_entity/run.py` around lines 154 - 163, The broad exception handling around fileset validation in create_model_entity() is wrapping transient SDK/API failures into ModelEntityCreationError, preventing the retry decorator from retrying them. Update the try/except around self.sdk.files.filesets.retrieve() so only permanent “not found” or access-denied cases are translated to ModelEntityCreationError, and let timeouts/5xx/other transient exceptions propagate out of the fileset validation block.services/rl/src/nmp/rl/tasks/file_io/run.py-141-181 (1)
141-181: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRecreate progress state on each retry.
DownloadStats/UploadStatsandCompositeCallbackare built once, then reused across tenacity retries. After a partial attempt,BaseSingleFileCallback.close()has already incremented counts/bytes, so the retry double-counts progress and can leave final totals above the real transfer. Move stats/callback construction inside the retried path or reset them per attempt.Also applies to: 218-251
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/file_io/run.py` around lines 141 - 181, The download progress state is reused across retries, so a partial attempt can leave DownloadStats and the callback chain carrying stale counts into the next try. Update the retried flow in run.py around the download path that builds CompositeCallback, TqdmPerFileDownloadCallback, and FileDownloadProgressCallback so these objects are created fresh per attempt or explicitly reset before each retry. Apply the same per-attempt reset pattern to the matching upload path referenced by the comment so progress totals stay accurate after tenacity retries.services/rl/src/nmp/rl/tasks/model_entity/run.py-424-428 (1)
424-428: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDon’t log the raw model-entity config.
config.model_dump_json()will dump inlinedeployment_config.additional_envsverbatim. That can leak deployment secrets into task logs.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/model_entity/run.py` around lines 424 - 428, The startup logging in run() is dumping the full model-entity config, which can expose deployment secrets via deployment_config.additional_envs. Update the logger.info call that prints config.model_dump_json(indent=2) so it logs a redacted or summarized config instead, and ensure any sensitive nested fields are excluded or masked before logging.services/rl/src/nmp/rl/tasks/model_entity/run.py-148-169 (1)
148-169: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winUse the configured workspace for full-model outputs.
ModelEntityTaskConfigcarries a target workspace, but this path hardcodesself.job_ctx.workspaceand passes that intomodels.create()/models.update(). Cross-workspace submissions will create or update the wrong model entity.Also applies to: 226-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/model_entity/run.py` around lines 148 - 169, The full-model creation/update path is using the job context workspace instead of the configured target workspace, so cross-workspace runs write to the wrong place. Update the model entity flow in model_entity/run.py to use ModelEntityTaskConfig.workspace for the full-model output path, including the call chain through _create_or_update_full_entity and any models.create/models.update logic, while keeping the fileset workspace fallback separate.services/rl/src/nmp/rl/tasks/model_entity/run.py-236-242 (1)
236-242: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winHonor
config.trust_remote_code.The full-entity request always copies
base_me.trust_remote_code, so the task config flag never affects full or merged model entities. That can persist the wrong trust policy.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/model_entity/run.py` around lines 236 - 242, The full-entity request body is always using base_me.trust_remote_code, so the task-level trust policy from config is ignored. Update the request construction in the model entity run flow to use config.trust_remote_code for full/merged model entity requests, and make sure the request_body assembly in this section consistently honors the config flag instead of copying the base model value.services/rl/src/nmp/rl/schemas.py-26-39 (1)
26-39: 🗄️ Data Integrity & Integration | 🟠 MajorForbid extra fields in the nested training schemas.
ParallelismParamsand_TrainingBasestill accept unknown keys, so typos liketraining.learningrateorparallelism.num_nodeare silently dropped and the job falls back to defaults. Addmodel_config = ConfigDict(extra="forbid")to both models.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/schemas.py` around lines 26 - 39, The nested training schemas currently accept unknown keys, so typos in submitted config fields are silently ignored and defaults are used instead. Update both ParallelismParams and _TrainingBase in schemas.py to forbid extra fields by adding model_config = ConfigDict(extra="forbid"), ensuring invalid keys under training and parallelism are rejected during validation.services/rl/src/nmp/rl/entities/values.py-19-21 (1)
19-21: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRemove unsupported
PPOfrom the enum.
TrainingStepConfig.training.training_typeaccepts this value, but the backend only compilesDPOandGRPOand raises on anything else. That turns validation into a runtime failure in the training container.Suggested fix
class TrainingType(str, Enum): """RL training algorithm. DPO is wired; the rest are headroom.""" DPO = "dpo" GRPO = "grpo" - PPO = "ppo"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/entities/values.py` around lines 19 - 21, The TrainingType enum in values.py still exposes PPO even though the backend only supports DPO and GRPO, so remove the unsupported PPO member from the enum. Update the TrainingType definition and any related validation or mappings that reference PPO so TrainingStepConfig.training.training_type only accepts the supported values and cannot reach a runtime failure in the training container.services/rl/src/nmp/rl/app/jobs/compiler.py-263-263 (1)
263-263: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDon't log the full submitted spec at INFO.
Line 263 serializes the entire request, and the same
integrationsobject is later mined for secret envs. That creates a normal-path secret leak into job logs.Suggested fix
- logger.info("Compiling NeMo-RL DPO job to PlatformJobSpec: %s", job_spec.model_dump_json(indent=2)) + logger.info( + "Compiling NeMo-RL DPO job to PlatformJobSpec", + extra={"model": job_spec.model, "output_model": job_spec.output.name}, + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/app/jobs/compiler.py` at line 263, The INFO log in compiler.py is serializing the full submitted job spec, which can leak secret values from the later-processed integrations payload. Update the logging in the job compilation flow around the compiler entry point to avoid dumping job_spec.model_dump_json(...) at INFO; instead log only non-sensitive high-level context (for example job identifiers or summary fields) and keep any detailed spec output out of normal-path logs.services/rl/src/nmp/rl/tasks/training/errors/parser.py-76-79 (1)
76-79: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winParse non-
*Errorexception names too.This regex never extracts
TimeoutExpired, so subprocess timeouts cannot hit theTrainingTimeoutErrorrule inerror_rules.yamland fall back to a generic runtime failure. Match generic exception class names and strip any module prefix.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/errors/parser.py` around lines 76 - 79, The exception parser in _EXCEPTION_RE only matches names ending in Error or Exception, so it misses generic exception types like TimeoutExpired. Update the regex in parser.py to capture any exception class name, and make sure the parsing logic strips any module prefix before applying the training error rules so TrainingTimeoutError can match subprocess timeout failures.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_config.py-44-47 (1)
44-47: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winDo not advertise GRPO from a stub.
NemoRLBackend.compile_config()routes GRPO jobs here, but this path still prints the full config/job context and then raisesNotImplementedError. Every GRPO submission fails here. Either implement it or reject GRPO before backend selection.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_config.py` around lines 44 - 47, GRPO is still routed into a stub in NemoRLBackend.compile_config() via the GRPO config path, which prints the full training_config/job_ctx and then fails. Remove the stub behavior in the GRPO config compiler and either implement the actual GRPO config compilation in the relevant grpo_config handling or make backend selection reject GRPO before it reaches this path; also eliminate the debug prints from the stub path.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/backend.py-126-135 (1)
126-135: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winRestore the process env after each run.
These writes mutate global state, and
MLFLOW_URIis only set when present, never cleared. A reused worker can send a later job to the previous job’s tracking backend or run with stale settings. Build a per-run env or save/restore these keys infinally.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/backend.py` around lines 126 - 135, The env setup in the Nemo RL backend mutates global process state and leaves stale values behind between runs; update the run setup around the existing environment writes in backend.py to save and restore the affected keys, or construct a per-run env passed into the execution path. Make sure the logic around BASE_LOG_DIR, GPUS_PER_NODE, and especially MLFLOW_URI in the backend entrypoint is restored in a finally block so a reused worker cannot inherit prior run settings.services/rl/src/nmp/rl/tasks/training/errors/error_rules.yaml-415-418 (1)
415-418: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winNarrow the generic OOM matcher.
contains: "out of memory"will also match host/Ray/system OOMs, so those failures get classified asCudaErrorand users get GPU-specific remediation. Keep the CUDA-specific rules and drop or remap this catch-all.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/errors/error_rules.yaml` around lines 415 - 418, The generic OOM matcher in error_rules.yaml is too broad because the contains: "out of memory" rule can catch host, Ray, or system OOMs and misclassify them as CudaError. Narrow this rule by removing it or changing it to a CUDA-specific matcher, and keep the existing CUDA-focused rules so only true GPU OOMs map to CudaError; the relevant symbols are the OOM keyword rule under the training error rules and the CudaError mapping.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/backend.py-157-165 (1)
157-165: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winPut the original signal handlers back.
This installs process-wide SIGINT/SIGTERM handlers and never restores them. After
run_with_driver()returns, later signals in the same process still hit this stale closure and can terminate the wrong driver or raiseSystemExitunexpectedly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/backend.py` around lines 157 - 165, The cleanup signal handler in bootstrap.run_with_driver setup is replacing the process-wide SIGINT/SIGTERM handlers and never restoring them, leaving a stale closure behind after the driver exits. Update the cleanup logic around the cleanup function and the signal.signal registrations to save the original handlers before installing them, then restore those original handlers after run_with_driver(str(driver_path), driver_args) completes (preferably in a finally block), so later signals do not call the old closure.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/no_override_requirements.txt-1-8 (1)
1-8: 🗄️ Data Integrity & Integration | 🟠 MajorThese markers don't pin base-image versions.
sys_platform == 'never'drops the requirement entirely, so it doesn't protectray,starlette,cryptography,mlflow, orprometheus-clientfrom being replaced if another dependency pulls them in.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/no_override_requirements.txt` around lines 1 - 8, The no_override_requirements markers are not preserving base-image packages because sys_platform == 'never' removes the requirement instead of constraining it. Update the entries in no_override_requirements.txt so each preserved package (prometheus-client, starlette, cryptography, mlflow, ray) is explicitly pinned or otherwise constrained in a way that prevents pip from upgrading/replacing the base-image version, using the same package names already listed in the file.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/ray_bootstrap.py-246-248 (1)
246-248: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winNamespace or clear
ENDEDper run.A stale
log_dir / "ENDED"from a previous retry makes new head/worker startup return immediately. Use a task/attempt-scoped marker path, or clear it with startup coordination before any rank checks it.Also applies to: 357-360, 392-395
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/ray_bootstrap.py` around lines 246 - 248, The ENDED marker in ended_file is currently shared across retries because it always returns log_dir/"ENDED", so a stale file can short-circuit new runs. Update the marker handling in ray_bootstrap.py to make the path task/attempt-scoped (for example via the existing run/task identifiers used by the bootstrap flow) or ensure the marker is cleared during startup coordination before any rank checks it; apply the same fix to the related startup/check paths referenced by ended_file and the retry coordination logic.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_driver.py-55-104 (1)
55-104: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftDo not ship an executable driver that only raises
NotImplementedError.If this module is reachable via backend selection or
python -m, GRPO jobs fail immediately. Gate it out of runtime wiring or implement the driver before exposing it.I can help generate the implementation plan or open an issue for the missing GRPO driver.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_driver.py` around lines 55 - 104, The GRPO driver module is currently a stub and will fail at runtime because `load_config`, `get_environment`, `run_grpo_training`, and `main` all raise `NotImplementedError`. Either fully implement these entry points in `grpo_driver.py` or remove/gate the module from any backend selection and `python -m` wiring so it cannot be executed until complete. Make sure the runtime path that imports or dispatches to `main` only exposes a working driver, and use the existing function names to locate the missing implementation.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_driver.py-55-64 (1)
55-64: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDo not dump full config or job context to stdout.
Hydra overrides, resolved configs, and
NMPJobContextcan contain internal URLs, identifiers, or credentials. Log redacted fields only, preferably at debug level.Also applies to: 93-94
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_driver.py` around lines 55 - 64, The startup logging in dpo_driver should not print raw Hydra overrides or the fully resolved config to stdout. Update the logic around parse_hydra_overrides, OmegaConf.to_container, and the final pprint output so only redacted or minimal fields are logged, ideally through a debug logger instead of print. Also review the related job-context logging referenced in the same flow and ensure NMPJobContext values are sanitized before being emitted.services/rl/src/nmp/rl/tasks/training/distributed.py-91-99 (1)
91-99: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winSerialize stale-barrier cleanup before workers continue.
Workers can create or wait on current-run markers before rank 0 executes
shutil.rmtree, so coordinator cleanup can delete live markers and deadlock barriers. Add a startup-ready marker, or use a per-attempt barrier directory instead of deleting a shared path.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/distributed.py` around lines 91 - 99, The stale-barrier cleanup in the distributed startup path can race with workers creating or waiting on current-run markers, so `shutil.rmtree` may delete live state. Update the initialization flow in `distributed.py` around `barrier_dir` handling in the `world_size > 1` block so cleanup is serialized before any worker proceeds, either by adding a startup-ready marker handshake or by switching `barrier_dir` to a per-attempt path instead of deleting a shared directory.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/ray_bootstrap.py-450-450 (1)
450-450: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winPut deadlines on Ray CLI subprocesses.
These calls can block indefinitely, making retry, worker-wait, and cleanup timeouts ineffective. Add
timeout=and handlesubprocess.TimeoutExpired; make cleanup unable to keep the process alive after its timeout.Also applies to: 485-485, 529-536, 676-680, 697-703, 707-710
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/ray_bootstrap.py` at line 450, The Ray CLI subprocess calls in ray_bootstrap.py can block forever and bypass the existing retry/cleanup timeouts. Update the subprocess.run invocations in the relevant bootstrap/cleanup paths (including the call sites around result handling and the Ray cleanup helpers) to pass an explicit timeout, and add subprocess.TimeoutExpired handling so the code can abort and continue cleanup safely. Make sure the cleanup logic cannot leave the child process running after its timeout by terminating/killing it and waiting for it to exit, using the existing bootstrap and cleanup functions as the entry points to patch.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/helpsteer3.py-69-71 (1)
69-71: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winDo not silently drop split overrides.
train_split/val_splitare public args, but the HuggingFace fallback ignores them. Pass them through if the base dataset supports them, or reject them whentrain_data_pathis absent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/helpsteer3.py` around lines 69 - 71, The HuggingFace fallback in HelpSteer3Dataset currently ignores the public train_split and val_split args when train_data_path is not provided. Update the fallback path in the dataset initializer to either pass these split overrides through to the base dataset constructor if it supports them, or explicitly validate and reject unsupported overrides before calling super().__init__().services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/tulu3.py-71-73 (1)
71-73: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winDo not silently drop split overrides.
train_split/val_splitare public args, but the HuggingFace fallback ignores them. Pass them through if the base dataset supports them, or reject them whentrain_data_pathis absent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/tulu3.py` around lines 71 - 73, The HuggingFace fallback in the Tulu3 preference dataset constructor is ignoring the public train_split and val_split arguments by calling the base initializer without them. Update the fallback path around the super().__init__ call in the Tulu3 dataset class to either forward train_split and val_split to the base dataset if supported, or explicitly reject those overrides when train_data_path is not provided instead of silently dropping them.services/rl/src/nmp/rl/tasks/training/templates/llama-3.1-instruct.jinja-20-24 (1)
20-24: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winSerialize tool metadata as JSON instead of concatenating it.
tnameortdesccontaining quotes/newlines produces invalid JSON in the prompt.Proposed fix
- {%- set tparams = t.function.parameters | tojson %} {{- "Use the function '" + tname + "' to '" + tdesc + "':\n" }} - {{- '{"name": "' + tname + '", "description": "' + tdesc + '", "parameters": ' + tparams + '}\n\n' }} + {{- ({"name": tname, "description": tdesc, "parameters": t.function.parameters} | tojson) + "\n\n" }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/templates/llama-3.1-instruct.jinja` around lines 20 - 24, The tool metadata rendering in the llama-3.1-instruct template is building JSON by string concatenation, which breaks when tname or tdesc contains quotes/newlines. Update the template logic around the tname, tdesc, and tparams block to serialize the tool payload as proper JSON instead of manually assembling strings, and keep the existing function metadata fields intact in the rendered prompt.services/rl/src/nmp/rl/tasks/training/templates/llama-3.1-instruct.jinja-10-37 (1)
10-37: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winGuard undefined tool variables.
In Jinja, an undefined value is not
none, so this branch can emit the tool preamble even whentools/tool_choicewere never provided.Proposed fix
-{%- if tools is not none and tool_choice is not none %} +{%- if tools is defined and tools is not none and tool_choice is defined and tool_choice is not none %}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/templates/llama-3.1-instruct.jinja` around lines 10 - 37, The tool-preamble branch in llama-3.1-instruct.jinja only checks for none, so it can still render when tools or tool_choice are undefined. Update the conditional guarding the system/user preamble to explicitly handle undefined values before entering the tools loop, using the existing tools and tool_choice symbols, so the template only emits function-calling instructions when both inputs are actually provided.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py-357-372 (1)
357-372: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winDetect dataset class per split before wiring NeMo-RL loaders.
Validation only proves each row matches the broad DPO union. Here the loader class is inferred from
prepared.train_fileonce and then reused for validation too, so mixed-format train shards or a validation file in another supported DPO format compile cleanly and then hit the wrong NeMo-RL loader at runtime. Detect the schema for each prepared file and reject mismatches, or emit split-specificdataset_namevalues if NeMo-RL supports that.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py` around lines 357 - 372, The dataset loader is currently inferred once from prepared.train_file and reused for validation, which can assign the wrong NeMo-RL class to a split. Update the DPO config logic around detect_dpo_schema_name and _dataset_spec to determine dataset_name separately for each prepared file (train and validation), and verify the split schemas are compatible before building the loader config. If NeMo-RL allows it, pass split-specific dataset_name values; otherwise reject mismatched formats early with a clear error.services/rl/src/nmp/rl/tasks/training/datasets/preparation.py-366-373 (1)
366-373: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winStream merges instead of loading each shard at once.
inp.read()pulls the whole dataset shard into memory before writing it back out. Large training files can OOM the worker during config preparation. Copy incrementally and only track whether the last line ended with\n.Suggested fix
with open(output_file, "w", encoding="utf-8") as out: for f in files: with open(f, "r", encoding="utf-8") as inp: - content = inp.read() - out.write(content) - # Ensure newline between files - if content and not content.endswith("\n"): + last_line: str | None = None + for last_line in inp: + out.write(last_line) + if last_line and not last_line.endswith("\n"): out.write("\n")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/preparation.py` around lines 366 - 373, The merge logic in preparation.py currently uses the dataset shard read path in the write loop, which loads each file fully into memory before appending it to the output. Update the file-copying flow in the shard merge section to stream data incrementally instead of calling inp.read(), and use the existing output handling to track whether the last chunk ended with a newline so the separator behavior stays correct. Keep the change localized to the merge loop that iterates over files and writes to output_file.services/rl/src/nmp/rl/tasks/training/datasets/preparation.py-406-408 (1)
406-408: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winGuard tiny datasets before auto-splitting.
A 1-row file produces
val_size == 1, an empty train split, andcompile_dpo_config()later fails on the generated empty train file. Clamp validation to at mostlen(lines) - 1and fail early when the file is too small to split.Suggested fix
with open(train_file, "r", encoding="utf-8") as f: lines = [line for line in f if line.strip()] + + if len(lines) < 2: + raise DatasetFormatError("Need at least 2 samples to create a validation split") # Shuffle for reproducibility (important for multi-node!) # Uses global seed if not explicitly provided random.seed(seed) random.shuffle(lines) - val_size = max(1, int(len(lines) * val_ratio)) + val_size = min(max(1, int(len(lines) * val_ratio)), len(lines) - 1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/preparation.py` around lines 406 - 408, Guard the auto-split logic in preparation.py so tiny datasets do not produce an empty train file. In the code that sets val_size and builds val_lines/train_lines, clamp the validation count to at most len(lines) - 1 and add an early failure path when len(lines) is too small to split safely. Use the existing split logic around val_size, val_lines, and train_lines to keep the train output non-empty before compile_dpo_config() consumes it.services/rl/src/nmp/rl/tasks/training/datasets/schemas.py-292-305 (1)
292-305: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winReject empty
tool_calls.
tool_calls=[]passes this validator, butservices/rl/src/nmp/rl/tasks/training/templates/nemotron-3.1.jinjaLines 28-41 treat any non-Nonevalue as a tool-call block and never close it when the list is empty. That yields malformed prompts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/schemas.py` around lines 292 - 305, The SFTChatMessage validator in check_has_content_or_thinking_or_tool_calls currently accepts tool_calls=[] because it only checks for None; update it to reject empty tool call lists so only non-empty tool_calls are allowed. Keep the existing content/thinking rules intact, and make sure the validation logic in SFTChatMessage treats tool_calls the same way the nemotron-3.1.jinja template does when deciding whether a tool-call block should exist.services/rl/src/nmp/rl/tasks/training/datasets/schemas.py-366-369 (1)
366-369: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winRequire at least one chat message.
messages=[]validates here, butservices/rl/src/nmp/rl/tasks/training/templates/nemotron-3.1.jinjaLines 1-6 andservices/rl/src/nmp/rl/tasks/training/templates/nemotron-3.3.jinjaLines 2-7 indexmessages[0]immediately. Empty rows pass validation and then crash during template rendering.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/schemas.py` around lines 366 - 369, Require at least one chat message in the schema used by the Nemotron training dataset. Update the messages field in the relevant model/class in schemas.py so empty lists are rejected during validation, matching the assumptions in the nemotron-3.1.jinja and nemotron-3.3.jinja templates that access messages[0]. Keep the fix localized to the schema definition for the conversation object so invalid rows fail early instead of crashing during template rendering.services/rl/src/nmp/rl/tasks/training/datasets/schemas.py-268-271 (1)
268-271: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
thinkingis accepted without any renderer.This schema allows messages with
thinkingand nocontent, butnemotron-3.1.jinjaLines 42-46,nemotron-3.3.jinjaLines 12-17, andphi-4.jinjaLines 6-10 all rendermessage['content']only. Valid rows will render incorrectly or fail.Also applies to: 292-305
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/schemas.py` around lines 268 - 271, The message schema currently allows `thinking` without any rendering support, so update the dataset/message handling to either reject rows where `thinking` is present but `content` is missing or explicitly map `thinking` into the rendered prompt path. Make the fix in the message schema/validation flow and in the Jinja renderers (`nemotron-3.1.jinja`, `nemotron-3.3.jinja`, `phi-4.jinja`) so they use the same source field consistently rather than only `message['content']`; use the `thinking` and message schema definitions in `schemas.py` to locate the affected logic.services/rl/src/nmp/rl/tasks/training/datasets/schemas.py-351-351 (1)
351-351: 🎯 Functional Correctness | 🟠 MajorFix the chat schema name/tag mismatch.
get_sft_dataset_discriminator()returnsSFTChatDatasetItemSchemaformessages, but the class and union tag areSFTPChatDatasetItemSchema, so chat rows won’t resolve to that branch. Update the class/tag names to match.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/datasets/schemas.py` at line 351, The chat dataset discriminator and schema class name are mismatched, so `get_sft_dataset_discriminator()` cannot resolve `messages` rows to the intended branch. Update the `SFTPChatDatasetItemSchema` class and its union/tag usage in `schemas.py` so the name and discriminator key match exactly with `SFTChatDatasetItemSchema`, and make sure any references in the dataset schema union use the corrected identifier consistently.services/rl/src/nmp/rl/tasks/training/templates/nemotron-3.3.jinja-17-20 (1)
17-20: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUse real newlines here.
Line 17 and Line 20 emit
\\n\\n, while Line 8 uses\n\n. That writes literal backslashes into the prompt and changes the expected chat format.Suggested fix
- {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n' + content | trim + '<|eot_id|>' }} + {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + content | trim + '<|eot_id|>' }} @@ - {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }} + {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/templates/nemotron-3.3.jinja` around lines 17 - 20, In the Nemotron 3.3 Jinja template, the message rendering block is emitting literal backslash characters because the assistant/user header template uses escaped newline text instead of actual newlines. Update the prompt construction around the message loop and the add_generation_prompt branch so it matches the existing newline style used earlier in the template, and ensure the rendered chat format from the template stays consistent across the header/content separators.services/rl/src/nmp/rl/tasks/training/utils.py-69-77 (1)
69-77: 🩺 Stability & Availability | 🟠 MajorGuard against zero GPUs.
torch.cuda.device_count()can return0when CUDA is unavailable, so this branch can emit--nproc_per_node 0. Treat<= 0as missing and fall back to1.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/utils.py` around lines 69 - 77, The GPU count fallback in the training utils can currently pass through a zero value from torch.cuda.device_count(), which leads to an invalid nproc_per_node setting. Update the logic in the GPU detection block around gpus_per_node so that any value less than or equal to 0 is treated as unavailable, and fall back to "1" just like the exception path. Keep the fix localized to the existing os.environ.get("GPUS_PER_NODE") and torch.cuda.device_count() handling.services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/nemo_rl_logger.py-42-49 (1)
42-49: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winValidate
steps_per_epochandlog_intervalin__init__.Lines 100 and 105 divide/modulo by constructor inputs with no guard. A zero or negative value crashes the first metrics update. Reject invalid values before training starts.
Suggested fix
def __init__( self, steps_per_epoch: int, job_ctx: NMPJobContext | None = None, log_interval: int = 10, max_steps: int | None = None, num_epochs: int | None = None, ): @@ + if steps_per_epoch <= 0: + raise ValueError("steps_per_epoch must be > 0") + if log_interval <= 0: + raise ValueError("log_interval must be > 0") + self._job_ctx = job_ctx or NMPJobContext.from_env()Also applies to: 97-105
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/nemo_rl_logger.py` around lines 42 - 49, Validate the constructor inputs in NemoRLLogger.__init__: reject zero or negative values for steps_per_epoch and log_interval before storing them, since methods like _log_metrics and the step/update logic later perform division and modulo using these fields. Add an early guard in __init__ that raises a clear error for invalid values so training fails fast instead of crashing on the first metrics update.services/rl/src/nmp/rl/tasks/training/templates/llama-3.2-instruct.jinja-19-25 (1)
19-25: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winSerialize the tool schema with one
tojsoncall.Line 24 hand-builds JSON with raw
tnameandtdesc. Any quote or newline in those fields makes the tool definition invalid and corrupts the prompt format. Serialize a dict instead of concatenating JSON fragments.Suggested fix
{%- for t in tools %} {%- set tname = t.function.name %} {%- set tdesc = t.function.description %} - {%- set tparams = t.function.parameters | tojson %} {{- "Use the function '" + tname + "' to '" + tdesc + "':\n" }} - {{- '{"name": "' + tname + '", "description": "' + tdesc + '", "parameters": ' + tparams + '}\n\n' }} + {{- {"name": tname, "description": tdesc, "parameters": t.function.parameters} | tojson + '\n\n' }} {%- endfor %}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/templates/llama-3.2-instruct.jinja` around lines 19 - 25, The tool schema rendering in the llama-3.2-instruct.jinja template is manually concatenating JSON fragments with raw tname and tdesc, which can break on quotes or newlines; update the loop over tools to serialize a single dictionary with one tojson call instead of building the JSON string by hand, while keeping the existing t.function.name, t.function.description, and t.function.parameters values in the output.services/rl/src/nmp/rl/tasks/training/templates/llama-3.3-instruct.jinja-19-25 (1)
19-25: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winSerialize the tool schema with one
tojsoncall.Line 24 hand-builds JSON with raw
tnameandtdesc. Any quote or newline in those fields makes the tool definition invalid and corrupts the prompt format. Serialize a dict instead of concatenating JSON fragments.Suggested fix
{%- for t in tools %} {%- set tname = t.function.name %} {%- set tdesc = t.function.description %} - {%- set tparams = t.function.parameters | tojson %} {{- "Use the function '" + tname + "' to '" + tdesc + "':\n" }} - {{- '{"name": "' + tname + '", "description": "' + tdesc + '", "parameters": ' + tparams + '}\n\n' }} + {{- {"name": tname, "description": tdesc, "parameters": t.function.parameters} | tojson + '\n\n' }} {%- endfor %}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/templates/llama-3.3-instruct.jinja` around lines 19 - 25, The tool schema serialization in the llama-3.3-instruct.jinja template is being hand-built with raw tname and tdesc, which can break JSON when those values contain quotes or newlines. Update the loop over tools to construct a single dictionary-like payload for each tool and serialize that whole object with one tojson call, using the existing t.function.name, t.function.description, and t.function.parameters fields. Keep the surrounding prompt text generation in the same template, but remove the concatenated JSON fragment so the tool definition stays valid and robust.services/rl/src/nmp/rl/tasks/training/errors/converter.py-89-102 (1)
89-102: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winKeep
create_error_details()non-throwing.Line 90 calls
get_error_converter()outside the fallback guard. If rule loading fails here, the original training exception escapes the error path andTrainingRunner.run()falls back to its stale default result instead of structured error details. Wrap converter acquisition in the sametryand use theInternalErrorfallback from that branch too.Suggested fix
def create_error_details(exception: Exception) -> ErrorDetails: @@ - converter = get_error_converter() try: + converter = get_error_converter() converter.raise_converted_or_default(exception) except CustomizerTrainingError as converted: return converted.to_error_details() except Exception as e: # noqa: BLE001 - intentional last-resort guard to guarantee dict return # Unexpected exception type - wrap in InternalError to ensure we always return a dict logger.warning(f"Unexpected exception type from converter: {type(e).__name__}: {e}") exc = InternalError( message=f"An internal error occurred. ({type(exception).__name__}: {exception})", - detail=str(exception), + detail=f"{type(exception).__name__}: {exception}", ) return exc.to_error_details()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/errors/converter.py` around lines 89 - 102, Keep create_error_details() non-throwing by moving get_error_converter() inside the same try block as converter.raise_converted_or_default() in converter.py. If converter acquisition fails, handle it in the existing except Exception branch and return InternalError.to_error_details() so TrainingRunner.run() always gets structured error details instead of leaking the original exception. Use the existing create_error_details(), get_error_converter(), and InternalError flow as the fix point.
🟡 Minor comments (2)
plugins/nemo-rl/src/nemo_rl_plugin/schema.py-36-41 (1)
36-41: 🗄️ Data Integrity & Integration | 🟡 MinorCap
OutputRequest.nameat 255 chars
OutputResponsealready limits bothnameandfilesetto 255, so an oversized explicit output name should be rejected here instead of during transform.Suggested fix
class OutputRequest(BaseModel): """Submitter-facing output preferences. ``name`` is auto-derived if omitted.""" model_config = ConfigDict(extra="forbid") - name: str | None = None + name: str | None = Field(default=None, max_length=255)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/nemo-rl/src/nemo_rl_plugin/schema.py` around lines 36 - 41, OutputRequest.name is not bounded, so an explicit oversized output name can slip through until later transform logic. Add a 255-character max constraint to the name field in OutputRequest (the model in schema.py) so validation fails immediately, matching the limits already enforced by OutputResponse for name and fileset. Keep the change localized to the OutputRequest model definition.services/rl/src/nmp/rl/tasks/training/runner.py-81-81 (1)
81-81: 📐 Maintainability & Code Quality | 🟡 MinorUse
Selffor__enter__. Inservices/rl/src/nmp/rl/tasks/training/runner.py:81, replace the string return type withSelfand import it fromtyping.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/rl/src/nmp/rl/tasks/training/runner.py` at line 81, The __enter__ method in TrainingRunner is still using a string annotation instead of the modern Self type. Update the TrainingRunner.__enter__ signature to return Self, and add Self to the typing import list in the same module so the annotation is resolved correctly.Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 118f7c09-f7be-40f2-bcb5-727036cd8217
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (75)
docker-bake.hcldocker/Dockerfile.nmp-rl-basedocker/Dockerfile.nmp-rl-tasksdocker/Dockerfile.nmp-rl-trainingdocker/rl/Dockerfile.platform-workspacedocker/rl/pyproject.workspace.tomlpackages/nmp_customization_common/src/nmp/customization_common/contributor/jobs.pyplugins/nemo-rl/README.mdplugins/nemo-rl/pyproject.tomlplugins/nemo-rl/src/nemo_rl_plugin/__init__.pyplugins/nemo-rl/src/nemo_rl_plugin/cli/__init__.pyplugins/nemo-rl/src/nemo_rl_plugin/cli/inputs.pyplugins/nemo-rl/src/nemo_rl_plugin/cli/main.pyplugins/nemo-rl/src/nemo_rl_plugin/config.pyplugins/nemo-rl/src/nemo_rl_plugin/contributor.pyplugins/nemo-rl/src/nemo_rl_plugin/jobs/jobs.pyplugins/nemo-rl/src/nemo_rl_plugin/schema.pyplugins/nemo-rl/src/nemo_rl_plugin/sdk/__init__.pyplugins/nemo-rl/src/nemo_rl_plugin/sdk/resources.pyplugins/nemo-rl/src/nemo_rl_plugin/transform.pypyproject.tomlservices/rl/README.mdservices/rl/pyproject.tomlservices/rl/src/nmp/rl/app/constants.pyservices/rl/src/nmp/rl/app/jobs/compiler.pyservices/rl/src/nmp/rl/app/jobs/training/schemas.pyservices/rl/src/nmp/rl/compile.pyservices/rl/src/nmp/rl/config.pyservices/rl/src/nmp/rl/entities/values.pyservices/rl/src/nmp/rl/images.pyservices/rl/src/nmp/rl/schemas.pyservices/rl/src/nmp/rl/tasks/file_io/__init__.pyservices/rl/src/nmp/rl/tasks/file_io/__main__.pyservices/rl/src/nmp/rl/tasks/file_io/callbacks.pyservices/rl/src/nmp/rl/tasks/file_io/run.pyservices/rl/src/nmp/rl/tasks/model_entity/__init__.pyservices/rl/src/nmp/rl/tasks/model_entity/__main__.pyservices/rl/src/nmp/rl/tasks/model_entity/run.pyservices/rl/src/nmp/rl/tasks/training/__main__.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/backend.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/callbacks.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/checkpoints.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_driver.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_config.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/grpo_driver.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/nemo_rl_logger.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/no_override_requirements.txtservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/__init__.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/helpsteer3.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/preference_datasets/tulu3.pyservices/rl/src/nmp/rl/tasks/training/backends/nemo_rl/ray_bootstrap.pyservices/rl/src/nmp/rl/tasks/training/chat_templates.pyservices/rl/src/nmp/rl/tasks/training/datasets/preparation.pyservices/rl/src/nmp/rl/tasks/training/datasets/schemas.pyservices/rl/src/nmp/rl/tasks/training/datasets/validation.pyservices/rl/src/nmp/rl/tasks/training/distributed.pyservices/rl/src/nmp/rl/tasks/training/errors/converter.pyservices/rl/src/nmp/rl/tasks/training/errors/error_rules.yamlservices/rl/src/nmp/rl/tasks/training/errors/exceptions.pyservices/rl/src/nmp/rl/tasks/training/errors/parser.pyservices/rl/src/nmp/rl/tasks/training/integrations.pyservices/rl/src/nmp/rl/tasks/training/protocol.pyservices/rl/src/nmp/rl/tasks/training/runner.pyservices/rl/src/nmp/rl/tasks/training/templates/llama-3.1-instruct.jinjaservices/rl/src/nmp/rl/tasks/training/templates/llama-3.2-instruct.jinjaservices/rl/src/nmp/rl/tasks/training/templates/llama-3.3-instruct.jinjaservices/rl/src/nmp/rl/tasks/training/templates/nemotron-3.1.jinjaservices/rl/src/nmp/rl/tasks/training/templates/nemotron-3.3.jinjaservices/rl/src/nmp/rl/tasks/training/templates/nemotron-super-3.3.jinjaservices/rl/src/nmp/rl/tasks/training/templates/phi-4.jinjaservices/rl/src/nmp/rl/tasks/training/utils.pyservices/rl/tests/test_compiler.pyservices/rl/tests/test_dpo_config.pyservices/rl/tests/test_schemas.py
| try: | ||
| from transformers import AutoTokenizer | ||
|
|
||
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | ||
| template = getattr(tokenizer, "chat_template", None) | ||
| if template: | ||
| logger.debug(f"Found chat template in tokenizer for {model_path}") | ||
| return template | ||
| except Exception as e: | ||
| logger.warning(f"Could not load tokenizer to get chat template: {e}") |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔴 Critical
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the target file and relevant call sites
git ls-files 'services/rl/src/nmp/rl/tasks/training/chat_templates.py' \
'services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py'
echo '--- chat_templates.py (selected lines) ---'
nl -ba services/rl/src/nmp/rl/tasks/training/chat_templates.py | sed -n '1,220p'
echo '--- dpo_config.py (selected lines) ---'
nl -ba services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | sed -n '1,220p'Repository: NVIDIA-NeMo/nemo-platform
Length of output: 371
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the target file and relevant call sites
git ls-files 'services/rl/src/nmp/rl/tasks/training/chat_templates.py' \
'services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py'
echo '--- chat_templates.py (selected lines) ---'
nl -ba services/rl/src/nmp/rl/tasks/training/chat_templates.py | sed -n '1,220p'
echo '--- dpo_config.py (selected lines) ---'
nl -ba services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | sed -n '1,220p'Repository: NVIDIA-NeMo/nemo-platform
Length of output: 371
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/chat_templates.py | cat -n
echo '--- dpo_config.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | cat -nRepository: NVIDIA-NeMo/nemo-platform
Length of output: 20073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/chat_templates.py | cat -n
echo '--- dpo_config.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | cat -nRepository: NVIDIA-NeMo/nemo-platform
Length of output: 20073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/chat_templates.py | cat -n
echo '--- dpo_config.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | cat -nRepository: NVIDIA-NeMo/nemo-platform
Length of output: 20073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/chat_templates.py | cat -n
echo '--- dpo_config.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | cat -nRepository: NVIDIA-NeMo/nemo-platform
Length of output: 20073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/chat_templates.py | cat -n
echo '--- dpo_config.py ---'
sed -n '1,220p' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.py | cat -nRepository: NVIDIA-NeMo/nemo-platform
Length of output: 20073
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py matches ---'
rg -n -C 4 'from_pretrained|chat_template|model_path|trust_remote_code' services/rl/src/nmp/rl/tasks/training/chat_templates.py
echo '--- dpo_config.py matches ---'
rg -n -C 4 'resolve_chat_template|model\.path|customizer_config|chat_template' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.pyRepository: NVIDIA-NeMo/nemo-platform
Length of output: 14263
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py matches ---'
rg -n -C 4 'from_pretrained|chat_template|model_path|trust_remote_code' services/rl/src/nmp/rl/tasks/training/chat_templates.py
echo '--- dpo_config.py matches ---'
rg -n -C 4 'resolve_chat_template|model\.path|customizer_config|chat_template' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.pyRepository: NVIDIA-NeMo/nemo-platform
Length of output: 14263
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- chat_templates.py matches ---'
rg -n -C 4 'from_pretrained|chat_template|model_path|trust_remote_code' services/rl/src/nmp/rl/tasks/training/chat_templates.py
echo '--- dpo_config.py matches ---'
rg -n -C 4 'resolve_chat_template|model\.path|customizer_config|chat_template' services/rl/src/nmp/rl/tasks/training/backends/nemo_rl/dpo_config.pyRepository: NVIDIA-NeMo/nemo-platform
Length of output: 14263
Remove trust_remote_code=True from tokenizer fallback. This can execute arbitrary code from model_path while only resolving a chat template; keep it off unless the source is strictly allowlisted.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 109-109: Loading a HuggingFace model, tokenizer, dataset or pipeline with trust_remote_code=True executes arbitrary code from the remote repository. Only enable it for fully trusted sources; prefer artifacts that do not require it.
Context: trust_remote_code=True
Note: [CWE-94] Improper Control of Generation of Code ('Code Injection').
(transformers-trust-remote-code-python)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/rl/src/nmp/rl/tasks/training/chat_templates.py` around lines 107 -
116, The tokenizer fallback in chat template resolution is using
trust_remote_code=True, which should be removed from
AutoTokenizer.from_pretrained in the template lookup path. Update the fallback
in the chat template helper to load the tokenizer without remote code execution
and keep it limited to extracting the chat_template attribute; if special
handling is needed, gate it behind a strict allowlist in the same function.
Source: Linters/SAST tools
| finally: | ||
| # === Phase 5: Write result (coordinator only) === | ||
| self._write_result(result) | ||
| return result |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win
Don't return from finally.
Line 153 suppresses any active exception, including BaseException and failures inside the error-conversion path. That can turn real job failures into the default "No result" response. Keep finally for _write_result() only and return after it.
Suggested fix
- finally:
- # === Phase 5: Write result (coordinator only) ===
- self._write_result(result)
- return result
+ finally:
+ # === Phase 5: Write result (coordinator only) ===
+ self._write_result(result)
+
+ return result📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| finally: | |
| # === Phase 5: Write result (coordinator only) === | |
| self._write_result(result) | |
| return result | |
| finally: | |
| # === Phase 5: Write result (coordinator only) === | |
| self._write_result(result) | |
| return result |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/rl/src/nmp/rl/tasks/training/runner.py` around lines 150 - 153, The
`finally` block in `training.runner` should not return, because
`Runner`/`_write_result()` is currently suppressing active exceptions and can
hide real failures as a default result. Keep the `finally` section in
`runner.py` limited to calling `_write_result(result)`, then move the `return
result` out of `finally` so normal completion still returns the result while
exceptions propagate correctly.
| if self._dist_ctx.is_coordinator: | ||
| if isinstance(self._backend, SupportsPreprocessing): | ||
| self._progress.report_running("conversions") | ||
| self._backend.run_preprocessing(self._config) | ||
| logger.info("Pre-training conversions complete") | ||
| # Always release workers, even if no conversions are needed | ||
| self._dist_ctx.signal(BARRIER_PREPROCESSING_COMPLETE) | ||
| else: | ||
| self._dist_ctx.wait_for_coordinator(BARRIER_PREPROCESSING_COMPLETE) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift
Coordinator-only failures strand worker ranks.
These phases only publish success markers. If preprocessing, config compilation, or the YAML write fails on the coordinator, every worker stays in wait_for_coordinator() until timeout. The barrier protocol needs a failure marker/shared error payload so workers can exit immediately and surface the same error.
Also applies to: 218-235
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@services/rl/src/nmp/rl/tasks/training/runner.py` around lines 197 - 205, The
coordinator-only phases in runner.py only signal success, so failures in
run_preprocessing, config compilation, or the YAML write can leave worker ranks
stuck in wait_for_coordinator(). Update the barrier flow around
BARRIER_PREPROCESSING_COMPLETE and the later coordinator steps to publish a
failure marker or shared error payload when an exception occurs. Make the
coordinator catch and propagate the same error state, and have workers check
that state after wait_for_coordinator() so they can exit immediately instead of
timing out.
Linear ticket - https://linear.app/nvidia/issue/AALGO-292/add-dpo-support-for-customizer-plugin
Summary by CodeRabbit
New Features
Bug Fixes