Skip to content

Prevent artifact storage path escapes from reserved dataset names#778

Open
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/dataset-name-artifact-escape
Open

Prevent artifact storage path escapes from reserved dataset names#778
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/dataset-name-artifact-escape

Conversation

@fallintoplace

Copy link
Copy Markdown

Summary

  • reject . and .. for artifact directory names
  • resolve artifact subpaths against the artifact root and fail if they escape it
  • add storage and interface regression coverage for reserved dataset names

Why

ArtifactStorage previously joined artifact_path and dataset_name directly. In resume flows, dataset_name=".." could resolve outside the artifact root, and dataset_name="." could collapse the dataset directory onto the artifact root.

Testing

  • uv run --package data-designer-engine pytest packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py
  • uv run --package data-designer pytest packages/data-designer/tests/interface/test_data_designer.py -k reserved_dataset_name
  • uv run ruff check packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py packages/data-designer/tests/interface/test_data_designer.py
  • uv run ruff format --check packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py packages/data-designer/tests/interface/test_data_designer.py

@fallintoplace fallintoplace requested a review from a team as a code owner June 27, 2026 20:53
@github-actions

Copy link
Copy Markdown
Contributor

Linked Issue Check

This PR does not reference an issue. External contributions must link to
a triaged issue before the PR can be merged.

Add one of the following to your PR description:

  • Fixes #<issue-number>
  • Closes #<issue-number>
  • Resolves #<issue-number>

If no issue exists yet, open one
and a maintainer will triage it.

See CONTRIBUTING.md
for details.

@github-actions

Copy link
Copy Markdown
Contributor

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes a path-traversal hole in ArtifactStorage where passing dataset_name=".." in a resume flow could cause the engine to read or write outside the designated artifact root. The fix adds two complementary defenses: an explicit reject of "." and ".." in the model validator, and a _resolve_artifact_subpath helper that uses Path.resolve() plus relative_to() as a belt-and-suspenders check on every derived storage path.

  • Validation guardvalidate_folder_names now rejects "." and ".." for all five configurable folder name fields before any path construction occurs.
  • Runtime path guard_resolve_artifact_subpath resolves the candidate path against resolved_artifact_path (a new cached property) and raises ArtifactStorageError if the result escapes the artifact root, protecting against any future bypass of the validator.
  • Test coverage — new parametrized unit tests cover all five fields for both reserved names, a cache-injection test exercises the runtime guard independently, and an integration test confirms DataDesigner.create() surfaces the error end-to-end.

Confidence Score: 5/5

The change adds purely additive validation with no modifications to existing logic paths; all new error paths raise immediately without side effects.

Both the validator-level and runtime-level guards are correct, the Path.resolve() + relative_to() pattern is the standard way to prevent directory traversal in Python, and the test suite covers the direct validation path, the cache-injection bypass scenario, and the full integration path through DataDesigner.create().

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py Adds path-traversal defenses: a '.' / '..' reject in the model validator, a new resolved_artifact_path cached_property, and a _resolve_artifact_subpath helper that uses Path.resolve() + relative_to() as a belt-and-suspenders check on every derived path.
packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py Adds parametrized coverage for all five folder-name fields rejecting '.' and '..', plus a test that directly injects '..' into the resolved_dataset_name cache to verify the belt-and-suspenders check in _resolve_artifact_subpath.
packages/data-designer/tests/interface/test_data_designer.py Adds an integration-level regression test verifying that DataDesigner.create() propagates the ArtifactStorageError when dataset_name='..' is passed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["ArtifactStorage init"] --> B["validate_artifact_path\nmust be existing dir"]
    B --> C["validate_folder_names"]
    C --> D["Check each folder name"]
    D -->|"empty"| E["ArtifactStorageError"]
    D -->|"duplicate"| F["ArtifactStorageError"]
    D -->|"invalid chars"| G["ArtifactStorageError"]
    D -->|"dot or dotdot"| H["ArtifactStorageError"]
    D -->|"passes"| I["Init MediaStorage via base_dataset_path"]
    I --> J["_resolve_artifact_subpath"]
    J --> K["resolved_artifact_path.joinpath parts .resolve"]
    K --> L{"relative_to check"}
    L -->|"fails"| M["ArtifactStorageError escapes root"]
    L -->|"passes"| N["Return resolved Path"]
    N --> O["ArtifactStorage ready"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["ArtifactStorage init"] --> B["validate_artifact_path\nmust be existing dir"]
    B --> C["validate_folder_names"]
    C --> D["Check each folder name"]
    D -->|"empty"| E["ArtifactStorageError"]
    D -->|"duplicate"| F["ArtifactStorageError"]
    D -->|"invalid chars"| G["ArtifactStorageError"]
    D -->|"dot or dotdot"| H["ArtifactStorageError"]
    D -->|"passes"| I["Init MediaStorage via base_dataset_path"]
    I --> J["_resolve_artifact_subpath"]
    J --> K["resolved_artifact_path.joinpath parts .resolve"]
    K --> L{"relative_to check"}
    L -->|"fails"| M["ArtifactStorageError escapes root"]
    L -->|"passes"| N["Return resolved Path"]
    N --> O["ArtifactStorage ready"]
Loading

Reviews (1): Last reviewed commit: "Prevent artifact path escapes from datas..." | Re-trigger Greptile

@fallintoplace

Copy link
Copy Markdown
Author

I have read the DCO document and I hereby sign the DCO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant