Prevent artifact storage path escapes from reserved dataset names#778
Prevent artifact storage path escapes from reserved dataset names#778fallintoplace wants to merge 1 commit into
Conversation
Linked Issue CheckThis PR does not reference an issue. External contributions must link to Add one of the following to your PR description:
If no issue exists yet, open one See CONTRIBUTING.md |
|
Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text: I have read the DCO document and I hereby sign the DCO. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot. |
Greptile SummaryThis PR closes a path-traversal hole in
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py | Adds path-traversal defenses: a '.' / '..' reject in the model validator, a new resolved_artifact_path cached_property, and a _resolve_artifact_subpath helper that uses Path.resolve() + relative_to() as a belt-and-suspenders check on every derived path. |
| packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py | Adds parametrized coverage for all five folder-name fields rejecting '.' and '..', plus a test that directly injects '..' into the resolved_dataset_name cache to verify the belt-and-suspenders check in _resolve_artifact_subpath. |
| packages/data-designer/tests/interface/test_data_designer.py | Adds an integration-level regression test verifying that DataDesigner.create() propagates the ArtifactStorageError when dataset_name='..' is passed. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["ArtifactStorage init"] --> B["validate_artifact_path\nmust be existing dir"]
B --> C["validate_folder_names"]
C --> D["Check each folder name"]
D -->|"empty"| E["ArtifactStorageError"]
D -->|"duplicate"| F["ArtifactStorageError"]
D -->|"invalid chars"| G["ArtifactStorageError"]
D -->|"dot or dotdot"| H["ArtifactStorageError"]
D -->|"passes"| I["Init MediaStorage via base_dataset_path"]
I --> J["_resolve_artifact_subpath"]
J --> K["resolved_artifact_path.joinpath parts .resolve"]
K --> L{"relative_to check"}
L -->|"fails"| M["ArtifactStorageError escapes root"]
L -->|"passes"| N["Return resolved Path"]
N --> O["ArtifactStorage ready"]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["ArtifactStorage init"] --> B["validate_artifact_path\nmust be existing dir"]
B --> C["validate_folder_names"]
C --> D["Check each folder name"]
D -->|"empty"| E["ArtifactStorageError"]
D -->|"duplicate"| F["ArtifactStorageError"]
D -->|"invalid chars"| G["ArtifactStorageError"]
D -->|"dot or dotdot"| H["ArtifactStorageError"]
D -->|"passes"| I["Init MediaStorage via base_dataset_path"]
I --> J["_resolve_artifact_subpath"]
J --> K["resolved_artifact_path.joinpath parts .resolve"]
K --> L{"relative_to check"}
L -->|"fails"| M["ArtifactStorageError escapes root"]
L -->|"passes"| N["Return resolved Path"]
N --> O["ArtifactStorage ready"]
Reviews (1): Last reviewed commit: "Prevent artifact path escapes from datas..." | Re-trigger Greptile
|
I have read the DCO document and I hereby sign the DCO. |
Summary
.and..for artifact directory namesWhy
ArtifactStoragepreviously joinedartifact_pathanddataset_namedirectly. In resume flows,dataset_name=".."could resolve outside the artifact root, anddataset_name="."could collapse the dataset directory onto the artifact root.Testing
uv run --package data-designer-engine pytest packages/data-designer-engine/tests/engine/storage/test_artifact_storage.pyuv run --package data-designer pytest packages/data-designer/tests/interface/test_data_designer.py -k reserved_dataset_nameuv run ruff check packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py packages/data-designer/tests/interface/test_data_designer.pyuv run ruff format --check packages/data-designer-engine/src/data_designer/engine/storage/artifact_storage.py packages/data-designer-engine/tests/engine/storage/test_artifact_storage.py packages/data-designer/tests/interface/test_data_designer.py