cwe: add XXE search aliases and enforce prohibited CWE exclusions#939
cwe: add XXE search aliases and enforce prohibited CWE exclusions#939Bornunique911 wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThree independent changes extend vulnerability knowledge-graph infrastructure: the CWE parser adds an ChangesCWE PROHIBITED Status Filtering
Text Search Alias Expansion
Database Health Check Exception Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
application/tests/cwe_parser_test.py (1)
151-261: ⚡ Quick winRestore coverage for the existing fallback cases (
614,502).This test still seeds
secure_cookie_creanddeserialization_cre, but no assertion verifies CWE614and502mappings. That weakens regression protection for fallback mapping behavior.Suggested patch
self.assertEqual( imported_cwes["384"].links[0].document.todict(), session_management_cre.todict(), ) + self.assertEqual( + imported_cwes["614"].links[0].document.todict(), + secure_cookie_cre.todict(), + ) + self.assertEqual( + imported_cwes["502"].links[0].document.todict(), + deserialization_cre.todict(), + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/tests/cwe_parser_test.py` around lines 151 - 261, The test method test_register_CWE_applies_fallback_family_mappings creates secure_cookie_cre and deserialization_cre objects and adds them to the collection, but the test is missing assertions to verify that imported_cwes for CWE entries 614 and 502 correctly map to these CREs respectively. Add two additional assertEqual statements following the same pattern as the existing assertions (checking imported_cwes["614"].links[0].document.todict() against secure_cookie_cre.todict() and imported_cwes["502"].links[0].document.todict() against deserialization_cre.todict()) to restore full coverage of the fallback mapping test cases.application/utils/external_project_parsers/parsers/cwe.py (1)
27-27: Use an immutable container for class-level statuses.At line 27,
allowed_statusesis a mutablesetshared across all instances. Preferfrozensetto prevent accidental runtime mutation of parser behavior.Suggested patch
- allowed_statuses = {"Stable", "Incomplete", "Draft"} + allowed_statuses = frozenset({"Stable", "Incomplete", "Draft"})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/cwe.py` at line 27, The `allowed_statuses` variable is defined as a mutable set which could be accidentally modified at runtime, compromising the parser's behavior. Replace the set literal syntax with a frozenset constructor by changing `allowed_statuses = {"Stable", "Incomplete", "Draft"}` to use `frozenset()` instead, wrapping the same string values as arguments to create an immutable version of this class-level attribute.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/database/db.py`:
- Around line 2183-2213: The code fails to handle whitespace-only input where
text.strip() produces an empty string that gets added to the expanded list,
causing LIKE '%%' queries that return excessive results. After the normalized =
text.strip() assignment, add a guard condition to check if normalized is empty
and skip the alias expansion logic (the part where expanded is populated and
aliases are added) if it is empty, preventing empty search terms from being
processed downstream.
---
Nitpick comments:
In `@application/tests/cwe_parser_test.py`:
- Around line 151-261: The test method
test_register_CWE_applies_fallback_family_mappings creates secure_cookie_cre and
deserialization_cre objects and adds them to the collection, but the test is
missing assertions to verify that imported_cwes for CWE entries 614 and 502
correctly map to these CREs respectively. Add two additional assertEqual
statements following the same pattern as the existing assertions (checking
imported_cwes["614"].links[0].document.todict() against
secure_cookie_cre.todict() and imported_cwes["502"].links[0].document.todict()
against deserialization_cre.todict()) to restore full coverage of the fallback
mapping test cases.
In `@application/utils/external_project_parsers/parsers/cwe.py`:
- Line 27: The `allowed_statuses` variable is defined as a mutable set which
could be accidentally modified at runtime, compromising the parser's behavior.
Replace the set literal syntax with a frozenset constructor by changing
`allowed_statuses = {"Stable", "Incomplete", "Draft"}` to use `frozenset()`
instead, wrapping the same string values as arguments to create an immutable
version of this class-level attribute.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 4af16295-7846-4009-a132-ad5d83e6eaca
📒 Files selected for processing (5)
application/database/db.pyapplication/tests/cwe_parser_test.pyapplication/tests/db_test.pyapplication/utils/external_project_parsers/parsers/cwe.pyscripts/update-cwe.sh
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
This commit builds on the earlier CWE mapping work (PR OWASP#823) and addresses two gaps found during local verification after the previous merge. Problem: Searching for terms like 'XXE' did not reliably surface the expected top-level CWE result, even when the corresponding CWE data existed locally. This created a mismatch between the intended curated CWE import behavior and actual API/UI discoverability. Solution: - Expand text-search aliases to include common security shorthand terms (e.g., XXE), ensuring related CWE and CRE records surface correctly. - Strengthen CWE import filtering to consistently exclude prohibited entries. Scope: - CWE import filtering logic - Search alias expansion for XXE and related terms - Unit tests covering prohibited CWE exclusion and XXE search behavior - Script messaging update for the CWE refresh workflow Testing: ./venv/bin/python -m pytest application/tests/cwe_parser_test.py -k 'register_CWE or prohibited' -q ./venv/bin/python -m pytest application/tests/db_test.py -k 'text_search' -q Follows-up: OWASP#823 References: OWASP#472
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Bornunique911 <69379200+Bornunique911@users.noreply.github.com>
f267dfa to
24b112d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/database/db.py`:
- Around line 2340-2345: The exception handler in the health_check function is
catching the overly broad Exception class on line 2340, which masks programming
errors and makes debugging difficult. Replace the Exception catch with
SQLAlchemyError to only catch database-layer failures while allowing unexpected
errors to propagate. Add an import statement at the top of the file for
SQLAlchemyError from the sqlalchemy package to make this change work properly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 92d40f16-a00d-494c-b315-dc13a4e4aa6a
📒 Files selected for processing (5)
application/database/db.pyapplication/tests/cwe_parser_test.pyapplication/tests/db_test.pyapplication/utils/external_project_parsers/parsers/cwe.pyscripts/update-cwe.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/update-cwe.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- application/utils/external_project_parsers/parsers/cwe.py
- application/tests/cwe_parser_test.py
- application/tests/db_test.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/database/db.py`:
- Around line 2340-2345: The exception handler in the health_check function is
catching the overly broad Exception class on line 2340, which masks programming
errors and makes debugging difficult. Replace the Exception catch with
SQLAlchemyError to only catch database-layer failures while allowing unexpected
errors to propagate. Add an import statement at the top of the file for
SQLAlchemyError from the sqlalchemy package to make this change work properly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 92d40f16-a00d-494c-b315-dc13a4e4aa6a
📒 Files selected for processing (5)
application/database/db.pyapplication/tests/cwe_parser_test.pyapplication/tests/db_test.pyapplication/utils/external_project_parsers/parsers/cwe.pyscripts/update-cwe.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/update-cwe.sh
🚧 Files skipped from review as they are similar to previous changes (3)
- application/utils/external_project_parsers/parsers/cwe.py
- application/tests/cwe_parser_test.py
- application/tests/db_test.py
🛑 Comments failed to post (1)
application/database/db.py (1)
2340-2345:
⚠️ Potential issue | 🟡 Minor🧩 Analysis chain
🏁 Script executed:
#!/bin/bash # Verify current exception imports/catches in db.py before applying the change. rg -n "from sqlalchemy\\.exc import|sqlalchemy\\.exc" application/database/db.py rg -n "except\\s+OperationalError|except\\s+Exception|except\\s+SQLAlchemyError" application/database/db.pyRepository: OWASP/OpenCRE
Length of output: 277
🏁 Script executed:
sed -n '2330,2350p' application/database/db.pyRepository: OWASP/OpenCRE
Length of output: 805
🏁 Script executed:
rg -n "SQLAlchemyError" application/database/db.pyRepository: OWASP/OpenCRE
Length of output: 39
🏁 Script executed:
rg -B5 -A3 "session\.query.*scalar" application/database/db.py | head -50Repository: OWASP/OpenCRE
Length of output: 546
🏁 Script executed:
rg -n "except.*SQLAlchemyError|except.*IntegrityError" application/database/db.pyRepository: OWASP/OpenCRE
Length of output: 147
🏁 Script executed:
rg -n "^import sqlalchemy|^from sqlalchemy import" application/database/db.py | head -20Repository: OWASP/OpenCRE
Length of output: 169
🏁 Script executed:
python3 << 'PY' # Check if SQLAlchemyError is a valid exception class in sqlalchemy try: from sqlalchemy.exc import SQLAlchemyError print("SQLAlchemyError is available in sqlalchemy.exc") except ImportError: print("SQLAlchemyError NOT available in sqlalchemy.exc") # Check the exception hierarchy from sqlalchemy.exc import OperationalError print(f"OperationalError MRO: {[c.__name__ for c in OperationalError.__mro__]}") PYRepository: OWASP/OpenCRE
Length of output: 246
🏁 Script executed:
fd -e "requirements" -e "setup.py" -e "pyproject.toml" -e "setup.cfg" | head -10Repository: OWASP/OpenCRE
Length of output: 39
🏁 Script executed:
rg -i "sqlalchemy" --type txt --type toml | grep -i "sqlalchemy" | head -10Repository: OWASP/OpenCRE
Length of output: 181
🏁 Script executed:
cat requirements.txt | grep -i sqlalchemyRepository: OWASP/OpenCRE
Length of output: 113
🏁 Script executed:
web_search SQLAlchemyError exception class availability in SQLAlchemyRepository: OWASP/OpenCRE
Length of output: 162
🌐 Web query:
SQLAlchemyError exception class in sqlalchemy.exc💡 Result:
SQLAlchemyError is the base exception class for all exceptions raised by the SQLAlchemy library [1][2]. It is defined in the sqlalchemy.exc module and inherits from the built-in Exception class (and historically the HasDescriptionCode mixin) [1][3][2]. Because SQLAlchemyError serves as the root of the library's exception hierarchy, catching it allows you to handle any exception specifically generated by SQLAlchemy [1][4]. Common subclasses include DBAPIError, which wraps database-level exceptions raised by the underlying DB-API driver, and StatementError, which occurs during the execution of a SQL statement [1][5][4]. Key characteristics include: - Base Class: All SQLAlchemy-specific exceptions inherit from SQLAlchemyError [1][2]. - Generic Nature: It acts as a generic error class for the library [1][3]. - Integration: It supports custom messaging and integration with SQLAlchemy's error code system [2][6].
Citations:
- 1: https://docs.sqlalchemy.org/en/14/core/exceptions.html
- 2: https://github.com/sqlalchemy/sqlalchemy/blob/d6046f5fcf3732df1a8a08274501efcffc6253f2/lib/sqlalchemy/exc.py
- 3: https://dokk.org/documentation/sqlalchemy/rel_2_0_23/core/exceptions/
- 4: http://docs.sqlalchemy.org/en/latest/core/exceptions.html
- 5: https://docs.sqlalchemy.org/en/13/core/exceptions.html?highlight=exceptions
- 6: https://github.com/sqlalchemy/sqlalchemy/blob/0138954e/lib/sqlalchemy/exc.py
🏁 Script executed:
cat requirements.txtRepository: OWASP/OpenCRE
Length of output: 1413
🏁 Script executed:
rg -B3 -A3 "pragma: no cover.*defensive|defensive.*never fail" application/database/db.pyRepository: OWASP/OpenCRE
Length of output: 324
Add import for
SQLAlchemyErrorand narrow exception handler inhealth_check.Line 2340 catches
Exception, which can mask programming faults and make debugging harder. SinceOperationalErroris already caught at line 2334, the fallback handler should catch only SQLAlchemy exceptions (SQLAlchemyError). This allows unexpected errors to propagate properly while still handling database-layer failures defensively.Suggested fix
-from sqlalchemy.exc import OperationalError, IntegrityError +from sqlalchemy.exc import OperationalError, IntegrityError, SQLAlchemyError- except Exception: # pragma: no cover - defensive, never fail open + except SQLAlchemyError: # pragma: no cover - defensive, never fail open return { "ok": False, "db_reachable": False, "reason": "database health query failed", }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.except SQLAlchemyError: # pragma: no cover - defensive, never fail open return { "ok": False, "db_reachable": False, "reason": "database health query failed", }🧰 Tools
🪛 Ruff (0.15.17)
[warning] 2340-2340: Do not catch blind exception:
Exception(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/database/db.py` around lines 2340 - 2345, The exception handler in the health_check function is catching the overly broad Exception class on line 2340, which masks programming errors and makes debugging difficult. Replace the Exception catch with SQLAlchemyError to only catch database-layer failures while allowing unexpected errors to propagate. Add an import statement at the top of the file for SQLAlchemyError from the sqlalchemy package to make this change work properly.Source: Linters/SAST tools
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
application/database/db.py (1)
2165-2198:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWhitespace-only searches can still raise before fuzzy search runs.
_expand_text_search_aliasesreturns[]for blank input, buttext_searchstill runs structured parsing on rawtext. With whitespace input,node_searchcan match and driveget_nodes(...)with empty fields, which can raiseValueErrorin__get_nodes_query__. Short-circuit on empty normalized input before structured parsing.💡 Suggested fix
def text_search(self, text: str) -> List[Optional[cre_defs.Document]]: @@ - search_terms = self._expand_text_search_aliases(text) + search_terms = self._expand_text_search_aliases(text) + if not search_terms: + return [] + text = text.strip()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/database/db.py` around lines 2165 - 2198, Add an early return check after the call to _expand_text_search_aliases to handle whitespace-only input. If the normalized search_terms result is empty, return an empty result set before proceeding to the structured text search regex patterns (cre_id_search, cre_naked_id_search, cre_name_search, and node_search). This prevents the node_search regex from matching against raw whitespace text and subsequently calling get_nodes with empty fields, which would trigger a ValueError in __get_nodes_query__.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@application/database/db.py`:
- Around line 2165-2198: Add an early return check after the call to
_expand_text_search_aliases to handle whitespace-only input. If the normalized
search_terms result is empty, return an empty result set before proceeding to
the structured text search regex patterns (cre_id_search, cre_naked_id_search,
cre_name_search, and node_search). This prevents the node_search regex from
matching against raw whitespace text and subsequently calling get_nodes with
empty fields, which would trigger a ValueError in __get_nodes_query__.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 00e2e867-7147-45dd-9c3c-c1ea254930c4
📒 Files selected for processing (1)
application/database/db.py
This PR builds on the earlier CWE mapping work (PR #823 ) and addresses two gaps found during local verification after the previous merge.
Problem:
Searching for terms like 'XXE' did not reliably surface the expected top-level CWE result, even when the corresponding CWE data existed locally. This created a mismatch between the intended curated CWE import behavior and actual API/UI discoverability.
Solution:
Scope:
Testing:
Follows-up: #823
References: #472