Python: proper catch-type modelling (follow-up once #22017 lands)#21937
Draft
yoff wants to merge 7 commits into
Draft
Python: proper catch-type modelling (follow-up once #22017 lands)#21937yoff wants to merge 7 commits into
yoff wants to merge 7 commits into
Conversation
83c5f33 to
647b976
Compare
0b3f28f to
db94cd9
Compare
647b976 to
c34dc45
Compare
db94cd9 to
e9d6d8f
Compare
f79f239 to
b783ed6
Compare
e9d6d8f to
5b9803e
Compare
b783ed6 to
158fb5b
Compare
5b9803e to
7a9af9a
Compare
158fb5b to
9c1d64b
Compare
7a9af9a to
5b23b1f
Compare
9c1d64b to
cada7e9
Compare
5b23b1f to
3c9b0f7
Compare
cada7e9 to
7d5fd6b
Compare
3c9b0f7 to
85ba647
Compare
7d5fd6b to
8e79ca5
Compare
67aedba to
fa159d2
Compare
8e79ca5 to
3e5256d
Compare
Preparatory refactor for the shared-CFG dataflow migration. Switches 'import python' to 'import python as Py' inside Flow.qll, and qualifies every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare, Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the Py:: prefix. Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode, NameNode, DefinitionNode, CompareNode, ...) keep their unqualified names — they remain the public CFG API exported from this file. This is a semantic noop: the qualification was applied mechanically by script and no name resolution changes. Verified by: - All 361 lib/ + src/ queries compile clean. - All 186 ControlFlow + PointsTo + dataflow library-tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
… AstSig Adds three new defaulted signature predicates to the shared CFG library: - getWhileElse / getForeachElse: `else` block of a while/for loop, if any (used by Python's `while-else` / `for-else` constructs). - getCatchType: type expression of a catch clause, if any (used by Python's `except SomeExpr:` where the catch type is a runtime expression that needs CFG evaluation). Each predicate defaults to `none()`, so behaviour is unchanged for any language that doesn't override it (verified by re-running java/ql/test/library-tests/controlflow/). The Make0 succession rules are extended: - WhileStmt/ForeachStmt: route the loop-exit edge through the else block before reaching the after-position. - CatchClause: route the matching-evaluation through the type expression (if present) before reaching the after-value position. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the new Python CFG library additively, without changing any production behaviour. Library additions: - semmle.python.controlflow.internal.AstNodeImpl — mediates between the Python AST and the shared codeql.controlflow.ControlFlowGraph signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two synthetic kinds of node (BlockStmt for body slots, intermediate nodes for multi-operand boolean expressions). - semmle.python.controlflow.internal.Cfg — public facade re-exposing the same API surface as semmle/python/Flow.qll (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode, CompareNode, ...), backed by the shared CFG. - lib/printCfgNew.ql — debug/visualisation query for the new CFG. - consistency-queries/CfgConsistency.ql — consistency query running the shared CFG's standard checks against Python. Shared library: - shared.controlflow.ControlFlowGraph — adds two defaulted getWhileElse / getForeachElse predicates to AstSig so Python can model while-else / for-else (no behavioural change for other languages). Test additions: - ControlFlow/bindings/* — annotation-driven SSA-binding tests for the new CFG (annassign, compound, comprehension, decorated, except_handler, imports, match_pattern, parameters, simple, type_params, walrus_starred, with_stmt, dead_under_no_raise). - ControlFlow/store-load/* — basic store/load coverage. - ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing OldCfg evaluation-order self-validation suite, run against the new CFG via NewCfgImpl.qll. - Minor extensions to existing test_if.py / test_boolean.py + cosmetic .expected churn on a handful of OldCfg tests. No dataflow, SSA, or production query is migrated yet — that lands in follow-up PRs. The new CFG library has zero callers in lib/ and src/. Verified by: - All lib + src + consistency-queries compile clean (367 queries). - All 56 ControlFlow library-tests pass. - All 474 dataflow + PointsTo library-tests + consistency tests pass. - syntax_error/CONSISTENCY/CfgConsistency passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the new Python SSA adapter additively, without changing any production behaviour. Library additions: - semmle.python.dataflow.new.internal.SsaImpl — Python SSA implementation built on the new (shared) CFG. Mirrors the Java SSA adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll): an InputSig is defined in terms of positional (BasicBlock, int) variable references, and the shared codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then instantiated. SourceVariable is the AST-level Py::Variable. Variable references are looked up via the new CFG facade's NameNode.defines/uses/deletes predicates (added in the preceding PR), which themselves are one-line bridges to AST-level Name.defines/uses/deletes. Implicit-entry definitions are inserted for non-local/global/builtin reads, captured variables, and (when needed) parameters. Test additions: - library-tests/dataflow-new-ssa/ — exercises the new SSA over a representative test corpus and checks expected def/use chains. - library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and legacy ESSA over the same corpus and diffs the results, so any semantic divergence shows up as a test failure. Production impact: None. The new SSA adapter has zero callers in lib/ and src/ — the legacy ESSA SSA (semmle/python/essa/*) remains the default. The dataflow library is not migrated yet; that lands in a follow-up PR. Verified by: - All 367 lib + src + consistency-queries compile clean. - All 641 ControlFlow + PointsTo + dataflow + essa + consistency library-tests pass. - Both new dataflow-new-ssa[/vs-legacy] test packs pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…with The new CFG previously only emitted exception edges for explicit `raise` and `assert` statements. As a result, code that became reachable only via the exception path of an arbitrary expression (e.g., the body of an `except` handler following a try-body whose `call()` could raise) was classified as dead, breaking analyses like StackTraceExposure, FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException. This commit adds a `mayThrow` predicate over expressions that are known sources of implicit exceptions in Python (calls, attribute access, subscripts, arithmetic/comparison operators, imports, await/yield/yield from) plus `from m import *` at the statement level, and routes them through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor, always=false)` hook. The set of exception sources is restricted to nodes that are syntactically inside a `try`/`with` statement in the same scope. This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits exception edges where local handling can observe them — outside such contexts, the edges add CFG complexity (weakening BarrierGuard precision and breaking SSA continuity around augmented assignments and subscript stores) without analysis benefit, since exceptions just propagate to the function exit anyway. Net effect on the test suite: ~100 alerts restored across the exception- related query tests (StackTraceExposure +29, ExceptionInfo +17, FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored) with no precision regressions. Affected `.expected` files and the regression-guard `dead_under_no_raise.py` are updated accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3e5256d to
92e0331
Compare
fa159d2 to
408ba62
Compare
408ba62 to
93cae5f
Compare
acb4a58 to
5081d81
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proper modelling on catch once #22017 lands.