Skip to content

Python: proper catch-type modelling (follow-up once #22017 lands)#21937

Draft
yoff wants to merge 7 commits into
yoff/python-shared-cfg-dataflow-flipfrom
yoff/python-cfg-modelling-exceptions
Draft

Python: proper catch-type modelling (follow-up once #22017 lands)#21937
yoff wants to merge 7 commits into
yoff/python-shared-cfg-dataflow-flipfrom
yoff/python-cfg-modelling-exceptions

Conversation

@yoff

@yoff yoff commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Proper modelling on catch once #22017 lands.

@github-actions github-actions Bot added the Python label Jun 4, 2026
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 83c5f33 to 647b976 Compare June 4, 2026 07:47
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 0b3f28f to db94cd9 Compare June 4, 2026 08:09
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 647b976 to c34dc45 Compare June 4, 2026 08:09
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from db94cd9 to e9d6d8f Compare June 4, 2026 22:13
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch 2 times, most recently from f79f239 to b783ed6 Compare June 5, 2026 08:12
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from e9d6d8f to 5b9803e Compare June 5, 2026 08:12
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from b783ed6 to 158fb5b Compare June 18, 2026 13:17
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 5b9803e to 7a9af9a Compare June 18, 2026 13:17
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 158fb5b to 9c1d64b Compare June 18, 2026 15:12
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 7a9af9a to 5b23b1f Compare June 18, 2026 15:12
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 9c1d64b to cada7e9 Compare June 18, 2026 15:18
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 5b23b1f to 3c9b0f7 Compare June 18, 2026 15:18
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from cada7e9 to 7d5fd6b Compare June 22, 2026 11:56
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 3c9b0f7 to 85ba647 Compare June 22, 2026 11:56
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 7d5fd6b to 8e79ca5 Compare June 22, 2026 12:28
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch 2 times, most recently from 67aedba to fa159d2 Compare June 22, 2026 12:45
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 8e79ca5 to 3e5256d Compare June 22, 2026 12:45
Copilot AI and others added 7 commits June 22, 2026 13:37
Preparatory refactor for the shared-CFG dataflow migration. Switches
'import python' to 'import python as Py' inside Flow.qll, and qualifies
every AST-class reference (Expr, Bytes, Dict, AssignExpr, Compare,
Module, Scope, Call, Attribute, SsaVariable, AugAssign, etc.) with the
Py:: prefix.

Flow.qll's own CFG types (ControlFlowNode, BasicBlock, CallNode,
NameNode, DefinitionNode, CompareNode, ...) keep their unqualified
names — they remain the public CFG API exported from this file.

This is a semantic noop: the qualification was applied mechanically by
script and no name resolution changes. Verified by:
- All 361 lib/ + src/ queries compile clean.
- All 186 ControlFlow + PointsTo + dataflow library-tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
… AstSig

Adds three new defaulted signature predicates to the shared CFG library:

- getWhileElse / getForeachElse: `else` block of a while/for loop, if
  any (used by Python's `while-else` / `for-else` constructs).
- getCatchType: type expression of a catch clause, if any (used by
  Python's `except SomeExpr:` where the catch type is a runtime
  expression that needs CFG evaluation).

Each predicate defaults to `none()`, so behaviour is unchanged for any
language that doesn't override it (verified by re-running
java/ql/test/library-tests/controlflow/).

The Make0 succession rules are extended:
- WhileStmt/ForeachStmt: route the loop-exit edge through the else
  block before reaching the after-position.
- CatchClause: route the matching-evaluation through the type
  expression (if present) before reaching the after-value position.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python CFG library additively, without changing any production
behaviour.

Library additions:

- semmle.python.controlflow.internal.AstNodeImpl — mediates between
  the Python AST and the shared codeql.controlflow.ControlFlowGraph
  signature. Wraps Python's Stmt/Expr/Scope/Pattern and adds two
  synthetic kinds of node (BlockStmt for body slots, intermediate
  nodes for multi-operand boolean expressions).

- semmle.python.controlflow.internal.Cfg — public facade
  re-exposing the same API surface as semmle/python/Flow.qll
  (ControlFlowNode, CallNode, BasicBlock, NameNode, DefinitionNode,
  CompareNode, ...), backed by the shared CFG.

- lib/printCfgNew.ql — debug/visualisation query for the new CFG.

- consistency-queries/CfgConsistency.ql — consistency query running
  the shared CFG's standard checks against Python.

Shared library:

- shared.controlflow.ControlFlowGraph — adds two defaulted
  getWhileElse / getForeachElse predicates to AstSig so Python can
  model while-else / for-else (no behavioural change for other
  languages).

Test additions:

- ControlFlow/bindings/* — annotation-driven SSA-binding tests for
  the new CFG (annassign, compound, comprehension, decorated,
  except_handler, imports, match_pattern, parameters, simple,
  type_params, walrus_starred, with_stmt, dead_under_no_raise).

- ControlFlow/store-load/* — basic store/load coverage.

- ControlFlow/evaluation-order/NewCfg*.ql — mirrors of the existing
  OldCfg evaluation-order self-validation suite, run against the
  new CFG via NewCfgImpl.qll.

- Minor extensions to existing test_if.py / test_boolean.py +
  cosmetic .expected churn on a handful of OldCfg tests.

No dataflow, SSA, or production query is migrated yet — that lands in
follow-up PRs. The new CFG library has zero callers in lib/ and src/.

Verified by:
- All lib + src + consistency-queries compile clean (367 queries).
- All 56 ControlFlow library-tests pass.
- All 474 dataflow + PointsTo library-tests + consistency tests pass.
- syntax_error/CONSISTENCY/CfgConsistency passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Adds the
new Python SSA adapter additively, without changing any production
behaviour.

Library additions:

- semmle.python.dataflow.new.internal.SsaImpl — Python SSA
  implementation built on the new (shared) CFG. Mirrors the Java SSA
  adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll):
  an InputSig is defined in terms of positional (BasicBlock, int)
  variable references, and the shared
  codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then
  instantiated.

  SourceVariable is the AST-level Py::Variable. Variable references
  are looked up via the new CFG facade's NameNode.defines/uses/deletes
  predicates (added in the preceding PR), which themselves are
  one-line bridges to AST-level Name.defines/uses/deletes.

  Implicit-entry definitions are inserted for non-local/global/builtin
  reads, captured variables, and (when needed) parameters.

Test additions:

- library-tests/dataflow-new-ssa/ — exercises the new SSA over a
  representative test corpus and checks expected def/use chains.

- library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and
  legacy ESSA over the same corpus and diffs the results, so any
  semantic divergence shows up as a test failure.

Production impact:

None. The new SSA adapter has zero callers in lib/ and src/ — the
legacy ESSA SSA (semmle/python/essa/*) remains the default. The
dataflow library is not migrated yet; that lands in a follow-up PR.

Verified by:
- All 367 lib + src + consistency-queries compile clean.
- All 641 ControlFlow + PointsTo + dataflow + essa + consistency
  library-tests pass.
- Both new dataflow-new-ssa[/vs-legacy] test packs pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll)
and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade
(semmle.python.controlflow.internal.Cfg) and the new SSA adapter
(semmle.python.dataflow.new.internal.SsaImpl), both introduced
additively in the preceding PRs in this stack.

This is the trunk-flip equivalent of the original draft PR #21894 (kept
around as documentation), rebased on top of the four preparatory PRs:

  P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919).
  P2: Qualify Flow.qll's AST references with Py:: prefix (#21920).
  P3: Add new shared-CFG-backed control flow graph (#21921).
  P4: Add new shared-SSA-backed SSA adapter (#21923).

The Python dataflow library (semmle/python/dataflow/new/) now imports
the new CFG facade and SSA adapter. All CFG-typed predicates
(ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are
qualified with the Cfg:: prefix; SSA references switch from
EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable.

GuardNode is redesigned to use the new CFG's outcome-node model
(isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock +
flipped indirection. Only BarrierGuard<...> is preserved as public
API.

Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib,
...) are updated to take CFG nodes from the new facade.

A handful of dataflow consistency tweaks for the new CFG:
- Augmented-assignment targets are treated as both load and store.
- 'from X import *' produces uncertain SSA writes for unknown names.
- CFG nodes are canonicalised so dataflow does not see equivalent
  pre/post-order pairs as distinct nodes.

Two AST tweaks for the new CFG:
- AstNodeImpl: omit PEP 695 type-parameter names from
  FunctionDefExpr / ClassDefExpr children.
- ImportResolution: drop the legacy essa import.

Test churn (~175 files): reblessed library- and query-test .expected
files reflect slightly different CFG granularity, different toString
output, and a handful of true alert deltas in security queries.

Verification: all 367 lib + src + consistency-queries compile clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…with

The new CFG previously only emitted exception edges for explicit `raise`
and `assert` statements. As a result, code that became reachable only
via the exception path of an arbitrary expression (e.g., the body of an
`except` handler following a try-body whose `call()` could raise) was
classified as dead, breaking analyses like StackTraceExposure,
FileNotAlwaysClosed, ExceptionInfo, UseOfExit, and CatchingBaseException.

This commit adds a `mayThrow` predicate over expressions that are known
sources of implicit exceptions in Python (calls, attribute access,
subscripts, arithmetic/comparison operators, imports, await/yield/yield
from) plus `from m import *` at the statement level, and routes them
through the shared CFG's `beginAbruptCompletion(_, _, ExceptionSuccessor,
always=false)` hook.

The set of exception sources is restricted to nodes that are
syntactically inside a `try`/`with` statement in the same scope.
This mirrors Java's `ControlFlowGraph::mayThrow`, which only emits
exception edges where local handling can observe them — outside such
contexts, the edges add CFG complexity (weakening BarrierGuard
precision and breaking SSA continuity around augmented assignments and
subscript stores) without analysis benefit, since exceptions just
propagate to the function exit anyway.

Net effect on the test suite: ~100 alerts restored across the exception-
related query tests (StackTraceExposure +29, ExceptionInfo +17,
FileNotAlwaysClosed +52, UseOfExit +1, CatchingBaseException restored)
with no precision regressions. Affected `.expected` files and the
regression-guard `dead_under_no_raise.py` are updated accordingly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@yoff yoff force-pushed the yoff/python-cfg-modelling-exceptions branch from 3e5256d to 92e0331 Compare June 22, 2026 13:47
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from fa159d2 to 408ba62 Compare June 22, 2026 13:47
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch from 408ba62 to 93cae5f Compare June 24, 2026 08:04
@yoff yoff changed the title Python: model exception edges for raise-prone expressions inside try/with Python: proper catch-type modelling (follow-up once #22017 lands) Jun 24, 2026
@yoff yoff force-pushed the yoff/python-shared-cfg-dataflow-flip branch 2 times, most recently from acb4a58 to 5081d81 Compare June 24, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants