perf: prune explicit zeros at matrix assembly (cut build peak memory) by FBumann · Pull Request #816 · PyPSA/linopy

FBumann · 2026-07-02T14:33:47Z

Intent placeholder — @FBumann to replace with your own words.

Important

Draft — depends on #815. Merge #815 first, then mark this ready.
This targets master, so until #815 merges the diff below also includes #815's commit (the _stack eliminate_zeros()). Once #815 is in, GitHub collapses the diff to just this PR's source-level prune.

Follow-up that completes the zero-drop story on the memory axis.

Note

The following was generated by AI.

Why a second PR

#815 eliminates zeros from the stacked constraint matrix with eliminate_zeros(). That runs after scipy.vstack has already materialised the full dense-with-zeros block, so the peak allocation — exactly what the CodSpeed memory instrument tracks — is unchanged; only the resting matrix size and the solver-ingest cost drop. That's why the memory job shows no movement even though the handoff is measurably faster.

Change

Prune at the source: fold coeffs != 0 into the valid-entry mask in Constraint._matrix_export_data, so broadcast zeros never enter cols/data/the CSR at all. Snapshot capture and the matrix build both go through this path, so they stay consistent. #815's eliminate_zeros() stays as cheap safety for the one case a pre-filter can't catch — duplicate variable terms that cancel to zero after sum_duplicates.

Two ripples (both handled)

Dual alignment. matrices.dual derived its active rows from stored nnz (np.diff(csr.indptr)). Once zeros are pruned, an all-zero-coefficient row (e.g. 0·x ≤ 5) keeps its row in A but stores no entry, so it would silently lose its dual slot → misaligned dual vector. Fixed by deriving active rows from row activity via a new ConstraintBase.active_row_mask (coefficient-independent; also skips a redundant CSR rebuild).
Warm-start semantics. A zero-coefficient term no longer changes the sparsity pattern, so the persistent-solver path no longer forces a SPARSITY rebuild when one is added — correct, since the matrix is unchanged. test_shape_mismatch_triggers_sparsity_rebuild updated to use a non-zero coefficient (still exercises the real path), and test_zero_coefficient_term_needs_no_rebuild added for the new behaviour.

Impact

Peak allocation building m.matrices.A — sparse_network(250)

variant	build peak	final nnz
master (keep zeros)	50.1 MB	1,506,000
#815 (`_stack` eliminate_zeros)	49.9 MB	18,000
this PR (prune at source)	36.3 MB	18,000

~27% lower peak. The residual is the unavoidable dense-coeffs flatten — removing that needs expression-level sparsity, out of scope here. Solver result is identical; addMConstr/addRows stay ~2× faster from the smaller nnz.

Full test suite green (3770 passed, 45 skipped).

🤖 Generated with Claude Code

Expressions that broadcast against a dense coordinate store one coefficient per pair, most of them structurally zero. Those explicit zeros were carried all the way into `matrices.A` and thus into every solver handoff. `highspy.Highs.addRows` (and the other direct backends' matrix loaders) scale with *stored* nnz, so the handoff spent most of its work describing zeros — e.g. sparse_network(250) stored 1.5M entries for 18k structural nonzeros (98.8% zeros). Prune them once, centrally, in `_stack` via `eliminate_zeros()`, so `A` and `indicator_A` — and hence HiGHS, gurobi, xpress, copt, mosek and the LP/MPS writers — all hand the solver only structural nonzeros. A zero coefficient never changes a constraint, so this is mathematically identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Follow-up to #815. That eliminates zeros from the stacked constraint matrix after scipy.vstack has already materialised the full dense-with-zeros block, so the peak allocation (what the CodSpeed memory instrument tracks) is unchanged — only the resting size and solver-ingest cost drop. Prune at the source instead: fold `coeffs != 0` into the valid-entry mask in Constraint._matrix_export_data, so broadcast zeros never enter cols/data/the CSR. Snapshot capture and the matrix build share this path, so they stay consistent. Two ripples handled: - matrices.dual derived active rows from stored nnz (np.diff(indptr)); an all-zero-coefficient row would lose its dual slot once pruned. Derive active rows from row activity via a new ConstraintBase.active_row_mask (also avoids rebuilding the CSR just to count rows). - A zero-coefficient term no longer changes the sparsity pattern, so the persistent warm-start path no longer forces a SPARSITY rebuild for it. Test updated to use a non-zero coefficient; a no-rebuild case added. sparse_network(250): build peak 50 -> 36 MB (~27%), identical solver result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-07-02T14:41:22Z

Merging this PR will improve performance by 43.25%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 6 improved benchmarks
✅ 167 untouched benchmarks
⏩ 173 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`test_to_solver[highs-sparse_network-n=250]`	64.8 MB	34.8 MB	+86.16%
⚡	Memory	`test_to_solver[highs-kvl_cycles-severity=50]`	245.6 MB	160.6 MB	+52.99%
⚡	Memory	`test_to_solver[highs-kvl_cycles-severity=100]`	217 MB	155 MB	+39.95%
⚡	Memory	`test_to_solver[gurobi-sparse_network-n=250]`	48 MB	35.1 MB	+37.07%
⚡	Memory	`test_to_solver[gurobi-kvl_cycles-severity=100]`	198.8 MB	155.4 MB	+27.96%
⚡	Memory	`test_to_solver[gurobi-kvl_cycles-severity=50]`	198.8 MB	160.9 MB	+23.58%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing perf/prune-zeros-at-source (97ab692) with master (e861678)}

173 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

FBumann and others added 2 commits July 2, 2026 15:39

FBumann changed the base branch from perf/drop-explicit-zeros to master July 2, 2026 14:35

FBumann marked this pull request as draft July 2, 2026 14:35

ci: trigger CI

97ab692

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: prune explicit zeros at matrix assembly (cut build peak memory)#816

perf: prune explicit zeros at matrix assembly (cut build peak memory)#816
FBumann wants to merge 3 commits into
masterfrom
perf/prune-zeros-at-source

FBumann commented Jul 2, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FBumann commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why a second PR

Change

Two ripples (both handled)

Impact

Uh oh!

codspeed-hq Bot commented Jul 2, 2026

Merging this PR will improve performance by 43.25%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FBumann commented Jul 2, 2026 •

edited

Loading