perf: prune explicit zeros at matrix assembly (cut build peak memory)#816
Draft
FBumann wants to merge 3 commits into
Draft
perf: prune explicit zeros at matrix assembly (cut build peak memory)#816FBumann wants to merge 3 commits into
FBumann wants to merge 3 commits into
Conversation
Expressions that broadcast against a dense coordinate store one coefficient per pair, most of them structurally zero. Those explicit zeros were carried all the way into `matrices.A` and thus into every solver handoff. `highspy.Highs.addRows` (and the other direct backends' matrix loaders) scale with *stored* nnz, so the handoff spent most of its work describing zeros — e.g. sparse_network(250) stored 1.5M entries for 18k structural nonzeros (98.8% zeros). Prune them once, centrally, in `_stack` via `eliminate_zeros()`, so `A` and `indicator_A` — and hence HiGHS, gurobi, xpress, copt, mosek and the LP/MPS writers — all hand the solver only structural nonzeros. A zero coefficient never changes a constraint, so this is mathematically identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to #815. That eliminates zeros from the stacked constraint matrix after scipy.vstack has already materialised the full dense-with-zeros block, so the peak allocation (what the CodSpeed memory instrument tracks) is unchanged — only the resting size and solver-ingest cost drop. Prune at the source instead: fold `coeffs != 0` into the valid-entry mask in Constraint._matrix_export_data, so broadcast zeros never enter cols/data/the CSR. Snapshot capture and the matrix build share this path, so they stay consistent. Two ripples handled: - matrices.dual derived active rows from stored nnz (np.diff(indptr)); an all-zero-coefficient row would lose its dual slot once pruned. Derive active rows from row activity via a new ConstraintBase.active_row_mask (also avoids rebuilding the CSR just to count rows). - A zero-coefficient term no longer changes the sparsity pattern, so the persistent warm-start path no longer forces a SPARSITY rebuild for it. Test updated to use a non-zero coefficient; a no-rebuild case added. sparse_network(250): build peak 50 -> 36 MB (~27%), identical solver result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merging this PR will improve performance by 43.25%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Memory | test_to_solver[highs-sparse_network-n=250] |
64.8 MB | 34.8 MB | +86.16% |
| ⚡ | Memory | test_to_solver[highs-kvl_cycles-severity=50] |
245.6 MB | 160.6 MB | +52.99% |
| ⚡ | Memory | test_to_solver[highs-kvl_cycles-severity=100] |
217 MB | 155 MB | +39.95% |
| ⚡ | Memory | test_to_solver[gurobi-sparse_network-n=250] |
48 MB | 35.1 MB | +37.07% |
| ⚡ | Memory | test_to_solver[gurobi-kvl_cycles-severity=100] |
198.8 MB | 155.4 MB | +27.96% |
| ⚡ | Memory | test_to_solver[gurobi-kvl_cycles-severity=50] |
198.8 MB | 160.9 MB | +23.58% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing perf/prune-zeros-at-source (97ab692) with master (e861678)
Footnotes
-
173 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Intent placeholder — @FBumann to replace with your own words.
Important
Draft — depends on #815. Merge #815 first, then mark this ready.
This targets
master, so until #815 merges the diff below also includes #815's commit (the_stackeliminate_zeros()). Once #815 is in, GitHub collapses the diff to just this PR's source-level prune.Follow-up that completes the zero-drop story on the memory axis.
Note
The following was generated by AI.
Why a second PR
#815 eliminates zeros from the stacked constraint matrix with
eliminate_zeros(). That runs afterscipy.vstackhas already materialised the full dense-with-zeros block, so the peak allocation — exactly what the CodSpeed memory instrument tracks — is unchanged; only the resting matrix size and the solver-ingest cost drop. That's why the memory job shows no movement even though the handoff is measurably faster.Change
Prune at the source: fold
coeffs != 0into the valid-entry mask inConstraint._matrix_export_data, so broadcast zeros never entercols/data/the CSR at all. Snapshot capture and the matrix build both go through this path, so they stay consistent. #815'seliminate_zeros()stays as cheap safety for the one case a pre-filter can't catch — duplicate variable terms that cancel to zero aftersum_duplicates.Two ripples (both handled)
matrices.dualderived its active rows from stored nnz (np.diff(csr.indptr)). Once zeros are pruned, an all-zero-coefficient row (e.g.0·x ≤ 5) keeps its row inAbut stores no entry, so it would silently lose its dual slot → misaligned dual vector. Fixed by deriving active rows from row activity via a newConstraintBase.active_row_mask(coefficient-independent; also skips a redundant CSR rebuild).SPARSITYrebuild when one is added — correct, since the matrix is unchanged.test_shape_mismatch_triggers_sparsity_rebuildupdated to use a non-zero coefficient (still exercises the real path), andtest_zero_coefficient_term_needs_no_rebuildadded for the new behaviour.Impact
Peak allocation building
m.matrices.A— sparse_network(250)_stackeliminate_zeros)~27% lower peak. The residual is the unavoidable dense-coeffs flatten — removing that needs expression-level sparsity, out of scope here. Solver result is identical;
addMConstr/addRowsstay ~2× faster from the smaller nnz.Full test suite green (3770 passed, 45 skipped).
🤖 Generated with Claude Code