add similar library to render diffs from regression test results by Aurashk · Pull Request #1296 · EnergySystemsModellingLab/MUSE2

Aurashk · 2026-05-19T12:17:53Z

Description

Adds the similar diffing library to render the output from failing regression tests in a nicer way

Fixes #1186

Type of change

Bug fix (non-breaking change to fix an issue)
New feature (non-breaking change to add functionality)
Refactoring (non-breaking, non-functional change to improve maintainability)
Optimization (non-breaking change to speed up the code)
Breaking change (whatever its nature)
Documentation (improve or add documentation)

Key checklist

All tests pass: $ cargo test
The documentation builds and looks OK: $ cargo doc
Update release notes for the latest release if this PR adds a new feature or fixes a bug
present in the previous release

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

Copilot

Pull request overview

Adds the similar crate to improve regression-test failure diagnostics by rendering structured diffs instead of line-by-line mismatch lists.

Changes:

Adds similar as a dev-dependency.
Updates regression comparison to detect mismatches first, then render a diff for differing CSV outputs.
Introduces render_diff to show inserted/deleted lines with source line numbers.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`Cargo.toml`	Adds `similar` to dev-dependencies for regression test diagnostics.
`Cargo.lock`	Locks the new `similar` dependency.
`tests/regression.rs`	Replaces per-line mismatch reporting with diff rendering for failed regression output comparisons.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+fn render_diff(lines1: &[String], lines2: &[String]) -> String {
+    let text1 = lines1.join("\n");
+    let text2 = lines2.join("\n");
+    let diff = TextDiff::from_lines(&text1, &text2);


codecov · 2026-05-19T12:24:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.73%. Comparing base (0af4399) to head (3d5b4a7).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1296      +/-   ##
==========================================
+ Coverage   89.46%   89.73%   +0.27%     
==========================================
  Files          57       57              
  Lines        8361     8418      +57     
  Branches     8361     8418      +57     
==========================================
+ Hits         7480     7554      +74     
+ Misses        581      564      -17     
  Partials      300      300

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alexdewar

Copilot's right. I tried this locally and when I changed one line of a commodity_flows.csv file, it showed a diff including every line where the text doesn't match exactly, even where the differences in floating-point values were below the threshold.

I think the right way to do this is to use the capture_diff_slices function from the crate. If it returns an empty Vec, the files are the same, otherwise they are different.

Unfortunately to do the comparisons correctly, you'll need to derive a custom type representing a field in the CSV file (an enum), which can be either a non-NaN floating-point number or a plain string. This type needs to derive Eq and Hash. Hash is, unfortunately, not defined for f64, because NaNs break everything. But! There is a NotNan type in the ordered_float crate, which does implement Hash, so you could use this. You'll define Eq to do the comparison logic we've currently got in compare_lines. Each CSV file will be represented as a Vec<Vec<MyCustomFieldType>> and you can pass these to capture_diff_slices.

Does this make sense?

For bonus points, you could use the colored crate to colourise the diff (we already have it as an indirect dependency), but we can probably get by without this too.

alexdewar · 2026-05-19T13:41:40Z

Actually, there are a couple of things I missed.

You'll also need to make sure that similar floats have the same hash, which is annoying. Probably the easiest way to do this is to round to the nearest number divisible by the tolerance. You'll also need to store string representations of the lines somewhere so you can print them out for the diff.

All in all, this is a little fiddly.

…in floating precision add tests that check diffs rendered are as expected

alexdewar

This mostly seems to work -- and is definitely an improvement on what we had before! -- but there's something that's not working quite right.

What you're doing is:

Compare files using the old comparison code
If different, feed into similar's algorithm to get diff
Print diff, excluding lines that similar thinks are the same but are different according to compare_line

What we should be able to do is:

Feed files into similar's algorithm directly
Discard any DiffOps::Equals
If there are any diff ops remaining, print them all out

We shouldn't need the old code anymore -- if similar thinks lines are different, but the old code thinks they're the same or vice versa, we're not doing the comparison quite right. There are a bunch of output files with spurious replacements in them that we end up ignoring, so something's not working.

I'm not sure what to do, because your new code looks ok to me. It's just not giving the same results as the old. We don't want to let the perfect be the enemy of the good though and it is an improvement.

Maybe it would work better if we represented decimal numbers in base-10 instead (e.g. using this crate; though when I tried it, I was still getting some seemingly spurious mismatches).

If you'd like to fiddle with this more to see if we can yeet the old code, then that would be great, but otherwise I'm happy to merge this and open an issue about this problem. What are you feeling?

alexdewar · 2026-05-20T13:03:17Z

+                for idx in 0..paired_len {
+                    let old_idx = old_start + idx;
+                    let new_idx = new_start + idx;
+                    if !compare_line(&lines1[old_idx], &lines2[new_idx]) {


This extra comparison shouldn't be necessary 😞. The diff algo thinks they're different and if compare_line says otherwise, then it's not doing the same comparison that we're getting with the Eq implementation for CSVLine, which is a problem.

alexdewar · 2026-05-20T13:30:19Z

+    let scaled = value / FLOAT_CMP_TOLERANCE;
+    let quantised = if scaled.is_finite() {
+        scaled.round() * FLOAT_CMP_TOLERANCE


@jamesturner246 As the resident floating-point guru, does this seem like an ok way to round a number to the nearest FLOAT_CMP_TOLERANCE? Numbers need to come out exactly the same if they are within FLOAT_CMP_TOLERANCE of one another, because they need to have the same hash.

I know it's not be the answer you want, but I'd go with the approx equals routes.

Absolute tolerance: abs(a-b) <= abs_tol.
Relative tolerance: abs(a-b) <= rel_tol * max(abs(a), abs(b)).

The issue with this code is that

there's the risk of overflow if FLOAT_CMP_TOLERANCE is small

given scaled will be large, the rounding will be courser, distorting the calculation

It seems rust's round has weird nonstandard behaviour: it rounds away from zero, rather than rounding to nearest, ties to even LSB required by IEEE-754

@Aurashk It seems like this approach to quantisation isn't going to work because... floating-point numbers 😞.

Are you happy to have another go at this using the rust_decimal crate instead? That should give us proper, consistent rounding, so that we don't end up with two CsvLines within the tolerance giving different hashes (due to FP rounding errors), which is what we have currently. You may have to tweak the tolerance slightly or regenerate some test files to get things to pass because it is slightly different from the current tolerance checking we're doing, but that's ok.

NB: You can use different rounding strategies with this crate (see here). Dunno if this is something we'll want to change or whether the default MidpointNearestEven is fine.

Btw I have verified that we do see these rounding errors with our actual data, e.g. the missing_commodity test fails with:

-L608: 2040,10,BIOPRD,autumn.peak,25.852906323049822 CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.852906323000003)]) +L608: 2040,10,BIOPRD,autumn.peak,25.852906323050078 CsvLine([Float(2040.0), Float(10.0), Text("BIOPRD"), Text("autumn.peak"), Float(25.8529063231)])

I tried another thing too which seems to work and is much simpler, see if you are happy with this. If not I can have a go with the decimal crate.

filter through each line with existing floating point tolerance first - if they meet the tolerance make the lines exactly equal

feed everything to the diffing library
I appreciate it feels a bit wasteful going through twice, but there seems to be no palpable performance change. If you want to try it just put this in your regression.rs

regression.txt

As @AdrianDAlessandro suggested, this would be a lot faster if we only do this extra work in the case that we already know the files are different, because currently it's comparing every line of one file with every line in another using our probably quite slow comparison method, so when comparing two files that are 1000 lines long, you'd get 1m comparisons, rather than 1000.

This still isn't quite correct though. If you have two identical lines in your CSV file and one of them is deleted, it won't appear in the diff, because it will find a match somewhere else in the file. For it to be totally right, the diffing algorithm has to be the single source of truth. That said, I suspect it will probably work fine in practice, at least most of the time.

You probably wouldn't need to change much to use the decimal type I mentioned though. It would mostly be a case of swapping out the NotNan for a Decimal and deleting the old comparison code.

Sorry for the slow response, managed to have another look at this now.

I fiddled around with the rust_decimal crate for a bit, but it still seems to have similar problems to what we were facing before. My understanding now (could be wrong) is that however you convert/round/truncate the floating points you will always lose some information, so it's never going to be as good as comparing floating points directly. Even if you tweak the tolerance, the diff will still diverge from that tolerance in practice. I feel like we would need to use fixed point calculations throughout for that to work properly.

I applied the suggestion to only do floating point replacements if there is a mismatch between files and managed to improve on the other approach a bit by binning the 'equivalent' lines in maps before replacing instead of doing a brute force check-and-replace of each line. This seems to work okay. I did a stress test by turning down the tolerance to 1e-14 and it's still rendering all the diffs in a few seconds. Let me know what you think.

Note that, because of the binning, if there are two lines which are exact duplicates within the tolerance, it is possible for the reported diff to contain a slightly different line (within tolerance) to what is actually outputted to the file, although I'm not sure this is going to matter much in practice, I added a warning so it doesn't confuse people.

Nw! I'm off in a mo though so won't have a chance to look at this again.

The thing about using a base-10 type is that while it does still have some of the other problems associated with binary floating-point numbers, you should be able to represent the rounded version exactly.

E.g. in a file, you have 1.23456789, but we only care about the first two DPs. If you represent it with the Decimal type you should be able to represent the original number exactly and also the rounded version (1.23). So every time you want to compare a number that rounds to 1.23, you can do so safely, because any number that rounds to 1.23 will always be represented in the same way -- byte for byte. This also means the hashes of any two of these values will also be equal, which is another property we need. You can't round to two decimal places exactly using a regular binary floating-point number in the same way.

There might be something I'm missing... anyway, shall we park this one until I get back?

I agree with what you are saying - once you are in fixed precision land everything will work. It just doesn't seem to fit the problem when you try it though.

The rounding from floating point causes problems, for example:

0.100000005 0.1000000050601934845

rounded to 10 dp, come out to different numbers

0.1000000050 0.1000000051

even though they are only 6.0193475e-11 apart. We could round everything to (for instance) 10dp, make the claim that the regression tests enforce a similarity of 9dp, and turn down the dp until the regression tests pass, but I'm not really a fan of that. All our other tolerances are defined using floating point differences. I feel like if you're using fixed precision anywhere it should ideally be used everywhere.

In reference to what James says above too, it would be even better to use a relative floating point tolerance as well as an absolute tolerance, as 1e-10 could be insignificant between two values that are orders of magnitude apart

No problem, no rush to get back to me!

add more robust testing and more cases for csv diff

add similar library to render diffs from regression test results

38eb020

Copilot AI review requested due to automatic review settings May 19, 2026 12:17

Copilot started reviewing on behalf of Aurashk May 19, 2026 12:18 View session

Aurashk requested review from AdrianDAlessandro, alexdewar, mjademitchell and tsmbland May 19, 2026 12:18

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread tests/regression.rs Outdated

fn render_diff(lines1: &[String], lines2: &[String]) -> String {

let text1 = lines1.join("\n");

let text2 = lines2.join("\n");

let diff = TextDiff::from_lines(&text1, &text2);

alexdewar requested changes May 19, 2026

View reviewed changes

Aurashk added 3 commits May 19, 2026 16:01

Make diffing of regression tests less spammy when most lines are with…

50ad023

…in floating precision add tests that check diffs rendered are as expected

add test for quantisation edge case

797fdf8

add colours to diffing render

3d5b4a7

Aurashk requested a review from alexdewar May 20, 2026 10:47

alexdewar reviewed May 20, 2026

View reviewed changes

add binning method for floating comparisons

c9da3d5

add more robust testing and more cases for csv diff

Aurashk requested a review from alexdewar May 28, 2026 13:41

move testing helpers to test module

3bc9bf4

Conversation

Aurashk commented May 19, 2026

Description

Type of change

Key checklist

Further checks

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

alexdewar commented May 19, 2026

Uh oh!

alexdewar left a comment

Choose a reason for hiding this comment

Uh oh!

alexdewar May 20, 2026

Choose a reason for hiding this comment

Uh oh!

alexdewar May 20, 2026

Choose a reason for hiding this comment

Uh oh!

jamesturner246 May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexdewar May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Aurashk May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexdewar May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Aurashk May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexdewar May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Aurashk May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented May 19, 2026 •

edited

Loading

jamesturner246 May 20, 2026 •

edited

Loading

Aurashk May 21, 2026 •

edited

Loading

Aurashk May 28, 2026 •

edited

Loading

Aurashk May 28, 2026 •

edited

Loading