Skip to content

Fix Algorithm deepCopy subtype preservation#453

Open
Abhi190702 wants to merge 1 commit into
cbomkit:mainfrom
Abhi190702:codex/fix-algorithm-deepcopy-subtypes
Open

Fix Algorithm deepCopy subtype preservation#453
Abhi190702 wants to merge 1 commit into
cbomkit:mainfrom
Abhi190702:codex/fix-algorithm-deepcopy-subtypes

Conversation

@Abhi190702
Copy link
Copy Markdown

@Abhi190702 Abhi190702 commented May 24, 2026

Fixes #127

What changed

  • Added a metadata-only copy hook to Algorithm.deepCopy() so recursive child copying stays centralized while concrete algorithm subtype identity is preserved.
  • Added type-specific copy constructors and shallowCopy() overrides across Algorithm subclasses, including indirect subclasses such as Ascon128.
  • Updated RSA to use the shared Algorithm.deepCopy() path instead of duplicating the child-copy loop. This also intentionally preserves origin; the previous custom RSA path used the 3-argument Algorithm constructor and silently reset origin to the default.
  • Defensively copy the children map in the multi-argument asKind constructor so recasting cannot mutate the source map. Child INode instances remain shared unless callers use deepCopy().
  • Added regression coverage for representative subtype preservation, recursive child-copy isolation, metadata preservation, enricher recasting behavior, and all direct Algorithm subclasses discovered on the classpath.

Why

Algorithm.deepCopy() previously instantiated new Algorithm(this), so calling deepCopy() on AES and other algorithm subclasses returned a plain Algorithm. That broke downstream instanceof logic, including enricher behavior and CBOM output decisions.

Validation

  • mvn -pl mapper,enricher spotless:apply
  • mvn -pl mapper -am -Dexec.skip=true -Dtest=AlgorithmDeepCopyTest -Dsurefire.failIfNoSpecifiedTests=false test (154 AlgorithmDeepCopyTest cases, including 151 direct algorithm subclasses)
  • mvn -pl mapper,enricher -am -Dexec.skip=true test (mapper/common/engine/enricher pass; mapper 197 tests with 1 existing skip, enricher 28 tests)

Note: -Dexec.skip=true is needed on Windows because mapper's compile phase otherwise tries to execute download-cipher-suites.sh directly and fails with %1 is not a valid Win32 application.

@Abhi190702 Abhi190702 changed the title [codex] Fix Algorithm deepCopy subtype preservation Fix Algorithm deepCopy subtype preservation May 24, 2026
Copy link
Copy Markdown
Contributor

@n1ckl0sk0rtge n1ckl0sk0rtge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Thanks for tackling #127 — the fix correctly preserves subtype identity, all 174 algorithm subclasses + 5 intermediate classes are covered, RSA is migrated to the shared path, and 3-level inheritance (Ascon128 → Ascon → Algorithm) works. No architectural, security, performance, or consumer-breaking concerns. Requesting changes on a couple of points before merge.

Coverage check — verified against the diff:

  • Architecture: clean, all changes confined to mapper model + 1 enricher test
  • All 155 transitive Algorithm subclasses + 23 indirect descendants are touched
  • Intermediate classes (Ascon, BLAKE, Elephant, Grain, IES, Isap, DSA, EllipticCurveAlgorithm, PBES1/2, PKCS12PBE) correctly use protected copy ctors
  • No scope creep; no behavior change for consumers besides the (intended) defensive children-map copy in the multi-arg "asKind" constructor

Must fix

1. DCO check failing — the commit needs a Signed-off-by trailer. Please amend with git commit --amend --signoff (or git rebase --signoff main) and force-push.


Should fix

2. AlgorithmDeepCopyTest — RSA and Ascon128 cases don't actually pin down the new behavior

Both tests only check isInstanceOf(...) + hasChildOfType(...).isPresent(). The AES case goes further and asserts .isNotSameAs(...) on the child KeyLength, which is what proves the recursive child.deepCopy() loop ran. RSA is the most interesting case here because it lost its custom deepCopy() override — that's exactly the regression case where you want a child-independence assertion. Same for Ascon128 (the 3-level case).

Suggested additions:

// In deepCopyPreservesExistingSpecializedSubclass()
assertThat(copy.hasChildOfType(KeyLength.class).get())
        .isNotSameAs(rsa.hasChildOfType(KeyLength.class).get());

// In deepCopyPreservesIndirectAlgorithmSubclass()
assertThat(copy.hasChildOfType(TagLength.class).get())
        .isNotSameAs(ascon128.hasChildOfType(TagLength.class).get());

3. No field-equality assertion on the copied metadata

Nothing in the new tests verifies that name, kind, detectionLocation, and especially origin are actually copied. The origin field matters because RSA's old copy path went through super(rsa.name, rsa.kind, rsa.detectionLocation) (the 3-arg ctor) — that path did NOT propagate origin. The new path uses super(rsa) (protected copy ctor) which does. This is a behavior improvement, but it's only locked in if a test asserts it. A couple of lines per test would do it.

4. shallowCopy() hook has a silent-failure footgun for future subclasses

mapper/src/main/java/com/ibm/mapper/model/Algorithm.java — the base shallowCopy() returns new Algorithm(this). Every current subclass overrides it correctly. But if a future subclass author forgets to override, deepCopy() silently returns a plain Algorithm — reintroducing #127 for that family with no compile error and no test failure unless someone remembers to add an assertion.

Two mitigations worth considering:

  • JavaDoc on shallowCopy() stating: "Subclasses MUST override this to return their concrete type; otherwise deepCopy() will lose subtype identity."
  • A single reflection-driven parametrized test that iterates all Algorithm subclasses on the classpath and asserts instance.deepCopy().getClass() == instance.getClass(). ~30 lines, covers every existing and future subclass.

Suggestions

5. Method naming — shallowCopy() is a bit misleading
It doesn't even copy children shallowly — it returns a same-type instance with an empty children map. newEmptyOfSameType() or copyMetadataOnly() would describe the contract more honestly. Not a blocker; a JavaDoc clarification is enough if renaming across 174 files is too invasive.

6. Mention RSA's origin-field fix in the PR description
The removal of RSA's custom deepCopy() override is a subtle behavior improvement (previously the origin field was dropped via the 3-arg super call). Worth a line in the PR body so downstream consumers aren't surprised.

7. Document the partial defensive-copy semantics
The multi-arg "asKind" ctor's new HashMap<>(algorithm.getChildren()) creates a new map but shares INode references. The AESEnricherTest exercises removal-by-key isolation, which works, but mutation of a child's nested state would still be visible on both sides. Worth a one-line comment next to the assignment: // Children INodes themselves are aliased; deep isolation requires deepCopy().


What I like

  • Linked-issue scope is exact; PR addresses every acceptance criterion in #127.
  • The shallowCopy() template-method approach eliminates the children-loop duplication that the issue's literal suggestion would have spread across 174 files. (Worth noting the alternative — overriding deepCopy() per subclass — would also fix the bug and is simpler conceptually; the DRY win here is real but the trade-off is the footgun in finding #4.)
  • Intermediate-class visibility (protected on Ascon/BLAKE/IES/etc., private on leaves) is correct throughout.
  • The AESEnricherTest addition validates the real-world enricher pipeline scenario the issue reporter called out.

@Abhi190702 Abhi190702 force-pushed the codex/fix-algorithm-deepcopy-subtypes branch from 77c5695 to eeceece Compare May 29, 2026 17:32
@Abhi190702
Copy link
Copy Markdown
Author

Thanks for the detailed review!

I addressed the requested follow-ups:

Strengthened the RSA and Ascon128 deepCopy regression tests to assert that copied child nodes are separate instances from the originals.
Added metadata preservation assertions for name, kind, detection location, and origin.
Added JavaDoc documenting the shallowCopy() contract for future subclasses.
Added a comment clarifying that the recast constructor performs a map-level defensive copy while child INode instances remain shared.
I also amended the commit with --signoff to fix the DCO failure.

Validation:

mvn -pl mapper,enricher spotless:apply
mvn -pl mapper,enricher -am -Dexec.skip=true -Dtest=AlgorithmDeepCopyTest,AESEnricherTest -Dsurefire.failIfNoSpecifiedTests=false test
mvn -pl mapper,enricher -am -Dexec.skip=true test

Copy link
Copy Markdown
Contributor

@n1ckl0sk0rtge n1ckl0sk0rtge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update — the fix is correct and complete, with great test coverage for direct, indirect, and previously-customized subclasses. Two items I'd like addressed before merge:

1. Add a reflection-based parametrized test covering every direct Algorithm subclass

shallowCopy() has a silent-fallback failure mode: any future subclass that forgets to override it will degrade deepCopy() back to plain Algorithm, recreating exactly the bug this PR fixes. The current AlgorithmDeepCopyTest covers three representative cases (AES, RSA, Ascon128), but doesn't prevent drift as new algorithms are added.

Please add a parametrized test in mapper/src/test/java/com/ibm/mapper/model/AlgorithmDeepCopyTest.java that iterates every subclass of Algorithm under com.ibm.mapper.model.algorithms (via reflection / classpath scan), instantiates each, and asserts instance.deepCopy().getClass() == instance.getClass(). This was also flagged in the earlier review round.

2. Clarify the comment on Algorithm.java:37

// Child INode instances are intentionally shared here; use deepCopy() for deep isolation.
this.children = new HashMap<>(algorithm.getChildren());

The comment is technically correct but doesn't explain what changed. The real fix here is that the map container is now defensively copied (was previously aliased via algorithm.getChildren()), while child INode references remain shared. That subtlety matters because some reorganization paths previously relied on the aliasing.

Suggested rewording:

// The children map is defensively copied so callers (e.g. asKind reorganization)
// cannot mutate the source. Child INode instances remain shared — use deepCopy()
// for full isolation.

Nice quiet bonus in RSA.java:96: switching super(rsa.name, rsa.kind, rsa.detectionLocation)super(rsa) also preserves the origin field that the old path silently dropped. Worth mentioning in the PR body so future readers know it's intentional.

Signed-off-by: ABHIJEET RANJAN <abhijeet.r1907@gmail.com>
@Abhi190702 Abhi190702 force-pushed the codex/fix-algorithm-deepcopy-subtypes branch from eeceece to 609b8a0 Compare June 1, 2026 14:00
@Abhi190702 Abhi190702 marked this pull request as ready for review June 1, 2026 14:01
@Abhi190702 Abhi190702 requested a review from a team as a code owner June 1, 2026 14:01
@Abhi190702
Copy link
Copy Markdown
Author

Thanks again for the detailed review.

I addressed both requested follow-ups from the latest review.

1. Reflection-based coverage for direct Algorithm subclasses

I added a parameterized test in AlgorithmDeepCopyTest that scans compiled classes under com.ibm.mapper.model.algorithms, filters direct subclasses of Algorithm, instantiates them, and verifies that:

instance.deepCopy().getClass() == instance.getClass()

This is intended to guard against the silent shallowCopy() fallback issue you pointed out. If a future direct Algorithm subclass forgets to override shallowCopy(), this test should now fail instead of allowing deepCopy() to silently return a plain Algorithm.

The focused test now covers 154 cases total, including 151 direct Algorithm subclass reflection cases.

2. Clarified defensive-copy semantics in Algorithm

I updated the comment next to the new HashMap<>(algorithm.getChildren()) assignment to make the behavior explicit:

  • the children map container is defensively copied
  • callers such as asKind reorganization cannot mutate the source map
  • child INode instances remain shared
  • callers should use deepCopy() when full isolation is required

I also updated the PR body to explicitly mention the RSA origin preservation behavior. The previous RSA custom copy path went through the 3-argument Algorithm constructor and could drop origin; the new shared path uses super(rsa), so preserving origin is intentional.

Validation

I reran:

  • mvn -pl mapper,enricher spotless:apply
  • mvn -pl mapper -am -Dexec.skip=true -Dtest=AlgorithmDeepCopyTest -Dsurefire.failIfNoSpecifiedTests=false test
    • 154 AlgorithmDeepCopyTest cases passed
    • includes 151 direct Algorithm subclass reflection cases
  • mvn -pl mapper,enricher -am -Dexec.skip=true test
    • mapper/common/engine/enricher passed
    • mapper: 197 tests, 1 existing skip
    • enricher: 28 tests

I amended the existing signed-off commit and force-pushed the updated branch.
If there is something More , I would be happy to Look into it !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deepCopy does not create an instance of the specific class

2 participants