Skip to content

ext4: fix extent tree splits for large fragmented files#412

Open
apinonformoso wants to merge 3 commits into
diskfs:masterfrom
digitalocean:fix/ext4-large-file-extent-splits
Open

ext4: fix extent tree splits for large fragmented files#412
apinonformoso wants to merge 3 commits into
diskfs:masterfrom
digitalocean:fix/ext4-large-file-extent-splits

Conversation

@apinonformoso

@apinonformoso apinonformoso commented Jun 24, 2026

Copy link
Copy Markdown

Summary

Fixes bugs in ext4 extent tree growth when leaf or internal nodes split during extendExtentTree. These failures showed up when writing large, fragmented files whose extent trees grow beyond the inode-root limits, for example, copying large binaries from container images such as node:22-slim.

Previously:

  • When a non-root leaf node split, the new leaf nodes were written via writeNodeToDisk without allocated disk blocks, causing:
could not convert extents into tree: block number not found for node
  • When leaf splits increased the number of children under the inode-root internal node past its 4-entry limit, serialization could panic (slice bounds out of range) because the over-capacity root was never split.

This change:

  • Allocates disk blocks and uses writeNodeToBlock when splitting non-root leaf nodes (consistent with the existing root-split path)
  • Splits the inode-root internal node via splitInternalNodeChildren when child count exceeds max
  • Applies the same explicit block allocation to non-root internal node splits
  • Adds regression tests for non-root leaf split, inode-root internal overflow, and an ~80 MiB chunked write scenario

Root cause

writeNodeToDisk resolves a node's disk block by looking it up in parent.children. Nodes created during a split are not yet registered there, so the lookup fails. Root splits already avoided this by allocating blocks explicitly; non-root splits did not.

When a leaf split replaced one child with two under the inode-root internal node, the root could exceed its 4-child limit without being split, producing an invalid tree.

Test plan

  • go test ./filesystem/ext4/ -run '^TestExtendExtentTree' -count=1
  • Verified pre-fix code fails at fileBlock 340 with block number not found for node

@deitch deitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I didn't understand

Comment thread filesystem/ext4/extent.go Outdated
} else {
// Write new leaf nodes to the disk using parent reference
err := writeNodeToDisk(firstLeaf, fs, parent)
// Non-root split: allocate disk blocks for the new leaf nodes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section of code looks identical to the root case above. How is it different?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, they are identical now.

The fix was to make the non-root path do the same allocate + writeNodeToBlock as the root path, which ended as two duplicate blocks since I kept the original if/else structure, but it doesn't make sense to leave it like that.

For context, the else branch used writeNodeToDisk, assuming the parent could resolve where to write. That caused a lookup failure with block number not found for node because the new split nodes aren't in parent.children yet (the parent is updated later in extendLeafNode).

I'll collapse this into a single block.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed da4a0be

@apinonformoso apinonformoso requested a review from deitch June 25, 2026 13:51

@deitch deitch left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has less repeated code, which is better. thank you.

I see a logic concern. If I understood it correctly, by removing parent from splitLeafNode(), it no longer knows if this is root or not. Does this not mean that it always creates 2 new nodes? Let's say I am starting with the root, and it grows too big. At that point, it should create two new nodes in blocks (A and B), and the root points to A and B. Now an internal node gets too big, e.g. A. ext4 normally does not create two new nodes, C and D. Instead, it creates one new node (C), and root points to A, C, B. Instead, it looks like we add C and D, the root points to C, D, B and A is lost entirely.

The only time we do need more than 1 node (other than when outgrowing root) is when the parent of any level is full and we need internal nodes. E.g. root has 4 slots, and if we are already at A, B, C, D and need another slot, then (A E B C D), but 5 slots do not fit, so we create E and then internal I1 I2 and then repoint root -> I1 and root ->I1, etc. But that, I believe, is handled correctly.

@apinonformoso

Copy link
Copy Markdown
Author

Good catch, you're right that orphaning the original block isn't correct.

To clarify the mechanics: splitLeafNode always allocated two new blocks and redistributed the extents into them, so the file's data extents are preserved (they get copied into the two halves before the old pointer is replaced), no data loss. But the original block is dropped from the tree and never deallocated, so it leaks, and we allocate two blocks where one would do.

The original node still carries its block in node.diskBlock, so I can reuse it for one half and allocate just one sibling for the other, matching the standard ext4 behavior you described ([A, C, B] instead of [C, D, B]). The inode-root case (node.diskBlock == 0) still allocates two, since there's no block to reuse. This also means I don't need to bring parent back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants