Skip to content

backend,disk: make partition ops work on busy block devices#411

Merged
deitch merged 4 commits into
diskfs:masterfrom
eriknordmark:block-device-partition-ops
Jun 18, 2026
Merged

backend,disk: make partition ops work on busy block devices#411
deitch merged 4 commits into
diskfs:masterfrom
eriknordmark:block-device-partition-ops

Conversation

@eriknordmark

@eriknordmark eriknordmark commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Two fixes needed to run partition tools (e2fsck/resize2fs) against the
partitions of a live block device. The file-based tests cannot reach either
path, so both were surfaced running a downstream resizer against a real
/dev/nbd0.

  • file.OpenFromPathWithExclusive lets a caller open a device read-write
    without O_EXCL while still recording the path (OpenFromPath keeps its
    O_EXCL default). Holding the whole disk O_EXCL makes the kernel reject
    O_EXCL opens of its child partitions ("device is in use"), so a tool that
    shells out to e2fsck/resize2fs cannot hold the disk exclusively.

  • ReReadPartitionTable falls back to per-partition BLKPG
    reconciliation when BLKRRPART fails — EBUSY during the udev re-probe a
    table write triggers, or a mounted sibling partition. BLKPG adds/removes/
    recreates only the changed entries via ioctls, with no whole-disk exclusive
    re-read and no dependency on external tools (partx).

On a regular file O_EXCL is a no-op and there are no partition nodes to
re-read, which is why the existing file-based tests never exercise either
path. Two root-gated loop-device tests cover the fixes:

  • disk/disk_blockdev_linux_test.go exercises the BLKPG fallback: it mounts
    one partition to force BLKRRPART EBUSY, shrinks another, and verifies the
    kernel picks up the change while the mounted sibling is left untouched.

  • disk/disk_openexclusive_linux_test.go exercises the non-exclusive open: a
    parent held O_EXCL blocks an O_EXCL open of its child partition (the
    e2fsck/resize2fs case) with EBUSY, while a non-exclusive parent leaves
    the child openable.

@eriknordmark eriknordmark marked this pull request as ready for review June 17, 2026 17:39
@eriknordmark

Copy link
Copy Markdown
Contributor Author

@deitch could you take a look when you get a chance? This splits the two block-device fixes (non-exclusive whole-disk open + BLKPG re-read fallback) out of my downstream resizer work into go-diskfs, now with a root-gated loop-device regression test for the BLKPG path. Thanks!

@eriknordmark eriknordmark force-pushed the block-device-partition-ops branch from 45a14f8 to da6fccc Compare June 17, 2026 18:00
eriknordmark and others added 2 commits June 17, 2026 20:56
Two fixes needed to drive e2fsck/resize2fs on the partitions of a live
block device (surfaced running partitionresizer against a real /dev/nbd0,
which the file-based tests cannot exercise):

- file.OpenFromPathWithExclusive lets a caller open a device read-write
  WITHOUT O_EXCL while still recording the path. OpenFromPath keeps its
  O_EXCL default. Holding the whole disk O_EXCL makes the kernel reject
  O_EXCL opens of its child partitions ("device is in use"), so a tool
  that shells out to e2fsck/resize2fs cannot hold the disk exclusively.

- ReReadPartitionTable falls back to per-partition BLKPG reconciliation
  when BLKRRPART fails (EBUSY during the udev re-probe a table write
  triggers, or a mounted sibling partition). BLKPG adds/removes/recreates
  only the changed entries via ioctls, with no whole-disk exclusive
  re-read and no dependency on external tools (partx). Build-verified;
  the partx-based predecessor was validated end-to-end on /dev/nbd0,
  the BLKPG path still needs the same on-device validation.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Exercise the BLKPG re-read fallback on a real block device, which the
file-based tests structurally cannot reach. The test builds a GPT image,
exposes it via losetup, and mounts one partition so a whole-disk BLKRRPART
is guaranteed to fail EBUSY; it asserts that precondition directly, then
rewrites the table to shrink a different partition and confirms the kernel
picked up the change through per-partition BLKPG while the mounted sibling
was left untouched. The device is opened non-exclusively via
OpenFromPathWithExclusive, the same way a consumer that shells out to
e2fsck/resize2fs must. Requires root; skipped otherwise.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eriknordmark and others added 2 commits June 18, 2026 15:31
When BLKRRPART fails with EBUSY -- a sibling partition is mounted, i.e.
we are repartitioning the disk we booted from -- and the BLKPG
per-partition fallback also cannot reconcile the in-use partition, the
on-disk table that Partition() already wrote is nonetheless committed.
Wrap that case in the new exported ErrReReadDeferred sentinel so a caller
can errors.Is it and reboot to apply the table on the next boot, instead
of treating a committed-but-not-yet-live table as a hard failure.
Non-EBUSY re-read failures keep returning the existing hard error.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The O_EXCL whole-disk open default makes the kernel refuse the O_EXCL
open that resize2fs/e2fsck do on a child partition, so a consumer that
holds the disk open while running those tools is blocked. The
file.OpenFromPathWithExclusive non-exclusive open fixes this, but only
the BLKPG re-read fallback had block-device coverage.

Add a root-gated loop-device regression test that proves both
directions in one run: a parent held O_EXCL blocks the child's O_EXCL
open with EBUSY, while a non-exclusive parent leaves it openable. A
regular file cannot reach this (O_EXCL is a no-op and there are no
child partition nodes), so the test is skipped without root.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
eriknordmark added a commit to eriknordmark/partitionresizer that referenced this pull request Jun 18, 2026
When a partition-table commit reaches the disk but the kernel cannot be
made to re-read it live -- the boot disk is busy because the partition
being changed is mounted, as when repartitioning the disk we booted from
-- go-diskfs now reports disk.ErrReReadDeferred. The table is already on
disk, so translate that sentinel at each commit site into the exported
ErrRebootToApply, letting a caller reboot to apply the table on the next
boot instead of treating a committed-but-not-yet-live table as a failure.

Pin go-diskfs to the fork commit that adds ErrReReadDeferred via a replace
directive; the replace is temporary and will be dropped once
diskfs/go-diskfs#411 merges.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@deitch

deitch commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

You have been pushing commits. Is this one ready to be reviewed?

@eriknordmark

Copy link
Copy Markdown
Contributor Author

You have been pushing commits. Is this one ready to be reviewed?

Yes. No more commits planned.

eriknordmark added a commit to eriknordmark/eve that referenced this pull request Jun 18, 2026
Vendor the go-diskfs and partitionresizer changes the offline repartition
relies on: a non-exclusive block-device open (so the resizer can shell out
to e2fsck/resize2fs on child partitions), per-partition BLKPG re-read when
BLKRRPART is busy, and the ErrReReadDeferred / ErrRebootToApply sentinels
that let a busy boot disk be repartitioned and applied on the next boot.

Both deps are pinned to fork commits via replace, pending the upstream PRs
diskfs/go-diskfs#411 and diskfs/partitionresizer#15.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@deitch

deitch commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Overall, it looks good. Happy to have the extra test and handle the case. I am a little concerned that this is the first test that requires root to run, but it handles it with the skip correctly. So I am good with it.

@deitch deitch merged commit 2bdff12 into diskfs:master Jun 18, 2026
20 checks passed
eriknordmark added a commit to eriknordmark/partitionresizer that referenced this pull request Jun 19, 2026
Bump github.com/diskfs/go-diskfs to the 2026-06-18 master tip, which
now carries the ErrReReadDeferred change from diskfs/go-diskfs#411, and
remove the temporary `replace => github.com/eriknordmark/go-diskfs`
directive that was added to pin the pre-merge fork commit. The replace
was always meant to be dropped once #411 landed in master; it had
inadvertently stayed in tree. go mod tidy also prunes the stale go.sum
entries left from the fork pin.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
eriknordmark added a commit to eriknordmark/partitionresizer that referenced this pull request Jun 19, 2026
Bump github.com/diskfs/go-diskfs to the 2026-06-18 master tip, which now
carries the ErrReReadDeferred change from diskfs/go-diskfs#411, and remove
the temporary `replace => github.com/eriknordmark/go-diskfs` directive. go
mod tidy also prunes the stale go.sum entries left from the fork pin.

The replace was only ever meant to pin the pre-merge fork commit and be
dropped once #411 landed in master. It was added in diskfs#15, where Claude broke
its own rule: a temporary fork-pinning replace must be dropped before the
PR merges, never carried into main. diskfs#15 merged with it still in tree,
leaving upstream main depending on a personal fork; this commit removes it.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants