Skip to content

e2e: detect and try to fix potential post reboot image blob corruption.#700

Merged
askervin merged 1 commit into
containers:mainfrom
klihub:fixes/e2e/post-reboot-image-blob-corruption
Jun 24, 2026
Merged

e2e: detect and try to fix potential post reboot image blob corruption.#700
askervin merged 1 commit into
containers:mainfrom
klihub:fixes/e2e/post-reboot-image-blob-corruption

Conversation

@klihub

@klihub klihub commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Under certain preconditions, one known such being running with BTRFS and a (recent) self-compiled kernel from Torvalds git tree, seems to trigger a BTRFS bug where some image blob files become unreadable, with read failing with EOPNOTSUPP.

This is regularly triggered by balloons/n4c16/test30-numa-disabled test case. That failure in turn causes all the remaining test cases running on the same VM (IOW with the same emulated HW topology) to unconditionally fail as, among other things, all container creations fail afterwards. Since this is the last balloons test case for that topology, it causes all topology-aware tests on the same topology to fail.

To work around this (add a function, a rather big hammer to) try to detect and fix up when the runtime gets into such a condition. Currently this is only implemented for containerd. CRI-O is a TODO.

@klihub klihub requested a review from askervin June 23, 2026 17:15
@klihub klihub force-pushed the fixes/e2e/post-reboot-image-blob-corruption branch 3 times, most recently from 44ca6ec to 10c2ecb Compare June 23, 2026 17:19
@klihub klihub marked this pull request as draft June 24, 2026 05:57
@klihub klihub force-pushed the fixes/e2e/post-reboot-image-blob-corruption branch from 10c2ecb to fdaebba Compare June 24, 2026 07:33
@klihub klihub marked this pull request as ready for review June 24, 2026 07:33
@klihub klihub force-pushed the fixes/e2e/post-reboot-image-blob-corruption branch 2 times, most recently from cc4dcd7 to f367949 Compare June 24, 2026 12:15

@askervin askervin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice. Two comments...

Comment thread test/e2e/lib/vm.bash Outdated
@klihub klihub force-pushed the fixes/e2e/post-reboot-image-blob-corruption branch from f367949 to f7a8e18 Compare June 24, 2026 14:01
Under certain conditions rebooting renders some of the image blobs
unreadable, with blob reads failing with EOPNOTSUPP. One condition
known to trigger this bug is running with BTRFS and rebooting to or
from a kernel compiled from Torvalds 'vanilla' git tree. One test
that regularly triggers this bug is ballons/n4c16/test30-numa-disabled.

Add vm-post-reboot-runtime-check which tries to detect and apply a fix
for this bug. Patch test30 to do a runtime check after both reboots
(transitions between stock and self-compiled kernels).

Currently only implemented for containerd, with cri-o-specific bits
marked and erroring out with a TODO.

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
@klihub klihub force-pushed the fixes/e2e/post-reboot-image-blob-corruption branch from f7a8e18 to 693b66f Compare June 24, 2026 14:02
@klihub klihub requested a review from askervin June 24, 2026 14:02

@askervin askervin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@askervin askervin merged commit 6bd6620 into containers:main Jun 24, 2026
9 checks passed
@klihub klihub deleted the fixes/e2e/post-reboot-image-blob-corruption branch June 25, 2026 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants