Skip to content

Fix : Don't fail entire reconcile when individual OCI tags are malformed#32

Merged
anton-paulovich merged 3 commits into
masterfrom
fix/error-handling-per-image
Jun 15, 2026
Merged

Fix : Don't fail entire reconcile when individual OCI tags are malformed#32
anton-paulovich merged 3 commits into
masterfrom
fix/error-handling-per-image

Conversation

@anton-paulovich

@anton-paulovich anton-paulovich commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

When cloud-profile-sync fetched image versions from the OCI registry, any tag missing the architecture annotation caused the entire reconcile to fail with:

2026-06-15T08:19:43.496Z	ERROR	Reconciler error	{"controller": "managedcloudprofile", "controllerGroup": "cloudprofilesync.cobaltcore.dev", "controllerKind": "ManagedCloudProfile", "ManagedCloudProfile": {"name":"ironcore-metal"}, "namespace": "", "name": "ironcore-metal", "reconcileID": "6043acb1-ff5e-4a35-aa12-72b18b7a8356", "error": "failed to create or patch CloudProfile: updating machine images failed: failed to retrieve image versions from OCI registry: architecture annotation not found in descriptor. tag: 1877.19.2"}

This happened because GetVersions returned a non-nil error even though it had already collected all valid images — the bad tag was skipped in the results slice but its error was still joined and returned.
The caller in ImageUpdater.Update treated any non-nil error as fatal, aborting the reconcile and leaving the CloudProfile unchanged indefinitely

The same problem applied to fetch and decode failures for individual tags.

Fix

  • FetchReference failures, JSON decode failures, and missing architecture annotations are now logged at verbosity level 1 and skipped — the goroutine still sends a result on the channel (required to unblock the collector) but the collector discards it
  • Only sema.Acquire errors (context cancellation / shutdown) are still propagated as hard errors
  • Added a logger field to OCI / NewOCI so skip events can be traced to the specific tag

Summary by CodeRabbit

  • Bug Fixes

    • Improved OCI artifact handling: tags/manifests with incomplete metadata are now skipped instead of failing the overall operation.
    • Updated error behavior so skipped artifacts don’t block results; if all tags are skipped, the operation reports that outcome.
    • Enhanced visibility with improved logging during OCI processing and reconciliation.
  • Tests

    • Extended OCI source tests to verify tags lacking required metadata are skipped and remaining images are still returned.
    • Updated controller tests to accommodate logger propagation.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@anton-paulovich, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 45 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d6a8f43d-f938-45a9-a2b5-bbde3ad6a55c

📥 Commits

Reviewing files that changed from the base of the PR and between 067686f and 8453600.

📒 Files selected for processing (1)
  • cloudprofilesync/source.go
📝 Walkthrough

Walkthrough

Adds logr.Logger injection to the OCI source struct and NewOCI constructor. GetVersions now wraps per-tag manifest processing errors with context, logs them at verbosity level 1, and skips those tags instead of propagating errors; only context-cancellation errors are returned. The OCISourceFactory interface and controller call site are updated to thread the logger through.

Changes

OCI Source Logger Injection and Skip-on-Error

Layer / File(s) Summary
OCI struct, constructor, and skip-on-error behavior
cloudprofilesync/source.go
Adds log logr.Logger field to OCI struct and NewOCI parameter, wires it into the constructor, wraps per-tag manifest fetch/decode failures and missing annotation errors with context, logs skipped tags once at verbosity 1, appends only successful results, and returns error when all tags are skipped.
Source tests: logr.Discard() and new skip test
cloudprofilesync/source_test.go
Passes logr.Discard() to all existing NewOCI calls and adds a new test that pushes one annotated and one unannotated OCI index, asserting GetVersions returns only the annotated version.
OCISourceFactory interface and controller wiring
controllers/managedcloudprofile_controller.go, controllers/managedcloudprofile_controller_test.go
Extends OCISourceFactory.Create interface and DefaultOCISourceFactory.Create to accept log logr.Logger, forwards it to cloudprofilesync.NewOCI, passes the reconciler logger at the call site, and updates test doubles to match the new signature.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • cobaltcore-dev/cloud-profile-sync#15: Both PRs modify cloudprofilesync/source.go's GetVersions flow—main PR changes per-tag error skipping/aggregation and logger plumbing, while retrieved PR changes what GetVersions emits by removing CreatedAt parsing from SourceImage.

Suggested reviewers

  • adziauho

Poem

🐇 A logger hops in, sleek and new,
No more errors when manifests go askew!
With V(1) I note the missing tag,
Then skip it along without a snag.
Empty versions? Left behind in the hutch —
The rabbit says: log softly, skip much! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: preventing reconcile failure when individual OCI tags are malformed, which is the core objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/error-handling-per-image

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cloudprofilesync/source.go (1)

137-157: ⚠️ Potential issue | 🔴 Critical

Error handling is inconsistent and causes errors to be silently lost.

The three error paths at lines 139–141, 149–151, and 155–157 send empty Result[SourceImage]{} with err=nil, while the semaphore error at line 133 correctly sends Result[SourceImage]{err: err}.

The aggregation loop (lines 188–202) collects errors into a slice only when result.err != nil. Since the manifest fetch, JSON decode, and annotation errors all send nil-error results, these errors are never accumulated and are returned to the caller as successful results with no errors, masking the failures. The filtering at line 195 skips empty results silently without recording why they were skipped.

For consistency with the semaphore error handling and to preserve error information, these error paths should send Result[SourceImage]{err: err} so errors are properly accumulated and returned to the caller.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cloudprofilesync/source.go` around lines 137 - 157, Replace the three error
handling paths in the FetchReference, JSON decode, and architecture annotation
cases with Result[SourceImage]{err: err} instead of empty Result[SourceImage]{}
to ensure errors are properly accumulated and returned to the caller. Currently
sending empty results with nil errors causes actual errors to be silently lost
during aggregation, making all three error cases appear as successful
operations. For the architecture annotation case where there is no error
variable yet, you will need to create an appropriate error that can be included
in the Result to maintain consistency with the other error paths and the
semaphore error handling pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloudprofilesync/source_test.go`:
- Line 209: Remove the extra blank line at line 209 in the source_test.go file
to fix the gofmt formatting failure. The Go formatter (gofmt) requires proper
spacing conventions, and this extra blank line violates those standards. Delete
the unnecessary blank line at the specified location to ensure the code passes
gofmt validation.

---

Outside diff comments:
In `@cloudprofilesync/source.go`:
- Around line 137-157: Replace the three error handling paths in the
FetchReference, JSON decode, and architecture annotation cases with
Result[SourceImage]{err: err} instead of empty Result[SourceImage]{} to ensure
errors are properly accumulated and returned to the caller. Currently sending
empty results with nil errors causes actual errors to be silently lost during
aggregation, making all three error cases appear as successful operations. For
the architecture annotation case where there is no error variable yet, you will
need to create an appropriate error that can be included in the Result to
maintain consistency with the other error paths and the semaphore error handling
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db34c152-54fd-4c5a-a667-59158b260436

📥 Commits

Reviewing files that changed from the base of the PR and between 44d1784 and 90658e4.

📒 Files selected for processing (4)
  • cloudprofilesync/source.go
  • cloudprofilesync/source_test.go
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go

Comment thread cloudprofilesync/source_test.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents ManagedCloudProfile reconciliation from failing when individual OCI tags are malformed or otherwise unreadable, by skipping those tags (with debug-level logging) rather than returning a fatal error from GetVersions.

Changes:

  • Extend OCI source construction to accept a logr.Logger (plumbed via OCISourceFactory) so skip events can be attributed and traced.
  • Update OCI.GetVersions to log-and-skip per-tag fetch/decode/missing-architecture issues, while still propagating hard semaphore/context errors.
  • Add a unit test ensuring tags missing the architecture annotation are skipped and valid tags are still returned.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
controllers/managedcloudprofile_controller.go Updates OCISourceFactory and passes the reconciler logger into NewOCI so skip events are logged with reconcile context.
controllers/managedcloudprofile_controller_test.go Adapts test doubles to the updated OCISourceFactory.Create signature (adds logr.Logger param).
cloudprofilesync/source.go Adds logger to OCI and changes GetVersions to skip malformed/unreadable tags instead of failing the whole call.
cloudprofilesync/source_test.go Updates NewOCI calls and adds a test for skipping tags without the architecture annotation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cloudprofilesync/source.go
Comment thread cloudprofilesync/source.go
Comment thread cloudprofilesync/source.go Outdated
Comment thread cloudprofilesync/source.go
Co-authored-by: Dmitri Fedotov <13087245+defo89@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread cloudprofilesync/source.go
Comment thread cloudprofilesync/source.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cloudprofilesync/source.go`:
- Around line 187-193: The error classification logic at the conditional `if
ctx.Err() != nil` checks the context state at aggregation time rather than
determining if the error itself is context-related, creating a race condition
where non-context errors could be misclassified if the context is cancelled
during result processing. Fix this by using `errors.Is()` to check if
`result.err` is wrapping `context.Canceled` or `context.DeadlineExceeded`
instead of checking the current state of `ctx.Err()`. This ensures errors are
classified based on their actual type rather than timing.
- Around line 200-203: The early return condition checking if len(images) == 0
&& len(tags) > 0 discards context cancellation errors and returns a misleading
"possible registry issue" message instead. Before returning the generic error,
check if there are actual hard errors in the errs slice and propagate those
first. Restructure the logic so that if errors exist (particularly context
cancellation errors), those are returned to the caller rather than being masked
by the tags/images check. This ensures context cancellation errors are properly
surfaced for handling instead of being hidden behind a generic registry error
message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 79878b40-7896-41ce-86bc-68737d11583a

📥 Commits

Reviewing files that changed from the base of the PR and between 90658e4 and 067686f.

📒 Files selected for processing (4)
  • cloudprofilesync/source.go
  • cloudprofilesync/source_test.go
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • controllers/managedcloudprofile_controller.go
  • controllers/managedcloudprofile_controller_test.go
  • cloudprofilesync/source_test.go

Comment thread cloudprofilesync/source.go
Comment thread cloudprofilesync/source.go Outdated
@anton-paulovich anton-paulovich force-pushed the fix/error-handling-per-image branch from 067686f to 8453600 Compare June 15, 2026 10:43
@github-actions

Copy link
Copy Markdown

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync 88.54% (+1.87%) 👍
github.com/cobaltcore-dev/cloud-profile-sync/controllers 58.54% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync/source.go 87.84% (+4.26%) 74 (+7) 65 (+9) 9 (-2) 👍
github.com/cobaltcore-dev/cloud-profile-sync/controllers/managedcloudprofile_controller.go 58.54% (ø) 205 120 85

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/cobaltcore-dev/cloud-profile-sync/cloudprofilesync/source_test.go
  • github.com/cobaltcore-dev/cloud-profile-sync/controllers/managedcloudprofile_controller_test.go

@anton-paulovich anton-paulovich merged commit 01b6df2 into master Jun 15, 2026
7 checks passed
@anton-paulovich anton-paulovich deleted the fix/error-handling-per-image branch June 15, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants