Fix : Don't fail entire reconcile when individual OCI tags are malformed#32
Conversation
|
Warning Review limit reached
More reviews will be available in 48 minutes and 45 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds ChangesOCI Source Logger Injection and Skip-on-Error
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cloudprofilesync/source.go (1)
137-157:⚠️ Potential issue | 🔴 CriticalError handling is inconsistent and causes errors to be silently lost.
The three error paths at lines 139–141, 149–151, and 155–157 send empty
Result[SourceImage]{}witherr=nil, while the semaphore error at line 133 correctly sendsResult[SourceImage]{err: err}.The aggregation loop (lines 188–202) collects errors into a slice only when
result.err != nil. Since the manifest fetch, JSON decode, and annotation errors all send nil-error results, these errors are never accumulated and are returned to the caller as successful results with no errors, masking the failures. The filtering at line 195 skips empty results silently without recording why they were skipped.For consistency with the semaphore error handling and to preserve error information, these error paths should send
Result[SourceImage]{err: err}so errors are properly accumulated and returned to the caller.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cloudprofilesync/source.go` around lines 137 - 157, Replace the three error handling paths in the FetchReference, JSON decode, and architecture annotation cases with Result[SourceImage]{err: err} instead of empty Result[SourceImage]{} to ensure errors are properly accumulated and returned to the caller. Currently sending empty results with nil errors causes actual errors to be silently lost during aggregation, making all three error cases appear as successful operations. For the architecture annotation case where there is no error variable yet, you will need to create an appropriate error that can be included in the Result to maintain consistency with the other error paths and the semaphore error handling pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cloudprofilesync/source_test.go`:
- Line 209: Remove the extra blank line at line 209 in the source_test.go file
to fix the gofmt formatting failure. The Go formatter (gofmt) requires proper
spacing conventions, and this extra blank line violates those standards. Delete
the unnecessary blank line at the specified location to ensure the code passes
gofmt validation.
---
Outside diff comments:
In `@cloudprofilesync/source.go`:
- Around line 137-157: Replace the three error handling paths in the
FetchReference, JSON decode, and architecture annotation cases with
Result[SourceImage]{err: err} instead of empty Result[SourceImage]{} to ensure
errors are properly accumulated and returned to the caller. Currently sending
empty results with nil errors causes actual errors to be silently lost during
aggregation, making all three error cases appear as successful operations. For
the architecture annotation case where there is no error variable yet, you will
need to create an appropriate error that can be included in the Result to
maintain consistency with the other error paths and the semaphore error handling
pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: db34c152-54fd-4c5a-a667-59158b260436
📒 Files selected for processing (4)
cloudprofilesync/source.gocloudprofilesync/source_test.gocontrollers/managedcloudprofile_controller.gocontrollers/managedcloudprofile_controller_test.go
90658e4 to
e78867e
Compare
There was a problem hiding this comment.
Pull request overview
This PR prevents ManagedCloudProfile reconciliation from failing when individual OCI tags are malformed or otherwise unreadable, by skipping those tags (with debug-level logging) rather than returning a fatal error from GetVersions.
Changes:
- Extend OCI source construction to accept a
logr.Logger(plumbed viaOCISourceFactory) so skip events can be attributed and traced. - Update
OCI.GetVersionsto log-and-skip per-tag fetch/decode/missing-architecture issues, while still propagating hard semaphore/context errors. - Add a unit test ensuring tags missing the
architectureannotation are skipped and valid tags are still returned.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| controllers/managedcloudprofile_controller.go | Updates OCISourceFactory and passes the reconciler logger into NewOCI so skip events are logged with reconcile context. |
| controllers/managedcloudprofile_controller_test.go | Adapts test doubles to the updated OCISourceFactory.Create signature (adds logr.Logger param). |
| cloudprofilesync/source.go | Adds logger to OCI and changes GetVersions to skip malformed/unreadable tags instead of failing the whole call. |
| cloudprofilesync/source_test.go | Updates NewOCI calls and adds a test for skipping tags without the architecture annotation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Dmitri Fedotov <13087245+defo89@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cloudprofilesync/source.go`:
- Around line 187-193: The error classification logic at the conditional `if
ctx.Err() != nil` checks the context state at aggregation time rather than
determining if the error itself is context-related, creating a race condition
where non-context errors could be misclassified if the context is cancelled
during result processing. Fix this by using `errors.Is()` to check if
`result.err` is wrapping `context.Canceled` or `context.DeadlineExceeded`
instead of checking the current state of `ctx.Err()`. This ensures errors are
classified based on their actual type rather than timing.
- Around line 200-203: The early return condition checking if len(images) == 0
&& len(tags) > 0 discards context cancellation errors and returns a misleading
"possible registry issue" message instead. Before returning the generic error,
check if there are actual hard errors in the errs slice and propagate those
first. Restructure the logic so that if errors exist (particularly context
cancellation errors), those are returned to the caller rather than being masked
by the tags/images check. This ensures context cancellation errors are properly
surfaced for handling instead of being hidden behind a generic registry error
message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79878b40-7896-41ce-86bc-68737d11583a
📒 Files selected for processing (4)
cloudprofilesync/source.gocloudprofilesync/source_test.gocontrollers/managedcloudprofile_controller.gocontrollers/managedcloudprofile_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- controllers/managedcloudprofile_controller.go
- controllers/managedcloudprofile_controller_test.go
- cloudprofilesync/source_test.go
067686f to
8453600
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
When
cloud-profile-syncfetched image versions from the OCI registry, any tag missing thearchitectureannotation caused the entire reconcile to fail with:This happened because
GetVersionsreturned a non-nil error even though it had already collected all valid images — the bad tag was skipped in the results slice but its error was still joined and returned.The caller in
ImageUpdater.Updatetreated any non-nil error as fatal, aborting the reconcile and leaving the CloudProfile unchanged indefinitelyThe same problem applied to fetch and decode failures for individual tags.
Fix
FetchReferencefailures, JSON decode failures, and missingarchitectureannotations are now logged at verbosity level 1 and skipped — the goroutine still sends a result on the channel (required to unblock the collector) but the collector discards itsema.Acquireerrors (context cancellation / shutdown) are still propagated as hard errorsOCI/NewOCIso skip events can be traced to the specific tagSummary by CodeRabbit
Bug Fixes
Tests