Skip to content

ROSAENG-2057: add pull-secret update, audit, and validate subcommands#921

Open
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2057-pull-secret-update
Open

ROSAENG-2057: add pull-secret update, audit, and validate subcommands#921
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2057-pull-secret-update

Conversation

@nephomaniac

@nephomaniac nephomaniac commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds osdctl cluster pull-secret subcommand group for managing cluster pull secrets:

  • pull-secret update — refresh a cluster's pull secret from OCM without ownership transfer (replaces replace-pull-secret)
  • pull-secret audit — account-wide pull secret audit across all clusters owned by an account
  • pull-secret validate — wraps existing validate-pull-secret-ext under the new command group

References: ROSAENG-2057

Related: openshift/ops-sop#4035 (SOP update, on hold until this PR merges)

Key design decisions

  • Controller pattern: all business logic in pkg/controller/pullsecret.go, following PR ROSAENG-2066: Add osdctl account aws-creds command #913
  • Zero diff on transferowner.go: cleanly separated code to keep review scope focused
  • Never deletes auths or secrets — only adds/updates via merge
  • Dry-run first: live runs automatically perform a dry-run comparison before making changes
  • Three-way comparison (Classic): OCM ↔ Hive ↔ Target to detect drift at each layer
  • Two-way comparison (HCP): OCM ↔ Target (ManifestWork path, no Hive)
  • SyncSet name isolation: uses pull-secret-update (distinct from transfer-owner's pull-secret-replacement) with existing SyncSet detection and user prompt
  • Deprecated commands preserved: replace-pull-secret and validate-pull-secret-ext still work with deprecation messages

Flow: HCP pull secret update

flowchart TD
    A[osdctl pull-secret update] --> B[Fetch access token from OCM]
    B --> C[Connect to service cluster]
    C --> D[Compare OCM auths vs target PS]
    D -->|All match| E[No-op — nothing to update]
    D -->|Diffs found| F[Get ManifestWork on service cluster]
    F --> G[Merge OCM auths into ManifestWork PS]
    G --> H[Rename secret to trigger HCCO sync]
    H --> I[Update ManifestWork]
    I --> J[Wait for HCCO to reconcile to target]
    J --> K[Verify target PS matches OCM]
    K --> L[Send internal service log]

    style A fill:#2563eb,color:#fff
    style E fill:#22c55e,color:#fff
    style L fill:#22c55e,color:#fff
Loading

Flow: Classic pull secret update

flowchart TD
    A[osdctl pull-secret update] --> B[Fetch access token from OCM]
    B --> C[Connect to hive + target clusters]
    C --> D[Three-way compare: OCM ↔ Hive ↔ Target]
    D -->|All match| E[No-op — nothing to update]
    D -->|Diffs found| F[Merge OCM auths into hive secret]
    F --> G{Existing SyncSet?}
    G -->|Yes| H[Prompt user: delete and replace?]
    G -->|No| I[Create SyncSet pull-secret-update]
    H -->|Delete| I
    I --> J[Poll ClusterSync for sync completion]
    J --> K[Delete SyncSet cleanup]
    K --> L[Verify target PS matches OCM]
    L --> M[Restart telemeter-client + ocm-agent pods]
    M --> N[Send internal service log]

    style A fill:#2563eb,color:#fff
    style E fill:#22c55e,color:#fff
    style N fill:#22c55e,color:#fff
Loading

Change scope

Category Files Lines Added Lines Removed
New Go files 4 (pullsecret.go, pullsecretstatus.go, replacepullsecret.go, pullsecretop.go) 2,359 0
New controller 1 (pkg/controller/pullsecret.go) 1,172 0
New tests 1 (pullsecret_test.go) 594 0
Edited Go files 4 (cmd.go, validatepullsecret.go, validatepullsecretext.go, post.go) 16 2
New docs 4 233 0
Edited docs 2 146 41
Total 17 files 3,691 43

Usage

# Dry-run to preview without making changes (HCP)
osdctl cluster pull-secret update -C <cluster-id> --reason "OHSS-1234" --dry-run

# Live update (HCP — uses ManifestWork)
osdctl cluster pull-secret update -C <cluster-id> --reason "OHSS-1234"

# Live update (Classic — uses Hive SyncSet, needs --hive-ocm-url for cross-env hive)
osdctl cluster pull-secret update -C <cluster-id> --reason "OHSS-1234" --hive-ocm-url prod

# Audit all clusters for an account
osdctl cluster pull-secret audit -C <cluster-id> --reason "OHSS-1234"

# Audit with per-cluster validation
osdctl cluster pull-secret audit -C <cluster-id> --reason "OHSS-1234" --validate

# Audit by account ID directly
osdctl cluster pull-secret audit -A <account-id> --reason "OHSS-1234"

# Validate a single cluster (wraps validate-pull-secret-ext)
osdctl cluster pull-secret validate -C <cluster-id> --reason "OHSS-1234"

Example Output

osdctl cluster pull-secret update --dry-run (HCP)
============================================================
[Dry Run] Step 1: OCM data and cluster connectivity
  Resolve the cluster, owner account, and OCM access token.
  Establish connections to the infrastructure and target clusters.
============================================================
  Cluster:  example-cluster (abc123def456ghi789)
  Type:     HCP
  Owner:    example-user (account: 1A2B3C4D5E6F7G8H9I0J)
  Reason:   OHSS-1234
  Mode:     DRY-RUN (no changes will be made)
  [NOTE] This account owns 2 clusters sharing the same access token.
           This command only updates the pull secret on the cluster above.
           Use 'osdctl cluster pull-secret audit -C abc123def456ghi789' to review all clusters.
Is this the correct cluster? Continue? (y/N): y
[Dry Run] [OK] management cluster: hs-mc-example01
[Dry Run] [OK] service cluster: hs-sc-example01
[Dry Run] [OK] retrieved 4 auth entries from OCM access token
[Dry Run] [OK] connected to infrastructure cluster hs-sc-example01 with elevation
[Dry Run] [OK] connected to target cluster example-cluster with elevation
============================================================
[Dry Run] Step 2: Update pull secret via ManifestWork (HCP)
  HCP clusters have a multi-layer pull secret architecture:
    1. HostedCluster.spec.pullSecret (management cluster) — source of truth
    2. original-pull-secret (kube-system on hosted cluster) — HCCO syncs from #1
    3. additional-pull-secret (kube-system) — optional customer-added registries
  This tool operates at level 1 by updating the ManifestWork on the service cluster.
  HCCO then reconciles the change to the hosted cluster. Customer-added registries
  in additional-pull-secret (level 3) are not affected by this operation.
  ManifestWork: hs-mc-example01/abc123def456ghi789 on service cluster hs-sc-example01
============================================================
[Dry Run] Would: get and update ManifestWork hs-mc-example01/abc123def456ghi789 on service cluster hs-sc-example01
[Dry Run] [OK] hs-sc-example01: auth can-i get manifestworks in hs-mc-example01
[Dry Run] [OK] hs-sc-example01: auth can-i update manifestworks in hs-mc-example01
============================================================
[Dry Run] Step 3: Verify pull secret on target cluster
  The pull secret on the target cluster is compared against both the OCM
  access token auths and registry credential auths to verify all entries match.
  Required registries are also checked to ensure the cluster can pull images.
============================================================
[Dry Run] [OK] example-cluster: auth can-i get secrets in openshift-config
[Dry Run] Comparing ACCESS TOKEN against openshift-config/pull-secret on example-cluster...
  [OK] cloud.openshift.com                      token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  Verified 4/4 auth entries match
  [OK] All required registries present in cluster pull secret
[Dry Run] [OK] all 4 ACCESS TOKEN auth entries match
[Dry Run] Comparing REGISTRY CREDENTIAL against openshift-config/pull-secret on example-cluster...
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  Verified 3/3 auth entries match
  [OK] All required registries present in cluster pull secret
[Dry Run] [OK] all 3 REGISTRY CREDENTIAL auth entries match
[Dry Run] [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — pull secret is up to date
============================================================
[Dry Run] Step 4: Send internal service log
  An internal (non-customer-visible) service log is sent to record that the
  pull secret was updated, including the owner username and reason.
  This step only runs if the pull secret update was performed.
============================================================
[Dry Run] Would: send internal service log for example-cluster
============================================================
[Dry Run] Result
============================================================
[Dry Run] [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — a live run would be a no-op.
osdctl cluster pull-secret update --hive-ocm-url prod (Classic — tampered email detected and fixed)
============================================================
Step 1: OCM data and cluster connectivity
  Resolve the cluster, owner account, and OCM access token.
  Establish connections to the infrastructure and target clusters.
============================================================
  Cluster:  example-cluster (abc123def456ghi789)
  Type:     OSD/ROSA Classic
  Owner:    example-user (account: 1A2B3C4D5E6F7G8H9I0J)
  Reason:   OHSS-1234
  [NOTE] This account owns 2 clusters sharing the same access token.
           This command only updates the pull secret on the cluster above.
           Use 'osdctl cluster pull-secret audit -C abc123def456ghi789' to review all clusters.
Is this the correct cluster? Continue? (y/N): y
[OK] hive OCM connection established (https://api.openshift.com)
[OK] hive cluster: hive-example-01
[OK] retrieved 4 auth entries from OCM access token
[OK] connected to infrastructure cluster hive-example-01 with elevation
[OK] connected to target cluster example-cluster with elevation
Running pre-flight checks...
[OK] Pre-flight checks passed.
============================================================
Step 2: Compare pull secret across OCM, Hive, and target cluster
  Before making changes, compare the pull secret across all three sources:
    OCM    — access token auths (source of truth)
    Hive   — secret in hive namespace (used by SyncSet)
    Target — openshift-config/pull-secret (what the cluster uses)
  Hive will always be brought in sync with OCM.
  Target will be synced via SyncSet only if it differs from the updated hive secret.
============================================================
  ACCESS TOKEN AUTHS             OCM↔HIVE    OCM↔TARGET    HIVE↔TARGET  
------------------------------+----------+------------+--------------
  cloud.openshift.com            match       DIFFERS       DIFFERS      
  quay.io                        match       match         match        
  registry.connect.redhat.com    match       match         match        
  registry.redhat.io             match       match         match        
  [!] Target cluster needs update from OCM
  REGISTRY CREDENTIAL AUTHS      OCM↔HIVE    OCM↔TARGET    HIVE↔TARGET  
------------------------------+----------+------------+--------------
  registry.connect.redhat.com    match       match         match        
  registry.redhat.io             match       match         match        
  quay.io                        match       match         match        
  [OK] All sources in sync
Proceed with pull secret update? Continue? (y/N): y
============================================================
Step 3: Update pull secret via Hive SyncSet (Classic)
  Classic clusters store the pull secret in a Hive namespace on the hive cluster.
  The secret is updated (or created if missing), then a SyncSet syncs it to the target cluster.
  After sync completes, the SyncSet is cleaned up.
  Note: The hive secret is never deleted. If missing, it can be restored from the
  target cluster's pull secret or rebuilt from OCM auths.
============================================================
Resolving Hive namespace for cluster abc123def456ghi789 on hive-example-01...
[OK] found ClusterDeployment uhc-production-abc123def456ghi789/example-cluster on hive-example-01
[OK] secret uhc-production-abc123def456ghi789/pull exists on hive-example-01
Updating pull secret in uhc-production-abc123def456ghi789/pull on hive-example-01
SyncSet pull-secret-update in namespace uhc-production-abc123def456ghi789 has been created.
.
Sync completed...
[OK] pull secret updated via Hive SyncSet
============================================================
Step 4: Pod rollouts (Classic only)
  After the pull secret is synced, telemeter-client and ocm-agent pods are restarted
  so they pick up the new credentials. HCP clusters do not require pod rollouts.
============================================================
Pod telemeter-client-5cf984fffd-lf2bg in namespace openshift-monitoring has been deleted.
Pods in namespace openshift-monitoring with selector 'app.kubernetes.io/name=telemeter-client' have been deleted.
============================================================
Step 5: Verify pull secret on target cluster
  The pull secret on the target cluster is compared against both the OCM
  access token auths and registry credential auths to verify all entries match.
  Required registries are also checked to ensure the cluster can pull images.
============================================================
Comparing ACCESS TOKEN against openshift-config/pull-secret on example-cluster...
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  [OK] cloud.openshift.com                      token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  Verified 4/4 auth entries match
  [OK] All required registries present in cluster pull secret
[OK] all 4 ACCESS TOKEN auth entries match
Comparing REGISTRY CREDENTIAL against openshift-config/pull-secret on example-cluster...
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  Verified 3/3 auth entries match
  [OK] All required registries present in cluster pull secret
[OK] all 3 REGISTRY CREDENTIAL auth entries match
[OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — pull secret is up to date
Pod ocm-agent-855b9948b7-c6zhq in namespace openshift-ocm-agent-operator has been deleted.
Pod ocm-agent-855b9948b7-llcmd in namespace openshift-ocm-agent-operator has been deleted.
Pods in namespace openshift-ocm-agent-operator with selector 'app=ocm-agent' have been deleted.
============================================================
Step 6: Send internal service log
  An internal (non-customer-visible) service log is sent to record that the
  pull secret was updated, including the owner username and reason.
  This step only runs if the pull secret update was performed.
============================================================
Name                ID                                 State               Version             Cloud Provider      Region
example-cluster     abc123def456ghi789   ready               4.22.0              aws                 us-west-2
{
  "severity":"Info",
  "service_name":"SREManualAction",
  "summary":"INTERNAL ONLY, DO NOT SHARE WITH CUSTOMER",
  "description":"Pull secret replaced for cluster owner 'example-user'. Reason: OHSS-1234",
  "internal_only":true,
  "event_stream_id":"",
  "doc_references":null
}
Continue? (y/N): y
ID                                     Status
783a5cae-51a9-4087-9da7-22c1efa2d8f6   Message has been successfully sent to 783a5cae-51a9-4087-9da7-22c1efa2d8f6
[OK] internal service log sent
============================================================
Result
============================================================
[OK] Pull secret updated successfully. All ACCESS TOKEN and REGISTRY CREDENTIAL auths now match.
osdctl cluster pull-secret update --dry-run --hive-ocm-url prod (Classic)
============================================================
[Dry Run] Step 1: OCM data and cluster connectivity
  Resolve the cluster, owner account, and OCM access token.
  Establish connections to the infrastructure and target clusters.
============================================================
  Cluster:  example-cluster (abc123def456ghi789)
  Type:     OSD/ROSA Classic
  Owner:    example-user (account: 1A2B3C4D5E6F7G8H9I0J)
  Reason:   OHSS-1234
  Mode:     DRY-RUN (no changes will be made)
  [NOTE] This account owns 2 clusters sharing the same access token.
           This command only updates the pull secret on the cluster above.
           Use 'osdctl cluster pull-secret audit -C abc123def456ghi789' to review all clusters.
Is this the correct cluster? Continue? (y/N): y
[Dry Run] [OK] hive OCM connection established (https://api.openshift.com)
[Dry Run] [OK] hive cluster: hive-example-01
[Dry Run] [OK] retrieved 4 auth entries from OCM access token
[Dry Run] [OK] connected to infrastructure cluster hive-example-01 with elevation
[Dry Run] [OK] connected to target cluster example-cluster with elevation
============================================================
[Dry Run] Step 2: Update pull secret via Hive SyncSet (Classic)
  Classic clusters store the pull secret in a Hive namespace on the hive cluster.
  The secret is updated (or created if missing), then a SyncSet syncs it to the target cluster.
  After sync completes, the SyncSet is cleaned up.
  Note: The hive secret is never deleted. If missing, it can be restored from the
  target cluster's pull secret or rebuilt from OCM auths.
============================================================
[Dry Run] Resolving Hive namespace for cluster abc123def456ghi789 on hive-example-01...
[Dry Run] [OK] found ClusterDeployment uhc-production-abc123def456ghi789/example-cluster on hive-example-01
[Dry Run] [OK] secret uhc-production-abc123def456ghi789/pull exists on hive-example-01
[Dry Run] Would: update or create secret uhc-production-abc123def456ghi789/pull on hive-example-01 with merged pull secret data
[Dry Run] [OK] hive-example-01: auth can-i get secrets in uhc-production-abc123def456ghi789
[Dry Run] [OK] hive-example-01: auth can-i update secrets in uhc-production-abc123def456ghi789
[Dry Run] [OK] hive-example-01: auth can-i create secrets in uhc-production-abc123def456ghi789
[Dry Run] Would: create SyncSet uhc-production-abc123def456ghi789/pull-secret-replacement to sync to example-cluster
[Dry Run] [OK] hive-example-01: auth can-i create syncsets in uhc-production-abc123def456ghi789
[Dry Run] Would: poll ClusterSync uhc-production-abc123def456ghi789/example-cluster then delete SyncSet
[Dry Run] [OK] hive-example-01: auth can-i get clustersync in uhc-production-abc123def456ghi789
[Dry Run] [OK] hive-example-01: auth can-i delete syncsets in uhc-production-abc123def456ghi789
============================================================
[Dry Run] Step 3: Pod rollouts (Classic only)
  After the pull secret is synced, telemeter-client and ocm-agent pods are restarted
  so they pick up the new credentials. HCP clusters do not require pod rollouts.
============================================================
[Dry Run] Would: roll out pods openshift-monitoring/telemeter-client on example-cluster
[Dry Run] [OK] example-cluster: auth can-i delete pods in openshift-monitoring
[Dry Run] Would: roll out pods openshift-ocm-agent-operator/ocm-agent on example-cluster
[Dry Run] [OK] example-cluster: auth can-i delete pods in openshift-ocm-agent-operator
============================================================
[Dry Run] Step 4: Verify pull secret on target cluster
  The pull secret on the target cluster is compared against both the OCM
  access token auths and registry credential auths to verify all entries match.
  Required registries are also checked to ensure the cluster can pull images.
============================================================
[Dry Run] [OK] example-cluster: auth can-i get secrets in openshift-config
[Dry Run] Comparing ACCESS TOKEN against openshift-config/pull-secret on example-cluster...
  [OK] cloud.openshift.com                      token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  Verified 4/4 auth entries match
  [OK] All required registries present in cluster pull secret
[Dry Run] [OK] all 4 ACCESS TOKEN auth entries match
[Dry Run] Comparing REGISTRY CREDENTIAL against openshift-config/pull-secret on example-cluster...
  [OK] registry.connect.redhat.com              token=match, email=match (user@example.com)
  [OK] registry.redhat.io                       token=match, email=match (user@example.com)
  [OK] quay.io                                  token=match, email=match (user@example.com)
  Verified 3/3 auth entries match
  [OK] All required registries present in cluster pull secret
[Dry Run] [OK] all 3 REGISTRY CREDENTIAL auth entries match
[Dry Run] [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — pull secret is up to date
============================================================
[Dry Run] Step 5: Send internal service log
  An internal (non-customer-visible) service log is sent to record that the
  pull secret was updated, including the owner username and reason.
  This step only runs if the pull secret update was performed.
============================================================
[Dry Run] Would: send internal service log for example-cluster
============================================================
[Dry Run] Result
============================================================
[Dry Run] [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — a live run would be a no-op.
osdctl cluster pull-secret audit --validate
============================================================
 Owner:    example-user (account: 1A2B3C4D5E6F7G8H9I0J)
 Email:    user@example.com
 Registry credentials last updated: 2024-05-08 20:33:09 UTC
 Clusters: 2
============================================================
Cluster:  example-cluster (abc123def456ghi789)
Created:  2026-06-13 23:06    Status:  Active
  ACCESS TOKEN AUTHS             TOKEN    EMAIL    STATUS  
------------------------------+-------+-------+---------
  cloud.openshift.com            match    match    [OK]    
  quay.io                        match    match    [OK]    
  registry.connect.redhat.com    match    match    [OK]    
  registry.redhat.io             match    match    [OK]    
  REGISTRY CREDENTIAL AUTHS      TOKEN    EMAIL    STATUS  
------------------------------+-------+-------+---------
  registry.connect.redhat.com    match    match    [OK]    
  registry.redhat.io             match    match    [OK]    
  quay.io                        match    match    [OK]    
============================================================
Cluster:  example-cluster (abc123def456ghi789)
Created:  2026-06-14 16:55    Status:  Active
  ACCESS TOKEN AUTHS             TOKEN    EMAIL    STATUS  
------------------------------+-------+-------+---------
  registry.connect.redhat.com    match    match    [OK]    
  registry.redhat.io             match    match    [OK]    
  cloud.openshift.com            match    match    [OK]    
  quay.io                        match    match    [OK]    
  REGISTRY CREDENTIAL AUTHS      TOKEN    EMAIL    STATUS  
------------------------------+-------+-------+---------
  registry.connect.redhat.com    match    match    [OK]    
  registry.redhat.io             match    match    [OK]    
  quay.io                        match    match    [OK]    
2 cluster(s) found, 2 validated.

Test plan

  • make lint — 0 issues
  • make format — no changes
  • make generate-docs — docs up to date
  • go build — clean
  • Unit tests: 25 tests in pkg/controller/pullsecret_test.go (CompareThreeWay, MergePullSecretAuths, extractPullSecretAuth, updateManifestWorkPayloads, PullSecretOp, syncStatus)
  • Integration tests: 33 expect tests covering help, input validation, HCP dry-run, Classic dry-run, audit, and mutating scenarios (tamper+fix, dry-run mismatch detection, no-op, post-mutating audit)
  • Tested against staging HCP and Classic clusters
  • SOP update PR to follow

Jira

ROSAENG-2057

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features
    • Added osdctl cluster pull-secret group with audit (account-wide status, optional --validate), update (refresh pull secrets from owner OCM data; supports --dry-run and --force), and validate (enhanced auth verification).
  • Deprecations
    • validate-pull-secret-ext is deprecated; use pull-secret validate instead.
    • Kept deprecated replace-pull-secret alias for compatibility.
  • Documentation
    • Added command reference pages for pull-secret and subcommands; removed validate-pull-secret-ext from listings.
  • Bug Fixes
    • Improved help guidance with startup tips pointing to pull-secret audit and pull-secret validate.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 15, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 15, 2026

Copy link
Copy Markdown

@nephomaniac: This pull request references ROSAENG-2057 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Adds osdctl cluster pull-secret subcommand group for managing cluster pull secrets:

  • pull-secret update — refresh a cluster's pull secret from OCM without ownership transfer (replaces replace-pull-secret)
  • pull-secret audit — account-wide pull secret audit across all clusters owned by an account
  • pull-secret validate — wraps existing validate-pull-secret-ext under the new command group

Key design decisions

  • Controller pattern: all business logic in pkg/controller/pullsecret.go, following PR ROSAENG-2066: Add osdctl account aws-creds command #913
  • Zero diff on transferowner.go: cleanly separated code to keep review scope focused
  • Never deletes auths or secrets — only adds/updates via merge
  • Dry-run first: live runs automatically perform a dry-run comparison before making changes
  • Three-way comparison (Classic): OCM ↔ Hive ↔ Target to detect drift at each layer
  • Two-way comparison (HCP): OCM ↔ Target (ManifestWork path, no Hive)
  • SyncSet name isolation: uses pull-secret-update (distinct from transfer-owner's pull-secret-replacement) with existing SyncSet detection and user prompt
  • Deprecated commands preserved: replace-pull-secret and validate-pull-secret-ext still work with deprecation messages

Example: dry-run (HCP)

$ osdctl cluster pull-secret update -C <cluster-id> --reason "[OHSS-1234](https://redhat.atlassian.net/browse/OHSS-1234)" --dry-run

============================================================
[Dry Run] Step 1: OCM data and cluster connectivity
============================================================

 Cluster:  my-hcp-cluster (1abc2def3ghi4jkl)
 Type:     HCP
 Owner:    cluster-owner (account: ACCOUNT_ID)
 Reason:   [OHSS-1234](https://redhat.atlassian.net/browse/OHSS-1234)
 Mode:     DRY-RUN (no changes will be made)

 [NOTE] This account owns 2 clusters sharing the same access token.
        Use 'osdctl cluster pull-secret audit' to review all clusters.

[Dry Run] [OK] retrieved 4 auth entries from OCM access token
[Dry Run] [OK] connected to infrastructure and target clusters

============================================================
[Dry Run] Step 2: Update pull secret via ManifestWork (HCP)
 HCP clusters have a multi-layer pull secret architecture:
   1. HostedCluster.spec.pullSecret — source of truth
   2. original-pull-secret — HCCO syncs from #1
   3. additional-pull-secret — customer-added registries (not affected)
============================================================
[Dry Run] Would: get and update ManifestWork on service cluster
[Dry Run] [OK] auth can-i get/update manifestworks

============================================================
[Dry Run] Step 3: Verify pull secret on target cluster
============================================================
[Dry Run] Comparing ACCESS TOKEN auths...
 [OK] cloud.openshift.com        token=match, email=match
 [OK] quay.io                    token=match, email=match
 [OK] registry.connect.redhat.com token=match, email=match
 [OK] registry.redhat.io         token=match, email=match
 Verified 4/4 auth entries match

[Dry Run] [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths match — a live run would be a no-op.

Example: audit with --validate

$ osdctl cluster pull-secret audit -C <cluster-id> --reason "[OHSS-1234](https://redhat.atlassian.net/browse/OHSS-1234)" --validate

============================================================
Owner:    cluster-owner (account: ACCOUNT_ID)
Email:    owner@example.com
Clusters: 2
============================================================

Cluster:  my-hcp-cluster (1abc...)
 ACCESS TOKEN AUTHS             TOKEN    EMAIL    STATUS
 cloud.openshift.com            match    match    [OK]
 quay.io                        match    match    [OK]
 registry.connect.redhat.com    match    match    [OK]
 registry.redhat.io             match    match    [OK]

Cluster:  my-classic-cluster (5mno...)
 ACCESS TOKEN AUTHS             TOKEN    EMAIL    STATUS
 cloud.openshift.com            match    match    [OK]
 quay.io                        match    match    [OK]
 registry.connect.redhat.com    match    match    [OK]
 registry.redhat.io             match    match    [OK]

2 cluster(s) found, 2 validated.

Example: live update fixing tampered email (Classic, three-way comparison)

$ osdctl cluster pull-secret update -C <cluster-id> --reason "[OHSS-1234](https://redhat.atlassian.net/browse/OHSS-1234)" --hive-ocm-url prod

Step 2: Compare pull secret across OCM, Hive, and target cluster
 ACCESS TOKEN AUTHS             OCM↔HIVE    OCM↔TARGET    HIVE↔TARGET
 cloud.openshift.com            match       DIFFERS       DIFFERS
 quay.io                        match       match         match
 registry.connect.redhat.com    match       match         match
 registry.redhat.io             match       match         match
 [!] Target cluster needs update from OCM

Step 3: Update pull secret via Hive SyncSet (Classic)
 SyncSet pull-secret-update created.
 Sync completed...
 [OK] pull secret updated via Hive SyncSet

Step 5: Verify pull secret on target cluster
 Verified 4/4 auth entries match
 [OK] All ACCESS TOKEN and REGISTRY CREDENTIAL auths now match.

[OK] Pull secret updated successfully.

Test plan

  • make lint — 0 issues
  • make format — no changes
  • make generate-docs — docs up to date
  • go build — clean
  • Unit tests: 25 tests in pkg/controller/pullsecret_test.go (CompareThreeWay, MergePullSecretAuths, extractPullSecretAuth, updateManifestWorkPayloads, PullSecretOp, syncStatus)
  • Integration tests: 33 expect tests covering help, input validation, HCP dry-run, Classic dry-run, audit, and mutating scenarios (tamper+fix, dry-run mismatch detection, no-op, post-mutating audit)
  • Tested against staging HCP and Classic clusters
  • SOP update PR to follow

Jira

ROSAENG-2057

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/hold
On hold during review and pending SOP update PR.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces osdctl cluster pull-secret command group with three subcommands: audit (account-wide pull-secret staleness check), update (replacing the deprecated replace-pull-secret, supporting both Classic/Hive and HCP update flows), and validate (wrapping the existing extended validation). Backs these commands with a new pkg/controller/pullsecret.go utility library and a PullSecretOp operation context, deprecates validate-pull-secret-ext, and updates all documentation.

Changes

Pull-Secret Command Group

Layer / File(s) Summary
Pull-secret controller types and constants
pkg/controller/pullsecret.go (lines 1–77)
Exports RequiredPullSecretAuths allowlist, SyncSetName constant, and data structures: ClusterSummary, AuthCheckResult, PullSecretVerifyResult, ThreeWayAuthState, ThreeWayComparison, SimpleAuth, HiveNamespaceInfo, PreflightResult.
OCM access-token auth fetch and validation
pkg/controller/pullsecret.go (lines 78–230)
Fetches OCM access-token auths with optional impersonation, exports FetchOwnerAccessToken and ValidateRequiredAuths to compute missing required registries, implements CompareAccessTokenAuthsToCluster to verify cluster pull-secret against expected auths with token/email match checking and verification result rendering.
Registry credential verification
pkg/controller/pullsecret.go (lines 231–326)
Implements CompareRegistryCredentialAuthsToCluster to fetch account registry credentials, resolve registry IDs to names, decode cluster-stored base64 username:token entries, compare against account email, and render results.
Three-way auth comparison and sync determination
pkg/controller/pullsecret.go (lines 328–503)
Exports AccessTokenToSimple to normalize auth maps, implements CompareThreeWay to parse Hive/target .dockerconfigjson, compute per-registry presence and match state, derive sync-needed flags, and RenderThreeWayComparison to format sync-status tables with optional Hive column omission.
Hive discovery and pre-flight checks
pkg/controller/pullsecret.go (lines 505–606)
Exports FindHiveNamespace (label-based or cluster-ID scan), PreflightCheck (validates pull-secret existence, .dockerconfigjson readability, and pod-listing permissions for Classic clusters).
Account topology and credential timestamp helpers
pkg/controller/pullsecret.go (lines 608–683)
Exports CountOwnerClusters, ListOwnerSubscriptions (paginated listing into ClusterSummary), GetLatestCredentialUpdate (maximum registry-credential UpdatedAt timestamp).
Pull-secret auth merge and Hive SyncSet update
pkg/controller/pullsecret.go (lines 684–806)
Exports MergePullSecretAuths (merge-with-preservation semantics), UpdateHivePullSecretSSS (create-or-update Hive pull secret and trigger hive→target sync); implements internal syncPullSecretViaHive (create SyncSet, detect/interactively delete interfering SyncSets, poll ClusterSync status, cleanup).
HCP ManifestWork update and pod restart
pkg/controller/pullsecret.go (lines 808–1109)
Exports UpdateHCPPullSecretViaManifestWork (edit service-cluster ManifestWork with embedded Secret/HostedCluster rewrites, retry-on-conflict, condition polling), RestartPodsBySelector (delete pods by label selector), internal updateManifestWorkPayloads (rewrite payloads with unique secret names) and randomHexSuffix (unique resource naming).
PullSecretOp operation context
pkg/controller/pullsecretop.go
Introduces PullSecretOp struct with dry-run, logger, output, success/pull-secret state, failure tracking; provides colored output helpers (Section, OK, Fail, Warn, Would, Info) with dry-run variants; implements CheckCanI (RBAC via SelfSubjectAccessReview with nil-client handling), CheckSecretExists (secret probing with warnings); wraps FetchOwnerAccessToken and FindHiveNamespace with structured output; exports ResolveExistingPullSecret (Hive-first then target fallback).
Pull-secret controller unit tests
pkg/controller/pullsecret_test.go
Validates required-auth key presence, covers CompareThreeWay all-in-sync/drift/nil-hive/extra-registries/invalid-JSON, tests MergePullSecretAuths add-new/overwrite/preserve semantics and edge cases, verifies auth extraction and ManifestWork payload rewriting, tests PullSecretOp output formatting/dry-run-prefixes/failure accumulation, and syncStatus mapping.
Pull-secret update command setup
cmd/cluster/replacepullsecret.go (lines 1–168)
Registers update command with required flags (--cluster-id, --reason), optional flags (--dry-run, --force, --hive-ocm-url), validates inputs and enforces reason format via ticket-ID regex; wires deprecated replace-pull-secret alias.
Update command pre-flight and cluster resolution
cmd/cluster/replacepullsecret.go (lines 170–413)
Connects to OCM, resolves target cluster and HCP vs Classic type, fetches owner/subscription/account and access-token auths, resolves infra/management cluster context, connects to both clusters, runs live-only pre-flight RBAC/connectivity checks with --force-enforced typed YES.
Update command three-way comparison
cmd/cluster/replacepullsecret.go (lines 415–538)
Reads Hive/target pull-secret .dockerconfigjson, performs three-way comparison against OCM access-token and registry-credential auths, renders results, exits early when in-sync (unless --force typed YES), prompts for confirmation when changes needed.
Update command execution and verification
cmd/cluster/replacepullsecret.go (lines 540–761)
Dispatches HCP ManifestWork or Classic Hive SyncSet update with optional auth-merge decision and pod restart; verifies post-update state against both OCM auth sources; sets result flags and counts diffs.
Update command finalization and logging
cmd/cluster/replacepullsecret.go (lines 763–886), cmd/servicelog/post.go (lines 55–58)
Implements live no-op early-exit with --force typed YES handling; posts aggregated service log entry on success; prints formatted result summary with dry-run/success/no-op states and pre-flight failure details. Also adds SetDryRun method to PostCmdOptions.
Pull-secret audit command setup
cmd/cluster/pullsecretstatus.go (lines 1–134)
Registers audit command with required flags (--cluster-id or --account-id mutually exclusive, --reason), optional --validate; implements PreRunE validation of mutual exclusivity and allowed ID character format; constructs configured logrus logger.
Audit command execution and data gathering
cmd/cluster/pullsecretstatus.go (lines 135–280)
Connects to OCM, resolves account directly or via cluster subscription, fetches latest credential-update timestamp and lists owner subscriptions, conditionally fetches owner access-token auths with user fallback prompt for registry-only validation, iterates clusters with elevation reason to collect per-cluster verification results or connection errors.
Audit command output rendering
cmd/cluster/pullsecretstatus.go (lines 282–420)
Renders account header with credential timestamp, per-cluster banners and staleness status (comparing CreatedAt to credential update), optional per-cluster verification tables with match/mismatch/missing labels, summary totals, and reminder message when --validate not set.
Audit command validation tests
cmd/cluster/pullsecretstatus_test.go (lines 1–86)
Table-driven PreRunE tests covering required-argument enforcement, mutual exclusivity, invalid-character rejection, and valid identifier acceptance.
Command registration and deprecations
cmd/cluster/cmd.go, cmd/cluster/pullsecret.go, cmd/cluster/validatepullsecretext.go, cmd/cluster/validatepullsecret.go, go.mod
Registers newCmdPullSecret root with audit/update/validate subcommands; wires newCmdReplacePullSecretDeprecated alias; registers newCmdValidatePullSecretExtDeprecated with redirect message; adds stderr tips to validate-pull-secret; promotes go.uber.org/zap to direct dependency.
Documentation
docs/README.md, docs/osdctl_cluster.md, docs/osdctl_cluster_pull-secret.md, docs/osdctl_cluster_pull-secret_audit.md, docs/osdctl_cluster_pull-secret_update.md, docs/osdctl_cluster_pull-secret_validate.md
Adds documentation pages for pull-secret root and audit/update/validate subcommands with synopsis, examples, flag descriptions; removes validate-pull-secret-ext documentation; updates osdctl cluster SEE ALSO.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant UpdateCmd as pull-secret update
  participant OCM
  participant HiveAPI as Hive/Infra Cluster
  participant TargetCluster as Target Cluster

  User->>UpdateCmd: --cluster-id --reason [--dry-run/--force]
  UpdateCmd->>UpdateCmd: validate flags, resolve cluster/owner
  UpdateCmd->>OCM: FetchAccessTokenOp
  UpdateCmd->>HiveAPI: CheckCanI (RBAC pre-flight)
  UpdateCmd->>TargetCluster: CheckCanI (RBAC pre-flight)
  UpdateCmd->>TargetCluster: PreflightCheck (openshift-config/pull-secret)
  
  alt live mode
    UpdateCmd->>HiveAPI: read Hive pull-secret
    UpdateCmd->>TargetCluster: read target pull-secret
    UpdateCmd->>UpdateCmd: CompareThreeWay, render results
    UpdateCmd->>User: prompt if differences detected
  end

  alt Classic cluster
    UpdateCmd->>HiveAPI: UpdateHivePullSecretSSS
    HiveAPI-->>UpdateCmd: SyncSet created, polled
    UpdateCmd->>TargetCluster: RestartPodsBySelector
  else HCP cluster
    UpdateCmd->>HiveAPI: UpdateHCPPullSecretViaManifestWork
  end

  UpdateCmd->>TargetCluster: post-update verification
  UpdateCmd->>OCM: PostCmdOptions.SetDryRun().Run() (service log)
  UpdateCmd-->>User: result summary
Loading
sequenceDiagram
  participant User
  participant AuditCmd as pull-secret audit
  participant OCM
  participant TargetCluster as each cluster

  User->>AuditCmd: --account-id [--cluster-id] --reason [--validate]
  AuditCmd->>OCM: resolve account
  AuditCmd->>OCM: GetLatestCredentialUpdate
  AuditCmd->>OCM: ListOwnerSubscriptions
  
  alt --validate enabled
    AuditCmd->>OCM: FetchOwnerAccessToken
    AuditCmd->>User: prompt if token unavailable (registry-only fallback)
  end

  loop each owned cluster
    AuditCmd->>TargetCluster: connect with elevation reason
    AuditCmd->>TargetCluster: CompareAccessTokenAuthsToCluster
    AuditCmd->>TargetCluster: CompareRegistryCredentialAuthsToCluster
  end

  AuditCmd-->>User: account header, per-cluster staleness, verify tables, summary counts
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error PII (email addresses) exposed in logs: pkg/controller/pullsecret.go lines 212, 181, 309 log user email addresses to stdout; cmd/cluster/pullsecretstatus.go line 286 logs account email in audit output. Remove email addresses from logged output. Use masked values (e.g., "user@*.com" or just registry count) instead of actual email addresses in verification results and audit output.
Docstring Coverage ⚠️ Warning Docstring coverage is 40.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main addition of three new subcommands (update, audit, validate) under the pull-secret command group.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in the PR use standard Go testing.T and are static, descriptive strings with no dynamic values. The PR adds 30 test functions across 2 files (pkg/controller/pullsecret_test.go and cm...
Test Structure And Quality ✅ Passed Tests follow quality requirements: 29 unit tests with single responsibility (ValidateRequiredAuths, CompareThreeWay, MergePullSecretAuths, etc.), no cluster resources, meaningful assertion messages...
Microshift Test Compatibility ✅ Passed This PR adds only standard Go unit tests (using testing.T), not Ginkgo e2e tests. The new test files are pkg/controller/pullsecret_test.go and cmd/cluster/pullsecretstatus_test.go, both using Go's...
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR does not add Ginkgo e2e tests. All new tests are standard Go unit tests using the testing package (29 tests in pullsecret_test.go, 1 in pullsecretstatus_test.go). No SNO compatibility check needed.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CLI tool for cluster pull-secret management with no deployment manifests, workload definitions, or scheduling constraints that would break non-HA topologies.
Ote Binary Stdout Contract ✅ Passed PR introduces no OTE binary stdout contract violations. All output uses IOStreams (via o.Out), os.Stderr, or io.Writer parameters. No main(), init(), or TestMain() functions. Logger correctly confi...
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds standard Go unit tests (not Ginkgo e2e tests). The check for IPv6/disconnected network compatibility applies only to Ginkgo e2e tests, which this PR does not contain.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB) found. String equality comparisons are used only for verification/reporting output, not security gates. Standard encoding/base64 used...
Container-Privileges ✅ Passed This PR contains only Go source code (.go) and documentation (.md) files. No Kubernetes manifests, container specifications, or Dockerfiles were added or modified. The pull-secret controller code h...
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from Makdaam and rolandmkunkel June 15, 2026 23:06
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nephomaniac
Once this PR has been reviewed and has the lgtm label, please assign nikokolas3270 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nephomaniac

Copy link
Copy Markdown
Contributor Author

SOP update PR: https://github.com/openshift/ops-sop/pull/4035 (on hold until this PR merges)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (1)
pkg/controller/pullsecret.go (1)

386-420: ⚡ Quick win

Sort comparison rows before rendering or asserting sync status.

CompareThreeWay appends results from Go map iteration, so the table order changes between runs. Sort result.Auths by Registry before returning to keep CLI output and expect tests deterministic.

Proposed fix
+	sort.Slice(result.Auths, func(i, j int) bool {
+		return result.Auths[i].Registry < result.Auths[j].Registry
+	})
+
 	return result, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/pullsecret.go` around lines 386 - 420, The result slice
`result.Auths` is populated by iterating over the `ocmAuths` map, which has
randomized iteration order in Go, causing non-deterministic output order across
runs. After the for loop completes and all `ThreeWayAuthState` entries have been
appended to `result.Auths`, sort the slice by the `Registry` field before
returning to ensure deterministic order in CLI output and test assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/pullsecretstatus.go`:
- Around line 247-263: The code logs verification failures for
CompareAccessTokenAuthsToCluster and CompareRegistryCredentialAuthsToCluster but
does not set cr.err when these errors occur, causing the final audit report to
omit the failure details. Set cr.err when verifyErr is not nil in both the
access token verification block (around line 247-255) and the registry
credential verification block (around line 291-300), ensuring that cluster
verification failures are properly surfaced in the audit report instead of
silently continuing with only a warning log.
- Around line 291-303: The code currently skips calling renderPSStatus when
validation succeeds (in the else if hasCheck branch where cr.err is nil), which
causes stale status information to be omitted from the output and prevents
staleCount from being properly tracked. Refactor the conditional logic so that
renderPSStatus is called unconditionally for every cluster before checking
validation results. Move the renderPSStatus call outside the conditional
branches so it executes before the hasCheck condition is evaluated, then proceed
with rendering the validation check tables (renderCheckTable calls for
cr.accessTokenResult and cr.regCredResult) only when validation succeeded.
- Around line 87-91: The PreRunE validation function only checks that at least
one of --cluster-id or --account-id is provided, but does not enforce that they
are mutually exclusive. Replace the current validation logic to use
cmd.Flags().Changed() to detect which flags were explicitly set by the user, and
return an error if both flags are provided simultaneously or if neither flag is
provided. This ensures that exactly one flag is required and prevents the silent
fallback behavior where --account-id is used when both flags are set.

In `@cmd/cluster/replacepullsecret.go`:
- Around line 698-725: The code properly tracks verification results in
`atAllMatch` and `rcAllMatch` boolean flags—they remain false when verification
fails or mismatches are found. However, the `op.PullSecretUpdated` result branch
(around lines 810-815) unconditionally prints "All ACCESS TOKEN and REGISTRY
CREDENTIAL auths now match" without checking these flags, causing false success
reports even when errors or mismatches occurred. Fix this by wrapping the
success message print statement so it only executes when both `atAllMatch` and
`rcAllMatch` are true. This ensures verification errors or mismatches prevent
the success message from being printed.
- Around line 603-615: The code prompts the user to choose whether to use the
target cluster pull secret as the base for Hive secret restoration via
ResolveExistingPullSecret and ConfirmPrompt, but the user's choice and
existingData are discarded without being applied to the actual Hive update.
Store the result of the ConfirmPrompt call (whether the user selected YES or NO)
and use it to determine whether to merge the existingData with the OCM pull
secret or use OCM auths only when performing the Hive secret update operation.
This ensures the selected restoration base is actually plumbed through and
applied to prevent dropping customer/operator-added auths.
- Around line 508-510: When GetRegistryCredentials fails or returns no
credentials in the else block (lines 508-510), the code only logs a warning but
does not update the allInSync flag. This causes the command to incorrectly
report "nothing to update" if allInSync was previously set to true from other
comparisons. Fix this by explicitly setting allInSync to false whenever registry
credential comparison cannot be completed, ensuring that incomplete verification
is not treated as being in sync.
- Line 293: The error returned from the FetchAccessTokenOp function call is
being ignored with the underscore placeholder, but pullSecret is subsequently
used for Hive/ManifestWork updates. Capture the error return value instead of
ignoring it, check if it is not nil, and either return the error or mark the
operation as failed before proceeding. This ensures pullSecret is only used when
the FetchAccessTokenOp operation actually succeeds, rather than relying on
implicit side effects.
- Around line 776-786: The PostCmdOptions.Run() method in servicelog can prompt
for user confirmation and return nil even when the user declines, causing the
code to incorrectly report success. To fix this, add a non-interactive flag
(such as SkipPrompts or similar) to PostCmdOptions in cmd/servicelog/post.go
that prevents any interactive prompts, then set this flag to true when creating
the postCmd in replacepullsecret.go so the internal service log posting fails
explicitly rather than silently declining. Additionally, correct the fallback
command string to use the cluster ID flag as "-C" instead of "-i", and ensure
that any duplicate-service-log prompt in the Post command also respects the
skip-prompts flag.

In `@docs/osdctl_cluster_pull-secret_validate.md`:
- Around line 25-34: The documentation examples are using the deprecated command
name `osdctl cluster validate-pull-secret-ext` instead of the new command name.
Replace all three occurrences of `osdctl cluster validate-pull-secret-ext` with
`osdctl cluster pull-secret validate` in the examples section, keeping all the
command-line flags and arguments (--cluster-id, --reason, --skip-access-token,
--skip-registry-creds, --skip-service-logs) unchanged.

In `@pkg/controller/pullsecret_test.go`:
- Around line 26-64: The tests are using a helper function validateRequiredKeys
that duplicates the logic of the production function ValidateRequiredAuths,
which means the tests would not catch if ValidateRequiredAuths broke. Remove the
validateRequiredKeys helper function and refactor all three test functions
(TestValidateRequiredAuths_AllPresent, TestValidateRequiredAuths_SomeMissing,
and TestValidateRequiredAuths_Empty) to call ValidateRequiredAuths directly
instead. Build the appropriate input type (map[string]*amv1.AccessTokenAuth)
that ValidateRequiredAuths expects, populating it with the necessary test data,
and verify the returned missing keys match the expected values.

In `@pkg/controller/pullsecret.go`:
- Around line 732-738: The BuildPullSecretFromSources function does not accept
an account email parameter, and registry-credential entries are created with
empty Email fields, causing email mismatches during later verification. Add an
account email parameter to the BuildPullSecretFromSources function signature and
pass this account email to populate the Email field when handling registry
credentials throughout the function, including where registry credentials are
processed and added to the pull secret.
- Around line 879-891: The error handling at
pkg/controller/pullsecret.go#L879-L891 treats all Get() errors the same way
(attempting secret creation), but this conflates NotFound with
permission/timeout/API errors. Use apierrors.IsNotFound(err) to distinguish:
only create the secret if the error is NotFound; return the original error for
other failures like RBAC denials. Apply the same pattern at
pkg/controller/pullsecret.go#L550-L554 to report read failures separately from
genuinely missing secrets. At pkg/controller/pullsecretop.go#L160-L163, check
apierrors.IsNotFound(err) before logging "not found" to avoid misreporting RBAC
or transient errors. At pkg/controller/pullsecretop.go#L206-L225, only fall back
to alternative secret sources (Hive to target, or target to OCM-only) when
apierrors.IsNotFound(err) is true; for other errors, return and fail rather than
silently proceeding.
- Around line 858-866: The code attempts to assign values into existing.Auths
without checking if the map is nil, which will panic when valid JSON like {} or
{"auths":null} is unmarshaled. Before the for loop that ranges over
incoming.Auths and assigns values to existing.Auths, add a check to initialize
existing.Auths if it is nil. Ensure that both existing.Auths and incoming.Auths
are non-nil maps before attempting to merge them in the loop that assigns
incoming.Auths entries into existing.Auths.
- Around line 1112-1135: The HostedCluster.spec.pullSecret.name is being
rewritten unconditionally without first verifying that the Secret it references
actually exists and was updated. Refactor the logic to use two passes: first,
before the main loop, extract the current pullSecret.name from the HostedCluster
manifest; then, within the loop where Secret manifests are processed, only
update the Secret that matches the actual pullSecret.name from the HostedCluster
(not just any Secret matching secretNamePrefix); if the referenced Secret is not
found after scanning all manifests, return an error before updating the
HostedCluster reference; only after successfully finding and updating the exact
referenced Secret should you update the HostedCluster.Spec.PullSecret.Name to
newSecretName.
- Around line 608-613: The ListOwnerSubscriptions function currently only
fetches a single page of 100 results without implementing pagination, which
means accounts with more than 100 subscriptions will silently return incomplete
data. Modify the function to loop through all available pages by implementing
pagination using the .Page() method, continuing to fetch results until
response.Size() returns fewer items than the page size (100) or until all
response.Total() items have been collected. Follow the same pagination pattern
already implemented elsewhere in the codebase (e.g., pkg/utils/ocm.go lines
154–159) to accumulate subscriptions from each page into the result slice before
returning.
- Around line 743-760: When parsing the existingSecret using json.Unmarshal in
this code block, the current logic silently ignores unmarshal errors with the
condition err == nil, causing malformed secrets to be dropped entirely rather
than preserved. Change this behavior to return an error when json.Unmarshal
fails instead of silently skipping the merge operation, ensuring that malformed
existing secrets are properly handled and do not result in loss of
customer-added registries. Replace the err == nil condition to detect and return
the error when the existing secret cannot be parsed.
- Around line 985-1036: The SyncSet created by kubeCli.Create is not cleaned up
if errors occur in subsequent operations (like the kubeCli.Get call, the sync
polling loop, or timeout). Register a deferred cleanup operation immediately
after the successful kubeCli.Create call that deletes the syncSet, ensuring it
is removed even if later error paths return early. Then remove or refactor the
delete operation at the end of the function (the kubeCli.Delete call around line
1030) so it is not duplicated. This ensures the created SyncSet is always
cleaned up regardless of which error path is taken, preventing it from blocking
future runs.
- Around line 595-599: The accountID parameter is directly interpolated into OCM
search filter strings via fmt.Sprintf without validation, creating an injection
vulnerability when the value originates from CLI input. Add allow-list
validation of the accountID format (e.g., UUID pattern matching the expected OCM
account ID format) at the trust boundary before any fmt.Sprintf calls. This
validation must be applied at three locations: the search filter construction in
CountOwnerClusters (pkg/controller/pullsecret.go:595), the search filter
construction in ListOwnerSubscriptions (pkg/controller/pullsecret.go:609), and
the search filter construction in GetRegistryCredentials
(pkg/utils/utils.go:297). Consider centralizing the validation in a single
helper function and calling it before each fmt.Sprintf that uses accountID, or
validate it at the entry point where the CLI value is received.

In `@pkg/controller/pullsecretop.go`:
- Around line 131-150: The CheckCanI function only sets op.AllOK = false when
op.DryRun is true, but live operations rely on op.AllOK to gate mutations. Move
the op.AllOK = false assignments outside of the if op.DryRun conditional blocks
so they apply in both dry-run and live modes. Specifically, after the error
check from the SelfSubjectAccessReview creation, set op.AllOK = false
unconditionally when err is not nil (before the dry-run output), and when the
access is denied (when !allowed is true), set op.AllOK = false unconditionally
before or after the dry-run formatting. The output formatting can remain
dry-run-specific; only the flag setting should apply universally.

---

Nitpick comments:
In `@pkg/controller/pullsecret.go`:
- Around line 386-420: The result slice `result.Auths` is populated by iterating
over the `ocmAuths` map, which has randomized iteration order in Go, causing
non-deterministic output order across runs. After the for loop completes and all
`ThreeWayAuthState` entries have been appended to `result.Auths`, sort the slice
by the `Registry` field before returning to ensure deterministic order in CLI
output and test assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 665be379-01d8-4645-9398-edf9c17e87d5

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc345e and 67b45f6.

📒 Files selected for processing (17)
  • cmd/cluster/cmd.go
  • cmd/cluster/pullsecret.go
  • cmd/cluster/pullsecretstatus.go
  • cmd/cluster/replacepullsecret.go
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/validatepullsecretext.go
  • cmd/servicelog/post.go
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_pull-secret.md
  • docs/osdctl_cluster_pull-secret_audit.md
  • docs/osdctl_cluster_pull-secret_update.md
  • docs/osdctl_cluster_pull-secret_validate.md
  • go.mod
  • pkg/controller/pullsecret.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go

Comment thread cmd/cluster/pullsecretstatus.go
Comment thread cmd/cluster/pullsecretstatus.go
Comment thread cmd/cluster/pullsecretstatus.go
Comment thread cmd/cluster/replacepullsecret.go Outdated
Comment thread cmd/cluster/replacepullsecret.go
Comment thread pkg/controller/pullsecret.go
Comment thread pkg/controller/pullsecret.go
Comment thread pkg/controller/pullsecret.go Outdated
Comment thread pkg/controller/pullsecret.go Outdated
Comment thread pkg/controller/pullsecretop.go

@clcollins clcollins left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost entirely claude-identified issues, but nothing major, if you don't mind taking a look.

Comment thread pkg/controller/pullsecret.go Outdated
return fmt.Errorf("cannot update pull-secret within ManifestWork: %w", err)
}

fmt.Fprintf(out, "ManifestWork updated. Waiting 60 seconds for secret to sync on hosted cluster...\n")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just go check/watch rather than just wait 60s? It might get synced sooner and save folks some time just waiting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — replaced 60s sleep with a poll loop that checks ManifestWork Applied condition. Prompts user every 60s to continue waiting or abort.

Comment thread pkg/controller/pullsecret.go Outdated
//
// If existingSecret is non-nil, registries already present in the cluster that
// are not in either OCM source are preserved.
func BuildPullSecretFromSources(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude thinks this is dead code?

It is defined but never called from any command. The update path only uses FetchAccessTokenOp (access token auths only). This is consistent with transferowner.go which lso only writes access token auths, so it's not a bug — but it's dead code that should either be used or removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — BuildPullSecretFromSources and associated types removed.

Comment thread pkg/controller/pullsecret.go Outdated
if err := json.Unmarshal(manifest.Raw, secret); err != nil {
return err
}
if strings.Contains(secret.Name, secretNamePrefix) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says it's a prefix but strings.Contains only matches the existence in the string. Should use strings.HasPrefix?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — changed to strings.HasPrefix.

Comment thread pkg/controller/pullsecret.go
Comment thread pkg/controller/pullsecret.go Outdated
time.Sleep(syncPollInterval)
}
if !isSynced {
return fmt.Errorf("SyncSet %s/%s failed to sync after %d attempts", hiveNamespace, SyncSetName, checkSyncMaxAttempts)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says the SyncSet crated a line 985 is never cleaned up if there's a timeout - the delete only executes in the if isSynced, so this would leave a stale SyncSet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — SyncSet cleanup now runs unconditionally before the isSynced check.

Comment thread cmd/cluster/replacepullsecret.go Outdated
op.CheckCanI(ctx, masterKubeClientSet, infraName, "update", "secrets", "", resolvedHiveNS)
op.CheckCanI(ctx, masterKubeClientSet, infraName, "create", "secrets", "", resolvedHiveNS)

op.Would("create SyncSet %s/pull-secret-replacement to sync to %s", resolvedHiveNS, cluster.Name())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude again: The syncset name here is the old one, and should be pull-secret-update not pull-secret-replacement.

And bonus points if it's not hard-coded, for future proofing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed — uses controller.SyncSetName constant.

@nephomaniac nephomaniac force-pushed the ROSAENG-2057-pull-secret-update branch 2 times, most recently from 2af9003 to d0d96e8 Compare June 16, 2026 01:25

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
docs/osdctl_cluster_pull-secret_validate.md (1)

25-34: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Examples use the deprecated command name instead of the new command.

Lines 27, 30, and 33 show osdctl cluster validate-pull-secret-ext, but since this documentation page describes the new osdctl cluster pull-secret validate command, the examples should use the new command name instead.

🔧 Proposed fix for the examples
  # Compare OCM Access-Token, OCM Registry-Credentials, and OCM Account Email against cluster's pull secret
-  osdctl cluster validate-pull-secret-ext --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ"
+  osdctl cluster pull-secret validate --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ"

  # Exclude Access-Token, and Registry-Credential checks...
-  osdctl cluster validate-pull-secret-ext --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ" --skip-access-token --skip-registry-creds
+  osdctl cluster pull-secret validate --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ" --skip-access-token --skip-registry-creds

  # Skip sending service logs (useful for testing)
-  osdctl cluster validate-pull-secret-ext --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ" --skip-service-logs
+  osdctl cluster pull-secret validate --cluster-id ${CLUSTER_ID} --reason "OSD-XYZ" --skip-service-logs
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/osdctl_cluster_pull-secret_validate.md` around lines 25 - 34, The
documentation examples use the deprecated command name osdctl cluster
validate-pull-secret-ext instead of the new command osdctl cluster pull-secret
validate. In all three examples shown (the basic example with cluster-id and
reason, the example with skip-access-token and skip-registry-creds flags, and
the example with skip-service-logs flag), replace the deprecated command name
osdctl cluster validate-pull-secret-ext with the new command name osdctl cluster
pull-secret validate while keeping all the flags and options unchanged.
🧹 Nitpick comments (1)
pkg/controller/pullsecret.go (1)

426-428: 💤 Low value

Duplicate docstring line.

Line 427 repeats the docstring from line 426.

Suggested fix
 // RenderThreeWayComparison prints the three-way comparison in a readable format.
-// RenderThreeWayComparison prints the three-way comparison in a readable format.
 // sourceLabel identifies the OCM source (e.g. "ACCESS TOKEN AUTHS", "REGISTRY CREDENTIAL AUTHS").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/pullsecret.go` around lines 426 - 428, The docstring for the
RenderThreeWayComparison function contains a duplicate comment line. Remove the
second occurrence of the "RenderThreeWayComparison prints the three-way
comparison in a readable format." comment line, keeping only one instance of
this description before the sourceLabel documentation line.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@docs/osdctl_cluster_pull-secret_validate.md`:
- Around line 25-34: The documentation examples use the deprecated command name
osdctl cluster validate-pull-secret-ext instead of the new command osdctl
cluster pull-secret validate. In all three examples shown (the basic example
with cluster-id and reason, the example with skip-access-token and
skip-registry-creds flags, and the example with skip-service-logs flag), replace
the deprecated command name osdctl cluster validate-pull-secret-ext with the new
command name osdctl cluster pull-secret validate while keeping all the flags and
options unchanged.

---

Nitpick comments:
In `@pkg/controller/pullsecret.go`:
- Around line 426-428: The docstring for the RenderThreeWayComparison function
contains a duplicate comment line. Remove the second occurrence of the
"RenderThreeWayComparison prints the three-way comparison in a readable format."
comment line, keeping only one instance of this description before the
sourceLabel documentation line.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4af9a563-d3e6-45ed-81fd-02764838d0e3

📥 Commits

Reviewing files that changed from the base of the PR and between 67b45f6 and 2af9003.

📒 Files selected for processing (17)
  • cmd/cluster/cmd.go
  • cmd/cluster/pullsecret.go
  • cmd/cluster/pullsecretstatus.go
  • cmd/cluster/replacepullsecret.go
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/validatepullsecretext.go
  • cmd/servicelog/post.go
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_pull-secret.md
  • docs/osdctl_cluster_pull-secret_audit.md
  • docs/osdctl_cluster_pull-secret_update.md
  • docs/osdctl_cluster_pull-secret_validate.md
  • go.mod
  • pkg/controller/pullsecret.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go
✅ Files skipped from review due to trivial changes (2)
  • cmd/cluster/validatepullsecret.go
  • docs/osdctl_cluster.md
🚧 Files skipped from review as they are similar to previous changes (9)
  • cmd/servicelog/post.go
  • cmd/cluster/validatepullsecretext.go
  • cmd/cluster/cmd.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go
  • go.mod
  • cmd/cluster/pullsecretstatus.go
  • cmd/cluster/pullsecret.go
  • cmd/cluster/replacepullsecret.go

@nephomaniac nephomaniac force-pushed the ROSAENG-2057-pull-secret-update branch 8 times, most recently from 9075e2f to f9ca732 Compare June 16, 2026 04:02

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
pkg/controller/pullsecret.go (1)

897-900: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SyncSet not cleaned up if ClusterSync.Get fails.

The return at line 899 exits before reaching the cleanup at line 922, leaving an orphaned pull-secret-update SyncSet that can block future runs and continue applying unintended state. Use defer immediately after successful creation to ensure cleanup on all exit paths.

Proposed fix
 if err := kubeCli.Create(ctx, syncSet); err != nil {
     return fmt.Errorf("failed to create SyncSet: %w", err)
 }
+defer func() {
+    if delErr := kubeCli.Delete(ctx, syncSet); delErr != nil {
+        fmt.Fprintf(out, "\n%s failed to delete SyncSet %s/%s: %v\n", psColorWarn("[WARN]"), hiveNamespace, SyncSetName, delErr)
+    }
+}()
 // Use the server-side creation timestamp for comparison...
 syncSetCreatedAt := syncSet.CreationTimestamp.Time

Then remove the manual delete at lines 921-924 since defer handles it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/pullsecret.go` around lines 897 - 900, The current code
returns early from the kubeCli.Get call at line 899 before reaching the cleanup
code at lines 921-924, which leaves an orphaned pull-secret-update SyncSet that
blocks future runs. Move the SyncSet cleanup logic (the delete operations
currently at lines 921-924) into a defer statement that is placed immediately
after the pull-secret-update SyncSet is successfully created, ensuring cleanup
happens on all exit paths including error returns. Then remove the manual delete
code at lines 921-924 since the defer will now handle it.
cmd/cluster/replacepullsecret.go (1)

796-801: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Incorrect fallback command syntax for manual service log posting.

The fallback command uses -i %s but -i is a boolean flag for internal-only mode. The cluster ID should be passed with -C:

-			fmt.Fprintf(out, "To send manually: osdctl servicelog post -i %s -p MESSAGE=\"Pull secret replaced for cluster owner %q.\"\n", o.clusterID, ownerUsername)
+			fmt.Fprintf(out, "To send manually: osdctl servicelog post -C %s -i -p MESSAGE=\"Pull secret replaced for cluster owner %q. Reason: %s\"\n", o.clusterID, ownerUsername, o.reason)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/replacepullsecret.go` around lines 796 - 801, The fallback
command string in the fmt.Fprintf call within the error handling block (when
postCmd.Run() fails) incorrectly uses the `-i` flag followed by a cluster ID
format specifier. The `-i` flag is a boolean flag for internal-only mode, not
for passing the cluster ID. Replace the `-i %s` flag with `-C %s` in the
fallback command string to correctly pass the cluster ID argument to the osdctl
servicelog post command.
pkg/controller/pullsecretop.go (1)

113-119: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

CheckCanI failure paths can bypass operation gating.

When clientset is nil or SAR creation errors (Line 113–119, Line 132–138), the method returns false but does not mark op.AllOK false or append a failure. Live flows that gate on AllOK can continue farther than intended.

Suggested fix
 	if clientset == nil {
+		op.AllOK = false
+		op.Failures = append(op.Failures, fmt.Sprintf("auth can-i %s %s in %s failed: client not available", verb, resource, nsLabel(namespace)))
 		if op.DryRun {
 			fmt.Fprintf(op.Out, "%s %s %s: auth can-i %s %s in %s (client not available)\n",
 				opColorDryRun("[Dry Run]"), opColorWarn("[SKIP]"), systemLabel, verb, resource, nsLabel(namespace))
 		}
 		return false
 	}
@@
 	result, err := clientset.AuthorizationV1().SelfSubjectAccessReviews().Create(ctx, review, metav1.CreateOptions{})
 	if err != nil {
+		op.AllOK = false
+		op.Failures = append(op.Failures, fmt.Sprintf("auth can-i %s %s in %s failed: %v", verb, resource, nsLabel(namespace), err))
 		if op.DryRun {
 			fmt.Fprintf(op.Out, "%s %s %s: auth can-i %s %s in %s (%v)\n",
 				opColorDryRun("[Dry Run]"), opColorWarn("[SKIP]"), systemLabel, verb, resource, nsLabel(namespace), err)
 		}
 		return false
 	}

Also applies to: 132-138

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/pullsecretop.go` around lines 113 - 119, The CheckCanI method
has two failure paths that bypass operation gating by returning false without
updating the operation state. When clientset is nil (lines 113-119) and when SAR
creation errors occur (lines 132-138), the method should mark op.AllOK as false
and append a failure record in addition to returning false. This ensures that
operations gating on the AllOK flag will not proceed farther than intended when
these failure conditions occur.
cmd/cluster/pullsecretstatus.go (1)

88-94: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use explicit Cobra flag-state checks for mutual exclusivity.

Line 89–94 validates using resolved option values; that can mis-detect explicit user input when values are pre-populated upstream. Use cmd.Flags().Changed(...) and enforce exactly one flag explicitly set.

Suggested fix
 		PreRunE: func(cmd *cobra.Command, args []string) error {
-			if ops.clusterID == "" && ops.accountID == "" {
-				return fmt.Errorf("one of --cluster-id or --account-id is required")
-			}
-			if ops.clusterID != "" && ops.accountID != "" {
-				return fmt.Errorf("--cluster-id and --account-id are mutually exclusive")
+			clusterSet := cmd.Flags().Changed("cluster-id")
+			accountSet := cmd.Flags().Changed("account-id")
+			if clusterSet == accountSet {
+				return fmt.Errorf("exactly one of --cluster-id or --account-id is required")
 			}

Based on learnings, explicit flag usage in cmd/**/*.go should be determined with Cobra’s Flags().Changed(...), not derived values.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/pullsecretstatus.go` around lines 88 - 94, The PreRunE validation
function is checking resolved option values (ops.clusterID and ops.accountID) to
enforce mutual exclusivity, but this approach can misdetect explicit user input
when values are pre-populated upstream. Replace these value-based checks with
Cobra's Flags().Changed() method to explicitly verify which flags were actually
set by the user. Specifically, use cmd.Flags().Changed("cluster-id") and
cmd.Flags().Changed("account-id") to ensure exactly one of these two flags was
explicitly provided, rather than relying on whether the resulting ops fields are
empty or populated.

Source: Learnings

🧹 Nitpick comments (2)
cmd/cluster/pullsecretstatus_test.go (1)

10-13: ⚡ Quick win

Use a fresh command per subtest to avoid cross-case flag state leakage.

cmd is shared across all table cases, so Cobra flag state can carry between runs. This weakens coverage for explicit-flag behavior and can mask/introduce false results if validation shifts to Flags().Changed(...) (the safer pattern for inherited/autopopulated values). Create newCmdPullSecretAudit(...) inside each t.Run and set only flags intended for that case.

Based on learnings: when implementing Cobra command logic, explicit flag usage should be detected via Cobra’s Changed(...) API instead of resolved values that may be auto-populated.

Also applies to: 58-68

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/pullsecretstatus_test.go` around lines 10 - 13, The `cmd`
variable is created once outside the test cases and reused across multiple
`t.Run` subtests, allowing Cobra flag state to leak between test runs and
weakening test coverage. Move the `newCmdPullSecretAudit(streams, nil)` call
inside each `t.Run` block so that every subtest receives a fresh command
instance with clean flag state. Apply this same fix to all affected test cases
(also noted at lines 58-68), creating a new command for each subtest rather than
sharing a single instance across all table cases.

Source: Learnings

cmd/cluster/replacepullsecret.go (1)

835-847: 💤 Low value

Redundant YES confirmation prompt when using --force on no-op scenario.

When --force is used and the pull secret is already up-to-date, the user is prompted to type YES twice:

  1. First at lines 769-773 (no-op check before service log)
  2. Again at lines 843-847 (result section)

This creates a confusing UX. Since the first prompt already bypassed the no-op check with --force, the second prompt is unnecessary.

Proposed fix
 		} else if op.PullSecretUpToDate && !o.dryrun {
 			noopLabel := color.New(color.FgBlue, color.Bold).SprintFunc()
 			fmt.Fprintf(out, "%s%s All %s and %s auths match — nothing to update.\n",
 				prefix, colorOK("[OK]"), noopLabel("ACCESS TOKEN"), noopLabel("REGISTRY CREDENTIAL"))
-			if !o.force {
-				return nil
-			}
-			op.Warn("--force specified — proceeding with update despite no changes needed")
-			fmt.Fprintf(out, "Type YES to confirm: ")
-			var response string
-			if _, scanErr := fmt.Scanln(&response); scanErr != nil || response != "YES" {
-				return fmt.Errorf("operation aborted by user")
-			}
 		} else if op.AuthDiffCount > 0 && o.dryrun {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/replacepullsecret.go` around lines 835 - 847, The code is
prompting the user to type YES twice when using --force with an up-to-date pull
secret: once at the initial no-op check (lines 769-773) and again in the result
section (lines 843-847). Since the user has already confirmed with --force at
the first location, the second confirmation prompt in the else-if block where
op.PullSecretUpToDate is true should be removed to avoid confusing the user with
a redundant prompt.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/pullsecretstatus.go`:
- Around line 291-325: The summary line reports the validated count using
len(checkResults), which includes all attempted validations even those that
failed with errors or incomplete checks. Instead, track the actual count of
successful validations by adding a validatedCount variable (initialized to 0
alongside staleCount) and incrementing it only when hasCheck is true and cr.err
is nil. Then update the final fprintf output to use validatedCount instead of
len(checkResults) to accurately report only the clusters that passed validation.
- Line 114: The MarkFlagRequired call on the cmd object is discarding its error
return value using a blank identifier, which violates Go's error-handling
requirements. Replace the blank identifier pattern with proper error handling by
checking if the error is not nil and either logging the error or returning it
appropriately to ensure that any failure to mark the "reason" flag as required
is properly handled rather than silently ignored.

In `@pkg/controller/pullsecret_test.go`:
- Around line 20-30: Stop ignoring errors from fixture-construction calls in the
test file. In the function containing json.Marshal (around line 21) and the
buildAuthMap function containing Build (around line 29), capture the error
returns instead of discarding them with blank identifiers. For json.Marshal,
assign the error to a variable and check it with a test assertion like
require.NoError or testing.Fatal. Similarly, for the Build() method call,
capture and validate the error return. Apply the same fix pattern throughout the
file wherever json.Marshal and Build errors are currently ignored.

In `@pkg/controller/pullsecret.go`:
- Around line 1030-1040: The code retrieves oldPullSecret from secret.Data at
the ".dockerconfigjson" key without checking if it exists; when the key is
missing, oldPullSecret is nil and causes MergePullSecretAuths to fail when
trying to parse it as JSON. Before calling MergePullSecretAuths, check if the
".dockerconfigjson" key is present in secret.Data, and if it's missing or nil,
initialize oldPullSecret with a proper empty JSON auths structure (such as a
JSON object with an empty "auths" field) before passing it to
MergePullSecretAuths.
- Around line 562-566: The current code in the secret retrieval logic treats all
errors from the Secrets Get call as NotFound errors, which masks permission
denials and transient API failures. Instead of treating any error as "not
found", use apierrors.IsNotFound(err) to specifically check if the error is a
NotFound error. If the error is not a NotFound error (such as RBAC permission
denial or API transient failures), return the error to surface it to the
operator. Only proceed with the "secret not found" warning message when the
error is specifically a NotFound error.

In `@pkg/controller/pullsecretop.go`:
- Around line 162-165: The current error handling treats all errors from
clientset.CoreV1().Secrets(namespace).Get() as "not found" conditions, masking
actual Kubernetes API failures like permission or timeout errors. Replace the
generic error check with apierrors.IsNotFound(err) to distinguish actual
NotFound errors from other failures. When a NotFound error occurs, keep the
existing op.Warn() call; when a different error occurs, use op.Fail() instead to
properly report the actual API failure. Apply this pattern consistently at all
locations in the code where similar Get() calls are made on Kubernetes
resources, ensuring permission and network failures are logged distinctly from
legitimate missing resources.

---

Duplicate comments:
In `@cmd/cluster/pullsecretstatus.go`:
- Around line 88-94: The PreRunE validation function is checking resolved option
values (ops.clusterID and ops.accountID) to enforce mutual exclusivity, but this
approach can misdetect explicit user input when values are pre-populated
upstream. Replace these value-based checks with Cobra's Flags().Changed() method
to explicitly verify which flags were actually set by the user. Specifically,
use cmd.Flags().Changed("cluster-id") and cmd.Flags().Changed("account-id") to
ensure exactly one of these two flags was explicitly provided, rather than
relying on whether the resulting ops fields are empty or populated.

In `@cmd/cluster/replacepullsecret.go`:
- Around line 796-801: The fallback command string in the fmt.Fprintf call
within the error handling block (when postCmd.Run() fails) incorrectly uses the
`-i` flag followed by a cluster ID format specifier. The `-i` flag is a boolean
flag for internal-only mode, not for passing the cluster ID. Replace the `-i %s`
flag with `-C %s` in the fallback command string to correctly pass the cluster
ID argument to the osdctl servicelog post command.

In `@pkg/controller/pullsecret.go`:
- Around line 897-900: The current code returns early from the kubeCli.Get call
at line 899 before reaching the cleanup code at lines 921-924, which leaves an
orphaned pull-secret-update SyncSet that blocks future runs. Move the SyncSet
cleanup logic (the delete operations currently at lines 921-924) into a defer
statement that is placed immediately after the pull-secret-update SyncSet is
successfully created, ensuring cleanup happens on all exit paths including error
returns. Then remove the manual delete code at lines 921-924 since the defer
will now handle it.

In `@pkg/controller/pullsecretop.go`:
- Around line 113-119: The CheckCanI method has two failure paths that bypass
operation gating by returning false without updating the operation state. When
clientset is nil (lines 113-119) and when SAR creation errors occur (lines
132-138), the method should mark op.AllOK as false and append a failure record
in addition to returning false. This ensures that operations gating on the AllOK
flag will not proceed farther than intended when these failure conditions occur.

---

Nitpick comments:
In `@cmd/cluster/pullsecretstatus_test.go`:
- Around line 10-13: The `cmd` variable is created once outside the test cases
and reused across multiple `t.Run` subtests, allowing Cobra flag state to leak
between test runs and weakening test coverage. Move the
`newCmdPullSecretAudit(streams, nil)` call inside each `t.Run` block so that
every subtest receives a fresh command instance with clean flag state. Apply
this same fix to all affected test cases (also noted at lines 58-68), creating a
new command for each subtest rather than sharing a single instance across all
table cases.

In `@cmd/cluster/replacepullsecret.go`:
- Around line 835-847: The code is prompting the user to type YES twice when
using --force with an up-to-date pull secret: once at the initial no-op check
(lines 769-773) and again in the result section (lines 843-847). Since the user
has already confirmed with --force at the first location, the second
confirmation prompt in the else-if block where op.PullSecretUpToDate is true
should be removed to avoid confusing the user with a redundant prompt.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e2818a22-4925-41df-a161-08b810760fe8

📥 Commits

Reviewing files that changed from the base of the PR and between 652e329 and f9ca732.

📒 Files selected for processing (18)
  • cmd/cluster/cmd.go
  • cmd/cluster/pullsecret.go
  • cmd/cluster/pullsecretstatus.go
  • cmd/cluster/pullsecretstatus_test.go
  • cmd/cluster/replacepullsecret.go
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/validatepullsecretext.go
  • cmd/servicelog/post.go
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_pull-secret.md
  • docs/osdctl_cluster_pull-secret_audit.md
  • docs/osdctl_cluster_pull-secret_update.md
  • docs/osdctl_cluster_pull-secret_validate.md
  • go.mod
  • pkg/controller/pullsecret.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_cluster.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • cmd/servicelog/post.go
  • go.mod
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/cmd.go

Comment thread cmd/cluster/pullsecretstatus.go Outdated
Comment thread cmd/cluster/pullsecretstatus.go
Comment thread pkg/controller/pullsecret_test.go Outdated
Comment thread pkg/controller/pullsecret.go
Comment thread pkg/controller/pullsecret.go
Comment thread pkg/controller/pullsecretop.go
@nephomaniac nephomaniac requested a review from clcollins June 16, 2026 04:56
@nephomaniac nephomaniac force-pushed the ROSAENG-2057-pull-secret-update branch from f9ca732 to 75fda6c Compare June 16, 2026 05:19

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cmd/cluster/replacepullsecret.go (1)

801-801: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Malformed fallback servicelog command: -i is a boolean flag.

The fallback command passes o.clusterID as an argument to -i, but -i (--internal) is a boolean flag that doesn't accept arguments. The cluster ID should be passed with -C. This generates an invalid CLI invocation like osdctl servicelog post -i abc123 -p ....

Proposed fix
-			fmt.Fprintf(out, "To send manually: osdctl servicelog post -i %s -p MESSAGE=\"Pull secret replaced for cluster owner %q.\"\n", o.clusterID, ownerUsername)
+			fmt.Fprintf(out, "To send manually: osdctl servicelog post -C %s -i -p MESSAGE=\"Pull secret replaced for cluster owner '%s'. Reason: %s\"\n", o.clusterID, ownerUsername, o.reason)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/replacepullsecret.go` at line 801, The fmt.Fprintf statement
constructs an invalid osdctl servicelog command by using the boolean flag `-i`
to pass the cluster ID, but `-i` does not accept arguments. Replace `-i %s` with
`-C %s` in the format string to correctly pass o.clusterID to the servicelog
command, as `-C` is the proper flag for specifying the cluster ID.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/cluster/replacepullsecret.go`:
- Around line 838-850: The code contains a duplicate --force confirmation
prompt. When op.PullSecretUpToDate is true and the user confirms with YES at the
first prompt (around line 776), the function should return to prevent reaching
this second confirmation block. Remove or comment out the redundant confirmation
check in the else if branch (lines 846-850 in the diff showing the fmt.Fprintf
prompt and fmt.Scanln logic) since the user has already confirmed their intent
to proceed with --force at the earlier location. This ensures the user only
confirms once for a single --force override operation.

---

Duplicate comments:
In `@cmd/cluster/replacepullsecret.go`:
- Line 801: The fmt.Fprintf statement constructs an invalid osdctl servicelog
command by using the boolean flag `-i` to pass the cluster ID, but `-i` does not
accept arguments. Replace `-i %s` with `-C %s` in the format string to correctly
pass o.clusterID to the servicelog command, as `-C` is the proper flag for
specifying the cluster ID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4d3b3abc-6ce2-4c5e-aa7e-d702beeed84d

📥 Commits

Reviewing files that changed from the base of the PR and between f9ca732 and 75fda6c.

📒 Files selected for processing (18)
  • cmd/cluster/cmd.go
  • cmd/cluster/pullsecret.go
  • cmd/cluster/pullsecretstatus.go
  • cmd/cluster/pullsecretstatus_test.go
  • cmd/cluster/replacepullsecret.go
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/validatepullsecretext.go
  • cmd/servicelog/post.go
  • docs/README.md
  • docs/osdctl_cluster.md
  • docs/osdctl_cluster_pull-secret.md
  • docs/osdctl_cluster_pull-secret_audit.md
  • docs/osdctl_cluster_pull-secret_update.md
  • docs/osdctl_cluster_pull-secret_validate.md
  • go.mod
  • pkg/controller/pullsecret.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_cluster.md
🚧 Files skipped from review as they are similar to previous changes (10)
  • cmd/cluster/validatepullsecretext.go
  • go.mod
  • cmd/cluster/pullsecret.go
  • cmd/cluster/validatepullsecret.go
  • cmd/cluster/pullsecretstatus_test.go
  • cmd/cluster/cmd.go
  • pkg/controller/pullsecret_test.go
  • pkg/controller/pullsecretop.go
  • pkg/controller/pullsecret.go
  • cmd/cluster/pullsecretstatus.go

Comment thread cmd/cluster/replacepullsecret.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2057-pull-secret-update branch 3 times, most recently from b9b5f07 to 1a446e7 Compare June 16, 2026 20:01
@nephomaniac nephomaniac force-pushed the ROSAENG-2057-pull-secret-update branch from 1a446e7 to b4dd9f9 Compare June 16, 2026 21:50
@openshift-ci

openshift-ci Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

@nephomaniac: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants