Skip to content

ROSAENG-2066: Add osdctl account aws-creds command#913

Open
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2066-aws-creds-diagnostic
Open

ROSAENG-2066: Add osdctl account aws-creds command#913
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2066-aws-creds-diagnostic

Conversation

@nephomaniac

@nephomaniac nephomaniac commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR is a continuation of the work originally started in #601 (July 2024) to port the aws-creds-rotate.sh bash script into osdctl. Since then, #828 added multi-environment OCM/backplane support (--hive-ocm-url), and #878 extracted SimulatePrincipalPolicy, --dry-run, --cluster-id, and SyncSet-based credential syncing into pkg/controller/rotatesecret.go. This PR refactors the original #601 work to incorporate the updates from #828 and #878, adding a diagnostic-first aws-creds command group with additional robustness and UX improvements.

Note that #878 already addresses the core requirements of ROSAENG-2066 and ROSAENG-2067. Reviewers should assess whether the enhancements below justify the added scope.

References: ROSAENG-2066, ROSAENG-2067

Built on: #601 (original), #828 (multi-env OCM), #878 (SREP-726, SimulatePrincipalPolicy)

Related: openshift/ops-sop#4023 (SOP update, on hold until this PR merges)

What this PR adds beyond #878

Adds osdctl account aws-creds with two subcommands:

  • snapshot — read-only diagnostic report of AWS IAM credentials, Hive secrets, CredentialRequests, and IAM permission simulation
  • rotate — credential rotation for osdManagedAdmin and/or osdCcsAdmin with pre-flight checks, interactive key management, sequential CR secret recreation with CCO health verification, and post-rotation validation

The existing rotate-secret command remains functional with a runtime warning pointing users to aws-creds.

Capability rotate-secret (post-#878) aws-creds (this PR)
Accept cluster ID Yes (--cluster-id) Yes (-C)
SimulatePrincipalPolicy preflight Yes Yes
Dry-run mode Yes Yes
Read-only diagnostic snapshot No Yes
Backplane AWS credentials (no --aws-profile) No — requires profile Yes
Interactive key deletion (max 2 keys) No — errors out Yes
CCS-only rotation (independent of managed-admin) No Yes
CR secret staleness detection (key ID comparison) No Yes
Sequential CR secret recreation with CCO verification No — bulk delete Yes
Post-rotation validation No Yes
Sensitive data redaction in logs No Yes
IO stream plumbing for testability No Yes
Context propagation for cancellation No Yes

Change Scope

Category Files Lines Added Lines Removed
New Go files 5 (aws-creds.go, aws-creds-rotate.go, aws-creds-snapshot.go, awscreds.go, awscreds_test.go) 3,373 0
Edited Go files 4 (cmd.go, rotate-secret.go, rotatesecret.go, rotatesecret_test.go) 1,215 226
New docs (md) 3 (aws-creds.md, aws-creds_rotate.md, aws-creds_snapshot.md) 180 0
Edited docs (md) 3 (README.md, osdctl_account.md, rotate-secret.md) 127 32
Total 15 files 4,895 258

Usage

# Read-only snapshot (safe, no changes)
osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod

# CR secrets only (fastest, no AWS client needed)
osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod --cr-secrets

# Dry-run rotation
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod --managed-admin --dry-run

# Rotate osdManagedAdmin
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --managed-admin

# Rotate CCS admin only
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --ccs-admin

# Rotate both
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --managed-admin --ccs-admin

# Refresh CR secrets only (no key rotation)
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --refresh-secrets

Example Output

osdctl account aws-creds snapshot
$ osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod

INFO Creating OCM connection
INFO Cluster resolved                  id=1abc2def3ghi4jkl5mno6pqr7stu8vwx name=my-ccs-cluster

==========================================================================
 Cluster: my-ccs-cluster (1abc2def3ghi4jkl5mno6pqr7stu8vwx)
 External ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
 CCS/BYOC: Yes
 Reason: OHSS-12345
==========================================================================
Continue? (y/N): y
INFO Connecting to Hive cluster
INFO Account claim resolved           claim=osd-creds-mgmt-abc123
INFO Setting up AWS client
INFO Using backplane for AWS credentials (no --aws-profile or --aws-use-env specified)
INFO Connecting to managed cluster via backplane

IAM Permission Check (SimulatePrincipalPolicy)
  Uses SimulatePrincipalPolicy to detect SCP or IAM policy restrictions
  that would block rotation or CCO credential provisioning.
  AWS caller (used by this tool): arn:aws:sts::123456789012:assumed-role/ManagedOpenShift-Support-abc123/RH-SRE-sre-user

  SimulatePrincipalPolicy for user: osdManagedAdmin-abc123

  Rotation tooling (IAM key management):
  ACTION                         RESULT
  ------------------------------ ----------
  iam:CreateAccessKey            Allow
  iam:DeleteAccessKey            Allow
  iam:GetUser                    Allow
  iam:ListAccessKeys             Allow
  ...                            (10 actions, all Allow)

  CredentialRequest actions (required by cluster operators):
  ACTION                                     RESULT     CREDENTIAL REQUEST(S)
  ------------------------------------------ ---------- ----------------------------------------
  ec2:RunInstances                           Allow      openshift-machine-api-aws
  s3:GetObject                               Allow      openshift-image-registry, velero-iam-credentials
  kms:Encrypt                                Allow      aws-ebs-csi-driver-operator, openshift-machine-api-aws
  route53:ChangeResourceRecordSets           Allow      openshift-ingress, cloud-ingress-operator-credentials-aws
  ...                                                   (90+ actions, all Allow)

  SimulatePrincipalPolicy for user: osdCcsAdmin (parent user, manages osdManagedAdmin)
  [OK] osdCcsAdmin has required rotation and CR permissions.

  Additional OSD CCS required services (SCP coverage):
  [OK] SCP allows all required OSD CCS services.

  [OK] No SCP or IAM policy restrictions detected.

==========================================================================
 AWS Account: 123456789012    Account CR: osd-creds-mgmt-abc123
 AWS Caller: arn:aws:sts::123456789012:assumed-role/ManagedOpenShift-Support-abc123/RH-SRE-sre-user
==========================================================================

Credential Request Secrets (managed cluster)
  Secrets referenced by AWS CredentialRequests. After rotation, these
  secrets are to be deleted so CCO recreates them with the new credentials.
  Cluster root credential (kube-system/aws-creds) matches Hive (AKIA...XXXX).

  CREDENTIAL REQUEST                       NAMESPACE                    SECRET                   AGE      KEY STATUS
  ---------------------------------------- ---------------------------- ------------------------ -------- --------------
  aws-ebs-csi-driver-operator              openshift-cluster-csi-driv.. ebs-cloud-credentials    1h       CURRENT
  cloud-credential-operator-iam-ro         openshift-cloud-credential.. cloud-credential-opera.. 1h       CURRENT
  openshift-image-registry                 openshift-image-registry     installer-cloud-creden.. 1h       CURRENT
  openshift-ingress                        openshift-ingress-operator   cloud-credentials        1h       CURRENT
  openshift-machine-api-aws                openshift-machine-api        aws-cloud-credentials    1h       CURRENT
  ...                                                                                                     (9 CRs total)

AWS Credentials
  IAM access keys and their corresponding Hive secrets which reference them.

  USER                     KEY ID           AGE    LAST USED  STATUS   REF'd by HIVE SECRET(S)            SYNC
  ------------------------ ---------------- ------ ---------- -------- ---------------------------------- ------
  osdManagedAdmin-abc123   AKIA...XXXX      1h     33m ago    Active   osd-creds-mgmt-abc123-secret, aws  OK
  osdManagedAdmin-abc123   AKIA...YYYY      2h     2h ago     Active   (not referenced by this cluster)   --
  osdCcsAdmin              AKIA...ZZZZ      59m    N/A        Active   byoc                               OK
  osdCcsAdmin              AKIA...WWWW      2h     N/A        Active   (not referenced by this cluster)   --

==========================================================================

 Summary:
 [OK] osdManagedAdmin-abc123 has adequate permissions to fulfill CRs.
 [OK] osdCcsAdmin has adequate permissions.
 [OK] SCP allows all required OSD CCS services.
 [OK] Cluster root credential matches Hive — CR secrets are current.
 [INFO] osdManagedAdmin-abc123 has 2 access keys (max 2). Review which key should be deleted before rotation.
 [INFO] osdCcsAdmin has 2 access keys (max 2). Review which key should be deleted before rotation.
 No issues found during pre-rotation checks.
==========================================================================
osdctl account aws-creds rotate --dry-run
$ osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" \
    --hive-ocm-url prod --managed-admin --ccs-admin --dry-run

  ... (snapshot output as above) ...

INFO Dry-run mode — no changes will be made

[Dry Run] Would create a new IAM access key for user: osdManagedAdmin-abc123
[Dry Run] [OK] AWS: ListAccessKeys for osdManagedAdmin-abc123
[Dry Run] Would list access keys and report old keys to remove
[Dry Run] Would update secret aws-account-operator/osd-creds-mgmt-abc123-secret with new credentials
[Dry Run] [OK] Hive: GET secret aws-account-operator/osd-creds-mgmt-abc123-secret
[Dry Run] [OK] Hive: auth can-i update secrets in aws-account-operator
[Dry Run] Would update secret uhc-production-abc123/aws with new credentials
[Dry Run] [OK] Hive: auth can-i update secrets in uhc-production-abc123
[Dry Run] Would create SyncSet uhc-production-abc123/aws-sync to sync credentials to cluster
[Dry Run] [OK] Hive: LIST ClusterDeployments in uhc-production-abc123
[Dry Run] [OK] Hive: auth can-i create syncsets in uhc-production-abc123
[Dry Run] [OK] Managed cluster: LIST CredentialRequests (all namespaces)
[Dry Run] Would delete secret openshift-cluster-csi-drivers/ebs-cloud-credentials (CredentialRequest aws-ebs-csi-driver-operator)
[Dry Run] [OK] Managed cluster: GET secret openshift-cluster-csi-drivers/ebs-cloud-credentials
[Dry Run] [OK] Managed cluster: auth can-i delete secrets in openshift-cluster-csi-drivers
  ... (9 credential secrets verified) ...
[Dry Run] Would delete 9 credential secret(s) total

[Dry Run] Would create a new IAM access key for user: osdCcsAdmin
[Dry Run] [OK] AWS: ListAccessKeys for osdCcsAdmin
[Dry Run] Would update secret uhc-production-abc123/byoc with new osdCcsAdmin credentials
[Dry Run] [OK] Hive: auth can-i update secrets in uhc-production-abc123

[Dry Run] OK All pre-flight checks passed. No changes were made.
INFO Credential rotation completed successfully
osdctl account aws-creds rotate --managed-admin --ccs-admin
$ osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" \
    --hive-ocm-url prod --managed-admin --ccs-admin

  ... (snapshot output) ...

Proceed with credential rotation?
Continue? (y/N): y
INFO Starting credential rotation

==========================================================================
 Phase 1: Rotating osdManagedAdmin-abc123 credentials
==========================================================================

[WARN] IAM user osdManagedAdmin-abc123 already has the maximum number of access keys (2).
A key must be deleted before a new one can be created.
Review the keys below and select which one to delete:

  IAM User: osdManagedAdmin-abc123
  #    KEY ID                   AGE    LAST USED  STATUS   REF'd by HIVE SECRET(S)            SYNC
  ---- ------------------------ ------ ---------- -------- ---------------------------------- ------
  1    AKIA...XXXX              1m     N/A        Active   osd-creds-mgmt-abc123-secret, aws  OK
  2 *  AKIA...YYYY              1h     1h ago     Active   (not referenced by this cluster)   --

  * Key #2 is not referenced by this cluster and is the recommended candidate for deletion.

Enter the number of the key to delete (or 'q' to quit): 2
[OK] Deleted access key AKIA...YYYY
Creating new access key...

Access keys for IAM user osdManagedAdmin-abc123:
  - AKIA...XXXX (old - should be removed)
  - AKIA...ZZZZ (new - just created)

INFO Updating AAO account secret on Hive
INFO Updating cluster namespace secret on Hive
AWS creds updated on hive.
INFO Syncing credentials to cluster via SyncSet
Sync completed...
Successfully rotated access keys for osdManagedAdmin-abc123
INFO Deleting credential secrets so CCO recreates them
Deleting credential secrets one at a time, verifying CCO recreates each before continuing...

  [1/9] CredentialRequest: aws-ebs-csi-driver-operator
    Deleted secret openshift-cluster-csi-drivers/ebs-cloud-credentials
    Waiting for CCO to recreate secret (first secret — validating CCO health)...
    [OK] Secret openshift-cluster-csi-drivers/ebs-cloud-credentials recreated by CCO

  [2/9] CredentialRequest: cloud-credential-operator-iam-ro
    [OK] Secret recreated by CCO
  ... (9/9 all recreated successfully) ...

Deleted and verified 9 credential secret(s).

==========================================================================
 Phase 2: Rotating osdCcsAdmin credentials
==========================================================================

Updating Hive secret uhc-production-abc123/byoc with new osdCcsAdmin credentials...
[OK] Updated Hive secret uhc-production-abc123/byoc
Successfully rotated credentials for osdCcsAdmin

Post-rotation: verifying cluster health...

AWS Credentials
  USER                     KEY ID           AGE    LAST USED  STATUS   REF'd by HIVE SECRET(S)            SYNC
  ------------------------ ---------------- ------ ---------- -------- ---------------------------------- ------
  osdManagedAdmin-abc123   AKIA...XXXX      1h     N/A        Active   (not referenced by this cluster)   --
  osdManagedAdmin-abc123   AKIA...ZZZZ      0m     N/A        Active   osd-creds-mgmt-abc123-secret, aws  OK
  osdCcsAdmin              AKIA...AAAA      0m     N/A        Active   byoc                               OK
  osdCcsAdmin              AKIA...BBBB      2h     2h ago     Active   (not referenced by this cluster)   --

  [OK] Hive secret aws-account-operator/osd-creds-mgmt-abc123-secret contains the new key
  [OK] Hive secret uhc-production-abc123/aws contains the new key
  [OK] Hive secret uhc-production-abc123/byoc contains the new key
  [OK] Post-rotation checks passed.

  Suggested follow-up steps:
  - Remove old access key(s) from the AWS account
  - Verify CCO is healthy:
      oc logs -n openshift-cloud-credential-operator deploy/cloud-credential-operator --tail=50
  - Verify credential secrets are recreated:
      oc get credentialsrequests -A -o wide
INFO Credential rotation completed successfully

Test plan

  • make lint — 0 issues
  • make format — clean
  • go build — compiles
  • Unit tests — 40 controller tests (26 awscreds + 14 rotatesecret), all passing
  • 13 integration tests (expect script) covering snapshot, rotate, dry-run, CCS-only, cancellation, validation
  • Tested on staging CCS cluster with --hive-ocm-url prod

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added osdctl account aws-creds with rotate (interactive/full or refresh-only, dry-run, force) and snapshot (read-only diagnostics, optional CR-only output).
  • Deprecation
    • osdctl account rotate-secret marked deprecated; use osdctl account aws-creds rotate.
  • Documentation
    • New and updated docs for aws-creds, aws-creds rotate, and aws-creds snapshot with examples and options.
  • Tests
    • Expanded unit tests covering diagnostics, rotation flows, dry-run, CCS cases, and rendering.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 6, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 6, 2026

Copy link
Copy Markdown

@nephomaniac: This pull request references ROSAENG-2066 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Adds osdctl account aws-creds with two subcommands:

  • snapshot — read-only diagnostic report of AWS IAM credentials, Hive secrets, CredentialRequests, and IAM permission simulation
  • rotate — credential rotation for osdManagedAdmin and/or osdCcsAdmin with pre-flight checks, interactive key management, sequential CR secret recreation with CCO health verification, and post-rotation validation

The existing rotate-secret command remains functional with a runtime warning pointing users to aws-creds.

Resolves: ROSAENG-2066, ROSAENG-2067

Extends prior work from: SREP-726 (SimulatePrincipalPolicy), ROSAENG-2119 (multi-env OCM)

Change Scope

Category Files Lines Added Lines Removed
New Go files 5 (aws-creds.go, aws-creds-rotate.go, aws-creds-snapshot.go, awscreds.go, awscreds_test.go) 3,296 0
Edited Go files 4 (cmd.go, rotate-secret.go, rotatesecret.go, rotatesecret_test.go) 1,024 210
New docs (md) 3 (aws-creds.md, aws-creds_rotate.md, aws-creds_snapshot.md) 180 0
Edited docs (md) 3 (README.md, osdctl_account.md, rotate-secret.md) 132 27
Total 15 files 4,791 253

Key Features

  • Cluster-ID-first UX: accepts -C <cluster-id> instead of requiring account CR name
  • Backplane credentials by default: no --aws-profile needed, uses OCM/backplane for AWS access
  • Diagnostic-first: default snapshot mode shows full read-only report before any mutation
  • Two-part IAM permission simulation: checks both rotation tooling actions AND all CredentialRequest actions via SimulatePrincipalPolicy, with per-CR attribution
  • CCS/BYOC support: separate osdCcsAdmin checks including SCP coverage validation against OSD CCS minimum requirements
  • Independent CCS-only path: --ccs-admin without --managed-admin skips all managed-admin artifacts and permissions
  • Interactive key management: when max keys (2) reached, shows key table with Hive secret references and recommends which to delete
  • Sequential CR secret recreation: deletes one secret at a time, verifies CCO recreates (UID-based) before proceeding, with timeout/pause/manual-completion flow
  • Pre-flight and post-rotation validation: verifies client connections, secret existence, RBAC (auth can-i), and post-rotation key/secret sync
  • State tracking on failure: reports exactly which steps completed before a failure, with re-run safety guidance
  • --refresh-secrets: delete/recreate CR secrets without rotating keys
  • --cr-secrets on snapshot: fast CR-only view using Hive secret timestamp for staleness (no AWS client needed)
  • Reason validation: extracts Jira tickets and PD incidents from --reason, cross-references with cluster
  • Colored output: green/yellow/red/blue status indicators, structured logging to stderr via logrus
  • --hive-ocm-url: multi-environment OCM support for staging/integration testing

Usage

# Read-only credential status report
osdctl account aws-creds snapshot -C $CLUSTER_ID --reason "$TICKET"

# Quick CR secret status (no AWS client needed)
osdctl account aws-creds snapshot -C $CLUSTER_ID --reason "$TICKET" --cr-secrets

# Rotate osdManagedAdmin credentials
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin

# Rotate both users
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin --ccs-admin

# Rotate CCS admin only (no managed-admin dependency)
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --ccs-admin

# Refresh CR secrets only (no key rotation)
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --refresh-secrets

# Dry-run
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin --dry-run

Test plan

  • make lint — 0 issues
  • make format — clean
  • go build — compiles
  • Unit tests — 40 controller tests (26 awscreds + 14 rotatesecret), all passing
  • snapshot --cr-secrets — lightweight path, no AWS client
  • snapshot — full diagnostic report
  • rotate --managed-admin --dry-run — pre-flight checks only
  • rotate --ccs-admin --dry-run — CCS-only preflight (no managed-admin checks)
  • rotate --managed-admin --ccs-admin --dry-run — both preflight checks
  • rotate --refresh-secrets — CR secret deletion/recreation only
  • rotate --managed-admin — single user rotation
  • rotate --ccs-admin — CCS-only rotation (no managed-admin dependency)
  • rotate --managed-admin --ccs-admin — both users with Phase 1/2 banners
  • User cancellation at rotation and refresh prompts — graceful exit
  • Validation: rotate with no flags → error
  • Validation: --refresh-secrets --managed-admin → error
  • Tested on staging CCS cluster with --hive-ocm-url prod

Follow-up items

  • Update credential rotation SOPs to reference aws-creds instead of rotate-secret
  • Investigate SimulatePrincipalPolicy target principal (currently simulates against osdManagedAdmin, rotation tooling runs as assumed role)
  • Add CCO log health check to post-rotation validation

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an aws-creds CLI with snapshot and rotate subcommands, a DiagnosticReport library, rotation controller refactors for interactive key creation/retry and credential-secret deletion polling, extensive tests, and documentation updates.

Changes

AWS Credential Diagnostics and Rotation

Layer / File(s) Summary
CLI root and shared options
cmd/account/aws-creds.go
Top-level aws-creds command, cluster/account resolution, reason parsing, Hive/AWS client construction, and mapping to controller.AWSCredsInput.
Rotate command & orchestration
cmd/account/aws-creds-rotate.go
Implements aws-creds rotate: flags validation, pre-rotation diagnostic snapshot, strict-YES gating on permission failures, dry-run/confirm flows, and delegating to controller.RotateSecret or credential-request refresh logic.
Snapshot command
cmd/account/aws-creds-snapshot.go
Implements aws-creds snapshot: runs DiagnoseCredentials or DiagnoseCRSecrets and renders reports or CredentialRequest-only tables.
Command registration & deprecation
cmd/account/cmd.go, cmd/account/rotate-secret.go
Registers aws-creds subcommand and updates rotate-secret help to reference aws-creds (plus stderr logging tweak).
Documentation
docs/*
Adds aws-creds docs and examples, updates README and rotate-secret docs with SEE ALSO references and usage examples.
Diagnostic library
pkg/controller/awscreds.go
Adds DiagnosticReport model and functions: Hive secret checks, IAM access-key diagnostics, CredentialRequest secret diagnostics, SimulatePrincipalPolicy permission simulation, and human-readable report rendering.
Diagnostic tests
pkg/controller/awscreds_test.go
Tests covering healthy/failing diagnostics, CR-secret staleness, permission simulation behavior, and report rendering.
Rotation controller updates
pkg/controller/rotatesecret.go
Refactors RotateSecret API (diagnostic input, flags), adds createAccessKeyWithRetry (fallbacks/limit handling), preflight/post-rotation checks, interactive selection helpers, and rewritten credential-secret deletion with dry-run RBAC checks and polling.
Rotation controller tests
pkg/controller/rotatesecret_test.go
Updated and new tests: caller-identity mocking, relaxed AWS mock expectations, CCS-only flows, credential-secret deletion/polling timeouts, and post-rotation validation cases.
AWS provider client & mocks
pkg/provider/aws/*
Adds GetAccessKeyLastUsed to AWS Client interface and implementation and updates generated mock to include the method.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • clcollins

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Line 284 in rotatesecret.go outputs the full untruncated AWS access key ID in a manual cleanup instruction printed to stdout: `fmt.Fprintln(input.Out, " Manual cleanup required: aws iam delete-acce... Truncate the newKeyID in the manual cleanup message using truncateKeyID(newKeyID) on line 284, consistent with the same pattern used on lines 270, 282, and 283 in the same function.
Docstring Coverage ⚠️ Warning Docstring coverage is 39.66% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically summarizes the main change: adding the 'osdctl account aws-creds' command group with snapshot and rotate subcommands for AWS credential management.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed The PR uses standard Go testing framework (not Ginkgo). All 61 test functions in awscreds_test.go and rotatesecret_test.go use stable, deterministic names with no dynamic content (UUIDs, timestamps...
Test Structure And Quality ✅ Passed The PR's test files (awscreds_test.go, rotatesecret_test.go) demonstrate good test structure: proper use of defer cleanup for mocks (defer ctrl.Finish()), helper functions marked with t.Helper(), g...
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All test files use Go's standard testing package (not Ginkgo framework), making the MicroShift compatibility check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests were added. PR adds only standard Go unit tests (pkg/controller/*_test.go) using testing/testify/gomock, not Ginkgo framework. Custom check is inapplicable to this CLI tool repo...
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CLI tooling and controller logic for AWS credential diagnostics/rotation—no deployment manifests, pod specs, or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed This PR adds CLI commands to osdctl, not an OTE test binary. No process-level stdout writes detected; all output correctly uses IOStreams via fmt.Fprintf or os.Stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added. PR adds standard Go unit tests (TestDiagnoseCredentials_, TestRotateSecret_) using testify/assert framework, not Ginkgo e2e patterns (It/Describe/Context/When).
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comparisons found in the PR code.
Container-Privileges ✅ Passed No Kubernetes manifests, container specifications, or Dockerfiles present in this PR; changes are entirely Go source code and documentation for a CLI tool.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from dustman9000 and sam-nguyen7 June 6, 2026 02:11
@openshift-ci

openshift-ci Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nephomaniac
Once this PR has been reviewed and has the lgtm label, please assign petrkotas for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2026
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/account/aws-creds-rotate.go`:
- Around line 60-63: The code currently discards Cobra's cancellation context by
creating context.TODO() in callers; update the signatures of runRotate and
runRefreshSecrets to accept a context.Context parameter (e.g., func
runRotate(ctx context.Context, ops *... ) error), replace any use of
context.TODO() with the passed ctx (cmd.Context()) at each call site, and thread
that ctx through into DeleteCredentialSecrets and any AWS/Kubernetes or polling
functions so they all honor cancellation/timeouts; ensure callers in the Cobra
Run function call runRotate(cmd.Context(), ops) and propagate ctx through
intermediate helpers.

In `@cmd/account/aws-creds-snapshot.go`:
- Around line 45-48: The snapshot flow currently uses context.TODO() and needs a
cancelable timed context: obtain cmd.Context(), wrap it with context.WithTimeout
(choose a reasonable timeout such as 30s) and defer cancel; pass that ctx into
runSnapshot instead of using TODO, and propagate it into all backend calls by
updating signatures and call sites for runSnapshot, identifyCluster, any
resolve* functions and Diagnose* functions so they accept context.Context and
use it for OCM/Hive/managed-cluster calls; ensure every function that performs
network/backend operations takes and forwards the ctx to enable cancellation and
timeouts.

In `@cmd/account/aws-creds.go`:
- Around line 661-668: After fetching the Account CR with GetAWSAccount (where
rc.account and rc.accountID are set), add the same ManualSTSMode guard used
earlier (the check around ManualSTSMode/DiganoseCRSecrets path) to reject
STS-mode accounts for the --cr-secrets flow: inspect account.Spec.Mode (the
ManualSTSMode enum/constant) and return an error if it equals the ManualSTSMode
value so that DiagnoseCRSecrets is not run for STS clusters and the misleading
missing-claim-secret warning/report is avoided.

In `@pkg/controller/awscreds.go`:
- Around line 327-331: The remediation command in the appended Finding is
outdated; update the Guidance string in the report.Findings append so it uses
the new aws-creds rotate subcommand. Replace the current guidance
fmt.Sprintf("... aws-creds %s --reason <ticket> --rotate-managed-admin",
report.ClusterID) with the rotate form, e.g. fmt.Sprintf("Credentials are out of
sync. Rotate to fix:\n       osdctl account aws-creds rotate --name %s --reason
<ticket> --rotate-managed-admin", report.ClusterID), referencing the same
report.ClusterID in the fmt.Sprintf call used in the report.Findings append.
- Around line 983-1021: The current success message uses optimistic defaults
(rotationPermsOK and credReqPermsOK initialized true) and can report [OK] when
no permissions were evaluated; update the success gating to require actual
evaluation: either (A) use report.AllPermissionsOK instead of rotationPermsOK &&
credReqPermsOK, or (B) add explicit evaluation flags (e.g., rotationEvaluated
and credReqEvaluated set to true inside the switch when encountering "rotation"
and "credreq") and change the success condition to rotationPermsOK &&
credReqPermsOK && rotationEvaluated && credReqEvaluated; also ensure the failure
branches still report missing categories when evaluated false.
- Around line 372-420: The code currently uses Secret.CreationTimestamp to mark
a CredentialRequest secret stale; instead, find the active managed admin key ID
from report.Keys (the same place you already derive the Hive key) and compare
that key ID to the CR secret's aws_access_key_id value. Replace the age-based
logic around managedAdminKeyAge and the block that sets crs.Age/NeedsRecreation
with: extract the active key ID into a variable (e.g. managedAdminKeyID) from
report.Keys (the entry with HiveMatch && UserName != "osdCcsAdmin"), read
secret.Data["aws_access_key_id"] (convert to string), and set
crs.NeedsRecreation = true when they differ (set Exists/Err fields unchanged);
apply the same replacement for the other occurrence mentioned (lines ~728-779)
so both paths compare access key IDs instead of creation times.

In `@pkg/controller/rotatesecret.go`:
- Around line 88-96: The controller struct exposes Out but lacks an In reader,
so createAccessKeyWithRetry() uses fmt.Scanln (process stdin) and ignores
injected IOStreams.In; add an In io.Reader field to the controller (alongside
Report, Log, Out) and update createAccessKeyWithRetry() to read prompts from
that In reader (e.g., using fmt.Fscanln or bufio.Reader against the
controller.In) instead of calling fmt.Scanln so automated tests and injected CLI
IOStreams work correctly.
- Around line 131-135: The managed-admin rotation path is re-running
VerifyRotationPermissions via resolveAdminUsername and blocks the operator's
strict-YES override; fix by threading the precomputed permission simulation
result (or an explicit "operator confirmed despite denied simulation" boolean)
into RotateSecret and resolveAdminUsername so you don't re-run the simulation.
Update the RotateSecret signature to accept the precomputed report/confirmation
flag and change callers (including where adminUsername is resolved) to pass that
value, then alter resolveAdminUsername (or add a wrapper) to skip calling
VerifyRotationPermissions when the confirmation flag/report indicates the
operator already overrode the denial. Ensure references to resolveAdminUsername,
RotateSecret, adminUsername, input.AwsClient and accountIDSuffixLabel are used
to locate and update the logic.
- Around line 235-243: The deferred cleanup currently ignores the return value
of input.AwsClient.DeleteAccessKey (inside the deferred closure that checks
rotationCommitted and uses adminUsername/newKeyID), which can leave an orphaned
key without surfacing the cleanup failure; update that defer to capture the
DeleteAccessKey error, log it at error level with context (including
adminUsername and newKeyID) and propagate it so the caller/operator is notified
— e.g., record the cleanup error into the function's named error return (or
wrap/append it to the primary error) rather than discarding it, ensuring
DeleteAccessKey's error is never ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a88be861-94fb-4999-b132-7907e53cd407

📥 Commits

Reviewing files that changed from the base of the PR and between 486780d and 3ec12df.

📒 Files selected for processing (15)
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds-snapshot.go
  • cmd/account/aws-creds.go
  • cmd/account/cmd.go
  • cmd/account/rotate-secret.go
  • docs/README.md
  • docs/osdctl_account.md
  • docs/osdctl_account_aws-creds.md
  • docs/osdctl_account_aws-creds_rotate.md
  • docs/osdctl_account_aws-creds_snapshot.md
  • docs/osdctl_account_rotate-secret.md
  • pkg/controller/awscreds.go
  • pkg/controller/awscreds_test.go
  • pkg/controller/rotatesecret.go
  • pkg/controller/rotatesecret_test.go

Comment thread cmd/account/aws-creds-rotate.go
Comment thread cmd/account/aws-creds-snapshot.go
Comment thread cmd/account/aws-creds.go
Comment thread pkg/controller/awscreds.go
Comment thread pkg/controller/awscreds.go Outdated
Comment thread pkg/controller/awscreds.go
Comment thread pkg/controller/rotatesecret.go
Comment thread pkg/controller/rotatesecret.go Outdated
Comment thread pkg/controller/rotatesecret.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 3ec12df to 0d45d32 Compare June 6, 2026 03:05
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

1 similar comment
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

1 similar comment
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 0d45d32 to 809f194 Compare June 6, 2026 03:39

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/rotatesecret.go`:
- Around line 792-796: The confirmation prompt currently calls
utils.ConfirmPrompt() which reads from os.Stdin and ignores the injected
RotateSecretInput.In; change ConfirmPrompt (or add a new helper like
ConfirmPromptWithIO) to accept an io.Reader (and optionally an io.Writer) and
read the response from that reader (use bufio.Reader, TrimSpace, check first
rune for 'y'/'Y' and treat other input/EOF as false). Then update the call site
in RotateSecret (the fmt.Fprintf prompt and the ConfirmPrompt call) to pass
RotateSecretInput.In (and RotateSecretInput.Out) so tests/automation can inject
input; ensure the function returns false on EOF or non-affirmative input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ee68c51a-fcba-4cfd-b1b3-ef00378cd2e6

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec12df and 809f194.

📒 Files selected for processing (15)
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds-snapshot.go
  • cmd/account/aws-creds.go
  • cmd/account/cmd.go
  • cmd/account/rotate-secret.go
  • docs/README.md
  • docs/osdctl_account.md
  • docs/osdctl_account_aws-creds.md
  • docs/osdctl_account_aws-creds_rotate.md
  • docs/osdctl_account_aws-creds_snapshot.md
  • docs/osdctl_account_rotate-secret.md
  • pkg/controller/awscreds.go
  • pkg/controller/awscreds_test.go
  • pkg/controller/rotatesecret.go
  • pkg/controller/rotatesecret_test.go
✅ Files skipped from review due to trivial changes (2)
  • docs/osdctl_account.md
  • docs/osdctl_account_rotate-secret.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • cmd/account/rotate-secret.go
  • cmd/account/aws-creds-snapshot.go
  • pkg/controller/awscreds_test.go
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds.go
  • pkg/controller/awscreds.go

Comment thread pkg/controller/rotatesecret.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 809f194 to c899f9c Compare June 6, 2026 04:00
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from c899f9c to fe26067 Compare June 6, 2026 04:26
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2026

@bergmannf bergmannf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction - I think the main issue for me that we don't want to keep an outdated command around and just remove it completely.

Also seems there is some duplication between the two controllers which should be removed.

Comment thread cmd/account/aws-creds-rotate.go Outdated
without rotating AWS keys or modifying Hive secrets. This is useful when
CCO needs to re-provision secrets with existing credentials.

AWS credentials are obtained via backplane by default. Use --aws-profile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AWS credentials are obtained via backplane by default. Use --aws-profile
AWS credentials are obtained via backplane by default. Use --aws-profile

I think the local profiles aren't really used anymore with rh-aws-saml-login.

So the command also has to work without profile and without backplane (which would be the call after logging in via rh-aws-saml-login).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Addressed — added --aws-use-env flag that reads AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY/AWS_SESSION_TOKEN from environment variables directly (bypasses backplane and the proxy-wrapped default credential chain). Help text now includes the full kinitrh-aws-saml-login --output envosdctl --aws-use-env workflow.

Comment on lines +205 to +208
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}

Should we really give this option at all instead of stopping immediately?

Any issue will likely result in one or multiple operators on the cluster breaking after the rotation - maybe we should instead call out opening a proactive case / responding in-case with the problems first?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Addressed — permission failures now hard-stop rotation by default. --force unlocks a strict YES confirmation prompt. Useful when SCP restrictions report false positives, during staging testing with non-production IAM configs, or when investigating known permission bugs.

Comment thread cmd/account/aws-creds.go Outdated
}

var (
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)

Not a big issue, but this might report some false positives by accident.

"Rotating cluster D-1" would match this one but not be a ticket - I think we could just match for OHSS-*.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Narrowed to known project prefixes: (OHSS|SREP|ROSAENG|OSD|SDE|CSSRE|MNGT)-\d+.

Comment thread cmd/account/rotate-secret.go Outdated
Comment on lines +36 to +41
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy

The new command is a complete superset - I think it would be better to just remove this one and ensure that only the new command is referenced in any SOPs that might use it.

Keeping these old commands tends to be something that is forgotten to be removed and there seems to be no reason to keep a worse version around.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Now uses Cobra's Deprecated field — hidden from --help, prints deprecation warning when invoked, points to aws-creds rotate. Hard removal in a follow-up PR after SOP migration.

@@ -0,0 +1,1171 @@
package controller

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package controller
package controller

Thanks for sticking with the controller having (most) logic - it's something we try to move osdctl to, so we don't have all logic in the cobra commands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! All diagnostic and rotation logic stays in pkg/controller/. The CLI layer in cmd/account/ only handles flag parsing, cluster resolution, and client setup.

Comment thread pkg/controller/awscreds.go Outdated

// isAWSCredReqForDiag is a broader filter for diagnostics — includes all
// CredentialRequests with AWSProviderSpec regardless of name prefix.
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {

I think the function name can be 'isAWSCredentialRequestForDiagnostic' - the current one sounds a bit C-like.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Renamed to isAWSCredentialRequestForDiagnostic.

Comment thread pkg/controller/awscreds.go Outdated
Comment on lines +499 to +510
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}

Duplicating this list between this controller and the older rotate-secret one.

Let's extract this to a shared const so we can't accidentally forgot to update one occurrence if ever needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Extracted to RotationRequiredActions package-level var in rotatesecret.go. Referenced by both diagnosePermissions and VerifyRotationPermissions.

}

// simulateActions runs SimulatePrincipalPolicy for a set of IAM actions and records allow/deny results in the report.
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {

This one should probably replace the function in the other controller (or vice versa):

func VerifyRotationPermissions(out io.Writer, awsClient awsprovider.Client, accountID string, username string) error {

No need to have two functions doing the same thing really.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifyRotationPermissions now delegates to simulateActions internally. Single implementation, no duplication.

Comment thread pkg/controller/awscreds.go Outdated
fmt.Fprintf(out, " CR secret staleness detection will be skipped.\n\n")
} else {
if !accountSecret.CreationTimestamp.IsZero() {
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)

The secrets are/were update with updateSecret which would not update the CreationTimestamp so I think this will not work as intended.

An update operation will not update creationTimestamp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Good catch. Staleness detection now reads kube-system/aws-creds on the managed cluster and compares its aws_access_key_id against the Hive account secret. If they match, root credentials are synced and CR secrets are CURRENT. No timestamp comparison. Added 20 unit tests covering all edge cases (missing root secret, read errors, both keys empty, rendering correctness).


// renderKeySelectionTable prints an interactive numbered table of IAM keys with
// Hive secret references, highlighting the recommended deletion candidate.
func renderKeySelectionTable(keys []KeyStatus, report *DiagnosticReport, out io.Writer) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a useful piece of information to decide which key to remove is generally seeing when it was last used ( https://docs.aws.amazon.com/cli/latest/reference/iam/get-access-key-last-used.html)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Added GetAccessKeyLastUsed to the AWS client interface. LAST USED column shown in both the snapshot credentials table and the interactive key deletion table. The recommended deletion candidate (* marker) considers Hive references and last-used time.

@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from fe26067 to 05f812f Compare June 10, 2026 20:36
@nephomaniac

Copy link
Copy Markdown
Contributor Author

Response to review feedback

Good direction - I think the main issue for me that we don't want to keep an outdated command around and just remove it completely.

Agreed. PR has been updated to move rotate-secret from a logged warning to additionally using Cobra's Deprecated field — it's now hidden (i.e. from --help), prints a deprecation warning when invoked, and points users to aws-creds rotate. We can follow up with the removal in a separate PR after the SOP migration is complete and users have had time to switch.

Also seems there is some duplication between the two controllers which should be removed.

Addressed. The role-chain logic is consolidated into roleChainToSupport(), called by all three credential paths (--aws-profile, --aws-use-env, and the backplane fallback). VerifyRotationPermissions now delegates to simulateActions rather than duplicating it. The shared RotationRequiredActions constant is used by both the diagnostic and rotation code.

Detailed items addressed:

  1. Cobra deprecationrotate-secret has Deprecated field set. Hidden from help, prints warning when used.
  2. Staleness detection — Staleness detection now compares the aws_access_key_id in kube-system/aws-creds on the managed cluster against the Hive account secret. If they match, the cluster's root credentials are synced and CR secrets are marked CURRENT. If they differ (or the root secret is missing/unreadable), CR secrets are marked NEEDS REFRESH. The previous approach compared per-operator IAM user keys against the root key, which could never match since CCO creates separate IAM users for each CredentialRequest. Added 20 unit tests covering root secret edge cases, rendering correctness, and error paths.
  3. AWS credential paths — Three paths supported: backplane (default), --aws-profile (named profile), --aws-use-env (env vars from rh-aws-saml-login). Backplane falls back to default credential chain on failure. Help text includes the full kinitrh-aws-saml-loginosdctl --aws-use-env workflow.
  4. --force flag — Permission failures (detected via SimulatePrincipalPolicy) hard-stop rotation by default. --force unlocks a YES confirmation prompt to proceed despite failures. This is useful when SCP restrictions report false positives (e.g., a policy denies an action the rotation doesn't actually need), during testing against staging clusters with non-production IAM configurations, or when investigating known permission bugs where the operator can confirm the errors are benign.
  5. Key last-used time — Added GetAccessKeyLastUsed to the AWS client interface. LAST USED column shown in both the snapshot credentials table and the interactive key deletion table.
  6. Shared rotation actionsRotationRequiredActions is a package-level var in rotatesecret.go, referenced by both diagnosePermissions and VerifyRotationPermissions.
  7. Unified permission simulationVerifyRotationPermissions now calls simulateActions internally.
  8. Secret error handling helper — Deferred to follow-up PR.
  9. Renamed isAWSCredReqForDiagisAWSCredentialRequestForDiagnostic.
  10. Narrowed Jira regex to known project prefixes: (OHSS|SREP|ROSAENG|OSD|SDE|CSSRE|MNGT)-\d+.
  11. GetAccessKeyLastUsed added to Client interface and mock regenerated.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

Thanks @bergmannf for the thorough review — the staleness detection fix and --aws-use-env support in particular came directly from your feedback. Requesting re-review when you have a chance.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/account/aws-creds-rotate.go`:
- Around line 128-137: When handling o.refreshSecrets after resolveForCRSecrets
and DiagnoseCRSecrets, check the returned report for a missing/unreadable
cluster root credential (e.g., report.ClusterRootKeyID or equivalent field) and
fail closed instead of proceeding: if the root key id is empty or indicates
unreadable/missing state, return an error (or print a clear message) and do not
call controller.RenderCredRequestTable or
runRefreshSecrets/DeleteCredentialSecrets; apply the same guard to the other
refresh-secrets branch that mirrors this logic (the block using
resolveForCRSecrets, DiagnoseCRSecrets, controller.RenderCredRequestTable, and
runRefreshSecrets/DeleteCredentialSecrets).

In `@pkg/controller/awscreds.go`:
- Around line 543-550: extractCredReqActions can swallow ManagedClient.List
errors, causing diagnosePermissions to skip CredentialRequest simulation; change
extractCredReqActions (and similarly extractRotationActions) to return
(map[string][]CredentialRequest, error) so listing errors are bubbled up, then
in diagnosePermissions call them, check for non-nil error and either return that
error or set report.AllPermissionsOK = false and return an error; ensure
simulateActions is only skipped when there is a valid action map, and include
references to extractCredReqActions, extractRotationActions,
diagnosePermissions, simulateActions, and ManagedClient.List when making the
change.

In `@pkg/provider/aws/client.go`:
- Line 66: The GetAccessKeyLastUsed method currently discards context and calls
the SDK with context.TODO(); change the method signature on the client interface
and AwsClient implementation to accept a context.Context (i.e.,
GetAccessKeyLastUsed(ctx context.Context, input *iam.GetAccessKeyLastUsedInput)
(*iam.GetAccessKeyLastUsedOutput, error)), update AwsClient.GetAccessKeyLastUsed
to pass that ctx through to c.iamClient.GetAccessKeyLastUsed(ctx, input), and
update any callers (e.g., pkg/controller/awscreds.go) to supply and propagate
the caller's context when invoking AwsClient.GetAccessKeyLastUsed so
cancellations/timeouts are honored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fb7e59fb-0c66-46e3-ab01-02c2cd7eaaff

📥 Commits

Reviewing files that changed from the base of the PR and between fe26067 and 05f812f.

📒 Files selected for processing (17)
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds-snapshot.go
  • cmd/account/aws-creds.go
  • cmd/account/cmd.go
  • cmd/account/rotate-secret.go
  • docs/README.md
  • docs/osdctl_account.md
  • docs/osdctl_account_aws-creds.md
  • docs/osdctl_account_aws-creds_rotate.md
  • docs/osdctl_account_aws-creds_snapshot.md
  • docs/osdctl_account_rotate-secret.md
  • pkg/controller/awscreds.go
  • pkg/controller/awscreds_test.go
  • pkg/controller/rotatesecret.go
  • pkg/controller/rotatesecret_test.go
  • pkg/provider/aws/client.go
  • pkg/provider/aws/mock/client.go
✅ Files skipped from review due to trivial changes (1)
  • docs/osdctl_account_rotate-secret.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • cmd/account/aws-creds-snapshot.go
  • docs/osdctl_account.md
  • pkg/controller/rotatesecret_test.go
  • cmd/account/aws-creds.go
  • pkg/controller/rotatesecret.go

Comment thread cmd/account/aws-creds-rotate.go
Comment thread pkg/controller/awscreds.go Outdated
Comment thread pkg/provider/aws/client.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch 5 times, most recently from ee34e8d to d69a726 Compare June 10, 2026 22:12
@clcollins

clcollins commented Jun 10, 2026

Copy link
Copy Markdown
Member

cmd/account/rotate-secret.go:166 — The RotateSecretInput constructed here does not set In, so it defaults to nil. It also does not set UpdateManagedAdminCreds, so it defaults to false.

RotateSecret in pkg/controller/rotatesecret.go gates the managed-admin rotation block (line 254) on input.UpdateManagedAdminCreds. With this field false, the old rotate-secret command skips managed-admin key rotation entirely.

Separately, input.In being nil reaches confirmFrom(input.In) at rotatesecret.go:820 (via preflightCheckArtifacts) when --ccs is passed on a BYOC account and the byoc secret is not found. It also reaches fmt.Fscanln(in, &choice) at rotatesecret.go:435 (via rotateCcsAdminCredentialscreateAccessKeyWithRetry) when --ccs is passed and the IAM user has 2 access keys. fmt.Fscanln(nil, ...) panics with a nil pointer dereference.

^ Claude


Chris, the human: I think that's right, if I follow along through that path (the nil pointer bits)

@clcollins

clcollins commented Jun 10, 2026

Copy link
Copy Markdown
Member

pkg/controller/rotatesecret.go:973rotateCcsAdminCredentials calls createAccessKeyWithRetry to create a new osdCcsAdmin access key. If updateSecret at line 994 fails, the newly created key remains in AWS IAM with no cleanup.

The managed-admin path (lines 272-286) has a defer that deletes the new key if rotationCommitted is false. rotateCcsAdminCredentials has no equivalent defer. The caller at line 346 has no reference to the CCS key ID and cannot delete it on error.

The full key ID is not printed — reportAccessKeys at line 983 uses truncateKeyID, which shows only the first 4 and last 4 characters. An SRE would need to list keys for osdCcsAdmin in the AWS IAM console to find the orphaned key.

^ Also Claude

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from d69a726 to 7850b58 Compare June 11, 2026 00:04
@nephomaniac

Copy link
Copy Markdown
Contributor Author

All three fixed in 7850b58 — thanks Chris (and Claude) for the thorough review.

  1. UpdateManagedAdminCreds: true added to rotate-secret's RotateSecretInput. Without this the deprecated command silently skipped managed-admin rotation.
  2. In: os.Stdin added to prevent nil pointer dereference when fmt.Fscanln or confirmFrom is called during CCS rotation or preflight checks.
  3. CCS key rollback defer added to rotateCcsAdminCredentials. If updateSecret fails after key creation, the new key is deleted with the full key ID logged. Matches the existing managed-admin pattern.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@nephomaniac: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants