Skip to content

CNTRLPLANE-3701: simplify logging, ensure proper fields in reoncilers#8802

Open
stevekuznetsov wants to merge 2 commits into
openshift:mainfrom
stevekuznetsov:skuznets/controller-logs
Open

CNTRLPLANE-3701: simplify logging, ensure proper fields in reoncilers#8802
stevekuznetsov wants to merge 2 commits into
openshift:mainfrom
stevekuznetsov:skuznets/controller-logs

Conversation

@stevekuznetsov

@stevekuznetsov stevekuznetsov commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

We want to ensure that contextual, structural logging is used everywhere. When it's not, crucial information like what reconciler is spitting out the log or what key is being reconciled is missing, leaving users with no easy way to write structured queries over our outputs. Removed the pkg/log export as it's always the wrong thing to use.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Refactor
    • Standardized CLI/infra/OADP command logging to use scoped NewLogger() instances (including consistent timestamp formatting).
    • Controllers and admission/webhook handlers now derive logging from request/reconcile context for more consistent, scoped diagnostics.
    • Added deterministic controller naming constants for stable controller-runtime log identifiers.
    • Updated shared command wiring for platform/infra helpers (including Azure credential setup) and removed the old global CLI logger export.
  • Tests
    • Updated unit tests to match the new logger wiring and constructor signatures.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request introduces cmd/util.NewLogger(), a new zap-backed logr.Logger constructor with RFC3339 timestamp encoding, and removes the previously exported cmd/log.Log singleton. Every CLI command across cmd/ and product-cli/cmd/ is migrated to use cmdutil.NewLogger() for logging and cmdutil.GetClient() for Kubernetes client acquisition. AWS credential type CredentialsSecretData, Azure credential type AzureCreds, and helper functions (GenerateSSHKeys, ParseAWSTags, SetupAzureCredentials, GenerateRoleAssignmentName) are moved to reference cmd/util exclusively.

On the controller side, stored logr.Logger fields are removed from HostedControlPlaneReconciler, GCPPrivateServiceConnectReconciler, GCPPrivateServiceObserver, ManagedCAObserver, ConfigMapWebhookHandler, PodWebhookHandler, and SnapshotReconciler; all logging now derives from ctrl.LoggerFrom(ctx). PrivateServiceObserver gains a typed clientset field and a shared informer watch. HostedControlPlaneReconciler.SetupWithManager gains an explicit logger logr.Logger parameter. Exported ControllerName constants and .Named() registration are added to all controller registration paths.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@openshift-ci openshift-ci Bot requested review from clebs and devguyio June 22, 2026 19:17
@openshift-ci openshift-ci Bot added area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/none PR/issue for None (NonePlatform) platform - user-supplied infrastructure area/platform/openstack PR/issue for OpenStack (OpenStackPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform and removed do-not-merge/needs-area labels Jun 22, 2026
@stevekuznetsov stevekuznetsov force-pushed the skuznets/controller-logs branch 2 times, most recently from 6b7e682 to 3992c0f Compare June 22, 2026 19:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go (1)

65-75: ⚠️ Potential issue | 🟡 Minor

Use the informer lister in Reconcile instead of direct API calls.

SetupWithManager creates a shared informer and watches it (lines 102-128), but Reconcile bypasses this by making direct API calls via r.clientset.CoreV1().Services(req.Namespace).Get(). This defeats the caching benefit of the informer.

Store the informer's lister during setup and use services.Lister().Services(req.Namespace).Get(req.Name) in Reconcile to read from the cache.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go`
around lines 65 - 75, The PrivateServiceObserver struct makes direct API calls
in the Reconcile method using r.clientset.CoreV1().Services(req.Namespace).Get()
instead of leveraging the informer lister that is set up in SetupWithManager. To
fix this, add a services lister field to the PrivateServiceObserver struct to
store the informer's lister, initialize this lister in SetupWithManager when the
shared informer is created, and then update the Reconcile method to use
services.Lister().Services(req.Namespace).Get() instead of the direct client API
call to benefit from caching.
🧹 Nitpick comments (10)
product-cli/cmd/cluster/aws/destroy.go (1)

32-41: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use the pre-initialized logger from opts.Log instead of allocating a new one.

Line 39 creates a new logger instance for every error that occurs. The opts parameter carries a pre-initialized logr.Logger (set by the caller in cluster.go) that should be reused for consistency and efficiency. This follows the established pattern elsewhere (e.g., cmd/cluster/core/destroy.go).

♻️ Proposed fix
 		if err = hypershiftaws.DestroyCluster(cmd.Context(), opts); err != nil {
-			cmdutil.NewLogger().Error(err, "Failed to destroy cluster")
+			opts.Log.Error(err, "Failed to destroy cluster")
 			return err
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@product-cli/cmd/cluster/aws/destroy.go` around lines 32 - 41, The RunE
function in the destroy command is creating a new logger instance with
cmdutil.NewLogger() on each error instead of reusing the pre-initialized logger
available in opts.Log. Replace the cmdutil.NewLogger().Error() call with
opts.Log.Error() to use the existing logger that was set by the caller, ensuring
consistency and avoiding unnecessary allocations following the established
pattern in similar files like cmd/cluster/core/destroy.go.
product-cli/cmd/cluster/agent/destroy.go (1)

18-22: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use the pre-initialized logger from opts.Log instead of allocating a new one.

Line 20 creates a new logger instance for every error that occurs. The opts parameter carries a pre-initialized logr.Logger (set by the caller in cluster.go) that should be reused for consistency and efficiency. This follows the established pattern elsewhere (e.g., cmd/cluster/core/destroy.go).

♻️ Proposed fix
 	cmd.RunE = func(cmd *cobra.Command, args []string) error {
 		if err := agent.DestroyCluster(cmd.Context(), opts); err != nil {
-			cmdutil.NewLogger().Error(err, "Failed to destroy cluster")
+			opts.Log.Error(err, "Failed to destroy cluster")
 			return err
 		}

 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@product-cli/cmd/cluster/agent/destroy.go` around lines 18 - 22, In the RunE
function of the destroy command, replace the inefficient `cmdutil.NewLogger()`
call in the error handling block with the pre-initialized logger from
`opts.Log`. Instead of creating a new logger instance each time an error occurs
when `agent.DestroyCluster` fails, use the logger that was already initialized
and passed in through the options parameter. This maintains consistency with the
established pattern used elsewhere in the codebase and avoids unnecessary logger
allocations.
product-cli/cmd/cluster/openstack/destroy.go (1)

16-39: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use the pre-initialized logger from opts.Log instead of allocating a new one.

Line 23 creates a new logger instance and stores it in a local variable. The opts parameter carries a pre-initialized logr.Logger (set by the caller in cluster.go) that should be reused for consistency and efficiency. This follows the established pattern elsewhere (e.g., cmd/cluster/core/destroy.go).

♻️ Proposed fix
 	cmd.Run = func(cmd *cobra.Command, args []string) {
 		ctx, cancel := context.WithCancel(context.Background())
 		defer cancel()

 		sigs := make(chan os.Signal, 1)
 		signal.Notify(sigs, syscall.SIGINT)
 		go func() {
 			<-sigs
 			cancel()
 		}()

 		if err := openstack.DestroyCluster(ctx, opts); err != nil {
-			logger.Error(err, "Failed to destroy cluster")
+			opts.Log.Error(err, "Failed to destroy cluster")
 			os.Exit(1)
 		}
 	}

-	logger := cmdutil.NewLogger()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@product-cli/cmd/cluster/openstack/destroy.go` around lines 16 - 39, In the
NewDestroyCommand function, replace the local logger initialization (logger :=
cmdutil.NewLogger()) with direct use of the pre-initialized logger available in
the opts parameter. Update the logger.Error call within the Run function to use
opts.Log instead of the local logger variable. This ensures consistency with the
established pattern in the codebase and reuses the logger that was already
initialized by the caller.
product-cli/cmd/cluster/kubevirt/destroy.go (1)

18-24: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use the pre-initialized logger from opts.Log instead of allocating a new one.

Line 20 creates a new logger instance for every error that occurs. The opts parameter carries a pre-initialized logr.Logger (set by the caller in cluster.go) that should be reused for consistency and efficiency. This follows the established pattern elsewhere (e.g., cmd/cluster/core/destroy.go).

♻️ Proposed fix
 	cmd.RunE = func(cmd *cobra.Command, args []string) error {
 		if err := none.DestroyCluster(cmd.Context(), opts); err != nil {
-			cmdutil.NewLogger().Error(err, "Failed to destroy cluster")
+			opts.Log.Error(err, "Failed to destroy cluster")
 			return err
 		}
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@product-cli/cmd/cluster/kubevirt/destroy.go` around lines 18 - 24, The RunE
function in the destroy command is creating a new logger instance with
cmdutil.NewLogger() for the error log, which is inefficient and inconsistent
with the pattern used elsewhere. Instead, replace the
cmdutil.NewLogger().Error() call with opts.Log.Error() to reuse the
pre-initialized logger that is already available in the opts parameter (which is
passed by the caller and carries the context-aware logger configuration). This
ensures consistency, efficiency, and proper log context propagation throughout
the cluster destruction operation.
cmd/cluster/azure/destroy.go (1)

49-49: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Redundant logger creation.

opts.Log is already initialized in cluster.go line 51. Creating a separate logger here for error reporting in cmd.Run means this command uses two different logger instances. Consider using opts.Log consistently:

-	logger := util.NewLogger()
 	cmd.Run = func(cmd *cobra.Command, args []string) {
 		ctx, cancel := context.WithCancel(context.Background())
 		defer cancel()
 
 		sigs := make(chan os.Signal, 1)
 		signal.Notify(sigs, syscall.SIGINT)
 		go func() {
 			<-sigs
 			cancel()
 		}()
 
 		if err := DestroyCluster(ctx, opts); err != nil {
-			logger.Error(err, "Failed to destroy cluster")
+			opts.Log.Error(err, "Failed to destroy cluster")
 			os.Exit(1)
 		}
 	}

This pattern appears across all destroy commands in this layer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/cluster/azure/destroy.go` at line 49, The code in the destroy command is
creating a redundant logger instance using util.NewLogger() when opts.Log is
already initialized and available. Replace the logger assignment on line 49 to
use opts.Log instead of creating a new logger, and update all references to the
logger variable throughout the cmd.Run function to consistently use opts.Log.
This ensures the destroy command uses a single, consistent logger instance
rather than maintaining two separate logger instances.
cmd/infra/azure/rbac.go (1)

219-219: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Reuse one logger instance inside assignRole.

Lines 219–275 repeatedly call cmdutil.NewLogger() in the same request path. Create one local logger once and reuse it across branches to avoid repeated setup and keep log context consistent.

Proposed fix
 func (r *RBACManager) assignRole(ctx context.Context, client roleAssignmentClient, infraID, component, assigneeID, role, scope string) error {
+	logger := cmdutil.NewLogger().WithValues("infraID", infraID, "component", component)
 	// Generate the role assignment name
 	roleAssignmentName := cmdutil.GenerateRoleAssignmentName(infraID, component, scope)
@@
-				cmdutil.NewLogger().Info("Skipping role assignment creation, matching assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
+				logger.Info("Skipping role assignment creation, matching assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
@@
-			cmdutil.NewLogger().Info("Skipping role assignment creation, role assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
+			logger.Info("Skipping role assignment creation, role assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
@@
-		cmdutil.NewLogger().Info("Deleting stale role assignment with mismatched principal",
+		logger.Info("Deleting stale role assignment with mismatched principal",
@@
-				cmdutil.NewLogger().Info("Get not permitted; will attempt create and rely on 409 for idempotency.", "role", role, "assigneeID", assigneeID, "scope", scope)
+				logger.Info("Get not permitted; will attempt create and rely on 409 for idempotency.", "role", role, "assigneeID", assigneeID, "scope", scope)
@@
-			cmdutil.NewLogger().Info("Failed role assignment creation, role assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
+			logger.Info("Failed role assignment creation, role assignment already exists.", "role", role, "assigneeID", assigneeID, "scope", scope)
@@
-	cmdutil.NewLogger().Info("successfully created role assignment", "role", role, "assigneeID", assigneeID, "scope", scope)
+	logger.Info("successfully created role assignment", "role", role, "assigneeID", assigneeID, "scope", scope)

Also applies to: 235-243, 256-257, 270-275

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/infra/azure/rbac.go` at line 219, The assignRole function repeatedly
calls cmdutil.NewLogger() across multiple branches (lines 219-275), which
creates unnecessary logger instances and can be inefficient. Create a single
logger instance once at the beginning of the assignRole function and reuse it
throughout all the branches and conditional blocks instead of calling
cmdutil.NewLogger() each time. Replace all occurrences of
cmdutil.NewLogger().Info() and similar calls with the reused local logger
instance to maintain consistent log context and avoid repeated setup overhead.
cmd/infra/powervs/create.go (1)

230-237: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Bind logger name after flag parsing.

On Line 230, opts.InfraID is still at its default when NewCreateCommand() runs, so .WithName(opts.InfraID) can end up empty even when --infra-id is provided. Initialize the named logger inside RunE after flags are applied.

Proposed fix
-	logger := cmdutil.NewLogger().WithName(opts.InfraID)
 	cmd.RunE = func(cmd *cobra.Command, args []string) error {
+		logger := cmdutil.NewLogger().WithName(opts.InfraID)
 		if err := opts.Run(cmd.Context(), logger); err != nil {
 			logger.Error(err, "Failed to create infrastructure")
 			return err
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/infra/powervs/create.go` around lines 230 - 237, The logger is being
initialized with opts.InfraID before the command flags are parsed, causing
opts.InfraID to be at its default value. Move the logger initialization line
`logger := cmdutil.NewLogger().WithName(opts.InfraID)` from outside the RunE
function into the beginning of the RunE anonymous function, after flag parsing
has occurred. This ensures opts.InfraID contains the actual value from the
--infra-id flag when the named logger is created.
hypershift-operator/controllers/nodepool/nodepool_controller.go (1)

156-156: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Extract hardcoded controller name to a constant for consistency.

The main controller uses ControllerName constant at Line 134, but the secretjanitor controller uses a hardcoded string. For consistency and maintainability, extract this to a package-level constant.

♻️ Proposed refactor
 const ControllerName = "nodepool"
+const SecretJanitorControllerName = "secretjanitor"

Then update the usage:

 	if err := ctrl.NewControllerManagedBy(mgr).
-		Named("secretjanitor").
+		Named(SecretJanitorControllerName).
 		For(&corev1.Secret{}, builder.WithPredicates(supportutil.PredicatesForHostedClusterAnnotationScoping(mgr.GetClient()))).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go` at line 156,
The Named() method call uses a hardcoded string "secretjanitor" instead of a
constant, inconsistent with how the main controller uses the ControllerName
constant. Extract the hardcoded "secretjanitor" string to a package-level
constant (similar to ControllerName) and replace the hardcoded string in the
Named() call with this new constant to maintain consistency across the
controller definitions.
control-plane-operator/main.go (1)

536-536: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use the exported controller-name constant here instead of a duplicated literal.

This avoids drift if the controller name changes in one place but not the other.

Suggested change
-		}).SetupWithManager(mgr, upsert.New(enableCIDebugOutput).CreateOrUpdate, hcp, mgr.GetLogger().WithName("hostedcontrolplane")); err != nil {
+		}).SetupWithManager(
+			mgr,
+			upsert.New(enableCIDebugOutput).CreateOrUpdate,
+			hcp,
+			mgr.GetLogger().WithName(hostedcontrolplane.ControllerName),
+		); err != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/main.go` at line 536, Replace the hardcoded string
literal "hostedcontrolplane" in the mgr.GetLogger().WithName() call on line 536
of the SetupWithManager method invocation with an exported controller-name
constant. This ensures that if the controller name needs to be updated in the
future, it only needs to be changed in one place, preventing drift between
different usages of the same controller name throughout the codebase.
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)

605-605: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Use structured fields for the release-image mismatch log.

Line 605 still builds one long message string, which makes filtering harder in structured log backends.

Suggested change
-		log.Info("releaseImage is " + util.HCPControlPlaneReleaseImage(hostedControlPlane) + ", but this operator is configured for " + r.OperateOnReleaseImage + ", skipping reconciliation")
+		log.Info(
+			"Skipping reconciliation due to release image mismatch",
+			"hcpReleaseImage", util.HCPControlPlaneReleaseImage(hostedControlPlane),
+			"operatorReleaseImage", r.OperateOnReleaseImage,
+		)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`
at line 605, The log.Info statement at line 605 builds a single concatenated
string message using the "+" operator, which prevents structured log backends
from efficiently filtering and processing the log data. Replace the string
concatenation with structured logging fields by passing the release image values
and operator configuration as separate key-value pairs to the log.Info call
instead of concatenating them into a single message string. The values
util.HCPControlPlaneReleaseImage(hostedControlPlane) and r.OperateOnReleaseImage
should be passed as structured fields alongside a descriptive log message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/infra/powervs/destroy.go`:
- Line 120: The logger initialization with WithName(opts.InfraID) on line 120
occurs before Cobra parses the command-line flags, so opts.InfraID contains its
default value instead of the runtime --infra-id flag value. Move the logger
creation statement to after the Cobra flag parsing completes (typically after
opts.Complete() or similar flag binding method is called) so that opts.InfraID
contains the actual value provided by the user at runtime.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Line 1559: The logr.Error function does not perform printf-style formatting on
the message parameter, so the `%w` verb in the message string at line 1559 will
be logged literally instead of substituting the actual error value. Remove the
`%w` printf verb from the message string in the log.Error call and any other
occurrences at lines 1766 and 1799 in the same file. The error value should be
passed as a separate parameter to logr.Error if needed for structured logging,
not embedded in the message string itself.

---

Outside diff comments:
In
`@control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go`:
- Around line 65-75: The PrivateServiceObserver struct makes direct API calls in
the Reconcile method using r.clientset.CoreV1().Services(req.Namespace).Get()
instead of leveraging the informer lister that is set up in SetupWithManager. To
fix this, add a services lister field to the PrivateServiceObserver struct to
store the informer's lister, initialize this lister in SetupWithManager when the
shared informer is created, and then update the Reconcile method to use
services.Lister().Services(req.Namespace).Get() instead of the direct client API
call to benefit from caching.

---

Nitpick comments:
In `@cmd/cluster/azure/destroy.go`:
- Line 49: The code in the destroy command is creating a redundant logger
instance using util.NewLogger() when opts.Log is already initialized and
available. Replace the logger assignment on line 49 to use opts.Log instead of
creating a new logger, and update all references to the logger variable
throughout the cmd.Run function to consistently use opts.Log. This ensures the
destroy command uses a single, consistent logger instance rather than
maintaining two separate logger instances.

In `@cmd/infra/azure/rbac.go`:
- Line 219: The assignRole function repeatedly calls cmdutil.NewLogger() across
multiple branches (lines 219-275), which creates unnecessary logger instances
and can be inefficient. Create a single logger instance once at the beginning of
the assignRole function and reuse it throughout all the branches and conditional
blocks instead of calling cmdutil.NewLogger() each time. Replace all occurrences
of cmdutil.NewLogger().Info() and similar calls with the reused local logger
instance to maintain consistent log context and avoid repeated setup overhead.

In `@cmd/infra/powervs/create.go`:
- Around line 230-237: The logger is being initialized with opts.InfraID before
the command flags are parsed, causing opts.InfraID to be at its default value.
Move the logger initialization line `logger :=
cmdutil.NewLogger().WithName(opts.InfraID)` from outside the RunE function into
the beginning of the RunE anonymous function, after flag parsing has occurred.
This ensures opts.InfraID contains the actual value from the --infra-id flag
when the named logger is created.

In
`@control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go`:
- Line 605: The log.Info statement at line 605 builds a single concatenated
string message using the "+" operator, which prevents structured log backends
from efficiently filtering and processing the log data. Replace the string
concatenation with structured logging fields by passing the release image values
and operator configuration as separate key-value pairs to the log.Info call
instead of concatenating them into a single message string. The values
util.HCPControlPlaneReleaseImage(hostedControlPlane) and r.OperateOnReleaseImage
should be passed as structured fields alongside a descriptive log message.

In `@control-plane-operator/main.go`:
- Line 536: Replace the hardcoded string literal "hostedcontrolplane" in the
mgr.GetLogger().WithName() call on line 536 of the SetupWithManager method
invocation with an exported controller-name constant. This ensures that if the
controller name needs to be updated in the future, it only needs to be changed
in one place, preventing drift between different usages of the same controller
name throughout the codebase.

In `@hypershift-operator/controllers/nodepool/nodepool_controller.go`:
- Line 156: The Named() method call uses a hardcoded string "secretjanitor"
instead of a constant, inconsistent with how the main controller uses the
ControllerName constant. Extract the hardcoded "secretjanitor" string to a
package-level constant (similar to ControllerName) and replace the hardcoded
string in the Named() call with this new constant to maintain consistency across
the controller definitions.

In `@product-cli/cmd/cluster/agent/destroy.go`:
- Around line 18-22: In the RunE function of the destroy command, replace the
inefficient `cmdutil.NewLogger()` call in the error handling block with the
pre-initialized logger from `opts.Log`. Instead of creating a new logger
instance each time an error occurs when `agent.DestroyCluster` fails, use the
logger that was already initialized and passed in through the options parameter.
This maintains consistency with the established pattern used elsewhere in the
codebase and avoids unnecessary logger allocations.

In `@product-cli/cmd/cluster/aws/destroy.go`:
- Around line 32-41: The RunE function in the destroy command is creating a new
logger instance with cmdutil.NewLogger() on each error instead of reusing the
pre-initialized logger available in opts.Log. Replace the
cmdutil.NewLogger().Error() call with opts.Log.Error() to use the existing
logger that was set by the caller, ensuring consistency and avoiding unnecessary
allocations following the established pattern in similar files like
cmd/cluster/core/destroy.go.

In `@product-cli/cmd/cluster/kubevirt/destroy.go`:
- Around line 18-24: The RunE function in the destroy command is creating a new
logger instance with cmdutil.NewLogger() for the error log, which is inefficient
and inconsistent with the pattern used elsewhere. Instead, replace the
cmdutil.NewLogger().Error() call with opts.Log.Error() to reuse the
pre-initialized logger that is already available in the opts parameter (which is
passed by the caller and carries the context-aware logger configuration). This
ensures consistency, efficiency, and proper log context propagation throughout
the cluster destruction operation.

In `@product-cli/cmd/cluster/openstack/destroy.go`:
- Around line 16-39: In the NewDestroyCommand function, replace the local logger
initialization (logger := cmdutil.NewLogger()) with direct use of the
pre-initialized logger available in the opts parameter. Update the logger.Error
call within the Run function to use opts.Log instead of the local logger
variable. This ensures consistency with the established pattern in the codebase
and reuses the logger that was already initialized by the caller.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3788e3e5-969f-465f-b6e9-9f38631bdf8a

📥 Commits

Reviewing files that changed from the base of the PR and between d3162ab and 28e9ae9.

📒 Files selected for processing (76)
  • cmd/bastion/aws/create.go
  • cmd/bastion/aws/destroy.go
  • cmd/cluster/agent/destroy.go
  • cmd/cluster/aws/destroy.go
  • cmd/cluster/azure/destroy.go
  • cmd/cluster/azure/destroy_test.go
  • cmd/cluster/cluster.go
  • cmd/cluster/core/create.go
  • cmd/cluster/core/destroy_test.go
  • cmd/cluster/core/dump.go
  • cmd/cluster/gcp/destroy.go
  • cmd/cluster/kubevirt/destroy.go
  • cmd/cluster/none/destroy.go
  • cmd/cluster/openstack/destroy.go
  • cmd/cluster/powervs/destroy.go
  • cmd/consolelogs/aws/getlogs.go
  • cmd/infra/aws/create.go
  • cmd/infra/aws/create_cli_role.go
  • cmd/infra/aws/create_iam.go
  • cmd/infra/aws/destroy.go
  • cmd/infra/aws/destroy_iam.go
  • cmd/infra/azure/create.go
  • cmd/infra/azure/create_iam.go
  • cmd/infra/azure/destroy.go
  • cmd/infra/azure/destroy_iam.go
  • cmd/infra/azure/rbac.go
  • cmd/infra/gcp/create_iam.go
  • cmd/infra/gcp/create_infra.go
  • cmd/infra/gcp/destroy_iam.go
  • cmd/infra/gcp/destroy_infra.go
  • cmd/infra/powervs/create.go
  • cmd/infra/powervs/destroy.go
  • cmd/log/log.go
  • cmd/nodepool/core/create.go
  • cmd/nodepool/powervs/create.go
  • cmd/oadp/backup.go
  • cmd/oadp/restore.go
  • cmd/oadp/schedule.go
  • cmd/oadp/schedule_test.go
  • cmd/util/azure_test.go
  • cmd/util/log.go
  • control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go
  • control-plane-operator/controllers/azureprivatelinkservice/controller.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/observer.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/configmap_observer.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/setup.go
  • control-plane-operator/main.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook_test.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook_test.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • hypershift-operator/controllers/platform/azure/controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller_test.go
  • hypershift-operator/controllers/sharedingress/sharedingress_controller.go
  • hypershift-operator/controllers/supportedversion/reconciler.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • hypershift-operator/main.go
  • product-cli/cmd/cluster/agent/destroy.go
  • product-cli/cmd/cluster/aws/destroy.go
  • product-cli/cmd/cluster/cluster.go
  • product-cli/cmd/cluster/kubevirt/destroy.go
  • product-cli/cmd/cluster/openstack/destroy.go
  • product-cli/cmd/iam/azure/create.go
  • product-cli/cmd/iam/azure/destroy.go
  • product-cli/cmd/infra/azure/create.go
  • product-cli/cmd/infra/azure/destroy.go
  • product-cli/cmd/nodepool/destroy.go
💤 Files with no reviewable changes (4)
  • cmd/log/log.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller_test.go
  • hypershift-operator/main.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go

Comment thread cmd/infra/powervs/destroy.go Outdated
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.30612% with 183 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.13%. Comparing base (c4fd7dc) to head (104ae07).
⚠️ Report is 37 commits behind head on main.

Files with missing lines Patch % Lines
cmd/infra/azure/create.go 0.00% 21 Missing ⚠️
...ostedcontrolplane/hostedcontrolplane_controller.go 59.61% 20 Missing and 1 partial ⚠️
cmd/infra/azure/create_iam.go 0.00% 20 Missing ⚠️
cmd/infra/azure/destroy_iam.go 0.00% 15 Missing ⚠️
...ntrollers/auditlogpersistence/configmap_webhook.go 13.33% 13 Missing ⚠️
...tor/controllers/auditlogpersistence/pod_webhook.go 13.33% 13 Missing ⚠️
...s/platform/gcp/privateserviceconnect_controller.go 11.11% 8 Missing ⚠️
cmd/infra/azure/destroy.go 0.00% 7 Missing ⚠️
...ollers/awsprivatelink/awsprivatelink_controller.go 0.00% 5 Missing ⚠️
...trollers/sharedingress/sharedingress_controller.go 0.00% 5 Missing ⚠️
... and 48 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8802      +/-   ##
==========================================
+ Coverage   42.11%   42.13%   +0.01%     
==========================================
  Files         767      768       +1     
  Lines       95069    95095      +26     
==========================================
+ Hits        40039    40067      +28     
+ Misses      52216    52214       -2     
  Partials     2814     2814              
Files with missing lines Coverage Δ
cmd/cluster/core/create.go 61.70% <100.00%> (ø)
cmd/infra/azure/rbac.go 47.18% <100.00%> (ø)
cmd/infra/gcp/destroy_iam.go 43.85% <100.00%> (ø)
cmd/oadp/schedule.go 68.19% <100.00%> (ø)
cmd/util/log.go 100.00% <100.00%> (ø)
...rollers/auditlogpersistence/snapshot_controller.go 62.39% <100.00%> (+0.25%) ⬆️
product-cli/cmd/cluster/cluster.go 100.00% <100.00%> (ø)
cmd/bastion/aws/create.go 0.00% <0.00%> (ø)
cmd/bastion/aws/destroy.go 0.00% <0.00%> (ø)
cmd/cluster/agent/destroy.go 0.00% <0.00%> (ø)
... and 55 more

... and 1 file with indirect coverage changes

Flag Coverage Δ
cmd-support 35.47% <15.78%> (+<0.01%) ⬆️
cpo-hostedcontrolplane 44.55% <59.61%> (+0.02%) ⬆️
cpo-other 44.26% <28.57%> (+<0.01%) ⬆️
hypershift-operator 51.90% <11.32%> (-0.02%) ⬇️
other 31.69% <25.00%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go`:
- Line 77: The logger.Info call that logs "Initialized GCP platform information"
includes the raw projectID field, which is customer-identifying data that should
not be exposed in shared logs. Remove the projectID field from the logger.Info
call to avoid leaking tenant context, or alternatively redact the identifier
with a masked or hashed value instead of logging the raw project ID.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2cae6c22-dd7d-49c2-a615-11b65355bb5a

📥 Commits

Reviewing files that changed from the base of the PR and between 28e9ae9 and 3992c0f.

📒 Files selected for processing (76)
  • cmd/bastion/aws/create.go
  • cmd/bastion/aws/destroy.go
  • cmd/cluster/agent/destroy.go
  • cmd/cluster/aws/destroy.go
  • cmd/cluster/azure/destroy.go
  • cmd/cluster/azure/destroy_test.go
  • cmd/cluster/cluster.go
  • cmd/cluster/core/create.go
  • cmd/cluster/core/destroy_test.go
  • cmd/cluster/core/dump.go
  • cmd/cluster/gcp/destroy.go
  • cmd/cluster/kubevirt/destroy.go
  • cmd/cluster/none/destroy.go
  • cmd/cluster/openstack/destroy.go
  • cmd/cluster/powervs/destroy.go
  • cmd/consolelogs/aws/getlogs.go
  • cmd/infra/aws/create.go
  • cmd/infra/aws/create_cli_role.go
  • cmd/infra/aws/create_iam.go
  • cmd/infra/aws/destroy.go
  • cmd/infra/aws/destroy_iam.go
  • cmd/infra/azure/create.go
  • cmd/infra/azure/create_iam.go
  • cmd/infra/azure/destroy.go
  • cmd/infra/azure/destroy_iam.go
  • cmd/infra/azure/rbac.go
  • cmd/infra/gcp/create_iam.go
  • cmd/infra/gcp/create_infra.go
  • cmd/infra/gcp/destroy_iam.go
  • cmd/infra/gcp/destroy_infra.go
  • cmd/infra/powervs/create.go
  • cmd/infra/powervs/destroy.go
  • cmd/log/log.go
  • cmd/nodepool/core/create.go
  • cmd/nodepool/powervs/create.go
  • cmd/oadp/backup.go
  • cmd/oadp/restore.go
  • cmd/oadp/schedule.go
  • cmd/oadp/schedule_test.go
  • cmd/util/azure_test.go
  • cmd/util/log.go
  • control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go
  • control-plane-operator/controllers/azureprivatelinkservice/controller.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/observer.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/configmap_observer.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/setup.go
  • control-plane-operator/main.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook_test.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook_test.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • hypershift-operator/controllers/platform/azure/controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller_test.go
  • hypershift-operator/controllers/sharedingress/sharedingress_controller.go
  • hypershift-operator/controllers/supportedversion/reconciler.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • hypershift-operator/main.go
  • product-cli/cmd/cluster/agent/destroy.go
  • product-cli/cmd/cluster/aws/destroy.go
  • product-cli/cmd/cluster/cluster.go
  • product-cli/cmd/cluster/kubevirt/destroy.go
  • product-cli/cmd/cluster/openstack/destroy.go
  • product-cli/cmd/iam/azure/create.go
  • product-cli/cmd/iam/azure/destroy.go
  • product-cli/cmd/infra/azure/create.go
  • product-cli/cmd/infra/azure/destroy.go
  • product-cli/cmd/nodepool/destroy.go
💤 Files with no reviewable changes (3)
  • cmd/log/log.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller_test.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller_test.go
✅ Files skipped from review due to trivial changes (15)
  • product-cli/cmd/cluster/openstack/destroy.go
  • cmd/util/log.go
  • cmd/cluster/none/destroy.go
  • cmd/infra/gcp/create_iam.go
  • cmd/infra/gcp/destroy_iam.go
  • cmd/cluster/kubevirt/destroy.go
  • control-plane-operator/main.go
  • hypershift-operator/controllers/sharedingress/sharedingress_controller.go
  • cmd/cluster/agent/destroy.go
  • cmd/cluster/core/destroy_test.go
  • hypershift-operator/controllers/auditlogpersistence/snapshot_controller.go
  • cmd/infra/powervs/create.go
  • cmd/oadp/schedule_test.go
  • product-cli/cmd/nodepool/destroy.go
  • cmd/cluster/azure/destroy.go
🚧 Files skipped from review as they are similar to previous changes (56)
  • product-cli/cmd/infra/azure/create.go
  • product-cli/cmd/cluster/kubevirt/destroy.go
  • hypershift-operator/controllers/platform/aws/controller.go
  • product-cli/cmd/iam/azure/destroy.go
  • product-cli/cmd/infra/azure/destroy.go
  • product-cli/cmd/cluster/aws/destroy.go
  • cmd/bastion/aws/create.go
  • cmd/cluster/openstack/destroy.go
  • cmd/cluster/gcp/destroy.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/psc_endpoint_controller.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook_test.go
  • cmd/cluster/cluster.go
  • cmd/nodepool/powervs/create.go
  • cmd/infra/gcp/destroy_infra.go
  • cmd/infra/gcp/create_infra.go
  • hypershift-operator/controllers/supportedversion/reconciler.go
  • product-cli/cmd/iam/azure/create.go
  • cmd/oadp/schedule.go
  • control-plane-operator/controllers/azureprivatelinkservice/controller.go
  • hypershift-operator/controllers/uwmtelemetry/uwm_telemetry.go
  • cmd/infra/aws/destroy_iam.go
  • hypershift-operator/controllers/platform/azure/controller.go
  • hypershift-operator/controllers/nodepool/nodepool_controller.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/setup.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook_test.go
  • cmd/infra/aws/create_cli_role.go
  • cmd/oadp/backup.go
  • hypershift-operator/controllers/platform/gcp/privateserviceconnect_controller_test.go
  • cmd/infra/azure/create_iam.go
  • cmd/util/azure_test.go
  • cmd/cluster/core/dump.go
  • cmd/bastion/aws/destroy.go
  • cmd/oadp/restore.go
  • hypershift-operator/main.go
  • cmd/cluster/azure/destroy_test.go
  • cmd/cluster/powervs/destroy.go
  • product-cli/cmd/cluster/cluster.go
  • cmd/infra/azure/destroy_iam.go
  • cmd/cluster/core/create.go
  • cmd/infra/azure/destroy.go
  • cmd/consolelogs/aws/getlogs.go
  • cmd/infra/aws/create_iam.go
  • cmd/infra/aws/destroy.go
  • control-plane-operator/controllers/gcpprivateserviceconnect/observer.go
  • hypershift-operator/controllers/auditlogpersistence/configmap_webhook.go
  • cmd/cluster/aws/destroy.go
  • cmd/infra/powervs/destroy.go
  • cmd/nodepool/core/create.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/cmca/configmap_observer.go
  • hypershift-operator/controllers/auditlogpersistence/pod_webhook.go
  • cmd/infra/aws/create.go
  • cmd/infra/azure/create.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • cmd/infra/azure/rbac.go
  • control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go

We want to ensure that contextual, structural logging is used
everywhere. When it's not, crucial information like what reconciler is
spitting out the log or what key is being reconciled is missing, leaving
users with no easy way to write structured queries over our outputs.
Removed the `pkg/log` export as it's always the wrong thing to use.

Signed-off-by: Steve Kuznetsov <stekuznetsov@microsoft.com>
@stevekuznetsov stevekuznetsov force-pushed the skuznets/controller-logs branch from 3992c0f to 9c3df5c Compare June 22, 2026 19:57
Signed-off-by: Steve Kuznetsov <stekuznetsov@microsoft.com>
@csrwng csrwng changed the title *: simplify logging, ensure proper fields in reoncilers CNTRLPLANE-3701: simplify logging, ensure proper fields in reoncilers Jun 24, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 24, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 24, 2026

Copy link
Copy Markdown

@stevekuznetsov: This pull request references CNTRLPLANE-3701 which is a valid jira issue.

Details

In response to this:

We want to ensure that contextual, structural logging is used everywhere. When it's not, crucial information like what reconciler is spitting out the log or what key is being reconciled is missing, leaving users with no easy way to write structured queries over our outputs. Removed the pkg/log export as it's always the wrong thing to use.

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Refactor
  • Standardized CLI/infra/OADP command logging to use scoped NewLogger() instances (including consistent timestamp formatting).
  • Controllers and admission/webhook handlers now derive logging from request/reconcile context for more consistent, scoped diagnostics.
  • Added deterministic controller naming constants for stable controller-runtime log identifiers.
  • Updated shared command wiring for platform/infra helpers (including Azure credential setup) and removed the old global CLI logger export.
  • Tests
  • Updated unit tests to match the new logger wiring and constructor signatures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng

csrwng commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

/override codecov/patch

This is simply changing how logging is invoked, no new logic is introduced.

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@csrwng: Overrode contexts on behalf of csrwng: codecov/patch

Details

In response to this:

/override codecov/patch

This is simply changing how logging is invoked, no new logic is introduced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@csrwng csrwng left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean and consistent migration to contextual structured logging.

Key wins:

  • Fixes the r.Log concurrency bug (shared mutable state across reconcile calls)
  • Deterministic ControllerName constants for stable log identifiers
  • Removes cmd/log.Log singleton that controllers shouldn't have been using
  • Incidentally fixes incorrect %w format verbs in logr.Error() calls

Mechanical changes are all consistent — verified a representative sample across controllers, webhooks, and CLI layer.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-v2-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026
@cwbotbot

cwbotbot commented Jun 24, 2026

Copy link
Copy Markdown

Test Results

e2e-aks

e2e-aws

@openshift-ci

openshift-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

@stevekuznetsov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-4-22 104ae07 link true /test e2e-aws-4-22
ci/prow/e2e-v2-gke 104ae07 link true /test e2e-v2-gke

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci

hypershift-jira-solve-ci Bot commented Jun 24, 2026

Copy link
Copy Markdown

All those are TestCreateClusterHABreakGlassCredentials and TestCreateClusterProxy (different tests). No output from TestCreateCluster (the one without suffix) after line 2417. The test was stuck waiting for EnsureMetricsForwarderWorking for the remaining ~65 minutes until timeout.

Now I have all the evidence I need. Let me produce the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Container test exited with code 127, reason Error
Process did not finish before 2h0m0s timeout
"e2e-aws-4-22" pod "e2e-aws-4-22-hypershift-aws-run-e2e-nested" failed after 2h30m36s (failed containers: test): ContainerFailed one or more containers exited

TestCreateCluster/Main/EnsureMetricsForwarderWorking:
  util_metrics_proxy.go:117: kube-apiserver target via metrics-forwarder not found in Prometheus active targets (will retry)

Summary

The e2e-aws-4-22 job failed because the TestCreateCluster test exceeded the 2-hour process timeout. All 14 other non-skipped tests passed (including the closely-related e2e-aws variant on the same PR). TestCreateCluster ran its many sequential subtests (break-glass-credentials, EnsureHostedClusterImmutability, EnsureGlobalPullSecret, EnsureKubeAPIDNSNameCustomCert, EnsureDefaultSecurityGroupTags, EnsureMetricsForwarderWorking) and was stuck in the EnsureMetricsForwarderWorking subtest polling for the kube-apiserver metrics-forwarder target to appear as healthy in the guest cluster's Prometheus. The test's internal 10-minute poll loop was still running (with only 1 logged retry visible) when the 2h timeout killed the process. This is unrelated to the PR's logging changes — the PR modifies no test code, no metrics-proxy code, and no metrics-forwarder code. The e2e-aws variant of the same PR passed successfully.

Root Cause

The root cause is a test timing flakeTestCreateCluster has too many expensive sequential subtests to reliably complete within the 2h process timeout in the e2e-aws-4-22 variant.

Detailed chain of events:

  1. The test binary started at 02:28:15Z with -test.parallel=20 and a 2h process timeout (deadline: 04:28:15Z).
  2. All 15 non-skipped tests launched concurrently. TestCreateCluster is one of the longest due to its many sequential subtests.
  3. TestCreateCluster progressed through: ValidateHostedClusterMain/break-glass-credentials (involving independent_signers, customer-break-glass/CSR_flow/revocation, sre-break-glass/CSR_flow/revocation — each taking ~130-168s) → EnsureHostedClusterImmutabilityEnsureKubeAPIDNSNameCustomCertEnsureDefaultSecurityGroupTagsEnsureGlobalPullSecret (with DaemonSet readiness waits and pull secret propagation across nodes) → EnsureMetricsForwarderWorking.
  4. EnsureMetricsForwarderWorking started approximately around 03:20Z (based on log interleaving). It waited for endpoint-resolver and metrics-proxy deployments to become ready, then entered a PollUntilContextTimeout loop (15s interval, 10min timeout) checking Prometheus for the metrics-forwarder target.
  5. The Prometheus target was never found — the test logged "kube-apiserver target via metrics-forwarder not found in Prometheus active targets (will retry)" and continued polling.
  6. At 04:28:15Z the 2h process timeout hit, killing all tests including TestCreateCluster which never completed. The process exited with code 127 after the 30m grace period.

Why the metrics-forwarder target was not found: The metrics forwarding data path (guest Prometheus → PodMonitor → metrics-forwarder HAProxy → Route → metrics-proxy → kube-apiserver) involves multiple components across the management and guest clusters. This particular hosted cluster's metrics path likely had a transient infrastructure issue (e.g., slow route propagation, HAProxy pod scheduling delay, or Prometheus scrape config propagation lag) that prevented the target from becoming healthy within the ~65 minutes before the test was killed.

Why this is unrelated to PR #8802: The PR modifies 76 files, all related to logging changes (simplify logging, ensure proper fields in reconcilers). It does not modify any test code under test/e2e/, any metrics-proxy/metrics-forwarder code, any HyperShift framework code, or any deployment manifests. The e2e-aws variant of the same PR passed all tests successfully, confirming the failure is not caused by the PR's changes.

Recommendations
  1. Retest the job — This is a timing flake unrelated to the PR. A /retest should resolve it.
  2. Consider adding a per-test timeout to TestCreateCluster — The test currently has no explicit Go test timeout. Adding t.SetDeadline() or splitting EnsureMetricsForwarderWorking into a separate test would prevent it from blocking the entire suite.
  3. The EnsureMetricsForwarderWorking polling loop silently retries all errors (return false, nil) for up to 10 minutes. If the underlying issue is a permanent misconfiguration (e.g., missing PodMonitor, broken route), the test will always consume the full 10 minutes before failing. Adding fail-fast checks for unrecoverable conditions would improve feedback time.
Evidence
Evidence Detail
Exit code 127 (process killed by timeout)
Timeout 2h process timeout hit at 04:28:15Z (started 02:28:15Z)
Stuck test TestCreateCluster/Main/EnsureMetricsForwarderWorking
Last log from stuck test util_metrics_proxy.go:117: kube-apiserver target via metrics-forwarder not found in Prometheus active targets (will retry)
Tests passed 14/15 non-skipped tests completed successfully
Tests skipped 9 tests (Azure, Cilium, OLM, ExternalOIDC, etc.)
e2e-aws variant (same PR) PASSED — confirms PR changes are not the cause
e2e-aws-upgrade-hypershift-operator (same PR) PASSED
PR file changes 76 files — all logging refactors, no test/metrics code changed
Longest passing test TestCreateClusterProxy at 4079s (68 min)
Failed step e2e-aws-4-22-hypershift-aws-run-e2e-nested (test phase)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/none PR/issue for None (NonePlatform) platform - user-supplied infrastructure area/platform/openstack PR/issue for OpenStack (OpenStackPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants