Skip to content

feat: datumctl compute plugin — deploy and manage workloads from the CLI#113

Draft
scotwells wants to merge 24 commits into
feat/federated-deployment-schedulingfrom
feat/datumctl-compute-plugin
Draft

feat: datumctl compute plugin — deploy and manage workloads from the CLI#113
scotwells wants to merge 24 commits into
feat/federated-deployment-schedulingfrom
feat/datumctl-compute-plugin

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

Summary

Adds the datumctl compute plugin so developers can deploy and manage containerized workloads on Datum Cloud directly from the CLI.

Commands shipped:

  • deploy — push a container image as a workload with flags or a manifest file; waits for rollout
  • destroy — tear down a workload with a confirmation prompt
  • status — show workload health, per-city placement summary, and the active revision
  • instances — list all running instances across cities, with describe for full detail
  • scale — adjust minimum replica count across all placements
  • rollout — watch live rollout progress, browse revision history, and roll back to any prior revision
  • restart — trigger a rolling restart of a workload or a specific city
  • quota — inspect per-city instance usage and surface quota-exceeded messages

Revision history is stored as a ConfigMap per workload so rollout history and rollout undo work without server-side tracking.

Dependencies

What's not included

  • logs — telemetry service not yet implemented
  • Tests — next step is adding envtest-based integration tests for each command
  • cities / instance-types resource listing commands

Related

Closes #98. Design proposal in #111.

scotwells added a commit that referenced this pull request May 29, 2026
…cheduling base

After rebasing onto feat/federated-deployment-scheduling, go.mod had picked up
the wrong versions of two deps via conflict resolution:

- go.datum.net/network-services-operator was left at v0.1.0 (from #113's old
  go.mod side) instead of v0.21.10-... required by HEAD's LocationBinding usage
- go.miloapis.com/service-catalog v0.0.0-20260527221104 transitively requires
  milo v0.26.1, which has a broken downstreamclient (Apply method missing,
  ClusterName type mismatch). Add a replace directive to pin milo to v0.25.2
  (the version used by the federated-scheduling base) so downstreamclient
  compiles cleanly. service-catalog is updated to the latest available version.

Also apply gofmt alignment fixes surfaced by the rebase on instance_controller.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scotwells scotwells changed the base branch from main to feat/federated-deployment-scheduling May 29, 2026 03:30
@scotwells scotwells force-pushed the feat/datumctl-compute-plugin branch from a63c87a to c1186cb Compare May 29, 2026 03:30
scotwells and others added 23 commits May 28, 2026 22:33
Adds the datumctl-compute plugin binary with commands for deploying and
managing containerized workloads on Datum Cloud via the developer CLI.

Commands:
- deploy     — create or update a workload from flags or a manifest file
- destroy    — delete a workload and clean up its revision history
- status     — show health, placement summary, and recent revision info
- instances  — list and describe running instances across cities
- scale      — adjust minimum replica count across placements
- rollout    — watch live progress, view history, and roll back revisions
- restart    — trigger a rolling restart of a workload or specific city
- quota      — inspect per-city instance usage and quota headroom

Closes #98. Depends on datum-cloud/datumctl#198.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Within a project's virtual control plane, all resources live in the
"default" namespace — the project slug is only used to route to the
right control plane URL. Updated all commands to use
util.ResourceNamespace ("default") instead of the project name as the
k8s namespace.

Also corrects the instance type default from "d1-standard-2" to
"datumcloud/d1-standard-2" to match the format the admission webhook
requires.

Discovered while testing against the staging environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The datumctl module requirement was upgrading controller-runtime to
v0.23.3, which broke compatibility with multicluster-runtime and milo.
Eliminated the dependency by:

- Inlining the --plugin-manifest protocol in main.go
- Reading DATUM_API_HOST and DATUM_CREDENTIALS_HELPER from env directly
  in util/client.go instead of via plugin.Context()/plugin.Token()
- Reading DATUM_ORG from env in root.go instead of via plugin.NewRootCmd
- Dropping the now-unreachable internal/cmd/compute/client.go

Also updates CI workflows to use go-version-file instead of a pinned
go 1.24.0, and bumps golangci-lint to v2.12.2 which supports go 1.25.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Upgrades controller-runtime from v0.21.0 to v0.23.3 and multicluster-runtime
from v0.21.0-alpha.8 to v0.23.3, which unblocks adding go.datum.net/datumctl
as a direct dependency.

The CLI plugin (datumctl-compute) now uses the official datumctl plugin SDK:
- plugin.ServeManifest() for the --plugin-manifest protocol
- plugin.NewRootCmd() for pre-wired org/project/output flags
- plugin.Context() and plugin.Token() for credential access

Controller breaking changes addressed: ClusterName distinct type, Watches
callback signature, NewWebhookManagedBy generic API. A local milo provider
fork is added at internal/provider/milo since the upstream package hasn't
been updated for the ClusterName type change.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses 63 lint findings across errcheck, goconst, gocyclo, gofmt,
prealloc, staticcheck, and unparam linters:

- gofmt/goimports: reformat cmd/main.go, deploy.go, util/client.go, webhook
- errcheck: assign discarded fmt.Fprint* and Flush returns to _
- staticcheck: update webhook to generic admission.Defaulter[T]/Validator[T]
  with WithDefaulter/WithValidator; fix SA4010 unused append in quota.go;
  remove redundant .ObjectMeta selectors in restart.go
- unparam: rename four never-used function parameters to _
- gocyclo: extract helpers from watch.Rollout and quota.runQuota to reduce
  cyclomatic complexity below threshold
- goconst: extract repeated string literals to named constants across
  controllers, validation, and tests
- prealloc: preallocate slices with known capacity in validation and tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- errcheck: fix unchecked fmt.Fprint* returns in deploy, quota, rollout, scale
- prealloc: preallocate allErrs in workload_validation.go and stateful test
- gofmt: reformat destroy.go, instances.go, rollout.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- golangci.yml: exclude errcheck for internal/cmd/* — ignoring write
  errors on stdout/stderr is idiomatic in CLI tools
- prealloc: preallocate allErrs in validateScaleSettingMetrics
- gofmt: reformat status.go, instance_controller_test.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire ValidArgsFunction on every command that accepts a workload name
(deploy, destroy, restart, rollout, rollout history, rollout undo,
scale, status) and register flag completion for instances --workload.

All completions call a shared CompleteWorkloadNames helper in
internal/cmd/compute/util that fetches live workload names from the
API and always returns ShellCompDirectiveNoFileComp so the shell
never falls back to filename completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove ValidArgsFunction from deploy and replace with
  util.CompleteWorkloadNamesAndFlags, which wraps CompleteWorkloadNames
  with plugin.WithFlagCompletion from the datumctl SDK.
- Add plugin.WithFlagCompletion to the datumctl plugin SDK so any plugin
  can get the same behaviour by wrapping their own ValidArgsFunction.
- Bump go.datum.net/datumctl to b44de1c (adds WithFlagCompletion).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the hardcoded datum-control-plane ClusterIssuer from the
csi-webhook-cert component. DNS names stay since they are fixed by the
service name and namespace. Each consuming overlay now supplies the issuer
via a strategic merge patch, allowing different environments to use
different cert issuers without forking the component.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cert issuer name is environment-specific configuration that belongs
in the infra repo, not the compute overlay. The infra repo's base manager
patch already owns the full webhook-server-tls volume definition including
the issuer. Consumers deploying outside infra must patch the issuer in their
own overlay.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a printer.go with PrintJSON and PrintYAML helpers that commands can
use to emit API resources as structured output. Extend completion.go with
CompleteInstanceNames, CompleteCityCodes, and CompleteOutputFormats so all
-o/--output, --city, and instance-name completions are driven from a
single shared source.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both commands now accept -o/--output with tab-completion. json/yaml emit
the underlying API resource (InstanceList) or structured quota rows
respectively. wide adds an INSTANCE TYPE column for instances. --no-headers
suppresses the header row for table and wide. City completion is wired to
CompleteCityCodes and instance describe gains tab-completion via
CompleteInstanceNames.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add datumctl compute workloads (list) and workloads describe <name>
commands. The list command shows NAME/HEALTH/READY/PLACEMENTS/IMAGE/AGE
columns with --health and --city filters, -o table|wide|json|yaml, and a
footer summary. The describe command replaces status with a unified
config+health view: header block, per-placement per-city ready counts with
inline degradation annotations, and a container spec block. Remove the
now-redundant status command from root.go and delete its package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix duplicate TYPE/INSTANCE TYPE columns in instances -o wide (W3):
  populate TYPE from runtimeKind (sandbox/vm), INSTANCE TYPE from instType
- Fix footer bucketing in instances list (W4): compute Running/Pending/Failed
  from actual status strings instead of hardcoding Failed=0
- Skip revision ConfigMap Gets in workloads list table mode (W5): only
  fetch per-workload revision when -o wide is requested, avoiding N
  round-trips on every list invocation
- Compute health footer tallies after filters are applied (W9): previously
  counted all workloads then printed a filtered subset, making the summary
  misleading when --health or --city filters were active
- Fix gofmt import ordering in workloads.go (B1)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Before creating a workload, the deploy command now checks whether the
required network(s) exist. If a network is missing, the user is offered
the option to create a minimal auto-IPAM network in-place rather than
hitting an opaque NetworkNotFound error post-submission.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… API

- Add EnsureComputeEntitlement to gate all compute commands on an active
  service entitlement; prompts TTY users to request access and surfaces
  approval status
- Rewrite quota command to query AllowanceBucket resources from the
  project VCP (milo-system namespace) instead of deriving usage from
  instance quota conditions
- Add NewPlatformClient targeting the platform API server for
  ResourceRegistration lookups
- Extract ListServiceQuota into util so other service plugins can reuse
  the quota display logic with their own resource type prefix and
  display metadata overrides

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace hand-rolled HTTP entitlement code with a proper client-go
implementation using go.miloapis.com/service-catalog types. Uses
client.WithWatch to stream events from the API server and unblocks
as soon as the Ready condition appears — no polling interval.

Also adds ASCII progress bar to quota table output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The compute CLI client now serializes network-services-operator types
(Network, NetworkBinding, SubnetClaim), so deploy can preflight and
create networks on the user's behalf.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deployment revisions are becoming a platform concept rather than a
client concern. Remove the ConfigMap-backed revision ledger the CLI
maintained per workload, along with the 'rollout history' and 'rollout
undo' subcommands and the revision column in 'workloads'. 'rollout'
remains as a live-progress watch.

This also removes the only code path that serialized core/v1 ConfigMaps
from the CLI, so the missing-corev1-scheme warning on deploy no longer
occurs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cheduling base

After rebasing onto feat/federated-deployment-scheduling, go.mod had picked up
the wrong versions of two deps via conflict resolution:

- go.datum.net/network-services-operator was left at v0.1.0 (from #113's old
  go.mod side) instead of v0.21.10-... required by HEAD's LocationBinding usage
- go.miloapis.com/service-catalog v0.0.0-20260527221104 transitively requires
  milo v0.26.1, which has a broken downstreamclient (Apply method missing,
  ClusterName type mismatch). Add a replace directive to pin milo to v0.25.2
  (the version used by the federated-scheduling base) so downstreamclient
  compiles cleanly. service-catalog is updated to the latest available version.

Also apply gofmt alignment fixes surfaced by the rebase on instance_controller.go.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… resolution

The first conflict resolution in the aa9dc15 commit accidentally truncated
workload_webhook.go, dropping the ValidateCreate method, its kubebuilder
marker, and producing a syntactically invalid Default function body
(extra brace + wrong return signature). Restore the file to match
5486adf's content (the authoritative post-lint-migration version).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The platform now stamps city-code, workload-name, workload-deployment-name,
and placement-name directly onto Instances at creation time. The CLI can
therefore resolve CITY/WORKLOAD/placement directly from those labels without
performing cross-object joins.

The prior approach keyed the WorkloadDeployment map on UID and looked up
instances via WorkloadDeploymentUIDLabel. That UID is the edge/Karmada WD UID,
which differs from the project-cluster WD UID, causing the join to fail across
federation planes and producing "unknown"/"orphaned" output.

The new label-first path reads CityCodeLabel, WorkloadNameLabel,
PlacementNameLabel, and WorkloadDeploymentNameLabel (name is identical across
all planes) before falling back to the WD Get/List join. A wdNameFromInstanceName
helper strips the trailing ordinal suffix from the Instance name as a last-resort
fallback for instances created before the labels existed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define the UX, DX, and AX for deploying and managing compute workloads

1 participant