Skip to content

feat: support to deploy oblogservice and oceanbase.ai#933

Open
Junkrat77 wants to merge 21 commits into
oceanbase:masterfrom
Junkrat77:feature/shared-storage-support
Open

feat: support to deploy oblogservice and oceanbase.ai#933
Junkrat77 wants to merge 21 commits into
oceanbase:masterfrom
Junkrat77:feature/shared-storage-support

Conversation

@Junkrat77

Copy link
Copy Markdown
Collaborator

Summary

Solution Description

Junkrat77 added 21 commits June 11, 2026 11:43
This commit adds complete support for OceanBase shared-storage deployment
mode and LogService cluster management via a three-layer CRD architecture.

LogService (new):
- Three-layer CRD: OBLogServiceCluster → OBLogServiceZone → OBLogServiceNode
- Declarative topology with automatic cascade create/delete
- Auto bootstrap via ls_ctrl after all nodes ready
- Node failure recovery (Pod lost/failed → auto rebuild)
- Configurable resource requests, affinity, tolerations per zone
- Deletion protection via ignore-deletion annotation
- Immutability checks for objectStoreUrl and storage in ValidateUpdate

Shared Storage OBCluster (extended):
- New spec fields: deploymentMode, sharedStorageInfo, logServiceRef
- Bootstrap with LOGSERVICE_ACCESS_POINT + SHARED_STORAGE_INFO
- Observer starts with -m shared_storage, enable_logservice=True
- No redoLog PVC in shared storage mode
- Webhook validates SS-specific required fields and immutability

Security & robustness:
- Shell injection fix: user-supplied BucketURL/Region/Zone passed via env vars
- RBAC markers for Zone/Node CRDs
- DeepCopy pointer aliasing fix
- Nil pointer guards in UpdateStatus and CreatePod
- Finalizer safety: transition to Deleting instead of skipping cleanup
- AlreadyExists handling for idempotent zone/node creation
- Deterministic node deletion ordering (newest first)

Build & deployment:
- SS image build scripts (oblogservice + observer-ss Dockerfiles)
- Non-root USER in Dockerfile.oblogservice
- Pinned base image for observer-ss
- Example manifests with placeholder credentials and endpoints
Remove access_id/access_key from operator logs by:
- Not logging the full bootstrap SQL in BootstrapSharedStorage
- Not including ls_ctrl Job output in error messages or success logs
Rename image, namespace, and resource names from ss to ai across
build scripts and example manifests.
…en formatting.

Remove local-only RPM image build helpers from version control and revert
accidental column-alignment churn in unrelated resource manager task name files.
- Merge LogService DeepCopy methods into api/types/deepcopy.go with proper
  handling of Resource, Affinity, Tolerations, and StorageSpec fields
- Delete redundant api/types/zz_generated_deepcopy.go
- Add Affinity/Tolerations fields to OBLogServiceNodeSpec and propagate
  from Zone topology through to Pod spec
- Set container resource Limits equal to Requests (Guaranteed QoS)
- Regenerate CRD manifests via controller-gen
- Fix obcluster_webhook RedoLogStorage indent and obcluster_test pointer type
- Fix observer_task.go switch-case indentation (gofmt)
Propagate user-defined parameters through the Cluster→Zone→Node
hierarchy and inject them into the oblogservice startup command
via the -g flag.
…OBCluster

- Add required resource field (cpu/memory) to OBLogServiceCluster and
  OBLogServiceZone specs with webhook validation and propagation
- Remove hardcoded resource defaults in Pod creation, use spec values
- Auto-calculate log_disk_size from storeStorage PVC size (95%)
- Auto-calculate memory_limit from resource.memory (90%) via webhook
  Defaulter, matching OBCluster's Default() pattern
- Add mutating webhook for OBLogServiceCluster to fill memory_limit
- Add LogServiceReservedParameters to filter log_disk_size/percentage
- Extract LogService mount paths and volume names to constants
- Validate memory_limit does not exceed resource.memory in webhook
Single-quote user-supplied parameter values in the oblogservice -g flag,
aligned with OBCluster's observer startup pattern. Restructure command
construction to use a parameter slice joined by commas for clarity.
Add CNI detection and static IP support for LogService nodes. Nodes on
Calico/KubeOvn recover in-place with pinned IP; others are marked
unrecoverable and replaced by the zone manager.
Cluster Controller now detects replica changes in spec topology and
propagates them to child Zone specs, matching the OBCluster pattern.
Zone now registers new nodes via ls_ctrl add ln after pod is running,
and unregisters nodes via ls_ctrl delete ln before deletion. This makes
LogService replica scaling actually effective at the cluster level.
…n tracking

Use annotation oceanbase.oceanbase.com/logservice-node-registered to track
which nodes have been registered via ls_ctrl add ln. Skip already-registered
nodes to avoid non-idempotent add ln failures. Use client.Patch (MergeFrom)
for annotation updates to avoid resourceVersion conflicts. Mark bootstrap
nodes as registered after successful cluster bootstrap.
LogService does not support dynamic zone topology changes. Add webhook
ValidateUpdate check to reject zone additions and removals, and remove
the now-unreachable AddZone/DeleteZone reconciliation code paths.
Fix revive and staticcheck findings reported by CI on new code:
- remove redundant v1alpha1 import aliases
- flatten promoted embedded field selectors (ObjectMeta, VolumeSource, OBClusterExtra)
- add package doc comments for new packages
- add default cases to switches, folding empty Pause branch into default
- merge identical CNI switch branches
- lowercase error string per ST1005
Region has no practical meaning for LogService at this stage. Remove the
region field from LogServiceZoneTopology and OBLogServiceNodeSpec so
users no longer need to fill it in. Bootstrap now uses a hardcoded
default "region1" via the LogServiceDefaultRegion constant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant