This guide describes how to automatically migrate an existing CockroachDB cluster from the Public operator (v1alpha1 CrdbCluster) to the CockroachDB operator (v1beta1 CrdbCluster with CrdbNodes).
The CockroachDB operator is in Preview.
The CockroachDB operator's migration controller automatically handles the transition, including certificate migration, node-by-node replacement, and RBAC setup.
This migration can be completed without affecting cluster availability. The process preserves existing disks so that data does not need to be replicated into empty volumes. The controller migrates nodes one at a time, so the maximum cluster capacity will be reduced by one node periodically throughout the migration.
Compatibility
Before starting migration, verify your cluster configuration is supported.
| Feature | Supported | Notes |
|---|---|---|
| Self-signer certs (operator built-in) | Yes | Regenerated with join service DNS SANs; originals preserved |
| cert-manager certs | Yes | Certificate CR updated with join service DNS SANs; issuer references preserved |
| External certs (user-provided secrets) | Yes | Secret references preserved as ExternalCertificates |
Custom NodeTLSSecret / ClientTLSSecret |
Yes | CrdbNode pods mount user's secrets directly; cert regeneration skipped (see prerequisites) |
| Insecure clusters (TLS disabled) | Yes | Detected from --insecure flag; cert migration phase skipped entirely |
| WAL failover (dedicated PVC) | No | WAL failover detection only runs for Helm migrations. Not supported for operator path |
Dedicated logs PVC (logsdir / logs-dir) |
No | LogsStore detection only runs for Helm migrations. Not supported for operator path |
| PCR (virtualized/standby) | No | PCR detection only runs for Helm migrations (init job pattern). Not supported for operator path |
| Custom service account | Yes | Preserved with create: false |
Custom start flags / additionalArgs |
Partial | Known operator-managed flags excluded; additionalArgs converted to startFlags.upsert |
cache / max-sql-memory |
Yes | Converted to start flags |
| Multi-region | Yes | Requires regionCode and cloudProvider annotations before starting |
| Ingress | Partial | v1alpha1 ingress config preserved as annotation on v1beta1. Ingress resources not modified. Manual adoption required |
| ServiceMonitor / PodMonitor | No | Not handled by migration. Must be recreated manually to match new pod labels |
| NetworkPolicy | No | Must be recreated manually if present |
| Log ConfigMap key format | Yes | logging.yaml key automatically renamed to logs.yaml (reversed on rollback) |
Before you begin
Before starting the migration process, verify the following:
- Your CockroachDB cluster is managed by the Public operator (v1alpha1 CrdbCluster).
- All StatefulSet pods are Running and Ready with no pending rolling updates or scale operations.
kubectlandhelmare installed and configured with access to the target cluster.- For multi-region clusters, cloud region and provider labels are applied to Kubernetes nodes.
- You have the
regionCodeandcloudProvidervalues for your cluster. - You have reviewed the Compatibility table above and confirmed your configuration is supported.
- The Public operator is accessible and running (required for rollback capability).
If you plan to create new v1beta1 clusters while the Public operator is running, you must patch the Public operator's webhooks to use matchPolicy: Exact. Refer to the Coexistence section in Step 5.
If you use custom NodeTLSSecret or ClientTLSSecret, your certificates must include join service DNS SANs ({cluster-name}-join, {cluster-name}-join.{namespace}, {cluster-name}-join.{namespace}.svc.cluster.local). The migration controller cannot regenerate certificates when custom secrets are provided because the CA private key is not available.
Overview
What the Migration Controller Does Automatically
- Detects cert-manager, self-signer, or custom TLS secrets. Self-signer and cert-manager
certs are regenerated with join service DNS SANs automatically. When custom
NodeTLSSecret/ClientTLSSecretare set, skips cert regeneration and mounts the user's secrets directly. - Creates RBAC resources with namespace-qualified names for cluster-scoped resources.
- Migrates ConfigMap key format (
logging.yamltologs.yaml). - Converts v1alpha1 CrdbCluster spec to v1beta1 via conversion webhook.
- Deletes old PDB and creates a new one during finalization (no protection gap because the new PDB is created immediately after old PDB deletion).
- Updates service selectors and restores them on rollback.
- Migrates nodes one at a time, verifying cluster health between each node.
- Preserves Ingress configuration from v1alpha1 CR as annotation on v1beta1 CrdbCluster.
- Removes
crdb.io/skip-reconcilelabel as the last step of rollback so the public operator resumes control cleanly.
What Requires Manual Action
crdb.io/skip-reconcile=truemust be applied to the v1alpha1 CrdbCluster before starting migration.- Cloud region and provider annotations must be applied to the CrdbCluster before starting migration.
- Cloud region and provider node labels must be applied to K8s nodes for multi-region clusters.
- The StatefulSet must be manually deleted after finalization to complete migration.
- Old public operator resources must be cleaned up after migration.
- ServiceMonitor / PodMonitor resources must be recreated with updated pod labels.
Migration Phases
Init -> CertMigration -> PodMigration -> Finalization -> (user deletes STS) -> Complete
| Phase | Duration (approx.) | What happens |
|---|---|---|
| Init | Seconds | Validates prerequisites (including skip-reconcile), creates v1beta1 CrdbCluster via conversion webhook, records original replica count |
| CertMigration | Seconds to minutes | Detects cert type, regenerates certs with join service SANs, creates CA ConfigMap, renames ConfigMap key (logging.yaml → logs.yaml), labels existing pods |
| PodMigration | ~5-15 min per node | Creates CrdbNode, waits for health, scales down STS by one. Repeats for each node (highest index first) |
| Finalization | Seconds | Sets cluster spec (regions, TLS, resources), deletes old PDB, creates new PDB, sets Mode=MutableOnly |
| Complete | Seconds (after user deletes STS) | Records completion, updates migration label to complete |
Step 1. Export environment variables
# CRDBCLUSTER is the name of your v1alpha1 CrdbCluster CR.
export CRDBCLUSTER=cockroachdb
# NAMESPACE is the namespace where the CrdbCluster is installed.
export NAMESPACE=default
# CLOUD_PROVIDER is the cloud vendor where the K8s cluster is running.
# Supported: gcp, aws, azure
export CLOUD_PROVIDER=gcp
# REGION corresponds to the cloud provider's identifier for this region.
# It must match the "topology.kubernetes.io/region" label on K8s nodes.
export REGION=us-central1
Step 2. Apply cloud region and provider labels (multi-region only)
CockroachDB uses K8s node labels for locality-based pod placement. These labels must exist before the operator starts scheduling pods.
# GCP
kubectl label node <node-name> \
topology.kubernetes.io/region=us-central1 \
topology.kubernetes.io/zone=us-central1-a \
cockroach.io/cloud=gcp
# AWS
kubectl label node <node-name> \
topology.kubernetes.io/region=us-east-1 \
topology.kubernetes.io/zone=us-east-1a \
cockroach.io/cloud=aws
# Azure
kubectl label node <node-name> \
topology.kubernetes.io/region=eastus \
topology.kubernetes.io/zone=eastus-1 \
cockroach.io/cloud=azure
Step 3. Pause the Public operator
The public operator must stop reconciling the cluster before installing the CockroachDB Operator. Without this, both controllers will fight over RBAC, PDB, and service selectors.
kubectl label crdbcluster $CRDBCLUSTER crdb.io/skip-reconcile="true" -n $NAMESPACE
Wait until the public operator finishes any in-progress reconciliation before proceeding. Verify the StatefulSet is not currently rolling:
kubectl rollout status sts $CRDBCLUSTER -n $NAMESPACE
# Expected: statefulset rolling update complete
# No new events should appear after applying the label
kubectl describe crdbcluster $CRDBCLUSTER -n $NAMESPACE | grep -A5 Events
If the skip-reconcile label is not applied, migration will fail with:
missing required label: crdb.io/skip-reconcile=true
Important: The
crdb.io/skip-reconcile=truelabel must remain on the v1alpha1 CrdbCluster for as long as the public operator is running. The public operator reconciles any CrdbCluster without this label and will recreate the StatefulSet and associated resources if it regains control. Do not remove this label until after the public operator has been fully uninstalled (Step 12). Removing it while the public operator is still running will cause it to recreate the StatefulSet, potentially conflicting with the migrated CrdbNodes.
Step 4. Annotate the CrdbCluster with region and cloud provider
Annotate the CR with region and cloud provider so they are preserved during the v1alpha1 to v1beta1 conversion:
kubectl annotate crdbcluster $CRDBCLUSTER \
crdb.cockroachlabs.com/cloudProvider=$CLOUD_PROVIDER \
crdb.cockroachlabs.com/regionCode=$REGION \
--overwrite -n $NAMESPACE
The fallback order for region resolution is: (1) regions annotation (for multi-region),
(2) regionCode and cloudProvider annotations, (3) operator environment configuration,
(4) default us-east1. For single-region clusters, the default may be acceptable.
Step 5. Install the CockroachDB operator with migration enabled
Install the operator with migration enabled. The migration controller only acts on
resources that have the crdb.io/migrate label, and the cluster controller skips clusters
with Mode=Disabled (which the conversion webhook sets for all v1alpha1 clusters). The
CockroachDB Operator will not interfere with clusters that have not been explicitly marked
for migration.
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator \
--set migration.enabled=true
Setting watchNamespaces is optional. It restricts the CockroachDB Operator to a subset of
namespaces, which can be useful for reducing blast radius in large environments.
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator \
--set migration.enabled=true \
--set watchNamespaces=$NAMESPACE
migration.enabled=trueon the operator chart enables the migration controller and registers the conversion webhook (translates between v1alpha1 and v1beta1). This is different frommigration.enabled=trueon the cockroachdb chart (used later in Step 12), which relaxes pre-upgrade validation so the chart can adopt an existing migrated CrdbCluster. Both must be set at their respective steps.
Verify the operator is running and the conversion webhook is active:
kubectl get pods -l app=cockroachdb-operator -n $NAMESPACE
kubectl get crd crdbclusters.crdb.cockroachlabs.com
# Verify both API versions are accessible (conversion webhook is active)
kubectl get crdbclusters.v1alpha1.crdb.cockroachlabs.com $CRDBCLUSTER -n $NAMESPACE
kubectl get crdbclusters.v1beta1.crdb.cockroachlabs.com $CRDBCLUSTER -n $NAMESPACE
Both commands should return the same CrdbCluster resource, confirming the conversion webhook is translating between API versions.
The operator's ClusterRole and ClusterRoleBinding must use unique names to avoid overwriting the public operator's RBAC. The Helm chart handles this automatically.
If the CockroachDB Operator is deployed in the same namespace as the public operator, ensure
the appLabel differs between the two so their Deployment selectors and Services do not
conflict. The CockroachDB Operator chart defaults to appLabel=cockroachdb-operator, which is
distinct from the public operator's app=cockroach-operator. Do not change this default
when coexisting with the public operator. If you previously customized appLabel, verify
it does not collide with the public operator's pod selector.
When migrating clusters across multiple namespaces, migrate one namespace at a time. Update
watchNamespaces (or use a comma-separated list) to include additional namespaces only after
the previous migration is complete.
Coexistence with the Public Operator
The CockroachDB Operator and the public operator can run side by side during migration.
Installing the CockroachDB Operator with migration.enabled=true replaces the CRD with a
dual-version CRD (v1alpha1 + v1beta1) and registers a conversion webhook. Existing clusters
continue to work normally.
What works
| Scenario | Behavior |
|---|---|
| Existing v1alpha1 clusters (not being migrated) | Continue to be reconciled by the public operator. The CockroachDB Operator ignores them (Mode=Disabled). |
| New v1alpha1 clusters created with both operators present | Reconciled by the public operator. The CockroachDB Operator's v1beta1 webhooks do not intercept v1alpha1 requests. |
| New v1beta1 clusters created with both operators present | The CockroachDB Operator's mutating webhook automatically injects crdb.io/skip-reconcile=true. The public operator sees this label and skips reconciliation. |
| Updating existing v1alpha1 clusters | Works normally. Only the public operator's webhooks validate the request. |
What does not work
| Scenario | Reason |
|---|---|
| Creating v1beta1 clusters without patching public operator webhooks | The public operator's webhooks use matchPolicy: Equivalent (the Kubernetes default), so they intercept v1beta1 requests after converting them to v1alpha1. The converted object fails the public operator's validation. |
| Both operators reconciling the same cluster | Only one operator should own a cluster at a time. Use crdb.io/skip-reconcile=true to hand off. |
| Running both operators with overlapping namespace scopes | Untested. Use watchNamespaces to separate them if needed. |
How webhook isolation works
Both operators' webhooks must use matchPolicy: Exact to prevent cross-version interception.
The CockroachDB Operator's webhooks ship with this setting. The public operator's webhooks
use the Kubernetes default (Equivalent), which causes them to intercept v1beta1 requests
after Kubernetes converts them to v1alpha1. To create v1beta1 clusters while the public
operator is present, update the public operator's webhooks to use matchPolicy: Exact.
Option 1: Edit the public operator manifest before installing. Download the manifest,
add matchPolicy: Exact to both webhook entries, then apply:
curl -sL https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.18.3/install/operator.yaml \
| sed '/rules:/i\ matchPolicy: Exact' \
| kubectl apply -f -
Option 2: Patch the live webhook configs. This does not persist across public operator redeploys:
kubectl patch validatingwebhookconfiguration cockroach-operator-validating-webhook-configuration \
--type=json -p='[{"op":"add","path":"/webhooks/0/matchPolicy","value":"Exact"}]'
kubectl patch mutatingwebhookconfiguration cockroach-operator-mutating-webhook-configuration \
--type=json -p='[{"op":"add","path":"/webhooks/0/matchPolicy","value":"Exact"}]'
With both sides using matchPolicy: Exact, a v1alpha1 request triggers only the public
operator's webhooks, and a v1beta1 request triggers only the CockroachDB Operator's webhooks.
This patching is only needed if you plan to create new v1beta1 clusters while the public operator is still running. Migration of existing v1alpha1 clusters does not require it because the migration controller works with the v1alpha1 API directly.
How conversion works
v1alpha1 clusters without skip-reconcile are converted to v1beta1 with Mode=Disabled
for storage. The CockroachDB Operator ignores clusters in this mode.
v1beta1 cluster protection: The CockroachDB Operator's mutating webhook injects
crdb.io/skip-reconcile=true on every v1beta1 CrdbCluster. This label is stored on the
object and carried through to the v1alpha1 view via ObjectMeta, so the public operator
sees it and skips reconciliation.
In summary, you can: - Keep existing v1alpha1 clusters running with the public operator - Create new v1alpha1 clusters while the CockroachDB Operator is installed - Create new v1beta1 clusters while the public operator is still running - Migrate clusters one by one at your own pace
Step 6. Start the migration
Initiate the migration by labeling the v1alpha1 CrdbCluster.
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=start -n $NAMESPACE
Step 7. Monitor the migration
DB Console
The DB Console is accessible via the public service throughout migration.
# Port-forward to access DB Console
kubectl port-forward svc/${CRDBCLUSTER}-public 8080 -n $NAMESPACE
# Then open https://localhost:8080 in your browser
Key things to monitor in the DB Console:
- Node List: Watch for nodes joining and leaving as migration progresses. Total node count stays constant (new CrdbNode created before old STS pod is removed).
- Replication Status: Under-replicated and unavailable ranges should stay at zero. The migration controller waits for under-replicated ranges to clear before proceeding to the next node.
- Ranges Dashboard: Under-replicated ranges counter should not increase during migration.
CockroachDB Health Checks
The controller performs these checks automatically between each node migration, but you can verify manually:
# Check under-replicated ranges (should be 0)
# During migration, STS pods use container name "db"; after migration, CrdbNode pods use "cockroachdb"
kubectl exec $CRDBCLUSTER-0 -n $NAMESPACE -c db -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
# Check all nodes are live (count should match total nodes)
kubectl exec $CRDBCLUSTER-0 -n $NAMESPACE -c db -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT count(DISTINCT node_id) FROM crdb_internal.kv_store_status;"
# For insecure clusters, replace --certs-dir with --insecure
Watch migration status
The migration status is recorded on the v1beta1 CrdbCluster (created via conversion webhook).
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.status.migration.phase} {.status.migration.message}' -w
Watch events
kubectl get events -n $NAMESPACE \
--field-selector involvedObject.name=$CRDBCLUSTER \
--sort-by='.lastTimestamp'
Watch CrdbNode creation during PodMigration
kubectl get crdbnode -n $NAMESPACE \
-l crdb.cockroachlabs.com/cluster=$CRDBCLUSTER -w
Phase-Specific Checks
| Phase | What to watch |
|---|---|
| Init | kubectl describe crdbcluster $CRDBCLUSTER for MigrationStarted event |
| CertMigration | For self-signer: kubectl get secret $CRDBCLUSTER-node-secret -n $NAMESPACE. For cert-manager: check the secret name from the Certificate CR (kubectl get certificate $CRDBCLUSTER-node -n $NAMESPACE -o jsonpath='{.spec.secretName}'). For custom NodeTLSSecret: your existing secret is used directly. TLS clusters only |
| PodMigration | kubectl get crdbnode -n $NAMESPACE -w for one node per pod |
| Finalization | kubectl get crdbcluster $CRDBCLUSTER -o jsonpath='{.status.migration.message}' for "Finalization complete" |
| Complete | After StatefulSet deletion; spec.mode: MutableOnly and crdb.io/migrate label changes to complete |
Auto-pause behavior
During PodMigration, if a newly created CrdbNode does not pass health checks within
10 minutes, the controller automatically pauses migration by setting the phase to
PhaseStopped. Health checks run every 10 seconds and verify:
- CrdbNode has
PodReady=Truecondition - Pod is in
Runningphase withReady=Truecondition - SQL health check passes (zero under-replicated ranges and correct live node count)
- Pod is registered in the headless service endpoints
When auto-paused, investigate the stuck node and resume with kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=start --overwrite.
Prometheus Metrics
crdb_operator_migration_phase{cluster, namespace, phase}
crdb_operator_migration_pods_total{cluster, namespace}
crdb_operator_migration_pods_migrated{cluster, namespace}
crdb_operator_migration_duration_seconds{cluster, namespace, phase}
crdb_operator_migration_errors_total{cluster, namespace, phase, error_type}
crdb_operator_migration_rollbacks_total{cluster, namespace, reason}
PDB Behavior During Migration
- Init through PodMigration: The original public operator PDB remains active. Pod disruption protection is maintained throughout.
- Finalization: The controller deletes the old PDB (
{name}or{name}-budget) and immediately creates the new PDB ({name}-pdb). There is no gap in protection. - Rollback: All PDB variants are deleted (
{name},{name}-budget,{name}-pdb). The original PDB is recreated by the public operator onceskip-reconcileis removed.
Step 8. Delete the StatefulSet to complete migration
After Finalization completes, the controller stops processing and waits for the user to
delete the StatefulSet. The StatefulSet is intentionally left intact. Once deleted, the
controller sets spec.mode=MutableOnly and records Phase=Complete.
# Verify Finalization is done (Mode is still Disabled at this point)
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.status.migration.message}'
# Expected: Finalization complete. Delete the StatefulSet to mark migration complete.
# Delete the StatefulSet to trigger Phase=Complete
kubectl delete sts $CRDBCLUSTER -n $NAMESPACE
# Verify completion (may take a few seconds)
# Mode switches to MutableOnly and label changes to "complete" after STS deletion.
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.spec.mode} {.status.migration.phase}'
# Expected: MutableOnly Complete
All STS pods are already at replicas=0 by the time Finalization runs. Deleting the StatefulSet object does not evict any running pods.
Step 9. Verify cluster health post-migration
# All CrdbNodes should be healthy
kubectl get crdbnode -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$CRDBCLUSTER
# All pods running
kubectl get pods -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$CRDBCLUSTER
# Verify data integrity from any CrdbNode pod
# Note: post-migration, the container name is "cockroachdb" (not "db" as in the original STS)
kubectl exec $CRDBCLUSTER-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT count(*) FROM [SHOW RANGES];"
# Verify under-replicated ranges are zero
kubectl exec $CRDBCLUSTER-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
Verify preserved configurations
# WAL failover (if configured)
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.walFailoverSpec}'
# Logs PVC (if configured)
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.logsStore}'
# ConfigMap key format (should be logs.yaml, not logging.yaml)
kubectl get configmap <log-config-name> -n $NAMESPACE -o yaml | grep -E "logging.yaml|logs.yaml"
# TLS certificates (secret names depend on cert type)
# Self-signer: {name}-node-secret, {name}-client-secret (regenerated by migration)
kubectl get secret $CRDBCLUSTER-node-secret -n $NAMESPACE
kubectl get secret $CRDBCLUSTER-client-secret -n $NAMESPACE
# cert-manager: secrets are managed by cert-manager (names from Certificate CR)
kubectl get certificate $CRDBCLUSTER-node -n $NAMESPACE -o jsonpath='{.spec.secretName}'
# Custom NodeTLSSecret/ClientTLSSecret: your original secrets are used directly
# CA ConfigMap (created for all TLS cert types)
kubectl get configmap $CRDBCLUSTER-ca-crt -n $NAMESPACE
Verify v1alpha1 → v1beta1 field conversion
Key fields converted during migration:
| v1alpha1 field | v1beta1 field |
|---|---|
spec.image.name |
spec.template.spec.image |
spec.grpcPort / spec.sqlPort / spec.httpPort |
spec.template.spec.grpcPort / sqlPort / httpPort |
spec.cache |
spec.template.spec.startFlags.upsert (--cache=<value>) |
spec.maxSQLMemory |
spec.template.spec.startFlags.upsert (--max-sql-memory=<value>) |
spec.additionalArgs |
spec.template.spec.startFlags.upsert |
spec.nodeTLSSecret |
CrdbNode mounts this secret directly (cert regeneration skipped) |
spec.clientTLSSecret |
CrdbNode mounts this secret directly (cert regeneration skipped) |
spec.logConfigMap |
spec.template.spec.loggingConfigMapName (key renamed logging.yaml → logs.yaml) |
spec.terminationGracePeriodSecs |
Used directly in pod spec |
spec.resources |
spec.template.spec.podTemplate.spec.containers[0].resources |
Step 10. Configure LocalityMappings
The migration controller preserves the --locality flag tier keys (e.g. region, zone)
as localityLabels on the CrdbNodeSpec. localityLabels is deprecated in favor of
localityMappings, which maps K8s node labels to CockroachDB locality tiers. The default
mapping covers standard K8s topology labels (topology.kubernetes.io/region → region,
topology.kubernetes.io/zone → zone).
If your cluster uses custom K8s node labels for locality, update localityMappings to
match. Each entry maps a K8s node label key to a CockroachDB locality tier name.
kubectl patch crdbcluster $CRDBCLUSTER -n $NAMESPACE --type merge -p '
spec:
template:
spec:
localityMappings:
- nodeLabel: "my-custom-label/region"
localityLabel: "region"
- nodeLabel: "my-custom-label/zone"
localityLabel: "zone"
'
If you are using the standard topology.kubernetes.io/* labels, the kubebuilder defaults
are correct and no action is needed.
Step 11. Clean up Public operator resources
After verifying the migrated cluster is healthy, clean up old public operator resources.
Migration-created RBAC (stale)
# Migration-created ClusterRole/ClusterRoleBinding use "{namespace}-{name}" naming
kubectl delete clusterrole ${NAMESPACE}-${CRDBCLUSTER} --ignore-not-found
kubectl delete clusterrolebinding ${NAMESPACE}-${CRDBCLUSTER} --ignore-not-found
Public operator RBAC
# Public operator's ClusterRole and ClusterRoleBinding
kubectl delete clusterrole cockroachdb-operator --ignore-not-found
kubectl delete clusterrolebinding cockroachdb-operator --ignore-not-found
Public operator deployment and CRDs
Only do this after all clusters managed by the public operator have been migrated.
# Delete the public operator Deployment
kubectl delete deployment cockroach-operator-manager -n cockroach-operator-system --ignore-not-found
# Delete public operator CRDs (this removes the v1alpha1 API entirely)
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.18.3/install/crds.yaml --ignore-not-found
Public operator webhook configurations
# The public operator registers its own validation/mutation webhooks
kubectl delete validatingwebhookconfiguration cockroach-operator-validating-webhook-configuration --ignore-not-found
kubectl delete mutatingwebhookconfiguration cockroach-operator-mutating-webhook-configuration --ignore-not-found
Public operator ServiceAccount and RBAC in its namespace
kubectl delete serviceaccount cockroach-operator-sa -n cockroach-operator-system --ignore-not-found
kubectl delete role cockroach-operator-role -n cockroach-operator-system --ignore-not-found
kubectl delete rolebinding cockroach-operator-rolebinding -n cockroach-operator-system --ignore-not-found
Step 12. Adopt into the CockroachDB Helm chart (optional)
Before adopting, verify the operator has fully reconciled the migrated cluster. Do not
proceed until generation and observedGeneration match and all pods are running.
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.metadata.generation} {.status.observedGeneration}'
# Both values should match
kubectl get pods -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$CRDBCLUSTER
# All pods should be Running and Ready
If you want helm upgrade to manage the migrated CrdbCluster going forward, you need to
annotate existing resources for Helm ownership and provide a values.yaml that matches
the migrated spec.
Prepare values.yaml
Inspect the migrated CrdbCluster and map its spec to Helm values.
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE -o yaml
Option A: Generate values.yaml automatically
./bin/migration-helper export-values \
--crdb-cluster $CRDBCLUSTER \
--namespace $NAMESPACE \
--output-dir ./manifests
Review the generated file:
cat ./manifests/values.yaml
If the generated file contains cockroachdb.crdbCluster.rbac.serviceAccount.name, keep
that value as is. It preserves the service account used by the migrated cluster.
Option B: Build values.yaml manually
Key values to set:
| Helm value | Source from migrated CrdbCluster |
|---|---|
cockroachdb.crdbCluster.image.name |
spec.template.spec.image |
cockroachdb.crdbCluster.regions |
spec.regions (code, nodes, namespace, cloudProvider) |
cockroachdb.tls.enabled |
spec.tlsEnabled |
cockroachdb.crdbCluster.rbac.serviceAccount.name |
spec.template.spec.serviceAccountName (leave empty to use default fullname) |
cockroachdb.crdbCluster.resources |
spec.template.spec.podTemplate.spec.containers[0].resources |
cockroachdb.crdbCluster.rollingRestartDelay |
spec.rollingRestartDelay |
cockroachdb.crdbCluster.clusterSettings |
spec.clusterSettings |
The v1alpha1 CrdbCluster fields (cache, max-sql-memory, additionalArgs) were converted to
spec.template.spec.startFlags.upsert during migration. Map these to the appropriate Helm
values or pass them via cockroachdb.crdbCluster.startFlags.
Field Mapping Reference
| Source (v1alpha1) | Migrated v1beta1 shape | Helm adoption value |
|---|---|---|
metadata.name |
CrdbCluster.metadata.name |
k8s.fullnameOverride |
spec.image.name |
spec.template.spec.image |
cockroachdb.crdbCluster.image.name |
spec.dataStore |
spec.template.spec.dataStore |
cockroachdb.crdbCluster.dataStore |
spec.logConfigMap |
spec.template.spec.loggingConfigMapName (key renamed logging.yaml → logs.yaml) |
cockroachdb.crdbCluster.loggingConfigMapName |
spec.resources |
spec.template.spec.podTemplate.spec.containers[0].resources |
cockroachdb.crdbCluster.podTemplate.spec |
spec.cache |
spec.template.spec.startFlags.upsert (--cache=<value>) |
cockroachdb.crdbCluster.startFlags |
spec.maxSQLMemory |
spec.template.spec.startFlags.upsert (--max-sql-memory=<value>) |
cockroachdb.crdbCluster.startFlags |
spec.additionalArgs |
spec.template.spec.startFlags.upsert |
cockroachdb.crdbCluster.startFlags |
| Service account | spec.template.spec.podTemplate.spec.serviceAccountName |
cockroachdb.crdbCluster.rbac.serviceAccount.name with create=false |
| Pod annotations | spec.template.spec.podTemplate.metadata.annotations |
cockroachdb.crdbCluster.podTemplate.metadata.annotations |
| Priority class | spec.template.spec.podTemplate.spec.priorityClassName |
cockroachdb.crdbCluster.podTemplate.spec.priorityClassName |
| Ingress intent | Preserved as annotation on CrdbCluster | cockroachdb.crdbCluster.service.ingress |
Annotate resources for Helm ownership
export RELEASE_NAME="<your-helm-release-name>"
# Namespace-scoped resources
for RESOURCE in \
"crdbcluster/${CRDBCLUSTER}" \
"serviceaccount/${CRDBCLUSTER}-sa" \
"service/${CRDBCLUSTER}-public" \
"role/${CRDBCLUSTER}" \
"rolebinding/${CRDBCLUSTER}"; do
kubectl annotate "${RESOURCE}" \
meta.helm.sh/release-name="${RELEASE_NAME}" \
meta.helm.sh/release-namespace="${NAMESPACE}" \
-n "${NAMESPACE}" --overwrite
kubectl label "${RESOURCE}" \
app.kubernetes.io/managed-by=Helm \
-n "${NAMESPACE}" --overwrite
done
# Ingress resources (only if your cluster has Ingress resources)
for RESOURCE in \
"ingress/ui-${CRDBCLUSTER}" \
"ingress/sql-${CRDBCLUSTER}"; do
kubectl get "${RESOURCE}" -n "${NAMESPACE}" 2>/dev/null && \
kubectl annotate "${RESOURCE}" \
meta.helm.sh/release-name="${RELEASE_NAME}" \
meta.helm.sh/release-namespace="${NAMESPACE}" \
-n "${NAMESPACE}" --overwrite && \
kubectl label "${RESOURCE}" \
app.kubernetes.io/managed-by=Helm \
-n "${NAMESPACE}" --overwrite
done
Do not annotate the headless service (service/${CRDBCLUSTER}). The headless service is
operator-managed and not templated by the CockroachDB Helm chart.
Do not annotate the migration-created ClusterRole/ClusterRoleBinding. The CockroachDB Helm
chart creates its own cluster-scoped RBAC with different names
({fullname}-{namespace}-node-reader). Delete the stale migration-created ones after
Helm adoption:
# Migration-created ClusterRole/ClusterRoleBinding use "{namespace}-{name}" naming
kubectl delete clusterrole ${NAMESPACE}-${CRDBCLUSTER} --ignore-not-found
kubectl delete clusterrolebinding ${NAMESPACE}-${CRDBCLUSTER} --ignore-not-found
Run Helm install
There is no existing Helm release for the migrated cluster, so use --install to create one.
helm upgrade --install ${RELEASE_NAME} ./cockroachdb-parent/charts/cockroachdb \
--namespace ${NAMESPACE} \
--set migration.enabled=true \
--values your-values.yaml
Verify the install succeeded:
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.metadata.labels.app\.kubernetes\.io/managed-by}'
# Expected: Helm
Use migration.enabled=true on the cockroachdb chart only for migration adoption. It
relaxes pre-upgrade validation so the chart can adopt an existing migrated CrdbCluster
without blocking on CRD checks. This is separate from migration.enabled=true on the
operator chart (set in Step 5), which enables the migration controller itself.
After adoption is verified, consider performing a rolling restart to confirm chart-managed
configuration is fully applied. Post-migration there is no StatefulSet, so trigger a rolling
restart via helm upgrade with an updated timestamp:
helm upgrade ${RELEASE_NAME} ./cockroachdb-parent/charts/cockroachdb \
--namespace ${NAMESPACE} \
--set migration.enabled=true \
--set cockroachdb.crdbCluster.timestamp="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--values your-values.yaml
Verify generation matching to confirm the operator has reconciled:
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.metadata.generation} {.status.observedGeneration}'
# Both values should match
Update Certificate Mode After Adoption
After migration, all certificate references are stored as externalCertificates on the
CrdbCluster spec, regardless of the original cert type. The export-values tool outputs
these as cockroachdb.tls.externalCertificates. After Helm adoption, update your
values.yaml to use the cert mode that matches your intended ongoing management:
- Self-signer: Switch to
cockroachdb.tls.selfSigner.enabled: trueand remove theexternalCertificatesblock. The operator's self-signer will manage cert rotation going forward. The migration-created{name}-node-secretand{name}-client-secretsecrets become stale once the self-signer creates its own. Delete them after verifying the new certs are working. - cert-manager: Switch to
cockroachdb.tls.certManager.enabled: truewith the appropriate issuer reference. The migration preserved the Certificate CR and itsspec.secretName, so cert-manager continues to manage rotation. Remove theexternalCertificatesblock from values. - User-provided / External: Keep
cockroachdb.tls.externalCertificatesas exported. You are responsible for cert rotation.
After updating cert mode, run helm upgrade again and verify pods are healthy. Clean up
any stale secrets that are no longer referenced:
# Check which secrets are actually mounted by pods
kubectl get pod $CRDBCLUSTER-0 -n $NAMESPACE -o jsonpath='{.spec.volumes[*].secret.secretName}'
# Delete stale migration-created secrets if they are no longer mounted
kubectl delete secret $CRDBCLUSTER-node-secret -n $NAMESPACE --ignore-not-found
kubectl delete secret $CRDBCLUSTER-client-secret -n $NAMESPACE --ignore-not-found
Multi-Cluster Migration Strategy
When you have multiple clusters across different namespaces, you can migrate them one at a time while the public operator continues managing the remaining v1alpha1 clusters.
Deployment pattern
Deploy one namespace-scoped CockroachDB Operator per namespace using watchNamespaces.
This provides clean isolation where each operator manages only the clusters in its namespace.
ns-prod-1: public operator cluster-1 + CockroachDB Operator (scoped, migration=true)
ns-prod-2: public operator cluster-2 (no CockroachDB Operator yet)
ns-staging: CockroachDB Operator (scoped, migration=false) + fresh v1beta1 cluster
The migration flag rule
At least one CockroachDB Operator must have migration.enabled=true as long as any v1alpha1
cluster exists anywhere in the Kubernetes cluster. The CRD conversion webhook is a
cluster-wide resource. The public operator reads and writes v1alpha1 CrdbClusters, and since
the storage version is v1beta1, every v1alpha1 interaction goes through the webhook. Only
operators started with migration.enabled=true register the /convert endpoint.
It does not matter which operator has the flag. The conversion endpoint is stateless and handles requests for any namespace. But at least one must be running and reachable.
Operators that only manage fresh v1beta1 clusters do not need the migration flag. When an
operator starts without migration enabled, it checks the CRD's storedVersions field. If
v1alpha1 is present and an existing conversion webhook is already configured (by another
operator), it preserves the webhook and starts normally.
Creating new v1beta1 clusters during migration
You can create new v1beta1 clusters at any point during migration. Install a namespace-scoped
CockroachDB Operator in a separate namespace and create v1beta1 CrdbCluster resources directly.
The mutating webhook automatically injects crdb.io/skip-reconcile=true so the public
operator ignores these clusters.
The new operator does not need migration.enabled=true as long as another operator already
has it enabled.
Gradual migration across namespaces
Phase 1 - Migrate the first namespace:
- Install a CockroachDB Operator in ns-1 with
migration.enabled=trueandwatchNamespaces=ns-1. - Apply
skip-reconcileand cloud annotations on the cluster in ns-1. - Label the cluster with
crdb.io/migrate=start. - Monitor migration, delete StatefulSet when in Finalization.
- The cluster in ns-2 continues running under the public operator, unaffected.
Phase 2 - Deploy fresh v1beta1 clusters in other namespaces (optional):
- Install a CockroachDB Operator in ns-3 with
migration.enabled=falseandwatchNamespaces=ns-3. This works because the ns-1 operator already has the conversion webhook configured. - Create v1beta1 clusters directly using the CockroachDB Helm chart.
Phase 3 - Migrate the remaining namespace:
- Either expand ns-1's
watchNamespacesto include ns-2, or install a new CockroachDB Operator in ns-2 withmigration.enabled=true. - Apply
skip-reconcileand cloud annotations on the cluster in ns-2. - Migrate the cluster.
Phase 4 - Finalize (all v1alpha1 clusters are migrated):
- Remove the public operator (no v1alpha1 clusters remain).
- Patch
storedVersionsto remove v1alpha1 (see Step 13). - The migration flag can optionally be removed (see Step 14).
What happens if things go wrong
| Scenario | What happens | What to do |
|---|---|---|
| The only migration-enabled operator goes down | v1alpha1 reads/writes fail (webhook unreachable). v1beta1 clusters are unaffected. | Restart the operator or enable migration on another operator. |
Migration flag removed before storedVersions is patched |
Operator preserves the existing webhook but does not register /convert. v1alpha1 interactions through the stale webhook return 404. v1beta1 clusters work fine. |
Re-enable migration, or patch storedVersions if all clusters are migrated. |
storedVersions patched while v1alpha1 clusters still exist |
The API server may not serve those objects correctly. | Do not patch storedVersions until all v1alpha1 clusters are migrated. |
| Public operator removed while v1alpha1 clusters remain | Those clusters become unmanaged. | Reinstall the public operator or migrate them first. |
skip-reconcile removed on a migrated cluster while public operator runs |
Public operator recreates StatefulSet, conflicting with CrdbNodes. | Re-apply skip-reconcile immediately. Delete the recreated StatefulSet. |
Step 13. Patch storedVersions
After all v1alpha1 clusters across the entire Kubernetes cluster have been migrated and the
public operator has been removed, patch the CRD's storedVersions to remove v1alpha1.
Do not do this until every v1alpha1 cluster has completed migration and the public operator is stopped.
# Verify all clusters are migrated
kubectl get crdbclusters --all-namespaces \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: phase={.status.migration.phase}{"\n"}{end}'
# Patch storedVersions
kubectl patch crd crdbclusters.crdb.cockroachlabs.com \
--subresource=status \
--type=json \
-p='[{"op":"replace","path":"/status/storedVersions","value":["v1beta1"]}]'
# Verify
kubectl get crd crdbclusters.crdb.cockroachlabs.com \
-o jsonpath='{.status.storedVersions}'
# Expected: ["v1beta1"]
Step 14. Disable migration mode (optional)
After storedVersions is patched, disabling migration mode is optional. When the operator
restarts without the flag, it sees that storedVersions no longer contains v1alpha1 and
sets v1alpha1 served=false on the CRD. The conversion webhook is removed.
If you leave migration.enabled=true after patching storedVersions, the operator continues
to register the conversion webhook and migration controller, but they have no effect.
helm upgrade crdb-operator ./cockroachdb-parent/charts/operator \
--reuse-values \
--set migration.enabled=false
After this: - v1alpha1 is no longer served by the API server. - The conversion webhook is removed from the CRD. - All existing v1beta1 clusters continue working normally.
Do not disable migration mode while the public operator is still running or before
storedVersionsis patched. Disabling it removes the conversion webhook, which causes the API server to drop v1beta1-only fields (certificates, regions, podTemplate) from the stored CrdbCluster spec during v1alpha1 round-trips.
Removing the CockroachDB Operator
If you uninstall the CockroachDB Operator without completing Steps 13-14 first, the CRD
still has the conversion webhook pointing at a dead endpoint. Any v1alpha1 API calls
(including kubectl get crdbclusters if the API server chooses v1alpha1 as the storage
version) will fail with webhook connection errors. To recover:
- Patch
storedVersionsto remove v1alpha1 (Step 13). - Remove the conversion webhook from the CRD manually:
bash kubectl patch crd crdbclusters.crdb.cockroachlabs.com \ --type=json \ -p='[{"op":"remove","path":"/spec/conversion/webhook"},{"op":"replace","path":"/spec/conversion/strategy","value":"None"}]' - Optionally remove v1alpha1 from the CRD's
spec.versionsif no v1alpha1 clusters remain.
Always complete Steps 13-14 before uninstalling the operator.
Conversion annotations
The conversion webhook writes annotations on v1alpha1 objects to preserve v1beta1 state
across round-trips. ConvertFrom always refreshes the hub spec annotation from the current
v1beta1 spec, so the annotation is always up to date when ConvertTo reads it. This
ensures v1beta1-only fields like ServiceAccountName and StartFlags survive any v1alpha1
write-back. The migration controller strips these annotations on Phase=Complete as cleanup,
but they are harmless and will be re-created if a v1alpha1 read occurs while the conversion
webhook is still active. Disabling migration mode prevents any further round-trips.
Certificate Detection
The controller automatically detects your certificate method. No manual configuration is needed.
- cert-manager: If cert-manager Certificate CRs exist for the cluster (
{name}-node), the controller updates the Certificate CR with join service DNS SANs and preserves the issuer references. cert-manager then regenerates the secrets automatically. CrdbNode pods mount from the cert-manager-managed secret (the name fromspec.secretNamein the Certificate CR), not{name}-node-secret. This keeps cert-manager's automatic rotation working after migration. - Self-signer (operator built-in): The controller loads the existing CA from cluster secrets,
regenerates node and client certificates with join service DNS SANs, and writes them to
the destination secrets (
{name}-node-secret,{name}-client-secret). CrdbNode pods mount from{name}-node-secret. The regenerated certificates have a 1-year TTL and are stored asExternalCertificatesin the v1beta1 spec. The operator does not auto-rotateExternalCertificates, so after migration you should switch tocockroachdb.tls.selfSigner.enabled: truevia Helm adoption (Step 12) to enable automatic rotation. If you delay this step, the certs will expire silently after one year. - External certs: If neither cert-manager CRs nor self-signer secrets are found, the controller preserves existing secret references as ExternalCertificates in the v1beta1 spec.
- Custom secret names: If the v1alpha1 spec has
NodeTLSSecretorClientTLSSecretset, the controller skips cert regeneration entirely because the CA private key is not available. CrdbNode pods mount the user's existing secrets directly. The user must update their certificates to include join service DNS SANs before starting the migration (see prerequisites). - Insecure clusters: Detected from the
--insecurestart flag. The entire cert migration phase is skipped. AnInsecureClusterMigrationwarning event is emitted.
Note: Regardless of the original cert type, after migration all certificate secret names are stored as
ExternalCertificateson the CrdbNode spec. This is the internal representation used by the operator to mount the correct secrets into pods.
Controlling Migration (Stop / Resume / Rollback)
Stop (Pause)
Pauses migration at the current phase. No resources are deleted. CrdbNodes already created remain running and serving traffic.
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=stop --overwrite -n $NAMESPACE
Resume
There is no separate "resume" label value. Use start again to resume from the paused phase.
The controller detects the PhaseStopped state internally and resumes at the correct phase
based on how many nodes have already been migrated.
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=start --overwrite -n $NAMESPACE
Rollback
Rollback is safe at any phase before Complete. Removing the label triggers automatic rollback.
The controller automatically:
1. Deletes all CrdbNodes and waits for their pods to terminate.
2. Restores StatefulSet to original replica count.
3. Removes migration labels (crdb.cockroachlabs.com/cluster, svc) from pods.
4. Deletes CA ConfigMap ({name}-ca-crt) and migration-created cert secrets
({name}-node-secret, {name}-client-secret).
5. Reverses ConfigMap key migration (logs.yaml back to logging.yaml).
6. Deletes all PDB variants ({name}, {name}-budget, {name}-pdb).
7. Restores original service selectors (app.kubernetes.io/component, app.kubernetes.io/instance).
8. Deletes namespace-qualified ClusterRole and ClusterRoleBinding ({namespace}-{name}).
9. Sets CrdbCluster Mode=Disabled.
10. Removes crdb.io/skip-reconcile label last so the public operator resumes control.
All cleanup steps are non-fatal. If a resource is already deleted or inaccessible, the controller logs a warning and continues with the remaining cleanup.
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate- -n $NAMESPACE
What happens to data during rollback? Data written during migration is safe. CockroachDB replicates data across nodes. When CrdbNodes are deleted, their data is re-replicated to the remaining STS pods (which are scaled back up). No data is lost as long as the replication factor is maintained.
During rollback, two sets of pods may briefly coexist. The controller scales the STS back up before deleting CrdbNodes, ensuring the cluster always has enough replicas.
What if the public operator was already uninstalled? Rollback still works. It restores the StatefulSet and removes migration artifacts. However, without the public operator running, nothing will reconcile the StatefulSet after rollback. You would need to reinstall the public operator or manage the StatefulSet manually.
What happens to PVCs during rollback? When CrdbNodes are deleted, their PVCs are also deleted. When the StatefulSet scales back up, Kubernetes creates new PVCs. Unlike forward migration (which reuses existing PVCs), rollback creates fresh storage. Data safety is maintained through CockroachDB's built-in replication. Data is re-replicated from the nodes that remain healthy throughout the process.
Important: For operator migrations, the public operator must still be running (or at least its CRDs and webhook must be available) for rollback to fully succeed. The controller removes
skip-reconcileas the last rollback step, allowing the public operator to resume reconciliation. If the public operator has been completely uninstalled (including CRDs), the StatefulSet is restored but nothing will reconcile it. You would need to reinstall the public operator or manage the cluster manually.
Rollback Validation
After rollback, the controller enters a validation phase (PhaseRollbackComplete) and
verifies:
- All CrdbNodes are deleted.
- StatefulSet is restored to original replica count.
- All pods are ready.
- Headless service has no v1beta1 ownerReferences.
Only after all checks pass does it remove the skip-reconcile label and clear migration status.
Rollback safety by phase
| Phase | Rollback safe? | Notes |
|---|---|---|
| Init | Yes | Only a CrdbCluster has been created; no pods affected |
| CertMigration | Yes | Only cert secrets created; no pods affected |
| PodMigration | Yes | Controller deletes CrdbNodes and scales STS back up |
| Finalization | Conditional | Safe if STS still exists. If STS was already deleted, rollback sets PhaseFailed |
| Complete | No | STS has been deleted. Manual recovery required |
Label Values Reference
| Label Value | Who Sets It | Meaning |
|---|---|---|
start |
User | Begin or resume migration |
stop |
User | Pause migration |
rollback |
User | Request rollback (same as removing label) |
in-progress |
Controller | Migration is actively running |
stopped |
Controller | Migration is paused |
complete |
Controller | Migration finished successfully |
failed |
Controller | Migration hit an unrecoverable error |
| (removed) | User | Triggers rollback |
Rollback After Complete
Once Phase=Complete, automated rollback is not possible. The StatefulSet has been deleted.
Manual recovery steps:
1. Delete all CrdbNodes
2. Manually recreate the StatefulSet with original spec
3. Scale up and verify cluster health
4. Remove the migration label
Troubleshooting
Migration Fails with "missing required label: crdb.io/skip-reconcile=true"
The skip-reconcile label was not applied before starting migration. Apply it and retry:
kubectl label crdbcluster $CRDBCLUSTER crdb.io/skip-reconcile="true" -n $NAMESPACE
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=start --overwrite -n $NAMESPACE
Migration Stuck at PodMigration
If a node fails health checks for more than 10 minutes, migration auto-pauses to PhaseStopped.
# Check why migration stopped
kubectl get crdbcluster $CRDBCLUSTER -n $NAMESPACE \
-o jsonpath='{.status.migration.lastError}'
# Check pod status
kubectl describe pod $CRDBCLUSTER-<index> -n $NAMESPACE
# Check CrdbNode status
kubectl get crdbnode $CRDBCLUSTER-<index> -n $NAMESPACE -o yaml
# Resume after fixing
kubectl label crdbcluster $CRDBCLUSTER crdb.io/migrate=start --overwrite -n $NAMESPACE
CrdbCluster Not Found (Operator Migration)
For operator migrations, the v1beta1 CrdbCluster is created via the conversion webhook.
The conversion webhook is only registered when the operator is started with
--enable-migration (or migration.enabled=true in Helm). If the v1beta1 CrdbCluster
is missing, ensure:
- The CockroachDB Operator was installed with
migration.enabled=true. - The webhook service is running and reachable.
- The v1alpha1 CrdbCluster has not been deleted.
# Check if conversion webhook is configured
kubectl get crd crdbclusters.crdb.cockroachlabs.com -o yaml | grep -A5 conversion
Locality Labels Missing
If CrdbNode pods remain Pending, check for locality label warnings:
kubectl get events -n $NAMESPACE --field-selector reason=LocalityLabelsRequired
Apply the required labels to K8s nodes (see Step 2).
Under-replicated Ranges
The controller retries health checks every 10 seconds. If under-replicated ranges persist for more than 10 minutes, migration auto-pauses.
# Use -c db during migration, -c cockroachdb after migration
kubectl exec $CRDBCLUSTER-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
RBAC Conflicts Between Public and CockroachDB Operator
If both operators are fighting over RBAC:
1. Verify skip-reconcile is set on the v1alpha1 CrdbCluster.
2. Verify the CockroachDB Operator uses watchNamespaces to limit its scope.
3. Check that ClusterRole names do not collide (CockroachDB Operator uses {namespace}-{name}).
# Check namespace-scoped RBAC
kubectl get role $CRDBCLUSTER -n $NAMESPACE
kubectl get rolebinding $CRDBCLUSTER -n $NAMESPACE
# Check cluster-scoped RBAC (namespace-qualified names)
kubectl get clusterrole ${NAMESPACE}-${CRDBCLUSTER}
kubectl get clusterrolebinding ${NAMESPACE}-${CRDBCLUSTER}
ConfigMap Key Format
The public operator uses logging.yaml as the key in log-config ConfigMaps. The CockroachDB
Operator uses logs.yaml. The migration controller renames this key automatically. On
rollback, the key is renamed back. Verify:
kubectl get configmap <log-config-name> -n $NAMESPACE -o yaml | grep -E "logging.yaml|logs.yaml"