This guide describes how to automatically migrate an existing CockroachDB cluster from a Helm StatefulSet deployment to the CockroachDB operator.
The CockroachDB operator is in Preview.
The CockroachDB operator's migration controller automatically handles node migration, certificate regeneration, and resource management, ensuring a seamless transition with minimal manual intervention.
This migration can be completed without affecting cluster availability. The process preserves existing disks so that data does not need to be replicated into empty volumes. The controller migrates nodes one at a time (highest index first), so the maximum cluster capacity will be reduced by one node periodically throughout the migration.
Compatibility
Before starting migration, verify your cluster configuration is supported.
| Feature | Supported | Notes |
|---|---|---|
| Self-signer certs (Helm built-in) | Yes | Regenerated with join service DNS SANs; originals preserved |
| cert-manager certs | Yes | Certificate CR updated with join service DNS SANs; issuer references preserved |
User-provided certs (tls.certs.provided) |
Yes | CrdbNode pods mount user's secrets directly; cert regeneration skipped (see prerequisites) |
| Insecure clusters (TLS disabled) | Yes | Detected from --insecure flag; cert migration phase skipped entirely |
| WAL failover (dedicated PVC) | Yes | Detected from failover* VolumeClaimTemplate and --wal-failover flag |
Dedicated logs PVC (logsdir / logs-dir) |
Yes | VolumeClaimTemplate and mount path preserved as LogsStore |
| PCR (virtualized/standby) | Yes | Detected from {name}-init Job args (--virtualized, --virtualized-empty) |
| Custom service account | Yes | Preserved with create: false |
| Custom start flags | Partial | Known operator-managed flags (--join, --listen-addr, --certs-dir, etc.) are excluded; all others preserved in startFlags.upsert |
| Multi-region | Yes | Requires migration-regions annotation with all regions before starting |
| Ingress (UI/SQL) | Partial | Existing ingress continues working during migration (service names preserved). Config saved as annotation for reference. Ingress resources themselves are not created or modified. Manual Helm adoption required |
| ServiceMonitor / PodMonitor | No | Not handled by migration. Must be recreated manually after migration to match new pod labels |
| NetworkPolicy | No | Must be recreated manually if present |
| Log-config Secret | Yes | Automatically converted to ConfigMap (operator uses ConfigMaps, Helm uses Secrets) |
Before you begin
Before starting the migration process, verify the following:
- Your CockroachDB cluster is managed by a Helm StatefulSet, not the Public operator.
- All StatefulSet pods are Running and Ready with no pending rolling updates, scale operations, or evictions.
kubectlandhelmare installed and configured with access to the target cluster.- For multi-region clusters, cloud region and provider labels are applied to Kubernetes nodes.
- For multi-region clusters, the
migration-regionsannotation is prepared with the full topology. - You have reviewed the Compatibility table above and confirmed your configuration is supported.
If you use user-provided certificates (tls.certs.provided), your certificates must include join service DNS SANs ({cluster-name}-join, {cluster-name}-join.{namespace}, {cluster-name}-join.{namespace}.svc.cluster.local). The controller cannot regenerate certificates when the CA private key is not available.
If you use tls.certs.provided with tls.certs.tlsSecret: false, secrets must be re-keyed to use Kubernetes TLS convention (tls.crt, tls.key, ca.crt) instead of CockroachDB-native key names (node.crt, node.key). The operator expects TLS secrets in the standard Kubernetes format. This is not needed for self-signer or cert-manager modes.
Overview
What the Migration Controller Does Automatically
- Detects cert-manager, self-signer, or user-provided certificates. Self-signer and cert-manager certs are regenerated with join service DNS SANs automatically. User-provided certs are mounted directly (regeneration skipped).
- Creates RBAC resources (ServiceAccount, Role, RoleBinding, ClusterRole, ClusterRoleBinding) with namespace-qualified names for cluster-scoped resources.
- Deletes old PDB and creates a new one during finalization (no protection gap because the new PDB is created immediately after old PDB deletion).
- Updates service selectors during migration and restores them on rollback.
- Migrates nodes one at a time (highest index first), verifying cluster health between each node.
- Detects and preserves Ingress configuration as an annotation on the CrdbCluster.
- Migrates
logsdir/logs-dirVolumeClaimTemplate to CrdbNode LogsStore. - Converts log-config Secret to ConfigMap (Helm uses Secrets, operator uses ConfigMaps).
- Detects Physical Cluster Replication (PCR) configuration from init jobs.
- Detects WAL failover configuration from VolumeClaimTemplates and
--wal-failoverflag.
What Requires Manual Action
- Cloud region and provider node labels must be applied before deploying the operator.
- The StatefulSet must be manually deleted after finalization to complete migration.
- Helm adoption annotations must be applied to all resources post-migration.
- Old bare-named ClusterRole/ClusterRoleBinding must be deleted post-migration.
- Ingress resources are not annotated for Helm adoption (manual step required).
- ServiceMonitor / PodMonitor resources must be recreated with updated pod labels.
Migration Phases
Init -> CertMigration -> PodMigration -> Finalization -> (user deletes STS) -> Complete
| Phase | Duration (approx.) | What happens |
|---|---|---|
| Init | Seconds | Validates prerequisites, creates v1beta1 CrdbCluster, records original replica count |
| CertMigration | Seconds to minutes | Detects cert type, regenerates certs with join service SANs, creates CA ConfigMap, labels existing pods |
| PodMigration | ~5-15 min per node | Creates CrdbNode, waits for health, scales down STS by one. Repeats for each node (highest index first) |
| Finalization | Seconds | Sets cluster spec (regions, TLS, resources), deletes old PDB, creates new PDB, sets Mode=MutableOnly |
| Complete | Seconds (after user deletes STS) | Records completion, updates migration label to complete |
Step 1. Export environment variables
Export the necessary environment variables to identify your existing deployment.
Set STS_NAME to the CockroachDB StatefulSet deployed via Helm chart:
export STS_NAME="crdb-test-cockroachdb"
Set NAMESPACE to the namespace where the StatefulSet is installed:
export NAMESPACE="default"
Set RELEASE_NAME to the Helm release name:
export RELEASE_NAME=$(kubectl get sts $STS_NAME -n $NAMESPACE \
-o jsonpath='{.metadata.annotations.meta\.helm\.sh/release-name}')
Step 2. Apply cloud region and provider labels (multi-region only)
CockroachDB uses K8s node labels for locality-based pod placement. These labels must exist on the nodes before the operator starts scheduling pods.
# AWS
kubectl label node <node-name> \
topology.kubernetes.io/region=us-east-1 \
topology.kubernetes.io/zone=us-east-1a \
cockroach.io/cloud=aws
# GCP
kubectl label node <node-name> \
topology.kubernetes.io/region=us-central1 \
topology.kubernetes.io/zone=us-central1-a \
cockroach.io/cloud=gcp
# Azure
kubectl label node <node-name> \
topology.kubernetes.io/region=eastus \
topology.kubernetes.io/zone=eastus-1 \
cockroach.io/cloud=azure
If the required labels are missing, CrdbNode pods will remain Pending with scheduling errors.
For multi-region clusters, annotate the StatefulSet with the full topology before starting migration:
kubectl annotate sts $STS_NAME \
'crdb.cockroachlabs.com/migration-regions=[{"code":"us-east-1","nodes":3,"namespace":"ns-east"},{"code":"eu-west-1","nodes":3,"namespace":"ns-west"}]' \
-n $NAMESPACE
If the annotation is missing, the controller defaults to the operator's configured cloud region. For single-region clusters this default is usually correct and the annotation can be omitted.
The fallback order is: (1) migration-regions annotation, (2) regionCode and
cloudProvider annotations, (3) operator environment configuration, (4) default
us-east1. For single-region clusters, the default is usually acceptable.
Step 3. Verify the StatefulSet is stable
All replicas must be ready before migration begins.
kubectl get sts $STS_NAME -n $NAMESPACE
# READY column must equal DESIRED
kubectl get pods -l app.kubernetes.io/name=cockroachdb -n $NAMESPACE
# All pods must be Running/Ready
No ongoing rolling updates, scale operations, or pending evictions.
Step 4. Install the CockroachDB operator with migration enabled
Install the operator with migration enabled. This registers the migration controller
which watches for the crdb.io/migrate label.
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator \
--set migration.enabled=true
migration.enabled=trueon the operator chart enables the migration controller and registers the conversion webhook. This is different frommigration.enabled=trueon the cockroachdb chart (used later in Step 13), which relaxes pre-upgrade validation so the chart can adopt an existing migrated CrdbCluster. Both must be set at their respective steps.
Verify the operator is running and CRDs are available:
kubectl get pods -l app=cockroachdb-operator -n $NAMESPACE
kubectl get crd crdbclusters.crdb.cockroachlabs.com
# Verify both API versions are accessible (conversion webhook is active)
kubectl api-resources | grep crdbclusters
If a public operator is also running in the cluster, scope the CockroachDB Operator to the target
namespace. Without watchNamespaces, the CockroachDB Operator's cluster controller will attempt to
reconcile all CrdbClusters cluster-wide, interfering with clusters managed by the public
operator in other namespaces.
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator \
--set migration.enabled=true \
--set watchNamespaces=$NAMESPACE
When migrating clusters across multiple namespaces, migrate one namespace at a time. Update
watchNamespaces (or use a comma-separated list) to include additional namespaces only after
the previous migration is complete.
Step 5. Start the migration
Initiate the migration by labeling the StatefulSet.
kubectl label sts $STS_NAME crdb.io/migrate=start -n $NAMESPACE
The operator detects this label and begins the migration automatically.
Step 6. Monitor the migration
DB Console
The DB Console is accessible via the public service throughout migration.
# Port-forward to access DB Console
kubectl port-forward svc/${STS_NAME}-public 8080 -n $NAMESPACE
# Then open https://localhost:8080 in your browser
Key things to monitor in the DB Console:
- Node List: Watch for nodes joining and leaving as migration progresses. Total node count stays constant (new CrdbNode created before old STS pod is removed).
- Replication Status: Under-replicated and unavailable ranges should stay at zero. The migration controller waits for under-replicated ranges to clear before proceeding to the next node.
- Ranges Dashboard: Under-replicated ranges counter should not increase during migration.
CockroachDB Health Checks
The controller performs these checks automatically between each node migration, but you can verify manually:
# Check under-replicated ranges (should be 0)
# During migration, STS pods use container name "db"; after migration, CrdbNode pods use "cockroachdb"
kubectl exec $STS_NAME-0 -n $NAMESPACE -c db -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
# Check all nodes are live (count should match total nodes)
kubectl exec $STS_NAME-0 -n $NAMESPACE -c db -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT count(DISTINCT node_id) FROM crdb_internal.kv_store_status;"
# For insecure clusters, replace --certs-dir with --insecure
Watch migration status
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.status.migration.phase} {.status.migration.message}' -w
Watch events
kubectl get events -n $NAMESPACE \
--field-selector involvedObject.name=$STS_NAME \
--sort-by='.lastTimestamp'
Watch CrdbNode creation during PodMigration
kubectl get crdbnode -n $NAMESPACE \
-l crdb.cockroachlabs.com/cluster=$STS_NAME -w
Phase-Specific Checks
| Phase | What to watch |
|---|---|
| Init | kubectl describe crdbcluster $STS_NAME for MigrationStarted event |
| CertMigration | For self-signer: kubectl get secret $STS_NAME-node-secret -n $NAMESPACE. For cert-manager: check the secret name from the Certificate CR (kubectl get certificate $STS_NAME-node -n $NAMESPACE -o jsonpath='{.spec.secretName}'). For user-provided certs: your existing secret is used directly. TLS clusters only |
| PodMigration | kubectl get crdbnode -n $NAMESPACE -w for one node per pod |
| Finalization | kubectl get crdbcluster $STS_NAME -o jsonpath='{.status.migration.message}' for "Finalization complete" |
| Complete | After StatefulSet deletion; spec.mode: MutableOnly and crdb.io/migrate label changes to complete |
Auto-pause behavior
During PodMigration, if a newly created CrdbNode does not pass health checks within
10 minutes, the controller automatically pauses migration by setting the phase to
PhaseStopped. Health checks run every 10 seconds and verify:
- CrdbNode has
PodReady=Truecondition - Pod is in
Runningphase withReady=Truecondition - SQL health check passes (zero under-replicated ranges and correct live node count)
- Pod is registered in the headless service endpoints
When auto-paused, investigate the stuck node and resume with kubectl label sts $STS_NAME crdb.io/migrate=start --overwrite.
Prometheus Metrics
crdb_operator_migration_phase{cluster, namespace, phase}
crdb_operator_migration_pods_total{cluster, namespace}
crdb_operator_migration_pods_migrated{cluster, namespace}
crdb_operator_migration_duration_seconds{cluster, namespace, phase}
crdb_operator_migration_errors_total{cluster, namespace, phase, error_type}
crdb_operator_migration_rollbacks_total{cluster, namespace, reason}
PDB Behavior During Migration
- Init through PodMigration: The original Helm PDB remains active. Pod disruption protection is maintained throughout.
- Finalization: The controller deletes the old PDB (
{name}or{name}-budget) and immediately creates the new PDB ({name}-pdb). There is no gap in protection. - Rollback: All PDB variants are deleted (
{name},{name}-budget,{name}-pdb). The original PDB is not recreated by the migration controller. It is recreated by the Helm chart on the nexthelm upgradeor by the Helm release's existing resources.
Step 7. Delete the StatefulSet to complete migration
After Finalization completes, the controller stops processing and waits for the user to
delete the StatefulSet. The StatefulSet is intentionally left intact. Once deleted, the
controller sets spec.mode=MutableOnly and records Phase=Complete.
# Verify Finalization is done (Mode is still Disabled at this point)
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.status.migration.message}'
# Expected: Finalization complete. Delete the StatefulSet to mark migration complete.
# Delete the StatefulSet to trigger Phase=Complete
kubectl delete sts $STS_NAME -n $NAMESPACE
# Verify completion (may take a few seconds)
# Mode switches to MutableOnly and label changes to "complete" after STS deletion.
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.spec.mode} {.status.migration.phase}'
# Expected: MutableOnly Complete
All STS pods are already at replicas=0 by the time Finalization runs (scaled down one by one during PodMigration). Deleting the StatefulSet object does not evict any running pods.
Step 8. Verify cluster health post-migration
# All CrdbNodes should be healthy
kubectl get crdbnode -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$STS_NAME
# All pods running
kubectl get pods -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$STS_NAME
# Verify data integrity from any CrdbNode pod
# Note: post-migration, the container name is "cockroachdb" (not "db" as in the original STS)
kubectl exec $STS_NAME-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT count(*) FROM [SHOW RANGES];"
# Verify under-replicated ranges are zero
kubectl exec $STS_NAME-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
Verify preserved configurations
# WAL failover (if configured)
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.walFailoverSpec}'
# Logs PVC (if configured)
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.logsStore}'
# Log-config ConfigMap (converted from Helm Secret)
kubectl get configmap ${STS_NAME}-log-config -n $NAMESPACE
# TLS certificates (secret names depend on cert type)
# Self-signer: {name}-node-secret, {name}-client-secret (regenerated by migration)
kubectl get secret $STS_NAME-node-secret -n $NAMESPACE
kubectl get secret $STS_NAME-client-secret -n $NAMESPACE
# cert-manager: secrets are managed by cert-manager (names from Certificate CR)
kubectl get certificate $STS_NAME-node -n $NAMESPACE -o jsonpath='{.spec.secretName}'
# User-provided: your original secrets are used directly
# CA ConfigMap (created for all TLS cert types)
kubectl get configmap $STS_NAME-ca-crt -n $NAMESPACE
# PCR (if configured)
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.spec.template.spec.virtualCluster}'
Step 9. Prepare values.yaml for the CockroachDB Helm chart
Before running helm upgrade, create a values.yaml that matches the migrated CrdbCluster
spec. The CockroachDB Helm chart generates a CrdbCluster from these values, so they must align
with what migration produced.
# Inspect the migrated CrdbCluster spec to extract the values you need
kubectl get crdbcluster $STS_NAME -n $NAMESPACE -o yaml
Option A: Generate values.yaml automatically
./bin/migration-helper export-values \
--crdb-cluster $STS_NAME \
--namespace $NAMESPACE \
--output-dir ./manifests
Review the generated file:
cat ./manifests/values.yaml
If the generated file contains cockroachdb.crdbCluster.rbac.serviceAccount.name, keep
that value as is. It preserves the service account used by the migrated cluster.
Option B: Build values.yaml manually
Key values to set (map each to your migrated spec):
| Helm value | Source from migrated CrdbCluster |
|---|---|
cockroachdb.crdbCluster.image.name |
spec.template.spec.image |
cockroachdb.crdbCluster.regions |
spec.regions (code, nodes, namespace, cloudProvider) |
cockroachdb.tls.enabled |
spec.tlsEnabled |
cockroachdb.crdbCluster.rbac.serviceAccount.name |
spec.template.spec.serviceAccountName (leave empty to use default fullname) |
cockroachdb.crdbCluster.resources |
spec.template.spec.podTemplate.spec.containers[0].resources |
cockroachdb.crdbCluster.rollingRestartDelay |
spec.rollingRestartDelay |
cockroachdb.crdbCluster.clusterSettings |
spec.clusterSettings |
If the source Helm release used custom values (affinity, tolerations, nodeSelector, etc.), carry those forward into the new values.yaml as well.
Field Mapping Reference
| Source (StatefulSet) | Migrated v1beta1 shape | Helm adoption value |
|---|---|---|
| StatefulSet name | CrdbCluster.metadata.name |
k8s.fullnameOverride |
| Container image | spec.template.spec.image |
cockroachdb.crdbCluster.image.name |
| Data PVC template | spec.template.spec.dataStore |
cockroachdb.crdbCluster.dataStore |
--wal-failover=path=... |
spec.template.spec.walFailoverSpec |
cockroachdb.crdbCluster.walFailoverSpec |
Dedicated logsdir PVC |
spec.template.spec.logsStore |
cockroachdb.crdbCluster.log.logsStore |
| Helm log Secret | Converted to ConfigMap; spec.template.spec.loggingConfigMapName |
cockroachdb.crdbCluster.loggingConfigMapName |
| Service account | spec.template.spec.podTemplate.spec.serviceAccountName |
cockroachdb.crdbCluster.rbac.serviceAccount.name with create=false |
--locality tier keys |
spec.template.spec.localityLabels |
Exported into values; patch localityMappings for custom labels |
| Ingress intent | Preserved as annotation on CrdbCluster | cockroachdb.crdbCluster.service.ingress |
| PCR config | spec.template.spec.virtualCluster |
cockroachdb.crdbCluster.virtualCluster |
Step 10. Annotate resources for Helm adoption
The migration controller creates resources but does not annotate them for Helm ownership.
For helm upgrade to manage these resources, they must carry Helm ownership annotations.
# Namespace-scoped resources
for RESOURCE in \
"crdbcluster/${STS_NAME}" \
"serviceaccount/${STS_NAME}" \
"service/${STS_NAME}-public" \
"role/${STS_NAME}" \
"rolebinding/${STS_NAME}"; do
kubectl annotate "${RESOURCE}" \
meta.helm.sh/release-name="${RELEASE_NAME}" \
meta.helm.sh/release-namespace="${NAMESPACE}" \
-n "${NAMESPACE}" --overwrite
kubectl label "${RESOURCE}" \
app.kubernetes.io/managed-by=Helm \
-n "${NAMESPACE}" --overwrite
done
# Ingress resources (only if your cluster has Ingress resources)
for RESOURCE in \
"ingress/ui-${RELEASE_NAME}" \
"ingress/sql-${RELEASE_NAME}"; do
kubectl get "${RESOURCE}" -n "${NAMESPACE}" 2>/dev/null && \
kubectl annotate "${RESOURCE}" \
meta.helm.sh/release-name="${RELEASE_NAME}" \
meta.helm.sh/release-namespace="${NAMESPACE}" \
-n "${NAMESPACE}" --overwrite && \
kubectl label "${RESOURCE}" \
app.kubernetes.io/managed-by=Helm \
-n "${NAMESPACE}" --overwrite
done
If skipped, helm upgrade will fail with "resource already exists" errors.
Do not annotate the migration-created ClusterRole/ClusterRoleBinding for Helm adoption.
The CockroachDB Helm chart creates its own cluster-scoped RBAC with different names
({fullname}-{namespace}-node-reader). The migration-created ones are stale and should
be deleted (see next step).
Step 11. Delete stale resources
The migration controller creates some resources that do not match what the CockroachDB Helm
chart produces. These must be deleted before or after helm upgrade.
# Migration-created ClusterRole and ClusterRoleBinding use "{namespace}-{name}" naming.
# The CockroachDB Helm chart creates its own with "{fullname}-{namespace}-node-reader" naming.
# Delete the migration versions so they don't accumulate as orphans.
kubectl delete clusterrole ${NAMESPACE}-${STS_NAME} --ignore-not-found
kubectl delete clusterrolebinding ${NAMESPACE}-${STS_NAME} --ignore-not-found
# Old ClusterRole and ClusterRoleBinding from the source Helm release (bare cluster name)
kubectl delete clusterrole ${STS_NAME} --ignore-not-found
kubectl delete clusterrolebinding ${STS_NAME} --ignore-not-found
# Old PDB from source Helm chart (operator creates its own "{name}-pdb" format)
kubectl delete pdb ${STS_NAME} -n ${NAMESPACE} --ignore-not-found
kubectl delete pdb ${STS_NAME}-budget -n ${NAMESPACE} --ignore-not-found
Step 12. Configure LocalityMappings
The migration controller preserves the --locality flag tier keys (e.g. region, zone)
as localityLabels on the CrdbNodeSpec. localityLabels is deprecated in favor of
localityMappings, which maps K8s node labels to CockroachDB locality tiers. The default
mapping covers standard K8s topology labels (topology.kubernetes.io/region → region,
topology.kubernetes.io/zone → zone).
If your cluster uses custom K8s node labels for locality, update localityMappings to
match. Each entry maps a K8s node label key to a CockroachDB locality tier name.
kubectl patch crdbcluster $STS_NAME -n $NAMESPACE --type merge -p '
spec:
template:
spec:
localityMappings:
- nodeLabel: "my-custom-label/region"
localityLabel: "region"
- nodeLabel: "my-custom-label/zone"
localityLabel: "zone"
'
If you are using the standard topology.kubernetes.io/* labels, the kubebuilder defaults
are correct and no action is needed.
Step 13. Run Helm upgrade with the CockroachDB chart
Before upgrading, verify the operator has fully reconciled the migrated cluster. Do not
proceed until generation and observedGeneration match and all pods are running.
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.metadata.generation} {.status.observedGeneration}'
# Both values should match
kubectl get pods -n $NAMESPACE -l crdb.cockroachlabs.com/cluster=$STS_NAME
# All pods should be Running and Ready
helm upgrade ${RELEASE_NAME} ./cockroachdb-parent/charts/cockroachdb \
--namespace ${NAMESPACE} \
--set migration.enabled=true \
--values ./manifests/values.yaml
Verify the upgrade succeeded and no resources were duplicated:
# CrdbCluster should be managed by Helm now
kubectl get crdbcluster $STS_NAME -n $NAMESPACE -o jsonpath='{.metadata.labels.app\.kubernetes\.io/managed-by}'
# Expected: Helm
# ClusterRole should use the new naming convention
kubectl get clusterrole | grep $STS_NAME
# Expected: {fullname}-{namespace}-node-reader (created by Helm chart)
Use migration.enabled=true on the cockroachdb chart only for migration adoption. It
relaxes pre-upgrade validation so the chart can adopt an existing migrated CrdbCluster
without blocking on CRD checks. This is separate from migration.enabled=true on the
operator chart (set in Step 4), which enables the migration controller itself.
After adoption is verified, consider performing a rolling restart to confirm chart-managed
configuration is fully applied. Post-migration there is no StatefulSet, so trigger a rolling
restart via helm upgrade with an updated timestamp:
helm upgrade ${RELEASE_NAME} ./cockroachdb-parent/charts/cockroachdb \
--namespace ${NAMESPACE} \
--set migration.enabled=true \
--set cockroachdb.crdbCluster.timestamp="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--values ./manifests/values.yaml
Verify generation matching to confirm the operator has reconciled:
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.metadata.generation} {.status.observedGeneration}'
# Both values should match
Update Certificate Mode After Adoption
After migration, all certificate references are stored as externalCertificates on the
CrdbCluster spec, regardless of the original cert type. The export-values tool outputs
these as cockroachdb.tls.externalCertificates. After Helm adoption, update your
values.yaml to use the cert mode that matches your intended ongoing management:
- Self-signer: Switch to
cockroachdb.tls.selfSigner.enabled: trueand remove theexternalCertificatesblock. The operator's self-signer will manage cert rotation going forward. The migration-created{name}-node-secretand{name}-client-secretsecrets become stale once the self-signer creates its own. Delete them after verifying the new certs are working. - cert-manager: Switch to
cockroachdb.tls.certManager.enabled: truewith the appropriate issuer reference. The migration preserved the Certificate CR and itsspec.secretName, so cert-manager continues to manage rotation. Remove theexternalCertificatesblock from values. - User-provided / External: Keep
cockroachdb.tls.externalCertificatesas exported. You are responsible for cert rotation.
After updating cert mode, run helm upgrade again and verify pods are healthy. Clean up
any stale secrets that are no longer referenced:
# Check which secrets are actually mounted by pods
kubectl get pod $STS_NAME-0 -n $NAMESPACE -o jsonpath='{.spec.volumes[*].secret.secretName}'
# Delete stale migration-created secrets if they are no longer mounted
kubectl delete secret $STS_NAME-node-secret -n $NAMESPACE --ignore-not-found
kubectl delete secret $STS_NAME-client-secret -n $NAMESPACE --ignore-not-found
Certificate Detection
The controller automatically detects your certificate method. No manual configuration is needed.
- cert-manager: If cert-manager Certificate CRs exist for the cluster (
{name}-node), the controller updates the Certificate CR with join service DNS SANs and preserves the issuer references. cert-manager then regenerates the secrets automatically. CrdbNode pods mount from the cert-manager-managed secret (the name fromspec.secretNamein the Certificate CR), not{name}-node-secret. This keeps cert-manager's automatic rotation working after migration. - Self-signer (Helm built-in): The controller loads the existing CA from cluster secrets,
regenerates node and client certificates with join service DNS SANs, and writes them to
the destination secrets (
{name}-node-secret,{name}-client-secret). CrdbNode pods mount from{name}-node-secret. The regenerated certificates have a 1-year TTL and are stored asExternalCertificatesin the v1beta1 spec. The operator does not auto-rotateExternalCertificates, so after migration you should switch tocockroachdb.tls.selfSigner.enabled: truevia Helm adoption (Step 9-13) to enable automatic rotation. If you delay this step, the certs will expire silently after one year. - Provided certs (
tls.certs.provided): The CA private key is not available, so the controller skips cert regeneration. CrdbNode pods mount the user's existing secrets directly (the secret name parsed from the StatefulSet volumes). The user must update their certificates to include join service DNS SANs before starting the migration (see prerequisites). - Insecure clusters: Detected from the
--insecurestart flag. The entire cert migration phase is skipped. AnInsecureClusterMigrationwarning event is emitted.
Note: Regardless of the original cert type, after migration all certificate secret names are stored as
ExternalCertificateson the CrdbNode spec. This is the internal representation used by the operator to mount the correct secrets into pods.
Controlling Migration (Stop / Resume / Rollback)
Stop (Pause)
Pauses migration at the current phase. No resources are deleted. CrdbNodes already created remain running and serving traffic.
kubectl label sts $STS_NAME crdb.io/migrate=stop --overwrite -n $NAMESPACE
Resume
There is no separate "resume" label value. Use start again to resume from the paused phase.
The controller detects the PhaseStopped state internally and resumes at the correct phase
based on how many nodes have already been migrated.
kubectl label sts $STS_NAME crdb.io/migrate=start --overwrite -n $NAMESPACE
Rollback
Rollback is safe at any phase before Complete. Removing the label triggers automatic rollback.
The controller automatically:
1. Deletes all CrdbNodes and waits for their pods to terminate.
2. Restores StatefulSet to original replica count.
3. Removes migration labels (crdb.cockroachlabs.com/cluster, svc) from pods.
4. Deletes CA ConfigMap ({name}-ca-crt).
5. Deletes the ConfigMap created from the log-config Secret.
6. Deletes all PDB variants ({name}, {name}-budget, {name}-pdb).
7. Restores original service selectors (app.kubernetes.io/instance, app.kubernetes.io/name).
8. Deletes namespace-qualified ClusterRole and ClusterRoleBinding ({namespace}-{name}).
9. Removes the migration label from the StatefulSet.
All cleanup steps are non-fatal. If a resource is already deleted or inaccessible, the controller logs a warning and continues with the remaining cleanup.
kubectl label sts $STS_NAME crdb.io/migrate- -n $NAMESPACE
What happens to data during rollback? Data written during migration is safe. CockroachDB replicates data across nodes. When CrdbNodes are deleted, their data is re-replicated to the remaining STS pods (which are scaled back up). No data is lost as long as the replication factor is maintained.
During rollback, two sets of pods may briefly coexist. The controller scales the STS back up before deleting CrdbNodes, ensuring the cluster always has enough replicas.
What happens to PVCs during rollback? When CrdbNodes are deleted, their PVCs are also deleted. When the StatefulSet scales back up, Kubernetes creates new PVCs. Unlike forward migration (which reuses existing PVCs), rollback creates fresh storage. Data safety is maintained through CockroachDB's built-in replication. Data is re-replicated from the nodes that remain healthy throughout the process.
Rollback safety by phase
| Phase | Rollback safe? | Notes |
|---|---|---|
| Init | Yes | Only a CrdbCluster has been created; no pods affected |
| CertMigration | Yes | Only cert secrets created; no pods affected |
| PodMigration | Yes | Controller deletes CrdbNodes and scales STS back up |
| Finalization | Conditional | Safe if STS still exists. If STS was already deleted, rollback sets PhaseFailed |
| Complete | No | STS has been deleted. Manual recovery required |
Label Values Reference
| Label Value | Who Sets It | Meaning |
|---|---|---|
start |
User | Begin or resume migration |
stop |
User | Pause migration |
rollback |
User | Request rollback (same as removing label) |
in-progress |
Controller | Migration is actively running |
stopped |
Controller | Migration is paused |
complete |
Controller | Migration finished successfully |
failed |
Controller | Migration hit an unrecoverable error |
| (removed) | User | Triggers rollback |
Rollback After Complete
Once Phase=Complete, automated rollback is not possible. The StatefulSet has been deleted.
Manual recovery steps:
1. Delete all CrdbNodes
2. Manually recreate the StatefulSet with original spec
3. Scale up and verify cluster health
4. Remove the migration label
Troubleshooting
Migration Stuck at PodMigration
If a node fails health checks for more than 10 minutes, migration auto-pauses to PhaseStopped.
# Check why migration stopped
kubectl get crdbcluster $STS_NAME -n $NAMESPACE \
-o jsonpath='{.status.migration.lastError}'
# Check pod status
kubectl describe pod $STS_NAME-<index> -n $NAMESPACE
# Check CrdbNode status
kubectl get crdbnode $STS_NAME-<index> -n $NAMESPACE -o yaml
# Resume after fixing the issue
kubectl label sts $STS_NAME crdb.io/migrate=start --overwrite -n $NAMESPACE
Locality Labels Missing
If CrdbNode pods remain Pending, check for locality label warnings:
kubectl get events -n $NAMESPACE \
--field-selector reason=LocalityLabelsRequired
Apply the required labels to K8s nodes (see Step 2).
Insecure Cluster Migration
The controller detects insecure clusters (--insecure flag) and skips certificate migration.
An InsecureClusterMigration warning event is emitted. No action needed unless you want to
enable TLS post-migration.
Under-replicated Ranges
If health checks report under-replicated ranges, wait for replication to catch up. The controller retries every 10 seconds. If ranges remain under-replicated for 10 minutes, migration pauses.
# Check from any running CockroachDB pod (use -c db during migration, -c cockroachdb after)
kubectl exec $STS_NAME-0 -n $NAMESPACE -c cockroachdb -- \
/cockroach/cockroach sql --certs-dir=/cockroach/cockroach-certs \
-e "SELECT sum((metrics->>'ranges.underreplicated')::INT8) FROM crdb_internal.kv_store_status;"
Status Conflict Errors
Status update conflicts are normal during migration (multiple controllers may update the same object). The controller retries immediately on conflict. If you see repeated conflict errors in logs, verify that no other controller is competing for the CrdbCluster status.
Verifying RBAC
# Check namespace-scoped RBAC
kubectl get role $STS_NAME -n $NAMESPACE
kubectl get rolebinding $STS_NAME -n $NAMESPACE
# Check cluster-scoped RBAC (namespace-qualified names)
kubectl get clusterrole ${NAMESPACE}-${STS_NAME}
kubectl get clusterrolebinding ${NAMESPACE}-${STS_NAME}
# Verify ServiceAccount
kubectl get sa $STS_NAME -n $NAMESPACE