Kubernetes — Advanced: Operators & Beyond

The Operator Pattern

An Operator is a controller that encodes human operational knowledge about a stateful application into Kubernetes-native automation. It watches custom resources, compares desired vs actual state, and reconciles.

The Control Loop (Reconciliation)

Watch → Detect drift → Reconcile → Repeat

Every built-in Kubernetes controller (Deployment, ReplicaSet) runs this loop. Operators extend it with your own resources and logic.

User applies CR → Operator watches → Compares desired state vs actual → Takes action → Updates status

Custom Resource Definitions (CRDs)

CRDs extend the Kubernetes API with your own resource types.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.io
spec:
  group: mycompany.io
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              engine:
                type: string
                enum: [postgres, mysql]
              replicas:
                type: integer
                minimum: 1
              storageGB:
                type: integer
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
    subresources:
      status: {}        # enables /status subresource
  scope: Namespaced
  names:
    plural: databases
    singular: database
    kind: Database
    shortNames: [db]

Once the CRD is installed, you use it like any built-in resource:

kubectl get databases
kubectl describe database my-postgres

Custom Resource (CR) instance

apiVersion: mycompany.io/v1
kind: Database
metadata:
  name: my-postgres
  namespace: production
spec:
  engine: postgres
  replicas: 3
  storageGB: 100

Building an Operator

Option 1: kubebuilder (recommended, Go)

# Bootstrap
kubebuilder init --domain mycompany.io --repo github.com/mycompany/db-operator
kubebuilder create api --group mycompany --version v1 --kind Database

# Generates:
# api/v1/database_types.go     — CRD struct
# controllers/database_controller.go  — reconcile loop
# config/crd/                  — CRD manifests
# config/rbac/                 — RBAC for operator SA

Option 2: Operator SDK (supports Go, Ansible, Helm)

operator-sdk init --domain mycompany.io --repo github.com/mycompany/db-operator
operator-sdk create api --group mycompany --version v1 --kind Database --resource --controller

The Reconcile loop (Go)

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := log.FromContext(ctx)

    // 1. Fetch the CR
    db := &mycompanyv1.Database{}
    if err := r.Get(ctx, req.NamespacedName, db); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // 2. Compute desired state
    desired := buildStatefulSet(db)

    // 3. Fetch actual state
    actual := &appsv1.StatefulSet{}
    err := r.Get(ctx, types.NamespacedName{Name: db.Name, Namespace: db.Namespace}, actual)

    if errors.IsNotFound(err) {
        // 4a. Create if missing
        if err := r.Create(ctx, desired); err != nil {
            return ctrl.Result{}, err
        }
    } else if err == nil {
        // 4b. Update if different
        actual.Spec = desired.Spec
        if err := r.Update(ctx, actual); err != nil {
            return ctrl.Result{}, err
        }
    }

    // 5. Update status
    db.Status.Phase = "Running"
    r.Status().Update(ctx, db)

    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

Key points:

Idempotent — reconcile can be called any number of times
Return ctrl.Result{} to stop; ctrl.Result{RequeueAfter: ...} to requeue
IgnoreNotFound — CR deleted, clean up and return nil
ctrl.SetControllerReference — tie child resources to parent (owner references → GC)

Owner References (cascading delete)

ctrl.SetControllerReference(db, statefulSet, r.Scheme)
// When the Database CR is deleted, the StatefulSet is automatically GC'd

Well-Known Operators

Operator	What it manages
cert-manager	TLS certs via `Certificate` CR; integrates Let's Encrypt, Vault
prometheus-operator	`ServiceMonitor`, `PrometheusRule` CRs — no manual scrape config editing
postgres-operator (Zalando)	HA Postgres clusters with failover, backups, users
strimzi	Apache Kafka clusters on k8s
ArgoCD	GitOps — `Application` CRs sync Git repos to cluster state
Flux	GitOps — `HelmRelease`, `Kustomization` CRs
Velero	Cluster backup/restore
keda	Event-driven autoscaling — scale on queue depth, cron, custom metrics
crossplane	Infrastructure as CRs — provision cloud resources (RDS, S3) from k8s

Horizontal Pod Autoscaler (HPA)

Scales Deployment replicas based on CPU, memory, or custom metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 200Mi
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5 min before scaling down
    scaleUp:
      stabilizationWindowSeconds: 30

kubectl get hpa
kubectl describe hpa my-app-hpa

Requires: metrics-server installed in cluster.

KEDA (Kubernetes Event-Driven Autoscaling)

Extends HPA to scale on anything — queue depth, Kafka lag, cron, custom metrics, even down to zero.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-app-scaler
spec:
  scaleTargetRef:
    name: my-app
  minReplicaCount: 0      # scale to zero!
  maxReplicaCount: 50
  triggers:
  - type: rabbitmq
    metadata:
      queueName: jobs
      queueLength: "5"    # 1 replica per 5 messages
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 8 * * 1-5"   # scale up Mon-Fri 8am
      end: "0 18 * * 1-5"
      desiredReplicas: "5"

Vertical Pod Autoscaler (VPA)

Recommends (or auto-applies) right-sized CPU/memory requests.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"     # Off | Initial | Recreate | Auto
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Off — recommendations only (read with kubectl describe vpa).
Auto — restarts pods with new requests. Not safe for stateful workloads.
VPA and HPA on same metric = conflict. Use VPA for requests, HPA for CPU utilization.

Pod Disruption Budgets (PDB)

Guarantees a minimum number of pods stay running during voluntary disruptions (node drain, rolling update).

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2         # or maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app

kubectl get pdb
kubectl drain <node> --ignore-daemonsets   # respects PDBs — blocks if it would violate

Priority Classes

Determines eviction order when nodes run out of resources.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"

# In pod spec
priorityClassName: high-priority

Higher value = harder to evict. System-critical pods use 2000001000; your pods should stay below 1000000000.

Admission Webhooks

Intercept API requests before they persist. Two types:

Type	Can modify?	Can reject?	Use for
MutatingAdmissionWebhook	Yes	Yes	Inject sidecars, set defaults
ValidatingAdmissionWebhook	No	Yes	Policy enforcement

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: sidecar-injector
webhooks:
- name: inject.mycompany.io
  clientConfig:
    service:
      name: sidecar-injector-svc
      namespace: kube-system
      path: /mutate
    caBundle: <base64-ca-cert>
  rules:
  - operations: ["CREATE"]
    apiGroups: [""]
    apiVersions: ["v1"]
    resources: ["pods"]
  namespaceSelector:
    matchLabels:
      injection: enabled
  failurePolicy: Fail    # Fail | Ignore
  admissionReviewVersions: ["v1"]
  sideEffects: None

The webhook receives an AdmissionReview JSON object and must return one with allowed: true/false and optionally a JSON patch for mutations.

cert-manager can auto-rotate the webhook TLS cert — recommended.

Service Mesh (Istio / Linkerd)

A service mesh adds a sidecar proxy to every pod (envoy for Istio, linkerd-proxy for Linkerd). The control plane manages proxy config; you get:

Feature	How
mTLS between pods	Automatic cert rotation per service identity
Traffic splitting	`VirtualService` weight routing (canary, A/B)
Retry / timeout / circuit breaker	Per-route policy, no code changes
Observability	Automatic metrics, traces, access logs per request
Rate limiting	`EnvoyFilter` or `RateLimitService`

Istio — key CRDs

# VirtualService — traffic routing
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts: [my-app]
  http:
  - match:
    - headers:
        x-canary: { exact: "true" }
    route:
    - destination: { host: my-app, subset: v2 }
  - route:
    - destination: { host: my-app, subset: v1 }
      weight: 90
    - destination: { host: my-app, subset: v2 }
      weight: 10

# DestinationRule — defines subsets
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: my-app
spec:
  host: my-app
  subsets:
  - name: v1
    labels: { version: v1 }
  - name: v2
    labels: { version: v2 }
  trafficPolicy:
    connectionPool:
      tcp: { maxConnections: 100 }
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s

GitOps with ArgoCD

ArgoCD watches a Git repo and syncs it to the cluster. Drift is detected and can be auto-corrected.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/mycompany/my-app
    targetRevision: main
    path: deploy/k8s
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true       # delete resources removed from Git
      selfHeal: true    # revert manual changes to cluster
    syncOptions:
    - CreateNamespace=true

argocd app list
argocd app sync my-app
argocd app diff my-app
argocd app history my-app

Multi-tenancy Patterns

Pattern	Isolation level	Tool
Namespace per team	Soft — shared API server	NetworkPolicy + RBAC + ResourceQuota
vCluster	Medium — virtual cluster per tenant	vcluster (Loft Labs)
Separate clusters	Hard — full isolation	Cluster API, EKS, GKE

Hierarchical Namespaces (HNC)

# Create child namespace inheriting parent RBAC/NetworkPolicy
kubectl hns create staging --namespace production

Cluster API (CAPI)

Manage cluster lifecycle (create, upgrade, delete) using Kubernetes CRs — clusters as code.

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: prod-cluster
spec:
  clusterNetwork:
    pods:
      cidrBlocks: ["192.168.0.0/16"]
  controlPlaneRef:
    kind: KubeadmControlPlane
    name: prod-cp
  infrastructureRef:
    kind: AWSCluster
    name: prod-aws

Infrastructure providers: AWS, GCP, Azure, vSphere, OpenStack.

Useful Advanced kubectl

# Force-replace (delete + create — breaks connections)
kubectl replace --force -f manifest.yaml

# Server-side apply (tracks field ownership)
kubectl apply --server-side -f manifest.yaml

# Strategic merge patch
kubectl patch deployment my-app -p '{"spec":{"replicas":5}}'

# JSON patch
kubectl patch pod my-pod --type='json' \
  -p='[{"op":"replace","path":"/spec/containers/0/image","value":"nginx:1.26"}]'

# Get with go-template
kubectl get pods -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}'

# Wait for condition
kubectl wait --for=condition=Ready pod -l app=my-app --timeout=120s
kubectl wait --for=condition=complete job/my-job --timeout=300s

# Debug with ephemeral container (k8s 1.23+)
kubectl debug -it my-pod --image=busybox --target=app

# Copy running pod spec for debugging
kubectl get pod my-pod -o yaml | kubectl run debug-pod --dry-run=client -f -

# Check RBAC
kubectl auth can-i create deployments --as=system:serviceaccount:default:my-sa -n production
kubectl auth whoami

Security Hardening

Pod Security Standards (replaces PSP in k8s 1.25+)

# Enforce restricted standard on a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Three levels: privileged → baseline → restricted.

Secure pod spec

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    volumeMounts:
    - mountPath: /tmp
      name: tmp-dir     # writable scratch if readOnlyRootFilesystem
  volumes:
  - name: tmp-dir
    emptyDir: {}

Image scanning

# Trivy (most common)
trivy image nginx:1.25
trivy k8s --report summary cluster    # scan whole cluster

etcd — What's Under the Hood

All cluster state lives in etcd (distributed key-value store). Nodes, pods, secrets, CRDs — everything.

# Direct etcd read (from control plane node)
ETCDCTL_API=3 etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  get /registry/pods/default --prefix --keys-only

# Backup
etcdctl snapshot save backup.db
etcdctl snapshot restore backup.db --data-dir /var/lib/etcd-restore

Secrets are stored in etcd — enable encryption at rest (EncryptionConfiguration) if not using a secrets manager.

Summary — Complexity Ladder

Pods + Deployments + Services          ← baseline cheat sheet
    ↓
CRDs + Operators                       ← extend the API; encode operational knowledge
    ↓
HPA / VPA / KEDA + PDB                ← autoscaling + resilience
    ↓
Admission Webhooks                     ← policy enforcement + defaults injection
    ↓
Service Mesh (Istio/Linkerd)           ← L7 observability + mTLS + traffic control
    ↓
GitOps (ArgoCD/Flux)                   ← cluster state as Git truth
    ↓
Multi-cluster / Cluster API            ← fleet management