Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Installation

Follow this guide to install the Node Readiness Controller in your Kubernetes cluster.

Prerequisites

If you plan to use the install-full.yaml option (which includes secure metrics and the validating admission webhook), you must first have cert-manager installed in your cluster.

Deployment Options

First, to install the CRDs, apply the crds.yaml manifest:

# Replace with the desired version
VERSION=v0.1.1
kubectl apply -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/crds.yaml
kubectl wait --for condition=established --timeout=30s crd/nodereadinessrules.readiness.node.x-k8s.io

2. Install the Controller

Choose one of the two following manifests based on your requirements:

ManifestContentsPrerequisites
install.yamlCore ControllerNone
install-full.yamlCore Controller + Metrics (Secure) + Validation Webhookcert-manager

Standard Installation (Minimal): The simplest way to deploy the controller with no external dependencies.

kubectl apply -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/install.yaml

Full Installation (Production Ready): Includes secure metrics (TLS-protected) and validating webhooks for rule conflict prevention. Requires cert-manager to be installed in your cluster.

kubectl apply -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/install-full.yaml

This will deploy the controller into the nrr-system namespace on any available node in your cluster.

Controller priority

The controller is deployed with system-cluster-critical priority to prevent eviction during node resource pressure.

If it gets evicted during resource pressure, nodes can’t transition to Ready state, blocking all workload scheduling cluster-wide.

This is the priority class used by other critical cluster components (eg: core-dns).

Images

The official releases use multi-arch images (AMD64, Arm64) and are available at registry.k8s.io/node-readiness-controller/node-readiness-controller

REPO="registry.k8s.io/node-readiness-controller/node-readiness-controller"
TAG=$(skopeo list-tags docker://$REPO | jq .'Tags[-1]' | tr -d '"')
docker pull $REPO:$TAG

Option 2: Advanced Deployment (Kustomize)

If you need deeper customization, you can use Kustomize directly from the source.

# 1. Install CRDs
kubectl apply -k config/crd

# 2. Deploy Controller with default configuration
kubectl apply -k config/default

You can enable optional components (Metrics, TLS, Webhook) by creating a kustomization.yaml that includes the relevant components from the config/ directory. For reference on how these components can be combined, see the deploy-with-metrics, deploy-with-tls, deploy-with-webhook, and deploy-full targets in the projects Makefile.

Option 3: Deploy as a Static Pod (Control Plane)

Running the controller as a Static Pod on control-plane nodes is useful for self-managed clusters (e.g., kubeadm) where you want the controller to be available alongside core components like the API server.

Refer to the examples/static-pod/node-readiness-controller.yaml for a detailed example on deploying the controller as a static pod in a kind cluster.

Deployment Steps

  1. Prepare the Manifest: Refer the example manifest in examples/static-pod/node-readiness-controller.yaml. This manifest handles kubeconfig with a initContainer and necessary flags for leader election.

  2. Deploy to Nodes: Use Ansible / Terraform to copy the manifest to the /etc/kubernetes/manifests/ directory on each control-plane node. The Kubelet will automatically detect and start the pod.

  3. Install CRDs:

    kubectl apply -k config/crd
    

    This is typically handled via a bootstrap script or post-install job in a kubeadm setup.


Verification

After installation, verify that the controller is running successfully.

Note: Replace ${NAMESPACE} with the namespace where the controller is deployed (typically nrr-system for standard deployments, or kube-system for static pods).

  1. Check Pod Status:

    kubectl get pods -n ${NAMESPACE} -l component=node-readiness-controller
    

    You should see the controller pods in Running status.

  2. Check Logs:

    kubectl logs -n ${NAMESPACE} -l component=node-readiness-controller
    

    Look for “Starting EventSource” or “Starting Controller” messages indicating the manager is active.

  3. Verify CRDs:

    kubectl get crd nodereadinessrules.readiness.node.x-k8s.io
    
  4. Verify High Availability: In an HA cluster, verify that one instance has acquired the leader lease:

    # The lease namespace should match the controller's namespace (configured via --leader-election-namespace)
    kubectl get lease -n ${NAMESPACE} ba65f13e.readiness.node.x-k8s.io
    

Uninstallation

IMPORTANT: Follow this order to avoid “stuck” resources.

The controller uses a finalizer (readiness.node.x-k8s.io/cleanup-taints) on NodeReadinessRule resources to ensure taints are safely removed from nodes before a rule is deleted.

You must delete all rule objects before deleting the controller.

  1. Delete all Rules:

    kubectl delete nodereadinessrules --all
    

    Wait for this command to complete. This ensures the running controller removes its taints from your nodes.

  2. Uninstall Controller:

    # If installed via release manifest
    kubectl delete -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/install.yaml
    
    # Or if using the full manifest
    kubectl delete -f https://github.com/kubernetes-sigs/node-readiness-controller/releases/download/${VERSION}/install-full.yaml
    
    # OR if using Kustomize
    kubectl delete -k config/default
    
    # OR if using Static Pods
    # Remove the manifest from /etc/kubernetes/manifests/ on all control-plane nodes
    
  3. Uninstall CRDs (Optional):

    kubectl delete -k config/crd
    

Recovering from Stuck Resources

If you accidentally deleted the controller before the rules, the NodeReadinessRule objects will get stuck in a Terminating state because the controller is needed to cleanup the taints and finalizers.

To force-delete them (this will require you to manually clean up the managed taints if any on your nodes):

# Patch the finalizer to remove it
kubectl patch nodereadinessrule <rule-name> -p '{"metadata":{"finalizers":[]}}' --type=merge

Troubleshooting Deployment

RBAC Permissions If the controller logs show “Forbidden” errors, verify the ClusterRole bindings:

kubectl describe clusterrole nrr-manager-role

It requires nodes (update/patch) and nodereadinessrules (all) access.

Debug Logging To enable verbose logging for deeper investigation:

kubectl patch deployment -n nrr-system nrr-controller-manager \
  -p '{"spec":{"template":{"spec":{"containers":[{"name":"manager","args":["--zap-log-level=debug"]}]}}}}'