Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Step 6: Advanced

Configure advanced scheduling options and node placement tolerations for specialized deployment requirements.

Quick Toleration Presets

Pre-configured tolerations for common node scheduling scenarios:

Control Plane Access

  • Taint: control-plane
  • Effect: NoSchedule
  • Use Case: Allow scheduling on control plane nodes
  • When to Use: System services, monitoring tools

Master Node Access (Legacy)

  • Taint: master
  • Effect: NoSchedule
  • Use Case: Legacy master node scheduling
  • When to Use: Older Kubernetes clusters

GPU Node Access

  • Taint: gpu
  • Effect: NoSchedule
  • Use Case: Schedule on GPU-enabled nodes
  • When to Use: Machine learning, graphics processing

Spot Instance Access

  • Taint: arch
  • Effect: NoSchedule
  • Use Case: Schedule on spot/preemptible instances
  • When to Use: Cost-optimized, fault-tolerant workloads

High Memory Nodes

  • Taint: memory-pressure
  • Effect: NoSchedule
  • Use Case: Schedule on high-memory nodes
  • When to Use: Memory-intensive applications

Dedicated Workload

  • Taint: dedicated
  • Effect: NoSchedule
  • Use Case: Schedule on dedicated nodes
  • When to Use: Isolated, high-performance workloads

Custom Tolerations

Create specific tolerations for your deployment requirements:

Toleration Configuration

  • Taint Key: Node taint identifier (e.g., node-role.kubernetes.io/control-plane)
  • Taint Value: Optional taint value (e.g., true, dedicated)
  • Operator: Matching logic (Exists ignores value, Equal requires exact match)
  • Effect: Scheduling behavior (NoSchedule, NoExecute, PreferNoSchedule)

Understanding Tolerations

How Tolerations Work

  • Tolerations allow pods to be scheduled on nodes with matching taints
  • Exists operator ignores the value, Equal requires exact match
  • NoSchedule prevents new pods, NoExecute evicts existing pods
  • PreferNoSchedule tries to avoid but allows if necessary

Taint Effects Explained

NoSchedule

  • Prevents new pods from being scheduled on the node
  • Existing pods continue running
  • Most common effect for resource isolation

NoExecute

  • Prevents new pods AND evicts existing pods
  • Use for immediate node isolation
  • Can cause service disruption

PreferNoSchedule

  • Soft constraint - tries to avoid scheduling
  • Will schedule if no other options available
  • Good for preferences rather than requirements

Common Use Cases

GPU Workloads

# Allow scheduling on GPU nodes
taint_key: nvidia.com/gpu
operator: Exists
effect: NoSchedule

High Memory Applications

# Schedule on high-memory nodes
taint_key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule

Cost Optimization

# Use spot instances
taint_key: node.kubernetes.io/spot
value: "true"
operator: Equal
effect: NoSchedule

Dedicated Infrastructure

# Dedicated application nodes
taint_key: dedicated
value: myapp
operator: Equal
effect: NoSchedule

Node Selection Strategies

Resource-Based Selection

  • CPU-Intensive: Target high-CPU nodes
  • Memory-Intensive: Target high-memory nodes
  • Storage-Intensive: Target nodes with fast storage
  • Network-Intensive: Target nodes with high bandwidth

Infrastructure-Based Selection

  • Bare Metal: For performance-critical applications
  • Virtual Machines: For standard workloads
  • Spot Instances: For cost-sensitive, fault-tolerant workloads
  • Reserved Instances: For predictable, long-running workloads

Geographic Selection

  • Region-Specific: Comply with data residency requirements
  • Zone-Specific: Control availability zone placement
  • Edge Locations: Minimize latency for users

Toleration Best Practices

When to Use Tolerations

  • Your application requires specific hardware (GPU, high memory)
  • You need guaranteed node resources for critical workloads
  • Cost optimization with spot/preemptible instances
  • Compliance requirements for data isolation

Best Practices

  • Use Specific Tolerations: Only when necessary for your workload
  • Avoid Control Plane: Regular applications shouldn't use control plane tolerations
  • Consider Resource Requirements: Target nodes that match your resource needs
  • Test in Development: Validate tolerations work before production deployment

Performance Considerations

  • Specialized nodes may have different performance characteristics
  • Monitor resource usage on tainted nodes
  • Plan for node availability when using dedicated infrastructure
  • Consider failover strategies for specialized workloads

Troubleshooting Tolerations

Common Issues

  • Pod Stuck Pending: No nodes match toleration requirements
  • Unexpected Scheduling: Tolerations too broad or permissive
  • Resource Conflicts: Multiple applications competing for specialized nodes
  • Node Unavailability: Tainted nodes offline or at capacity

Diagnostic Steps

  1. Check pod events for scheduling failures
  2. Verify node taints match toleration configuration
  3. Confirm target nodes have available resources
  4. Review cluster autoscaling configuration
  5. Validate toleration syntax and values

Next: Proceed to Step 7 for final validation and deployment.