Step 6: Advanced

Configure advanced scheduling options and node placement tolerations for specialized deployment requirements.

Quick Toleration Presets

Pre-configured tolerations for common node scheduling scenarios:

Control Plane Access

Taint: control-plane
Effect: NoSchedule
Use Case: Allow scheduling on control plane nodes
When to Use: System services, monitoring tools

Master Node Access (Legacy)

Taint: master
Effect: NoSchedule
Use Case: Legacy master node scheduling
When to Use: Older Kubernetes clusters

GPU Node Access

Taint: gpu
Effect: NoSchedule
Use Case: Schedule on GPU-enabled nodes
When to Use: Machine learning, graphics processing

Spot Instance Access

Taint: arch
Effect: NoSchedule
Use Case: Schedule on spot/preemptible instances
When to Use: Cost-optimized, fault-tolerant workloads

High Memory Nodes

Taint: memory-pressure
Effect: NoSchedule
Use Case: Schedule on high-memory nodes
When to Use: Memory-intensive applications

Dedicated Workload

Taint: dedicated
Effect: NoSchedule
Use Case: Schedule on dedicated nodes
When to Use: Isolated, high-performance workloads

Custom Tolerations

Create specific tolerations for your deployment requirements:

Toleration Configuration

Taint Key: Node taint identifier (e.g., node-role.kubernetes.io/control-plane)
Taint Value: Optional taint value (e.g., true, dedicated)
Operator: Matching logic (Exists ignores value, Equal requires exact match)
Effect: Scheduling behavior (NoSchedule, NoExecute, PreferNoSchedule)

Understanding Tolerations

How Tolerations Work

Tolerations allow pods to be scheduled on nodes with matching taints
Exists operator ignores the value, Equal requires exact match
NoSchedule prevents new pods, NoExecute evicts existing pods
PreferNoSchedule tries to avoid but allows if necessary

Taint Effects Explained

NoSchedule

Prevents new pods from being scheduled on the node
Existing pods continue running
Most common effect for resource isolation

NoExecute

Prevents new pods AND evicts existing pods
Use for immediate node isolation
Can cause service disruption

PreferNoSchedule

Soft constraint - tries to avoid scheduling
Will schedule if no other options available
Good for preferences rather than requirements

Common Use Cases

GPU Workloads

# Allow scheduling on GPU nodes
taint_key: nvidia.com/gpu
operator: Exists
effect: NoSchedule

High Memory Applications

# Schedule on high-memory nodes
taint_key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule

Cost Optimization

# Use spot instances
taint_key: node.kubernetes.io/spot
value: "true"
operator: Equal
effect: NoSchedule

Dedicated Infrastructure

# Dedicated application nodes
taint_key: dedicated
value: myapp
operator: Equal
effect: NoSchedule

Node Selection Strategies

Resource-Based Selection

CPU-Intensive: Target high-CPU nodes
Memory-Intensive: Target high-memory nodes
Storage-Intensive: Target nodes with fast storage
Network-Intensive: Target nodes with high bandwidth

Infrastructure-Based Selection

Bare Metal: For performance-critical applications
Virtual Machines: For standard workloads
Spot Instances: For cost-sensitive, fault-tolerant workloads
Reserved Instances: For predictable, long-running workloads

Geographic Selection

Region-Specific: Comply with data residency requirements
Zone-Specific: Control availability zone placement
Edge Locations: Minimize latency for users

Toleration Best Practices

When to Use Tolerations

Your application requires specific hardware (GPU, high memory)
You need guaranteed node resources for critical workloads
Cost optimization with spot/preemptible instances
Compliance requirements for data isolation

Best Practices

Use Specific Tolerations: Only when necessary for your workload
Avoid Control Plane: Regular applications shouldn't use control plane tolerations
Consider Resource Requirements: Target nodes that match your resource needs
Test in Development: Validate tolerations work before production deployment

Performance Considerations

Specialized nodes may have different performance characteristics
Monitor resource usage on tainted nodes
Plan for node availability when using dedicated infrastructure
Consider failover strategies for specialized workloads

Troubleshooting Tolerations

Common Issues

Pod Stuck Pending: No nodes match toleration requirements
Unexpected Scheduling: Tolerations too broad or permissive
Resource Conflicts: Multiple applications competing for specialized nodes
Node Unavailability: Tainted nodes offline or at capacity

Diagnostic Steps

Check pod events for scheduling failures
Verify node taints match toleration configuration
Confirm target nodes have available resources
Review cluster autoscaling configuration
Validate toleration syntax and values

Next: Proceed to Step 7 for final validation and deployment.

Keyboard shortcuts