Step 6: Advanced
Configure advanced scheduling options and node placement tolerations for specialized deployment requirements.
Quick Toleration Presets
Pre-configured tolerations for common node scheduling scenarios:
Control Plane Access
- Taint:
control-plane - Effect:
NoSchedule - Use Case: Allow scheduling on control plane nodes
- When to Use: System services, monitoring tools
Master Node Access (Legacy)
- Taint:
master - Effect:
NoSchedule - Use Case: Legacy master node scheduling
- When to Use: Older Kubernetes clusters
GPU Node Access
- Taint:
gpu - Effect:
NoSchedule - Use Case: Schedule on GPU-enabled nodes
- When to Use: Machine learning, graphics processing
Spot Instance Access
- Taint:
arch - Effect:
NoSchedule - Use Case: Schedule on spot/preemptible instances
- When to Use: Cost-optimized, fault-tolerant workloads
High Memory Nodes
- Taint:
memory-pressure - Effect:
NoSchedule - Use Case: Schedule on high-memory nodes
- When to Use: Memory-intensive applications
Dedicated Workload
- Taint:
dedicated - Effect:
NoSchedule - Use Case: Schedule on dedicated nodes
- When to Use: Isolated, high-performance workloads
Custom Tolerations
Create specific tolerations for your deployment requirements:
Toleration Configuration
- Taint Key: Node taint identifier (e.g.,
node-role.kubernetes.io/control-plane) - Taint Value: Optional taint value (e.g.,
true,dedicated) - Operator: Matching logic (
Existsignores value,Equalrequires exact match) - Effect: Scheduling behavior (
NoSchedule,NoExecute,PreferNoSchedule)
Understanding Tolerations
How Tolerations Work
- Tolerations allow pods to be scheduled on nodes with matching taints
- Exists operator ignores the value, Equal requires exact match
- NoSchedule prevents new pods, NoExecute evicts existing pods
- PreferNoSchedule tries to avoid but allows if necessary
Taint Effects Explained
NoSchedule
- Prevents new pods from being scheduled on the node
- Existing pods continue running
- Most common effect for resource isolation
NoExecute
- Prevents new pods AND evicts existing pods
- Use for immediate node isolation
- Can cause service disruption
PreferNoSchedule
- Soft constraint - tries to avoid scheduling
- Will schedule if no other options available
- Good for preferences rather than requirements
Common Use Cases
GPU Workloads
# Allow scheduling on GPU nodes
taint_key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
High Memory Applications
# Schedule on high-memory nodes
taint_key: node.kubernetes.io/memory-pressure
operator: Exists
effect: NoSchedule
Cost Optimization
# Use spot instances
taint_key: node.kubernetes.io/spot
value: "true"
operator: Equal
effect: NoSchedule
Dedicated Infrastructure
# Dedicated application nodes
taint_key: dedicated
value: myapp
operator: Equal
effect: NoSchedule
Node Selection Strategies
Resource-Based Selection
- CPU-Intensive: Target high-CPU nodes
- Memory-Intensive: Target high-memory nodes
- Storage-Intensive: Target nodes with fast storage
- Network-Intensive: Target nodes with high bandwidth
Infrastructure-Based Selection
- Bare Metal: For performance-critical applications
- Virtual Machines: For standard workloads
- Spot Instances: For cost-sensitive, fault-tolerant workloads
- Reserved Instances: For predictable, long-running workloads
Geographic Selection
- Region-Specific: Comply with data residency requirements
- Zone-Specific: Control availability zone placement
- Edge Locations: Minimize latency for users
Toleration Best Practices
When to Use Tolerations
- Your application requires specific hardware (GPU, high memory)
- You need guaranteed node resources for critical workloads
- Cost optimization with spot/preemptible instances
- Compliance requirements for data isolation
Best Practices
- Use Specific Tolerations: Only when necessary for your workload
- Avoid Control Plane: Regular applications shouldn't use control plane tolerations
- Consider Resource Requirements: Target nodes that match your resource needs
- Test in Development: Validate tolerations work before production deployment
Performance Considerations
- Specialized nodes may have different performance characteristics
- Monitor resource usage on tainted nodes
- Plan for node availability when using dedicated infrastructure
- Consider failover strategies for specialized workloads
Troubleshooting Tolerations
Common Issues
- Pod Stuck Pending: No nodes match toleration requirements
- Unexpected Scheduling: Tolerations too broad or permissive
- Resource Conflicts: Multiple applications competing for specialized nodes
- Node Unavailability: Tainted nodes offline or at capacity
Diagnostic Steps
- Check pod events for scheduling failures
- Verify node taints match toleration configuration
- Confirm target nodes have available resources
- Review cluster autoscaling configuration
- Validate toleration syntax and values
Next: Proceed to Step 7 for final validation and deployment.