Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Troubleshooting

Common issues and solutions for container deployment and management on the CLI App Platform.

Deployment Issues

Image Pull Errors

Symptoms: Container stuck in "Pending" or "ImagePullBackOff" state

Common Causes:

  • Incorrect image name or tag
  • Image doesn't exist in the specified registry
  • Private registry authentication issues
  • Network connectivity problems

Solutions:

  1. Verify Image Name: Check image name and tag spelling
  2. Test Image Access: Verify image exists and is accessible
  3. Check Registry Credentials: Ensure authentication is properly configured
  4. Network Connectivity: Verify cluster can reach the registry

Example Debug Commands:

# Test image pull locally
docker pull your-image:tag

# Check image exists
docker search your-image

Resource Constraint Errors

Symptoms: Pod stuck in "Pending" state with scheduling failures

Common Causes:

  • Insufficient CPU or memory in the cluster
  • Resource requests exceed node capacity
  • Node taints preventing scheduling
  • Storage provisioning failures

Solutions:

  1. Check Cluster Capacity: Review available resources in dashboard
  2. Reduce Resource Requests: Lower CPU/memory requirements
  3. Review Node Taints: Ensure tolerations match node taints
  4. Scale Cluster: Add more nodes if consistently resource-constrained

Diagnostic Steps:

  1. Check pod events in the Overview section
  2. Review cluster resource utilization
  3. Verify node availability and capacity
  4. Check for node taints and scheduling constraints

Configuration Errors

Symptoms: Container fails to start or crashes immediately

Common Causes:

  • Invalid environment variables
  • Missing required configuration
  • Incorrect volume mount paths
  • Application configuration errors

Solutions:

  1. Review Container Logs: Check startup logs for error messages
  2. Validate Configuration: Verify all environment variables and mounts
  3. Test Locally: Test configuration with local container runtime
  4. Minimal Configuration: Start with minimal config and add incrementally

Runtime Issues

Container Crashes and Restarts

Symptoms: High restart count, container status cycling

Common Causes:

  • Application bugs or exceptions
  • Memory limits exceeded (OOMKilled)
  • Health check failures
  • External dependency failures

Solutions:

  1. Analyze Logs: Review container logs for error patterns
  2. Check Resource Usage: Monitor CPU and memory consumption
  3. Review Health Checks: Ensure health check endpoints are responsive
  4. Test Dependencies: Verify external services are accessible

Memory Issues Specific:

  • Symptoms: OOMKilled events in container logs
  • Solutions: Increase memory limits or optimize application memory usage

Performance Issues

Symptoms: Slow response times, high resource utilization

Common Causes:

  • Insufficient resource allocation
  • Inefficient application code
  • Database or external service bottlenecks
  • Network latency issues

Solutions:

  1. Monitor Metrics: Use performance metrics to identify bottlenecks
  2. Scale Resources: Increase CPU/memory allocation as needed
  3. Profile Application: Use application profiling tools
  4. Optimize Queries: Review and optimize database queries

Network Connectivity Issues

Symptoms: Unable to access application, connection timeouts

Common Causes:

  • Incorrect port configuration
  • Network policy restrictions
  • Load balancer configuration issues
  • DNS resolution problems

Solutions:

  1. Verify Port Configuration: Check exposed ports match application
  2. Test Internal Connectivity: Use console to test from within cluster
  3. Check Network Policies: Review network security policies
  4. Validate DNS: Ensure service discovery is working correctly

Storage Issues

Volume Mount Failures

Symptoms: Container fails to start with mount errors

Common Causes:

  • Invalid mount paths
  • Volume doesn't exist
  • Permission issues
  • Storage class problems

Solutions:

  1. Check Mount Paths: Verify mount paths are valid and don't conflict
  2. Verify Volumes: Ensure referenced volumes exist
  3. Review Permissions: Check file system permissions
  4. Storage Class: Verify storage class is available and accessible

Persistent Volume Issues

Symptoms: Data loss, volume not available

Common Causes:

  • Volume not properly configured as persistent
  • Storage backend failures
  • Incorrect volume size
  • Backup/restore issues

Solutions:

  1. Verify PVC Configuration: Check persistent volume claim settings
  2. Monitor Storage Usage: Check available storage space
  3. Test Backup/Restore: Verify backup procedures work correctly
  4. Check Storage Backend: Ensure storage infrastructure is healthy

Security Issues

Secret Access Problems

Symptoms: Application can't access required secrets

Common Causes:

  • Secret doesn't exist
  • Incorrect secret name or key
  • Permission issues
  • Secret not mounted correctly

Solutions:

  1. Verify Secret Exists: Check secret is created and accessible
  2. Check Mount Configuration: Verify secret mount configuration
  3. Review Permissions: Ensure service account has access
  4. Test Secret Access: Use console to verify secret is available

Permission Denied Errors

Symptoms: Application fails with permission errors

Common Causes:

  • Security context restrictions
  • File system permissions
  • Service account permissions
  • Network policy restrictions

Solutions:

  1. Review Security Context: Check security context configuration
  2. Check File Permissions: Verify file and directory permissions
  3. Service Account: Ensure service account has required permissions
  4. Network Policies: Review network access policies

Monitoring and Logging Issues

Missing Logs

Symptoms: No logs appearing in dashboard

Common Causes:

  • Application not writing to stdout/stderr
  • Log level configuration issues
  • Logging system problems
  • Container not running

Solutions:

  1. Check Application Logging: Ensure app writes to stdout/stderr
  2. Verify Log Levels: Check log level configuration
  3. Container Status: Ensure container is running
  4. Test Log Output: Use console to verify log generation

Metrics Not Available

Symptoms: Performance metrics not showing

Common Causes:

  • Metrics collection not enabled
  • Application not exposing metrics
  • Network connectivity issues
  • Monitoring system problems

Solutions:

  1. Enable Metrics Collection: Ensure metrics are enabled
  2. Check Metrics Endpoints: Verify application exposes metrics
  3. Network Connectivity: Test connectivity to metrics endpoints
  4. Refresh Dashboard: Use refresh button to update metrics

Platform-Specific Issues

Dashboard Access Problems

Symptoms: Cannot access container management interface

Common Causes:

  • Authentication issues
  • Browser compatibility
  • Network connectivity
  • Platform maintenance

Solutions:

  1. Refresh Session: Log out and log back in
  2. Clear Browser Cache: Clear browser cache and cookies
  3. Try Different Browser: Test with different web browser
  4. Check Platform Status: Verify platform is operational

Console Connection Issues

Symptoms: Cannot connect to container console

Common Causes:

  • Container not running
  • Shell not available in container
  • Network connectivity issues
  • Browser WebSocket support

Solutions:

  1. Check Container Status: Ensure container is running
  2. Verify Shell: Ensure bash/sh is available in container
  3. Browser Compatibility: Check WebSocket support
  4. Network Configuration: Review firewall and proxy settings

Diagnostic Tools and Commands

Container Inspection

# Check container processes
ps aux

# Check disk usage
df -h

# Check memory usage
free -h

# Check network configuration
ip addr show

# Test network connectivity
ping google.com
nslookup service-name

Application Debugging

# Check application configuration
env | grep APP_

# Test application endpoints
curl http://localhost:8080/health

# Check application logs
tail -f /var/log/app.log

# Check file permissions
ls -la /app/

Resource Monitoring

# Monitor CPU usage
top

# Monitor memory usage
watch -n 1 free -h

# Monitor disk I/O
iostat 1

# Monitor network traffic
netstat -i

When to Contact Support

Contact platform support when:

  • Platform Issues: Dashboard or platform functionality problems
  • Persistent Problems: Issues that persist despite troubleshooting
  • Security Concerns: Suspected security issues or breaches
  • Performance Degradation: Unexplained performance problems
  • Data Loss: Issues with data persistence or backups

Information to Provide

When contacting support, include:

  1. Container Details: Name, ID, and deployment configuration
  2. Error Messages: Exact error messages and logs
  3. Timeline: When the issue started and any recent changes
  4. Steps Taken: Troubleshooting steps already attempted
  5. Environment: Details about your deployment environment

Emergency Procedures

For critical issues:

  1. Immediate Response: Stop affected containers if necessary
  2. Isolate Impact: Prevent issue from spreading
  3. Document Everything: Capture logs and system state
  4. Contact Support: Escalate to platform support immediately
  5. Communication: Notify stakeholders of impact and status