Troubleshooting
Common issues and solutions for container deployment and management on the CLI App Platform.
Deployment Issues
Image Pull Errors
Symptoms: Container stuck in "Pending" or "ImagePullBackOff" state
Common Causes:
- Incorrect image name or tag
- Image doesn't exist in the specified registry
- Private registry authentication issues
- Network connectivity problems
Solutions:
- Verify Image Name: Check image name and tag spelling
- Test Image Access: Verify image exists and is accessible
- Check Registry Credentials: Ensure authentication is properly configured
- Network Connectivity: Verify cluster can reach the registry
Example Debug Commands:
# Test image pull locally
docker pull your-image:tag
# Check image exists
docker search your-image
Resource Constraint Errors
Symptoms: Pod stuck in "Pending" state with scheduling failures
Common Causes:
- Insufficient CPU or memory in the cluster
- Resource requests exceed node capacity
- Node taints preventing scheduling
- Storage provisioning failures
Solutions:
- Check Cluster Capacity: Review available resources in dashboard
- Reduce Resource Requests: Lower CPU/memory requirements
- Review Node Taints: Ensure tolerations match node taints
- Scale Cluster: Add more nodes if consistently resource-constrained
Diagnostic Steps:
- Check pod events in the Overview section
- Review cluster resource utilization
- Verify node availability and capacity
- Check for node taints and scheduling constraints
Configuration Errors
Symptoms: Container fails to start or crashes immediately
Common Causes:
- Invalid environment variables
- Missing required configuration
- Incorrect volume mount paths
- Application configuration errors
Solutions:
- Review Container Logs: Check startup logs for error messages
- Validate Configuration: Verify all environment variables and mounts
- Test Locally: Test configuration with local container runtime
- Minimal Configuration: Start with minimal config and add incrementally
Runtime Issues
Container Crashes and Restarts
Symptoms: High restart count, container status cycling
Common Causes:
- Application bugs or exceptions
- Memory limits exceeded (OOMKilled)
- Health check failures
- External dependency failures
Solutions:
- Analyze Logs: Review container logs for error patterns
- Check Resource Usage: Monitor CPU and memory consumption
- Review Health Checks: Ensure health check endpoints are responsive
- Test Dependencies: Verify external services are accessible
Memory Issues Specific:
- Symptoms: OOMKilled events in container logs
- Solutions: Increase memory limits or optimize application memory usage
Performance Issues
Symptoms: Slow response times, high resource utilization
Common Causes:
- Insufficient resource allocation
- Inefficient application code
- Database or external service bottlenecks
- Network latency issues
Solutions:
- Monitor Metrics: Use performance metrics to identify bottlenecks
- Scale Resources: Increase CPU/memory allocation as needed
- Profile Application: Use application profiling tools
- Optimize Queries: Review and optimize database queries
Network Connectivity Issues
Symptoms: Unable to access application, connection timeouts
Common Causes:
- Incorrect port configuration
- Network policy restrictions
- Load balancer configuration issues
- DNS resolution problems
Solutions:
- Verify Port Configuration: Check exposed ports match application
- Test Internal Connectivity: Use console to test from within cluster
- Check Network Policies: Review network security policies
- Validate DNS: Ensure service discovery is working correctly
Storage Issues
Volume Mount Failures
Symptoms: Container fails to start with mount errors
Common Causes:
- Invalid mount paths
- Volume doesn't exist
- Permission issues
- Storage class problems
Solutions:
- Check Mount Paths: Verify mount paths are valid and don't conflict
- Verify Volumes: Ensure referenced volumes exist
- Review Permissions: Check file system permissions
- Storage Class: Verify storage class is available and accessible
Persistent Volume Issues
Symptoms: Data loss, volume not available
Common Causes:
- Volume not properly configured as persistent
- Storage backend failures
- Incorrect volume size
- Backup/restore issues
Solutions:
- Verify PVC Configuration: Check persistent volume claim settings
- Monitor Storage Usage: Check available storage space
- Test Backup/Restore: Verify backup procedures work correctly
- Check Storage Backend: Ensure storage infrastructure is healthy
Security Issues
Secret Access Problems
Symptoms: Application can't access required secrets
Common Causes:
- Secret doesn't exist
- Incorrect secret name or key
- Permission issues
- Secret not mounted correctly
Solutions:
- Verify Secret Exists: Check secret is created and accessible
- Check Mount Configuration: Verify secret mount configuration
- Review Permissions: Ensure service account has access
- Test Secret Access: Use console to verify secret is available
Permission Denied Errors
Symptoms: Application fails with permission errors
Common Causes:
- Security context restrictions
- File system permissions
- Service account permissions
- Network policy restrictions
Solutions:
- Review Security Context: Check security context configuration
- Check File Permissions: Verify file and directory permissions
- Service Account: Ensure service account has required permissions
- Network Policies: Review network access policies
Monitoring and Logging Issues
Missing Logs
Symptoms: No logs appearing in dashboard
Common Causes:
- Application not writing to stdout/stderr
- Log level configuration issues
- Logging system problems
- Container not running
Solutions:
- Check Application Logging: Ensure app writes to stdout/stderr
- Verify Log Levels: Check log level configuration
- Container Status: Ensure container is running
- Test Log Output: Use console to verify log generation
Metrics Not Available
Symptoms: Performance metrics not showing
Common Causes:
- Metrics collection not enabled
- Application not exposing metrics
- Network connectivity issues
- Monitoring system problems
Solutions:
- Enable Metrics Collection: Ensure metrics are enabled
- Check Metrics Endpoints: Verify application exposes metrics
- Network Connectivity: Test connectivity to metrics endpoints
- Refresh Dashboard: Use refresh button to update metrics
Platform-Specific Issues
Dashboard Access Problems
Symptoms: Cannot access container management interface
Common Causes:
- Authentication issues
- Browser compatibility
- Network connectivity
- Platform maintenance
Solutions:
- Refresh Session: Log out and log back in
- Clear Browser Cache: Clear browser cache and cookies
- Try Different Browser: Test with different web browser
- Check Platform Status: Verify platform is operational
Console Connection Issues
Symptoms: Cannot connect to container console
Common Causes:
- Container not running
- Shell not available in container
- Network connectivity issues
- Browser WebSocket support
Solutions:
- Check Container Status: Ensure container is running
- Verify Shell: Ensure bash/sh is available in container
- Browser Compatibility: Check WebSocket support
- Network Configuration: Review firewall and proxy settings
Diagnostic Tools and Commands
Container Inspection
# Check container processes
ps aux
# Check disk usage
df -h
# Check memory usage
free -h
# Check network configuration
ip addr show
# Test network connectivity
ping google.com
nslookup service-name
Application Debugging
# Check application configuration
env | grep APP_
# Test application endpoints
curl http://localhost:8080/health
# Check application logs
tail -f /var/log/app.log
# Check file permissions
ls -la /app/
Resource Monitoring
# Monitor CPU usage
top
# Monitor memory usage
watch -n 1 free -h
# Monitor disk I/O
iostat 1
# Monitor network traffic
netstat -i
When to Contact Support
Contact platform support when:
- Platform Issues: Dashboard or platform functionality problems
- Persistent Problems: Issues that persist despite troubleshooting
- Security Concerns: Suspected security issues or breaches
- Performance Degradation: Unexplained performance problems
- Data Loss: Issues with data persistence or backups
Information to Provide
When contacting support, include:
- Container Details: Name, ID, and deployment configuration
- Error Messages: Exact error messages and logs
- Timeline: When the issue started and any recent changes
- Steps Taken: Troubleshooting steps already attempted
- Environment: Details about your deployment environment
Emergency Procedures
For critical issues:
- Immediate Response: Stop affected containers if necessary
- Isolate Impact: Prevent issue from spreading
- Document Everything: Capture logs and system state
- Contact Support: Escalate to platform support immediately
- Communication: Notify stakeholders of impact and status