Aria Automation fails and cannot redeploy pods

Symptoms:

When logging into Service Broker, the service catalog is empty and UI fails with “internal error”

Logging in via SSH and getting the lates events on the prelude namespace, reveals that Kubernetes is trying to redeploy the containers but cannot find the images in the local docker library.

kubectl get events -n prelude

Getting the deployment events reveals that the images needed to redeploy the containers are not present in the docker image library and has a pull policy of “never”

service docker status

if we look at the latest docker events, we can see, that downloading the missing docker images fails

journalctl -xu kubelet.service 

This output of the kubelet service logs, shows that the Kubelet service deletes the docker images due to disk pressure on the Aria Automation node:

If we then list the available disk space in /data, where /var/lib/docker is mounted shows that 80% of the disk capacity is used event though the node itself is not reporting storage pressure.

df -h /var/lib/docker

If a Kubernetes node experiences disk pressure, kubelet will try to free space by removing unused local images first before evicting pods.

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#reclaim-node-resources

When Aria Automation is deployed, images are downloaded to the local image library and if kubelet deletes these pods due to disk pressure, the deployment of new containers will fail.

Solution:

Extend the disk that are runnung out of disk space in vCenter and run the command:

vra-cli disk-mgr resize

Redeploy Aria Automation by loggin in by SSH as root and execute:

./opt/scripts/deploy.sh