- Container Exec Die Kubernetes In Docker 2
- Container Exec Die Kubernetes In Docker Online
- Container Exec Die Kubernetes In Docker Tutorial
If you’ve used Kubernetes (k8s), you’ve probably bumped into the dreaded CrashLoopBackOff. A CrashLoopBackOff is possible for several types of k8s misconfigurations (not able to connect to persistent volumes, init-container misconfiguration, etc). We aren’t going to cover how to configure k8s properly in this article, but instead will focus on the harder problem of debugging your code or, even worse, someone else’s code 😱
Use Docker for packaging and shipping the app. Employ Kubernetes to deploy and scale your app. Startups or small companies with fewer containers usually can manage them without having to use Kubernetes, but as the companies grow, their infrastructure needs will rise; hence, the number of containers will increase, which can be difficult to manage.
- Nginx container not restarting after shutdown in Kubernetes. 26th March 2021 docker, google-kubernetes-engine, kubernetes, nginx, nginx-ingress. We’ve encountered an issue with our deployment on Kubernetes, recently. It would seem that, randomly, our NGINX front end containers, that serve our Front End application, seemingly die.
- Docker spawns a container in the containers own network namespace (use the CLONENEWNET flag defined in sched.h when calling the clone system call to create a new network namespace for the subprocess) and later on runs a veth pair (a cable with two ends) between the container namespace and the host network stack.
Here is the output from kubectl describe pod for a CrashLoopBackOff:
Two common problems when starting a container are OCI runtime create failed (which means you are referencing a binary or script that doesn’t exist on the container) and container “Completed” or “Error” which both mean that the code executing on the container failed to run a service and stay running.
Here’s an example of an OCI runtime error, trying to execute: “hello crashloop”:
K8s gives you the exit status of the process in the container when you look at a pod using kubectl or k9s. Common exit statuses from unix processes include 1-125. Each unix command usually has a man page, which provides more details around the various exit codes. Exit code (128 + SIGKILL 9) 137 means that k8s hit the memory limit for your pod and killed your container for you.
Here is the output from kubectl describe pod, showing the container exit code:
All containers are not created equally.
Docker allows you to define an Entrypoint and Cmd which you can mix and match in a Dockerfile. Entrypoint is the executable, and Cmd are the arguments passed to the Entrypoint. The Dockerfile schema is quite lenient and allows users to set Cmd without Entrypoint, which means that the first argument in Cmd will be the executable to run.
Note: k8s uses a different naming convention for Docker Entrypoint and Cmd. In Kubernetes command is Docker Entrypoint and Kubernetes args is Docker Cmd.
Description | The command run by the container | Arguments passed to the command |
---|---|---|
Docker field name | Entrypoint | Cmd |
Kubernetes field name | Cmd | args |
There are a few tricks to understanding how the container you’re working with starts up. In order to get the startup command when you’re dealing with someone else’s container, we need to know the intended Docker Entrypoint and Cmd of the Docker image. If you have the Dockerfile that created the Docker image, then you likely already know the Entrypoint and Cmd, unless you aren’t defining them and inheriting from a base image that has them set.
When dealing with either off the shelf containers, using someone else’s container and you don’t have the Dockerfile, or you’re inheriting from a base image that you don’t have the Dockerfile for, you can use the following steps to get the values you need. First, we pull the container locally using docker pull, then we inspect the container image to get the Entrypoint and Cmd:
- docker pull <image></image>
- docker inspect <image></image>
Here we use jq to filter the JSON response from docker inspect:
The Dreaded CrashLoopBackOff
Now that you have all that background, let’s get to debugging the CrashLoopBackOff.
In order to understand what’s happening, it’s important to be able to inspect the container inside of k8s so the application has all the environment variables and dependent services. Updating the deployment and setting the container Entrypoint or k8s command temporarily to tail -f /dev/null or sleep infinity will give you an opportunity to debug why the service doesn’t stay running.
Here’s how to configure k8s to override the container Entrypoint:
Here’s the configuration in Release:
Container Exec Die Kubernetes In Docker 2
You can now use kubectl or k9s to exec into the container and take a look around. Using the Entrypoint and Cmd you discovered earlier, you can execute the intended startup command and see how the application is failing.
Depending on the container you’re running, it may be missing many of the tools necessary to debug your problem like: curl, lsof, vim; and if it’s someone else’s code, you probably don’t know which version of linux was used to create the image. We typically try all of the common package managers until we find the right one. Most containers these days use Alpine Linux (apk package manager) or a Debian, Ubuntu (apt-get package manager) based image. In some cases we’ve seen Centos and Fedora, which both use the yum package manager.
One of the following commands should work depending on the operating system:
- apk
- apt-get
- yum
Dockerfile maintainers often remove the cache from the package manager to shrink the size of the image, so you may also need to run one of the following:
- apk update
- apt-get update
- yum makecache
Now you need to add the necessary tools to help with debugging. Depending on the package manager you found, use one of the following commands to add useful debugging tools:
- apt-get install -y curl vim procps inetutils-tools net-tools lsof
- apk add curl vim procps net-tools lsof
- yum install curl vim procps lsof
At this point, it’s up to you to figure out the problem. You can edit files using vim to tweak the container until you understand what’s going on. If you forget all of the files you’ve touched on the container, you can alway kill the pod and the container will restart without your changes. Always remember to write down the steps taken to get the container working. You’ll want to use your notes to alter the Dockerfile or add commands to the container startup scripts.
Debugging Your Containers
We have created a simple script to get all of the debuging tools, as long as you are working with a container that has curl pre-installed:
Container Exec Die Kubernetes In Docker Online
Conclusion
In this article, we’ve learnt how to spot and investigate the CrashLoopBackOff errors in containers. We walked you through how to inspect and investigate the container image itself. We’ve listed and shown some tools that we use to spot problems and investigate issues. We got several useful and basic tools installed on the image, hopefully regardless of base image. With these steps in mind and all the tools ready at your disposal, go forth and fix all the things!
Let’s say we have deployed our .net application into a pod that runs in Kubernetes (or a docker container) and somehow our users report some sort of slowness in that application. How do we find out the problem? Let’s find out, step by step!
Login to a terminal in your container: If you use Kubernetes, make sure you have your kube .config is ready and deployed in ~/.kube/config. If you do use WSL with docker desktop, the /.kube/config file is shared within windows and WSL.
List the pods with the below command:
Then find the relevant container from the list and login into it with:
If you don’t run Kubernetes but just docker use:
Download .net core SDK: Now we will need diagnostic tools and for the installation of these tools we need .net SDK installed since typically production containers won’t ship with these tools (unless you a sidecar). Get the relevant SDK from https://dotnet.microsoft.com/download/dotnet-core.
You could use a command like below on Linux shell (this is for .net core 3.1, use a different link for .net 5):
Extract and install .net core sdk:
Switch to dotnet folder:
Although we added dotnet directory to the path, it is likely the dotnet runtime binary will have priority so we need to switch to the directory which we have extracted the zip file:
Install the tools:
Add tools directory to the path:
In order to access the tools, we add tools to the path
Observe the root process:
The default process runs with process id 1 in the container so:
You see something like below (arrows added by me):
Here we observe several performance counters. We can observe the CPU percentage to find out if this is a CPU bound problem. If you see a number stuck to 12 or 15 or 25 and not much fluctuating, be careful, it is likely this means you are using 100% of a single core and since you have many cores, the tool only shows a number like 12 (100/8 cores) Also as of now there is no built-in performance counter showing total private memory consumption including the unmanaged parts.
Collect a perfview trace:
Wait like 60 sec, then stop by hitting CTRL+C, In my case, it creates a file:
This is a trace file you should open via a tool like perfview. This kind of trace will include wall-clock thread-time (find the slowest functions including I/O), all exceptions and other memory statistics.
Download the trace file:Once you collected the trace copy the file from container to your local
where path_to_source could be
/root/trace.nettrace
whereas local_path_including_the_file_name could betrace.nettrace
.For docker only use below:
Then when you open the trace file with Perfview (Use the latest perfview, not trace view) you could see wall clock CPU analysis:
Or the exceptions: