OpenShift Debugging 101

I’ve been working with the radanalytics.io big data examples for OpenShift recently and every once in a while I would be on a slow network and plagued with inconsistency in deploys to get the entire example running.  I finally reached out for help and got some great debugging advice so I wanted to share some of the basics on how to tell what is going on when a deployment just isn’t finishing and you aren’t getting much information about it.

Here was my scenario.  I was running through the Value at Risk example and sometimes the Oshinko Web UI wouldn’t deploy properly.  Sometimes I would get past that but then the Spark containers wouldn’t deploy.  Looking in the deployment logs didn’t really help as I would only see something like:

--> Scaling sparky-m-1 to 1
--> Waiting up to 10m0s for pods in rc sparky-m-1 to become ready

The trick was that I had to figure out what was actually happening in that deployment.  The first step was to run oc get pods to find the non-deployment pod:

[mhicks@localhost bigdata]$ oc get pods
NAME READY STATUS RESTARTS AGE
oshinko-1-73jg7 1/1 Running 0 17m
sparky-m-1-deploy 1/1 Running 0 7m
sparky-m-1-r1vvj 0/1 ContainerCreating 0 7m
sparky-w-1-2qg0m 0/1 ContainerCreating 0 7m
sparky-w-1-deploy 1/1 Running 0 7m

I’ve bolded the two pods in this example.

Next, I needed to figure out what was happening in those deployments.  You can get this from the oc describe <pod> command if you look in the events section (clipped output below):

[mhicks@localhost bigdata]$ oc describe pod sparky-m-1-r1vvj
Name: sparky-m-1-r1vvj
Namespace: myproject
...
Events:
 FirstSeen LastSeen Count From SubObjectPath Type Reason Message
 --------- -------- ----- ---- ------------- -------- ------ -------
 8m 8m 1 {default-scheduler } Normal Scheduled Successfully assigned sparky-m-1-r1vvj to 192.168.10.222
 8m 8m 1 {kubelet 192.168.10.222} spec.containers{sparky-m} Normal Pulling pulling image "willb/var-spark-worker"

Interesting…  Check out that last event with pulling image “willb/var-spark-worker”.  That means that it’s still doing a docker pull.

The last step is to be able to check the progress of that docker pull.  That’s simple enough by actually running docker pull <image> on the same image.

[mhicks@localhost bigdata]$ docker pull willb/var-spark-worker
Using default tag: latest
Trying to pull repository docker.io/willb/var-spark-worker ... 
sha256:70a5248e91444b96c66d0555df23c41938a7ae68e16941ee47f8ce3ed49a965a: Pulling from docker.io/willb/var-spark-worker
8d30e94188e7: Already exists 
b4cef18dbaf6: Already exists 
67005339c478: Downloading [=============================> ] 110.8 MB/187.5 MB
4c505a838158: Download complete 
28001ba6816a: Download complete 
6f9875b2f6b6: Downloading [==========> ] 38.91 MB/187.5 MB
a0ccab00fadc: Download complete 
9d123a390bac: Downloading [===============> ] 13.72 MB/44.25 MB
4869f3d7d89e: Waiting 
a25f81ddacf4: Waiting 
ff249abd99d4: Waiting

And there you have it.  Now you can not only know what is holding up your deployment but you can also track the progress to really know when it’s done.

Hope this helps!