OpenShift Debugging 101

I’ve been working with the radanalytics.io big data examples for OpenShift recently and every once in a while I would be on a slow network and plagued with inconsistency in deploys to get the entire example running.  I finally reached out for help and got some great debugging advice so I wanted to share some of the basics on how to tell what is going on when a deployment just isn’t finishing and you aren’t getting much information about it.

Here was my scenario.  I was running through the Value at Risk example and sometimes the Oshinko Web UI wouldn’t deploy properly.  Sometimes I would get past that but then the Spark containers wouldn’t deploy.  Looking in the deployment logs didn’t really help as I would only see something like:

--> Scaling sparky-m-1 to 1
--> Waiting up to 10m0s for pods in rc sparky-m-1 to become ready

The trick was that I had to figure out what was actually happening in that deployment.  The first step was to run oc get pods to find the non-deployment pod:

[mhicks@localhost bigdata]$ oc get pods
NAME READY STATUS RESTARTS AGE
oshinko-1-73jg7 1/1 Running 0 17m
sparky-m-1-deploy 1/1 Running 0 7m
sparky-m-1-r1vvj 0/1 ContainerCreating 0 7m
sparky-w-1-2qg0m 0/1 ContainerCreating 0 7m
sparky-w-1-deploy 1/1 Running 0 7m

I’ve bolded the two pods in this example.

Next, I needed to figure out what was happening in those deployments.  You can get this from the oc describe <pod> command if you look in the events section (clipped output below):

[mhicks@localhost bigdata]$ oc describe pod sparky-m-1-r1vvj
Name: sparky-m-1-r1vvj
Namespace: myproject
...
Events:
 FirstSeen LastSeen Count From SubObjectPath Type Reason Message
 --------- -------- ----- ---- ------------- -------- ------ -------
 8m 8m 1 {default-scheduler } Normal Scheduled Successfully assigned sparky-m-1-r1vvj to 192.168.10.222
 8m 8m 1 {kubelet 192.168.10.222} spec.containers{sparky-m} Normal Pulling pulling image "willb/var-spark-worker"

Interesting…  Check out that last event with pulling image “willb/var-spark-worker”.  That means that it’s still doing a docker pull.

The last step is to be able to check the progress of that docker pull.  That’s simple enough by actually running docker pull <image> on the same image.

[mhicks@localhost bigdata]$ docker pull willb/var-spark-worker
Using default tag: latest
Trying to pull repository docker.io/willb/var-spark-worker ... 
sha256:70a5248e91444b96c66d0555df23c41938a7ae68e16941ee47f8ce3ed49a965a: Pulling from docker.io/willb/var-spark-worker
8d30e94188e7: Already exists 
b4cef18dbaf6: Already exists 
67005339c478: Downloading [=============================> ] 110.8 MB/187.5 MB
4c505a838158: Download complete 
28001ba6816a: Download complete 
6f9875b2f6b6: Downloading [==========> ] 38.91 MB/187.5 MB
a0ccab00fadc: Download complete 
9d123a390bac: Downloading [===============> ] 13.72 MB/44.25 MB
4869f3d7d89e: Waiting 
a25f81ddacf4: Waiting 
ff249abd99d4: Waiting

And there you have it.  Now you can not only know what is holding up your deployment but you can also track the progress to really know when it’s done.

Hope this helps!

Advertisements

Fedora 20 on a Thinkpad X1 Carbon (20A7)

Time to try out some new hardware.  My experience so far with the Thinkpad X1 Carbon has been great and will get even better over time.  Most of the things that I’m going to cover in this blog have already been fixed in various projects and I expect that many of them will land in Fedora 21.  However, until that time, I want to make sure that Fedora 20 users can have a great experience with the Thinkpad X1 Carbon (model 20A7), assuming they are willing to tweak a bit.

Step 1 – Disable UEFI Boot for installation

To do an easy install just disable the UEFI Boot in your BIOS and hook up your installation source (USB, PXE over the net, etc).  Very simple to get going.

Step 2 – Fix Suspend / Resume and USB3

Resuming from suspend is going to fail because of a problem with the firmware and the USB3 driver.  You have a couple options.  The first is to disable USB3 in the BIOS and move on.  The second if to update your BIOS which is trickier.  Do not update your BIOS using my instructions unless you know exactly what you are doing.  You can brick (i.e. ruin) your machine if you do it wrong.

Option A (Easy) – Disable USB3 in the BIOS

To do an easy install just disable the UEFI Boot in your BIOS and hook up your installation source (USB, PXE over the net, etc).  Very simple to get going.

As for disabling USB3, there is evidently a USB3 driver problem that keep the machine from un-suspending.  I’m going to investigate updating the BIOS to see if it fixes this, but an easy fix for right now is to disable USB3 and suspend resume works great.

Option B (Danger) – Update your BIOS to version 1.13+ (AT YOUR OWN RISK)

I’ll be honest, I even contemplated as to whether to put these instructions in here.  At the end of the day though, I figure I might as well pass along what worked for me.  Seriously though, if you mess up a BIOS update, you can ruin your machine so if you don’t know what you are doing, just turn off USB3.  However, if you want to update the firmware, this is what I did.

Step 1 – Download the geteltorito.pl script.  You can download the one I used here.

Step 2 – Get a USB drive that can be erased and plug it in.  Figure out which device that drive is.  I usually just run ‘fdisk’ to figure out.  Keep in mind that if you see /dev/sdb1 in fdisk, your device is actually going to be /dev/sdb (with no number at the end).

Step 3 – Download the BIOS ISO image from here.

MAKE SURE YOU GET THE BIOS FOR YOUR MODEL NUMBER LAPTOP.  For example, I downloaded the driver ‘BIOS Update (Bootable CD) for Windows 8.1 (64-bit), 8 (64-bit), 7 (32-bit, 64-bit) – ThinkPad X1 Carbon (Machine types: 20A7, 20A8)‘.  The filename was gruj08us.iso.

Step 4 – Convert the downloaded ISO to a bootable image, named bios-update.iso

perl geteltorito.pl -o bios-update.iso gruj08us.iso

Step 5 – Copy that bootable image to your USB drive.  I’m using /dev/sdx below which you need to replace with your USB device.  Double check that you have the device name right for your USB drive and run:

sudo dd if=bios-update.iso of=/dev/sdx bs=512K
sudo sync

Step 6 – Reboot and press F12 to get the boot menu and boot from the USB.  Follow the instructions to update your BIOS.

Step 3 – Add MattOnCloud Repository

I’ve created a yum repository that contains a RPM that contains various fixes and repositories used in this blog.  I’m keeping the source on GitHub and pull requests are definitely appreciated.  To install my repository, run:

sudo rpm -Uvh https://files-oncloud.rhcloud.com/yum/RPMS/x86_64/oncloud-repo-0.4-1.fc20.x86_64.rpm

To apply the fixes, then run:

sudo yum install thinkpad-fixes

Step 4 – Update GNOME

Since the Thinkpad Carbon X1 has a very high resolution screen, you are going to want to get GNOME 3.12 HiDPI support.  If you don’t, a lot of the windows and text are going to be crazy small.  My RPM provides a repository to a backported version of GNOME 3.12.  So after installing, you just need to run:

sudo yum update

Go get a coffee since that is going to be a lot of packages.  After it’s done, logout and login or reboot your machine.

One you have GNOME reloaded, you are probably going to want to tweak your applications to scale their resolution correctly.  I followed the instructions in this article:

https://wiki.archlinux.org/index.php/HiDPI

Step 5 – Update Synaptics

The trackpad support for the Carbon is a little shaky in Fedora 20 by default as well.  The good news is that the 1.7.6 release backports some of these fixes.  Luckily you can get this release early by just installing from Fedora’s Koji RPM server:

sudo yum install http://kojipkgs.fedoraproject.org//packages/xorg-x11-drv-synaptics/1.7.6/2.fc20/x86_64/xorg-x11-drv-synaptics-1.7.6-2.fc20.x86_64.rpm

I found a great configuration from Major on his blog as well.  I started with that configuration and have made several tweaks – I think the setup is getting pretty solid.  I also add the syndaemon to disable the touchpad for a second after typing.  I’ve found this let’s me keep the touchpad fairly sensitive but avoid random taps when I’m typing email, etc.  I’ve added my configuration to the fixes RPM.  After you boot, you should run the following if you like the configuration and don’t want the settings to be updated via the settings widget:

gsettings set org.gnome.settings-daemon.plugins.mouse active false

I’ve also added a non-tap version of my synaptics settings that I’m currently using.  Curious on people’s feedback as to whether they like the tap settings or click settings better and I’ll make that the default.  You can find the non-tap setup here.

Step 6 – Screen Brightness / Keyboard Backlight

Good news is that adaptive keyboard support is coming soon for Linux.  I’ll update once that is in a kernel that we can get at.  The bad news is that after a suspend, the adaptive keyboard is blank and doesn’t work.  We depend on that for backlight and brightness so we need a workaround.  Luckily the thinkpad-fixes provides them.  It ships with two scripts in /usr/bin to adjust backlight and brightness.  You can run them with:

# Brightness options (dim to bright)
sudo brightness dim
sudo brightness normal
sudo brightness bright

# Backlight options (dim to bright)
sudo backlight 0
sudo backlight 1
sudo backlight 2

A co-worker pointed out that you can also use the brightness slider in the top menu bar drop down (right below the volume).  That is a much easier way to set the brightness if you aren’t in a terminal.  I’ll leave the script for now but might end up removing it.

Step 7 – Fedy

I highly recommend running Fedy to setup the other miscellaneous features such as codecs and font rendering – http://satya164.github.io/fedy/.  Lately I’ve been using the Numix theme and the Infinality fonts and like them quite a bit.  You can install the Numix themes from Fedy and also the improved font rendering with Infinality.  I set the osx style fonts with:

$ sudo /etc/fonts/infinality/infctl.sh setstyle
Select a style:
1) debug       3) linux          5) osx2         7) win98
2) infinality  4) osx          6) win7         8) winxp
#? 4
conf.d -> styles.conf.avail/osx

To switch to the Numix theme, you’ll want to add the GNOME extension for User Themes by going to the following location – https://extensions.gnome.org/extension/19/user-themes/.  Then install the GNOME Tweak tool via Fedy and launch it and select Numix in all the theme options.

Lastly, I highly recommend the Dash to Dock extension as well.  I think it’s one of the best extensions out there – https://extensions.gnome.org/extension/307/dash-to-dock/

Hope this blog helps a new Fedora user out there get up and running!

An application built from… cartridges?

Background

What are these things called cartridges in OpenShift and why are they so important?  Well, let’s take a step back and look at a typical application.  While some might argue the specifics, most applications are still multi-tier applications and utilize multiple technologies with some separation between them.  A classic case is a web application and a database.  While some of the databases might be experimenting with NoSQL backends, the general pattern largely holds.  And maybe you throw in a caching tier in there or something more exotic, but at the end of the day, very few applications I’ve seen get very far with just a web application runtime and nothing else.

Composition

So if you’re still with me and not yet posting to the comments about that initial claim being ridiculous, let’s talk about how that process often plays out.  When building an application, many developers think from a technology standpoint.  They might think of Ruby and want to use Mongo for storage.  Despite claims that the most effective route is to focus on the use cases first (e.g. I’m building a coupon generating web application that has to store large amount of redundant data), at the end of the day, the technology decision is often a major factor.  I often operate this way myself – half of the time, an idea I’m pursuing is as driven by getting to try out some new technology as it is on a successful and fast implementation.  Engineers like to learn and new technology is a great vehicle for that.

But while the learning curve around new technology has some benefits, it also has many disadvantages.  It’s hard for me to argue that learning to wire up a MySQL database to a Ruby application server versus a Java application server has any practical benefit.  It’s just something I need to do.  I need a database driver for the language I’m using, authentication details and endpoints.  It’s the same in theory but just different enough in every language and runtime to be a major pain.  And databases are well know.  The newer the technology gets, the more time is often wasted on the mundane aspects of integration.  But don’t give up on being a developer yet because this is what cartridges in OpenShift eliminate.

The cartridge model in OpenShift is all about enabling choice in technology and language while also reducing the effort around the integration portions that can be automated.  If you have an application that consists of a JBoss cartridge and a MySQL cartridge, the two are automatically wired together.  You don’t need to know or care about what MySQL driver is being used in JBoss or how the data source is setup.  You can just get down to writing code and queries.  This is beneficial in both development and production.  In development, this gives engineers the ability to trial a lot of different software to find the best solution to their problem.  They can spend more time on the analysis and not the administrivia of learning the setup environment of each technology.  But that same approach and power also extends to production.  Cartridges don’t only automate things like wiring up different components, they also can implement functionality like scaling.  For example, the JBoss cartridge has auto-scaling built in so that when the application is getting more load than it can handle, it will spin up new instances automatically.  And for those who might be wondering, clustering is automatically setup as well – new instances automatically join the cluster.  The goal of the cartridge model is to capture these capabilities in a standardized, easily consumable format that bring benefits throughout the entire lifecycle of application development.

The Technology

OpenShift cartridges have an amazing amount of functionality but there are two capabilities that are my favorite:

  • Providing a first class way to interact with each other, even across multiple machines
  • Giving the cartridges the ability to influence their deployment topology (i.e. can they run embedded with other cartridges or do they scale differently)

Publish / Subscribe

Let’s talk about the interaction model first.  By interaction model, I simply mean having multiple cartridges communicate with each other.  That sounds incredibly simple but it’s also amazingly powerful, especially as you consider building applications from many cartridges.  The concept is that a cartridge like MySQL can publish information about itself that other cartridges might want to know.  For example, when a new MySQL instance is created, you probably need to know the username, password and JDBC URL – all of that information can be published.  That process is described with the cartridge in a file that we call a manifest.  Here is an example of how MySQL actually publishes its connection information in it’s manifest:

Publishes:
  publish-db-connection-info:
    Type: ENV:NET_TCP:db:connection-info
That command will invoke a script called publish-db-connection-info that will publish a collection of environment variables of type ENV:NET_TCP:db:connection-info.  You can think of the type as an arbitrary string that can be used by consumers to filter out what they may or may not support.  This published information can then be consumed by any other cartridge that subscribes to a matching type.  For example, in the JBoss cartridge, you’ll see the following section in it’s manifest:
Subscribes:
  set-env:
    Type: ENV:*
      Required: false
This instructs the JBoss cartridge to listen to all environment variables set by publishing events that start with the string ENV.  More restrictive matching can also be done in cases where you might have a cartridge that is only compatible with a certain class of published information (e.g. subscribing to ENV:NET_TCP:db:connection-info instead of ENV:*).  Either way, if the publish and subscribe string match, the JBoss cartridge has access to the published MySQL information.  With that information, the JBoss cartridge is then able to automatically wire up a datasource definition in standalone.xml by using those values:
<datasource jndi-name="java:jboss/datasources/MysqlDS" 
...
<connection-url>
  jdbc:mysql://${env.OPENSHIFT_MYSQL_DB_HOST}:${env.OPENSHIFT_MYSQL_DB_PORT}/${env.OPENSHIFT_APP_NAME}
</connection-url>
<driver>mysql</driver>
<security>
  <user-name>${env.OPENSHIFT_MYSQL_DB_USERNAME}</user-name>
  <password>${env.OPENSHIFT_MYSQL_DB_PASSWORD}</password>
</security>
...
</datasource>

While this is just a simple example, hopefully the beauty of it to a developer is apparent.  Just the act of adding a MySQL cartridge to your JBoss application will automatically wire up your application to it.  Adding Mongo would do the same thing, as would Postgres, etc, etc.  And this isn’t limited to databases either.  It also works with monitoring cartridges, metrics cartridges, caching, and many others – the possibilities are limitless.

Deployment Topology

The second capability isn’t about the development process as much as it is about production.  We all know that different application technologies scale differently.  You might have a Ruby application whose throughput is determined by the number of Passenger instances that are running.  If it starts slowing down, you need to add more.  However, if this same application depends on a database, you probably need to scale the data tier independently.  You don’t want to add another MySQL instance every time you add a new Passenger instance.  Not only is that unnecessary and expensive, it most likely wouldn’t even work.  When scaling your web tier, you need to think about session affinity, connection persistence, stateless / stateful behavior and similar concepts.  However, when scaling MySQL, you need to think about your master / slave model, how many to add of each and what type of query patterns you are using.  In OpenShift, since these are different cartridges, each cartridge can approach scaling in a unique manner.

From the cartridge standpoint, the Ruby cartridge is going to respond to a scaling events very differently than MySQL.  While this requires real work and thought from the cartridge authors, it captures the complexity in a model that is easily leveraged by developers.  Developers are able to specify how they want scaling to occur (e.g. automatically or manually) and also put limits around how many of their resources they want each cartridge to be able to consume.  They might want their Ruby tier to always start with pre-allocated resources (called gears in OpenShift) but still limit the maximum number of resources it could consume.  Using the OpenShift command line tools, that would be as simple as:

rhc scale-cartridge ruby -a myapp --min 5 --max 10

In my application, that would always start the Ruby cartridge with 5 gears and never consume more than 10.  The best part though is that the cartridges themselves can also influence what sort of scaling is possible so that you aren’t blindly adding resources to a cartridge that can’t use them.  The default Ruby cartridge supports scaling but the default MySQL cartridge can only run standalone.  The MySQL cartridge is able to express limitation this by setting the scaling options to a single gear in the manifest:

Scaling:
  Min: 1
  Max: 1

The end result is that when you are creating a scaled application, the Ruby runtimes and MySQL runtimes will get created on separate gears to give the maximum amount of resources to each tier, but the MySQL cartridge and Ruby cartridges will implement their own unique scaling approach.

At the end of the day, this is really about separation of concerns.  Cartridges in OpenShift are used to describe lifecycle characteristics of the technology they represent as well as integration options with other cartridges.  Since the OpenShift cartridge format is completely open, it’s easy for commercial vendors as well as open source users to create cartridges.  For developers, that means they get to access a broad choice of technologies, both commercial and community.  But in addition to choice, the most value comes from allowing developers to spend more time doing the thing they do best – coding.

Fedora 19… on a Macbook Air (2013 Model)!

Update 02-04-2014 – this blog has been updated with Fedora 20.  Fedora 20 works much better so I would highly recommend following the instructions there.

Warning – This information is now out of date and replaced with the blog entry on Fedora 20

 

Update 07-29-2013 – the new kernel supports the touchpad out of the box.  Getting better every day!

This is a slight deviation from my traditional posts but I’m a techie at heart and when a Linux guy gets a new Macbook, he’s gotta try putting on Fedora.  Anderson Silva was my primary inspiration since he got this working on a mid-2012 model Macbook Air (http://anderson.the-silvas.com/?p=605).  However, when deploying on a 2013 model, I hit a couple of bumps in the road.  The good news is that I was able to fix everything but it took me a while to track down all the fixes.  Warning – if you aren’t interested in building some packages, it’s probably just better to wait a couple of months.  However, if you are impatient like myself, read on!

Step 1 – Building RPM’s

You are going to need to build some RPM’s here so you’ll need some development tools installed.

sudo yum install @development-tools
rpmdev-setuptree

Step 2 – Wireless

The new Macbooks ship a BCM4360 wireless chipset which isn’t supported today in Fedora or available via RPMFusion.  However, the RPMFusion guys are working on it and you can build your own RPM with the latest driver to get this working.  You can track the progress at https://bugzilla.rpmfusion.org/show_bug.cgi?id=2721.

# Download and build the source RPM's
cd ~/rpmbuild/SRPMS
wget http://dl.dropboxusercontent.com/u/25699833/rpmfusion/bug2721/broadcom-wl-6xx-6.30.223.30-1.fc20.src.rpm
wget http://dl.dropboxusercontent.com/u/25699833/rpmfusion/bug2721/wl-6xx-kmod-6.30.223.30-1.fc20.src.rpm
rpm -Uvh *.src.rpm
cd ~/rpmbuild/SPECS
rpmbuild -ba wl-6xx-kmod.spec
rpmbuild -ba broadcom-wl-6xx.spec

# Install those RPM's
cd ~/rpmbuild/RPMS
sudo yum install RPMS/`uname -i`/akmod-wl-6xx* RPMS/`uname -i`/kmod-wl-6xx* RPMS/noarch/broadcom-wl-6xx*

Step 3 – Touchpad support (e.g. two finger scroll, two finger click)

Update: You used to have to build a custom kernel but now that 3.10 is out, this works out of the box!

Issues

1. Light sensor / backlight – after a reboot, I can adjust screen brightness with the correct steps / increments using the hot keys. However, after suspend / resume, I can still use the hot keys but it’s either max / min brightness. Not the end of the world, but a bit of a pain.  I’ve opened the following bug to track – https://bugzilla.redhat.com/show_bug.cgi?id=989555

2. Internal speakers.  I can’t seem to get the internal speakers to work.  Headphones work fine but I went ahead and opened a bug to track – https://bugzilla.redhat.com/show_bug.cgi?id=989582

3. 15-30 second hangs.  This seems to be somewhat CPU / IO related but every once in a while, my machine will hang for 15 or 30 seconds.  Nothing more than an annoyance but I’m going to try adding libata.force=1:noncq to my kernel boot parameters and see if that helps based on this article (https://bbs.archlinux.org/viewtopic.php?pid=1295212#p1295212).

# Edit the default grub file
sudo vim /etc/default/grub

# Add 'libata.force=1:noncg' to the end of the GRUB_CMDLINE_LINUX parameter

# Regenerate the grub configurations
sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cf

Please comment on the bugs if you are experiencing the same issues or if you have fixes!