The Evolution of PaaS

Having spent three years now working on OpenShift, the lesson I’ve learned from working in the cloud space is that if you aren’t evolving, you are doing something wrong.  PaaS isn’t a static solution but a constantly progressing set of technologies to enable a better approach to building and running applications.  But at Red Hat, the open source way is a critical aspect of how we work.  To us, that means finding the best technology and best communities out there and working with them instead of against them.  That is why we created OpenShift Origin with the mission of being able to experiment along with other communities to find the best solutions for our users.  With that mission in mind, we are always looking for those adjacent communities and determining if they are a good fit.  Looking ahead to next year, I see a couple of exciting community developments on the horizon, one centered around Linux Containers and the other centered around OpenStack.
First, let’s talk about Linux Containers.  Personally, I think Linux Containers is one of the most exciting developments in the Linux kernel today.  The combination of kernel namespace capabilities, coupled with Linux Control groups and a strong security model is changing how users think about isolating applications running on the same machine.  But much like PaaS, Linux Containers isn’t a static solution.  There are a lot of options that can be utilized to strike the right balance of isolation versus overhead.  On OpenShift, we’ve been using our own variant of Linux Containers since day one.  But there is a lot of community activity around containers and a few months ago, we noticed Docker.  Docker introduced an innovative approach to container isolation and packaging which had the potential to both simplify our cartridges in OpenShift and increase user application portability.  But a lot of stuff looks interesting on the surface.  We wanted to really dive in (i.e. start hacking) to see if this would be a good fit.  We had a great experience working with the leads behind Docker and were able to close many of the initial gaps we hit to make Docker run seamlessly on the platforms critical to us, Fedora and RHEL, to enable us to start utilizing it on OpenShift.  During that same time period, Docker was accepted as a Nova driver to the OpenStack project.  With that foundation, we are getting ever closer to having a consistent portable container layer across the operating system (e.g. RHEL), IaaS (e.g. OpenStack) and PaaS (e.g. OpenShift).  Better still is that we are able to take our experience with containers and work with hundreds of other community members to come up with the best approach going forward.
But as excited as we are about the evolution of containers and better portability of applications, we also know the operational experience is equally critical.  And more and more, that operational experience is centered around OpenStack.  While we can run OpenShift very well on OpenStack and are even enabling better integration through projects like Heat and Neutron, we’ve had the feeling that there is a more fundamental set of capabilities in our platform today that could be native to OpenStack itself.  And in doing that, we could drastically improve the operational experience.
But I think it helps to talk through some of those operational challenges.  An example of this is the visibility of containers in OpenStack.  Almost every PaaS on the market right now uses some form of Linux containers.  Arguably, it’s what makes a PaaS so efficient – this highly elastic mesh of containers that form your applications.  However, if that PaaS doesn’t natively integrate with OpenStack, your operations team isn’t going to see those containers.  They are just going to see the virtual machines in OpenStack and not have deeper visibility.  But if that PaaS was natively integrated into OpenStack, things get interesting.  The containers themselves could be managed in OpenStack, opening up full visibility to the operations team.  They wouldn’t just see a virtual machine that is working really hard, they would see exactly why.  It would enable them to start using projects like Ceilometer to monitor those resources, Heat to deploy them, etc.  In other words they could start leveraging more of OpenStack to do their jobs better.  But where do you draw that line?  Should OpenStack know about the applications themselves or just containers?  In looking for those answers, we wanted to embrace the OpenStack community to help us draw that line, just like we did with Linux Containers.
OpenStack has a tremendous community and various areas where we could have started  – Nova, Neutron, Heat, Ceilometer, Keystone, etc.  At the end of the day, we were going to need to interact with all of them.  That led to the Solum project.  You might have seen the announcement today around the new community project.  We will be working with a group of like minded companies and individuals to figure out the approach that makes the most sense for OpenStack.  While OpenStack is a fast moving space, we have a lot of experience with it and believe that there is tremendous potential to align our PaaS approach with this project.  Being Red Hat, we love community driven innovation, and we’re excited to jump in and help move this effort forward.
I think 2014 is going to be the most exciting year for PaaS to date.  There is great traction in the market, developer expectations are starting to solidify and we’re seeing more and more traction in production.  I believe the next advancements will come as much in the operational experience as with the developer experience.  And I’m excited to find healthy and vibrant communities looking to solve the same problem.  The end result will be that OpenShift users benefit from greater portability as well as deeper integration with OpenStack.  This has been one of those moments that just crystallizes why I love working in open source.
If you are interested in finding out more, follow our progress at OpenShift Origin or get involved with the Solum project directly.  And I’m sure there will be a lot of activity at the OpenStack Design Summit, so If you are going to be there, come find us at one of our talks and we can hack on this together!

The New Cloud Business Model – Fake Support

I’ve noticed a growing tend over the last year with companies that are providing exciting new enterprise software, the promise of support and no chance of being able to deliver on it.  And unfortunately, for consumers trying to sort through all the new offerings out there, it can sometimes be difficult to separate all the marketing glitz and glamour from the reality.  With OpenShift, Red Hat is able to stand behind the software that it distributes – they have deep expertise in every layer of the stack.  Given that, it frustrates me when I see others claim the same model without the expertise – that approach is just taking advantage of customers who don’t do their homework before buying.

Let’s think about what would happen if more industries took this same approach – the medical profession for example.  Imagine what the conversation might be after your yearly check-up.

Doctor: Well, I’ve got some good news and some bad news.  The good news is that you still look okay.  The bad news is that the is something going on under the surface that you are going to want to figure out.

You: Okay… what exactly do you mean by ‘under the surface’?  Also, when you say that ‘I’ will need to figure this out, what do you mean?

Doctor: I mean something is going on underneath your skin.  What happens under there is basically a mystery to us – it’s not something we support.  That said, whatever is going on probably needs to be fixed so you’ll want to find someone that can do that.  We could try but we really don’t have any better odds than you in fixing the problem…

If a conversation like this is so unacceptable in other disciplines, why do we so readily accept it in software?  Let’s take Platform as a Service (PaaS) for example.  PaaS is platform positioned to be the core application foundation in your company.  It is tightly integrated with both the operating system (OS) and your application platforms.  Those that say otherwise are either dreaming or trying to deceive you.  That tight integration is what lets the PaaS platform do things so that you don’t have to.  But many of the PaaS vendors in the market have limited experience across the OS and the application stacks.  In almost all cases, the PaaS providers are going to have to rely on a separate company for the operating system distribution.  In many cases, they are going to have to do the same for the application stacks.

What are these companies going to do when their customers hit issues in area outside of the core PaaS software?  Most of these guys aren’t active in the open source versions of the software so I doubt they are going to do the fixes themselves.  Don’t let them give you the ‘power of open source software’ unless they are involved enough to influence those changes.  Maybe they will proceed with the same awkward conversation as the above example…

Now, maybe these providers have the ability to support all the things they promise.  Maybe they have all the connections in the open source projects to maintain stable distributions themselves.  This is what Red Hat does but I don’t see too many others doing the same.  At a minimum, you should check because you might end up buying a product from a company whose business models is based on you not making that call for help…

Building an Open Source PaaS Deployment Model

To some, cloud is an excuse to introduce “black box” processes that lock users into their services.  But they can’t really come right out and say that.  Instead they distract from their approach with fanciful names and tell us that the cloud is full of magic and wonder that we don’t need to understand.  This type of innovation is exciting to some, but to me, combining innovation with a lock-in approach is depressing.  In the past, we’ve seen it at the operating system level and the hypervisor level.  We’ve also seen open source disrupt lock-in at both levels and we are going to see the same thing happen in the cloud.

When we started designing and building OpenShift, we wanted to provide more than just a good experience to end users that, in turn, locked them in to our service.  One of the early design decisions we made on OpenShift was to utilize standards as much as we could and to make interactions transparent at all levels.  We did want the user experience to be magical but also completely accessible to those wanted to dig in.  To demonstrate this, let’s walk through the deployment process in OpenShift – arguably the most magical part of the entire offering…

As we were designing a PaaS service, focused on developers, our first goal was to make the deployment process as natural as possible for developers.  For most developers, their day to day process goes something like code, code, code, commit.  For those questioning this process already let me speak on behalf of the developer in question by saying

Tests?! Of course I’ve already written the tests!  They were in the third ‘code’!

Anyway, we wanted to plug into that process and to do that we chose git.  The reason for selecting git over more centralized source code management tools like subversion was that the distributed nature of git allowed the user to have full control over their data.  The user always had access to their entire historical repository and as developers, we thought that was a critical requirement.  Given that, we standardized on git as the main link between our users’ code and OpenShift.

Now let’s look at what that development process might look like in practice.  First, you start off with the code, code, commit part:

vi <file of your choice>
# make earth shattering changes
git commit -a -m "My earth shattering comment"

The next part of the process for those familiar with git is the publish process.  You run a ‘push’ command to move your code from your local repository to your distributed clones.  So when you run:

git push

Your code is transferred to OpenShift and automatically deployed to your environment.  Regardless of whether code needs to be compiled, tests need to be run, dependencies need to be downloaded, a specific packaging spec needs to be built – it all happens on the server side with this one command.  To do this we utilize a git hook to kick off the deployment process.  Wait – I know what you are thinking…

What?!  Just a git hook?!  This is the cloud baby!  Shouldn’t this be custom compiling my code into a Zeus Hammer to perform a magical Cloud Nuclear transfer?!!

If you ask us, a git hook works just fine because it’s what you would probably do yourself.  We simply map the post-receive git hook to a script.  That script invokes a series of scripts (called hooks) representing various steps in the deployment process.  Some of the hooks are provided by the cartridge that your application is using and some of the scripts are provided by the application itself.  This approach let’s the cartridge provide base functionality that can be further customized by the application.

First let’s talk about the cartridge hooks.  Having cartridge specific hooks is important because each cartridge needs to do different things in their deployment process.  For example, when a Java cartridge detects a deployment, we want to do a Maven build, but when a Ruby cartridge detects a deployment, it should execute Bundler.  The cool part is that each individual cartridge can override anything it needs to in the default process.

Let’s look at how the Ruby cartridge implements this.  We can look at the ruby-1.9 cartridge’s overridden script to see how it calls bundler.  When you use the Java cartridge, it leverages Maven in the build process using the same technique.  You can implement the pieces that are right for your cartridge where it makes sense and still utilize the generic process everywhere else.  In isolation, each individual script is really quite simple.  In aggregate though, all those extensions can become extremely powerful and do much of the heavy lifting on behalf of the users.

But, what if you want to change the default behavior for a specific application?  No problem!  You have a collection of action hooks that are provided with each application instance in their repository.  You could put your own code in pre_build, build, deploy, post_deploy or wherever else it makes sense.These are found in your application in ~/.openshift/action_hooks.  They are invoked just like the cartridge hooks as part of the deployment process.  For example, you can see how the script is called before we run the build.  What you choose to do with these hooks is your decision.  Put some code in them and they will get called at each step in the deployment process.  This let’s you not only leverage the power of a customized cartridge, but also let’s you tweak and tune so things are just right for your application.

At the end of the day, harnessing the power of the cloud doesn’t need to lock you into a vendor.  At OpenShift, we believe that transparency, standards and extensibility will make a process that lasts the test of time.  I hope this has provided some visibility to how the OpenShift deployment model works and also has given you some insight into navigating the codebase.  And if this has peaked your interested and you find yourself digging through more and more code, please reach out and get involved.

Are LXC containers enough?

First off, let me state that I think the LXC project is great.  In previous blog posts, I’ve talked about segmenting existing virtual machines to securely run multiple workloads and achieve better flexibility, cost, etc.  This concept is often referred to as ‘Linux Containers’ and creating these containers with the LXC project is a very popular approach.  LXC  aggregates a collection of other technologies such as Linux Control Groups, Kernel Namespaces, Bind Mounts and others to accomplish this in an easy way.  Good stuff.  The question however, is whether LXC alone is enough to give you confidence in your approach to utilizing Linux containers.

In the words of Dan Berrange:

Repeat after me “LXC is not yet secure. [. . .]”

In other words, no it’s not enough.  The main problem right now is that LXC doesn’t have any inherent protection against exploits that allow a user to become root.  In the world of Linux, traditionally if you have root you can do anything.  When using containers, that means that if one container can find a way to become root on the machine, it can do whatever it wants with all the other containers on the box.  I think the official term for that situation in IT is a ‘cluster’.  While the concept of capabilities is being introduced into the kernel to help segment the abilities that root actually has, that is a long ways out from being a realistic defense, especially on the production systems in deployment today.

How realistic are these exploits, though?  To many, the concept of a kernel or security exploit is something they would rather believe just doesn’t actually happen.  Maybe they prefer to think that it’s limited to the realm of academic discussions.  Or maybe they just believe it’s not going to happen to them.

Unfortunately, the reality is quite different.  While I agree that finding an exploit requires an amazing amount knowledge and creativity, using an exploit for malicious purposes isn’t that challenging.  For example, let’s look at the excellent article written by Jason A. Donenfeld about a kernel exploit that is able to achieve root access.  Jason explains how this exploit works in amazing detail here –  Believe me, discovering that and writing that article was a LOT of work.  But now, let’s look at how easy it is to use that exploit on unpatched kernels:

  • Download the provided C program (e.g. wget
  • Compile it (gcc mempodipper.c -o mempodipper)
  • Run it and get root access (./mempodipper)

Pretty scary huh?  Three steps and I could get root on your machine.  I can hear the sighs of relief already though, as people start thinking:

I don’t have to worry about this since I don’t let people run arbitrary code run on my machines…

Let’s discuss that train of thought for a minute.  First, let’s approach this from the perspective of a Platform as a Service (PaaS).  A PaaS essentially allows users to run their own code on machines shared by many.  That means experimenting with an exploit like this in a PaaS environment isn’t very difficult at all.  And remember, if any user can get root on that system, they own all the applications on it.

Not consuming or hosting a PaaS?  Well, I’ve spent many years in IT shops and the traditional IT deployments for large companies don’t look all too different.  Granted, the code is usually coming from employees and contractors, but you still probably don’t want to risk root exposures by anyone that is able to deploy a change into your environment.

Well if LXC doesn’t protect against this and my traditional environments are susceptible as well, is there any hope at all?!?!  Thankfully, there is.

The solution is using SELinux in combination with whatever container technologies you are using.  With an SELinux policy, you are essentially able to control the operations of any running process, regardless of what user they happen to be.  SELinux provides a layer of protection against the root layer where most other security mechanisms fail.  When a user is running in a SELinux context on a system and tries an exploit like the one above, you have an extra line of defense.  It’s easy for you to establish a confined environment that limits riskier operations like syscalls to setuid and restricts memory access which, in turn, would stop this exploit and others.  Most importantly, you can get consistent protection across any process, no matter what user they are running as.

You can think of SELinux as a whitelisting approach instead of blacklisting.  The traditional model of security (often referred to as Discretionary Access Control or DAC) requires protecting against anything a user should not be able to do.  Given the complexity of systems today, that’s becoming unrealistic for mere mortals.  The SELinux model of security (often referred to as Mandatory Access Control or MAC) requires enabling everything a user should be able to do.

While it’s not a silver bullet, it’s an elegant mitigation in many areas.  Many types of IT hosting are becoming increasingly standardized and you can put in place fairly simple policies that specify what users should be able to do.  For web applications, you are going to allow binding to HTTP / HTTPS ports.  You are going to probably allow JDBC connections.  You can describe the allowed behaviors of many of your applications in a fairly concise way.  Thinking of security this way mitigates many of the exploits that take a creative path like the one above (setuid access, /proc file descriptor access, and memory manipulation).  Unless you have a pretty special web application, it’s safe to say it shouldn’t be doing that stuff 🙂

Interested in learning more?  The place I recommend to start is with the Fedora documentation.  Fedora and RHEL have some of the best SELinux policies and support in the industry.  The documentation covers everything from learning SELinux to debugging it.  Most importantly though, don’t get fooled into thinking all Linux distributions are the same.  While SELinux support is in the kernel, what really matters is the ecosystem of policies that exist.  In Fedora or RHEL, you get whitelists ready-made for a slew of well known systems like Apache.  In many other distros, you’d spend your time having to recreate that work for the standard systems and never have any time to focus on your application policies.  Probably not your best use of time and would be a daunting first experience with SELinux to say the least.

My last disclaimer is that even as powerful as SELinux is, I wouldn’t recommend on putting all your eggs in one basket when it comes to security.  Combine SELinux with other security measures and maintain traditional operational best practices to minimize your exposure (e.g. apply security updates, audit, etc).  In other words, use it as an enhancement to what you do today, not a replacement.

Well, if you’ve made it this far, I’ll assume you are a convert: Welcome to the world of SELinux and sleeping a little better at night!

Will Linux be Relevant in the Cloud?

Those that know me probably know where this is going.  However, for those of you that do not know me, I’ll state my stance up front:

I do not understand that logic behind the argument that the operating system will become less relevant in the cloud.  That is a fallacy.

I realize that this is a popular messaging approach for some vendors that have a minimal stake or understanding of the operating system.  However, please don’t get pulled into that marketing machine.  Let’s try and look at this from a more practical standpoint.  I often hear this reasoning brought up in the following context:

  • You don’t care what operating system you are running in the cloud.  You only have to care about your application.

I spend my days building a Platform as a Service (PaaS) offering (aka OpenShift) so I’m particularly sensitive to this argument.  While I agree that our goal on OpenShift is to make the developer experience as simple as possible, everything beyond the initial registration experience today is going to take you to interacting with the operating system at some level.  Beyond your personal machine setup, technologies like SSH are heavily used in PaaS offerings.  In addition to being the backbone of the mundane functions like supporting authentication and providing the underlying protocol for git transfers, it’s also often used directly by developers to support use cases like debugging.  When your applications are running on remote machines, being able to port forward, attach local debuggers and poke and prod from your laptop is critical.  Technologies in Linux like SSH make that possible.

Okay, so maybe SSH is important, but what other aspects of the operating system should you have to care about?  I guess that is where the disconnect is to me.  A PaaS, or any cloud service, should support and allow you to leverage common tools and standards to the greatest extent possible.  Why?  Because a lot of people already know them and it makes those users more productive.  Why on earth would your users want to go re-implement everything to your standard?  If you love rsync and want to use rsync over SSH, it should just work.  If you want to schedule something on your PaaS application, you should be able to use cron.  If you want to shell out and script something from your PaaS instance, you should be able to run a Bash / Perl script and have all the standard tools just work.

Now, don’t get me wrong, I don’t think you should be forced to use this stuff but it should be there as an option.  Why?  Because the tools that have worked in Linux for decades still work extremely well.  Maybe better tools will be written in Ruby or Python for your use case and I would encourage you to use them if that is the case.  Experimentation is critical, but it’s usually most productive if you are building on a stable base.  In the cloud, just like in the data center, that base is Linux.

So far, I’ve really only focused on the end user experience and hopefully it’s apparent that even causal cloud users are still going to interact with the operating system regularly.  Now if the end users of cloud services are still going to be exposed to the operating system, imagine the people that are building those services!  At the end of the day, your competitive edge will be knowing the operating system so that you don’t waste time rebuilding things that already exist.  On OpenShift for example, we use bleeding edge operating system functionality such as Linux control groups and filesystem polyinstantiation to help provide workload management and segment users.  We could have built something to do that, but if there is already a robust solution already in the operating system, why build something new?  We use SELinux for security because trying to build a rock solid security layer outside of the kernel is practically impossible.  We use quota for managing filesystem allocations, we use tc for traffic control, PAM for authentication support and the list goes on and on.  Using the functionality that exists in Linux allows us to focus on our goal of making the developer experience in the cloud easier.  We get to focus on challenges that the operating system does not solve like automatically scaling your applications.  Our understanding of Linux allows us to not waste time reinventing the wheel.

I’m not completely unreasonable.  I do agree that the cloud will affect how you use Linux to some extent.  The hardware layer is being abstracted to a large degree.  That means will probably spend more time using networking technologies like SSH than you will messing with SAN configurations.  The toolset you use from day to day will shift slightly, but it will be a slight shift, not a replacement.  But at the end of the day, the operating system will still be a critical tool in your toolbox.  And in the cloud, that operating system is Linux.

What PaaS means to virtualization

Over the next few years, we expect to see a tremendous focus on technologies to help isolate applications and workloads within a virtual machine. This is a bit of a controversial topic because in the minds of many, that is exactly what virtualization is geared to do. Now don’t get me wrong, I’m a huge fan of virtualization, but I think it’s merely one weapon in the challenge of increasing efficiency and utilization – not the silver bullet. I also believe that the next battle of efficiency and utilization is going to occur in the Platform as a Service (PaaS) space. To achieve the highest levels of efficiency yourself is a challenging undertaking, but consuming these new tools via a PaaS is easier. In this post I’m going to try and demystify what is going on under the hood of the best PaaS offerings out there.

First let’s take a look at the past to make sure we all start on the same page. In the olden days (i.e. pre Y2K), the most common situation was to have a single piece of hardware, that ran a single operating system and a single application. Given the commodity hardware on the market, this worked really well. In most companies, when you wanted a new application deployed, you bought a new piece of hardware and installed that application on it. Pretty simple.

In the following years though, things started changing. First, more companies started depending on IT for a competitive edge. This drove most investment into IT, but also more scrutiny on how IT was spending its money. Improving the utilization of your machines became a pretty hot topic. At the same time, hardware was improving at a ridiculous pace. The model of installing a single application per physical piece of hardware was getting wasteful. More often that not, this approach wasn’t even using 10% of the hardware capabilities. Whether you were a Silicon Valley startup or a Fortune 100 company, you didn’t want 90% of your investment just sitting there.

Then, virtualization entered into the equation. Virtualization allowed a complete abstraction between the operating system and the actual hardware that it was running on. This allowed a single piece of hardware to run lots of operating system instances. Without changing any of the applications, companies could now take racks of machines and consolidate them all onto a single piece of hardware. Hardware that once sat lonely and idle, barely waking up to process their requests was now sweating under the load.

These days though, virtualization is commonplace. Virtualized infrastructures are no longer a differentiator, they are an expectation. And with Infrastructure as a Service providers (IaaS), you can get virtual machines in seconds, with impressive pricing. Now that everyone has access to this technology and anyone can achieve that first big jump in utilization, what is going to separate you from the pack?

What is going to make you stand out is your understanding of how to squeeze every drop of performance out of each virtual machine. If you are still running a single virtual machine for every application in your environment, you might be achieving great utilization (i.e. your machines are sweating like crazy) but you are probably wasting a lot more than you should. This approach was required at first because the tooling to properly segment lots of different applications on a single machine just wasn’t there. That’s not the case anymore. With the advances in Linux with technologies like Security Enhanced Linux (SELinux), Kernel Namespaces and Linux Control Groups, it’s time to re-evaluate how we are doing stuff.

So how does all this relate to Platform as a Service (PaaS)? Just like IaaS offerings such as Amazon AWS made virtualization available to everyone, the vast majority of users are going to harness these new levels of efficiency through Platform as a Service (PaaS) offerings. PaaS offerings, like Red Hat’s OpenShift PaaS, are built on top of virtualization and exist to make both developers and operations more productive.

While I can’t speak for all the PaaS offerings out there, I can say that OpenShift exists to tweak and tune everything possible to help make each virtual machine as effective as possible. At the same time, we strive to make all those gory details invisible to the developers. Since we are built on an open source technology stack, the ability to do it yourself is there and I would encourage you to check out OpenShift Origin and get engaged if you are interested in the technology that is going to make virtualization even more powerful.