How Intel® Clear Containers protects against root kernel exploits like Dirty COW
21 Mar, 2017
By Eric Adams and John Andersen, Intel Corporation.
The Dirty COW exploit (CVE-2016-5195) is a race condition that allows an attacker to gain root access to any vulnerable system, and can even be exploited from within a Docker* container. This vulnerability existed in the Linux* kernel for nine years before it was discovered.
Concerns like this prevent many companies from running containers in a public cloud because sensitive workloads like financial transactions or health records could be exposed to hackers. In this article we demonstrate the Dirty COW exploit on Docker on an unpatched system using the standard Docker runc runtime, and then show how this exploit and other kernel exploits are blocked with Intel® Clear Containers using Intel® VT.
How the exploit works
Because of a nine-year-old kernel bug, it is possible to create a race condition where one thread tries to write to a read only memory location, creating a modified copy in the process. Meanwhile, a second thread uses a function called madvise to tell the kernel that newly allocated memory is not needed in the immediate future. By executing these two threads simultaneously in a loop, the kernel eventually gets tricked into pointing to the modified copy of a file in memory that should be read only. You can see some great videos that show exactly how this exploit works at https://www.youtube.com/watch?v=kEsshExn7aE.
A user named scumjr posted a proof of concept of the Dirty COW exploit working from within a Docker container at https://github.com/scumjr/dirtycow-vdso. Linux has a virtual dynamic shared object (vDSO) that allows user space programs to execute common kernel functions like clock_gettime() without having to do an expensive context switch. This race condition can exploit this memory object to allow it to be modified by the Dirty COW vulnerability. The unused memory of the vDSO object is modified with a reverse TCP shell back to the host system with full root access so that the next time clock_gettime() is run by some random root process, the reverse shell payload is executed. The vDSO object is then modified back to its original version while the root shell is left open.
This particular exploit allows a Docker container to gain root access to the host system! This is obviously a very serious flaw that should concern all public cloud companies. Even Amazon AWS* was affected by this old kernel vulnerability. You can see a video of an Amazon AWS host system being compromised at https://www.youtube.com/watch?v=BwUfHJXgYg0 before these systems were patched.
Scumjr’s original code has been modified so that it could work across different versions of Linux and has also been modified to use /self/proc/mem on Ubuntu* instead of ptrace() because newer Docker versions implement seccomp, which blocks ptrace() from working.
How Intel® Clear Containers help protect against Dirty COW
A typical container runtime uses cgroups and namespaces to isolate processes from each other. This is why kernel vulnerabilities are security risks for any container runtime like Docker. Intel® Clear Containers use an alternative Docker runtime called Clear Container OCI Runtime (cor) to quickly launch a very lightweight virtual machine using Intel® Clear Linux as a guest OS. The VM isolates containers using Intel® VT, which is much more secure than the kernel alone. Each VM gets its own memory region so that compromised files can’t effect the host and can’t effect other containers on the system. It effectively helps prevent a guest OS from breaking outside of its walled garden. Securing containers this way is absolutely necessary in multi-tenant environments.
The first figure below shows an example of the Dirty COW exploit using the standard Docker runtime. The second figure shows how Intel® VT effectively helps prevent escapes like Dirty COW from happening.
Figure 1: Docker using runc"
Figure 2: Docker using Intel® Clear Containers
The guest page tables in each virtual machine instance isolate the guest OS memory location from the host OS. This type of segregation helps prevent undiscovered kernel exploits from allowing container-to-container escapes, and more importantly, container-to-host escapes like we saw in the Amazon example above.
The VM used for Intel® Clear Containers is optimized to make its memory footprint as lightweight as possible. Features like DAX, which removes the extra copy when accessing memory from a VM, are used to negate some of the resource penalties for using a VM. Other features like qemu-lite remove some PC-centric features, like BIOS support, that are not needed for running and protecting containers. You can read more about these optimizations at https://clearlinux.org/documentation/clear-containers.html.
The more intuitive security experts who read through the optimization features described in the link above might be wondering if container-to-container escapes might still be possible utilizing kernel samepage merging (KSM). The KSM feature works by identifying memory pages marked as mergeable that are exactly the same, discarding redundant copies, and having each process point to a single page. The good news is that after doing some testing we found that Intel® VT isolates container-to-container escapes through the extended page tables in such a way that a modified vDSO object in one Clear Container does not effect other Clear Containers. Intel tested and confirmed that modifying the vDSO object using the Dirty COW exploit inside of an Intel® Clear Container causes that container to point to the modified vDSO while other Intel® Clear Container instances still reference the original, unmodified, read only vDSO object. This effectively helps prevent container-to-container escapes.
Demo of Dirty COW on Intel® Clear Containers
This demo is shown on Ubuntu 16.04.1 before it was patched with the Dirty COW fix. The following instructions are derived from https://github.com/01org/cc-oci-runtime/wiki/Installation but are modified to point to the older installation packages. This is necessary to see the exploit in action, and more importantly to see how Intel® Clear Containers effectively isolates the exploit from affecting other active containers or the host itself.
Set up Docker and Intel® Clear Containers for Ubuntu* 16.04.1
First you need to set up an Ubuntu 16.04.1 system using the 64-bit version without installing any updates so we can avoid accidentally installing a patched kernel. You can download this version at http://old-releases.ubuntu.com/releases/xenial/. If an updated kernel is installed then you will have to choose the earlier kernel from the advanced options in the grub boot menu when booting the system. The following instructions describe how to install the version of Docker that was available when the exploit first came out in October of 2016 along with Intel® Clear Containers.
- Install the older version 2.0 Clear Container runtime:
$ sudo sh -c "echo 'deb http://download.opensuse.org/repositories/home:/clearlinux:/preview:/clear-containers-2.0/xUbuntu_16.04/ /' >> /etc/apt/sources.list.d/cc-oci-runtime.list"
$ curl -fSL http://download.opensuse.org/repositories/home:/clearlinux:/preview:/clear-containers-2.0/xUbuntu_16.04/Release.key | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install cc-oci-runtime
- Install Docker 1.12.1:
$ sudo apt-get install apt-transport-https ca-certificates
$ curl -fsSL https://yum.dockerproject.org/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb https://apt.dockerproject.org/repo/ ubuntu-$(lsb_release -cs) main"
$ sudo apt-get update
$ sudo apt-get install docker-engine=1.12.1-0~xenial
- Configure Docker to use Clear Containers by default:
$ sudo mkdir -p /etc/systemd/system/docker.service.d/
$ sudo nano /etc/systemd/system/docker.service.d/clr-containers.conf
ExecStart= ExecStart=/usr/bin/dockerd -D --add-runtime cor=/usr/bin/cc-oci-runtime --default-runtime=cor
- Downgrade the Clear Container guest kernel to a version affected by Dirty COW:
$ cd /usr/share/clear-containers
$ sudo wget 'https://download.clearlinux.org/releases/10000/clear/x86_64/os/Packages/clear-containers-image-9810-4.x86_64.rpm'
$ sudo rpm2cpio clear-containers-image-9810-4.x86_64.rpm | sudo cpio -idmv
$ sudo rm -f clear-containers.img
$ sudo mv ./usr/share/clear-containers/clear-* .
$ sudo wget 'https://download.clearlinux.org/releases/10000/clear/x86_64/os/Packages/linux-container-4.5-49.x86_64.rpm'
$ sudo rpm2cpio linux-container-4.5-49.x86_64.rpm | sudo cpio -idmv
$ sudo rm -f linux-container-4.5-49.x86_64.rpm
$ sudo cp ./usr/share/clear-containers/vmlinux-4.5-49.container ./
$ sudo rm -rf ./usr
$ sudo rm -f vmlinux.container
$ sudo ln -s vmlinux-4.5-49.container vmlinux.container
- Restart the Docker systemd service:
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
- Test that the Docker runc runtime and Clear Container cor runtime both work:
$ sudo docker run --rm -ti --runtime=runc ubuntu
$ sudo docker run --rm -ti --runtime=cor ubuntu
Run Dirty COW exploit
Next, build the exploit and container image from the demo repo (https://github.com/clearcontainers/cc-dirtycow-demo). Clone or extract these files to a known location, and follow the instructions below to build and run the container image. This container runs the setup script, which modifies the cc-oci-runtime guest OS kernel and replaces it with an older kernel that is vulnerable to Dirty COW. We do this to show how Intel® VT effectively blocks Dirty COW from affecting the host or other running containers even with an affected kernel in the guest OS.
You’ll need to build the exploit and a new container image called proc and add it to your Docker library. This container has the exploit.
# apt install build-essential nasm
# docker build -t proc .
Verify that you are running a kernel older than 4.4.0-45.66 to see the exploit on Ubuntu 16.04.1 LTS. Running apt-get will likely update the kernel to a patched version. Reboot to the grub boot menu, and select Advanced Options to choose an older kernel to boot. If one does not exist then you will need to downgrade to a previous kernel. Use the following command to check the kernel version.
# uname -r
Finally, test the exploit from both the standard Docker runc runtime and from the Intel® Clear Container runtime.
# docker run --rm -ti --runtime=runc proc
# docker run --rm -ti --runtime=cor proc
The easiest way to verify the exploit is to create a file in /root from the container image using the echo command. When using standard Docker runc, the file is created on the host system in /root with root permissions. That is the exploit! No file or directory owned by root should be writeable in this way, especially from within a running container! When using the Intel® Clear Container runtime the guest OS /root is modified, but the host system and other Clear Containers running on the system are not affected. This type of isolation and protection is very important for public cloud companies, and is how Intel® Clear Containers technology provides another security layer for more secure containers.
# echo DirtyCOW > /root/dirtycow.txt
Figure 3: Docker runc container escape with Dirty COW
Figure 4: Docker clear container runtime blocking of Dirty COW
The Dirty COW vulnerability has been in the kernel for nine years, and is a serious security concern for both public and private clouds as demonstrated above. Technologies like Intel® Clear Containers add an extra layer of security backed by Intel® VT to help protect against existing and future, as-yet-undiscovered kernel exploits. This is done without compromising on the fast speed and low memory utilization that make containers the exciting new technology for running cloud workloads.
- Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
- Intel, the Intel logo, Intel® Clear Containers, Intel® Clear Linux, and Intel® VT are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
- *Other names and brands may be claimed as the property of others.