A Stack Clash
is a vulnerability in the memory management of several operating systems, including Linux. It can be exploited by attackers to corrupt memory of a privileged process in order to execute arbitrary code.
Bugs Gone Wild
Zero-Days are simply bugs gone wild. If a bug is not considered to to be a security issue, it may not be patched; leaving an attacker with a well documented security hole that is waiting to be exploited. This is exactly what happened in the recent CVE-2017-1000253, a high severity vulnerability.. This vulnerability was patched in April 2015 in the linux kernel, but was not backported to many long-term distributions because it was not considered a security threat. As a result, many enterprise distributions (such as Redhat and CentOS) were vulnerable for a long time (over two years!) to – somewhat paradoxically – a “known” Zero Day. RedHat applied the patch only on August 1st, and CentOS on September 13th.
Discovering and patching bugs is obviously not enough. If a bug is not labeled a security issue, it may not get patched – leaving kernels, operating systems or applications vulnerable. Attackers, instead of finding out Zero Days on their own, can track these bugs in the kernel or popular distributions to find a known Zero Day and exploit it.
A Quick Recap
Stack Clash, is an attack which allows an attacker to control the execution flow of another process. When the other process has higher privileges, the attack can be used to perform privilege escalation: the privileged process can be manipulated to execute arbitrary code.
Stack Clash is not a new attack. It was first discussed in 2005 in Large Memory Management Vulnerabilities, and in 2010 in Exploiting Large Memory Management Vulnerabilities in Xorg Server Running on Linux. These two articles demonstrated how Stack Clash can be used to perform privilege escalation in userspace. There are also examples (a, b, c) of stack clashes in the kernel. These attacks are relatively old, and were discussed before linux added a security mechanism called a guard-page, to protect against such endeavors.
Until recently, these attacks were perceived to be more hypothetical than practical. A few months ago (June 2017), Qualys released an advisory which included over a dozen CVEs along with proof-of-concept code. They demonstrated how this attack can be used to get root from an unprivileged (yet authenticated) user on multiple operating systems and architectures. Several vulnerabilities in the kernel, along with other common binaries such as sudo and glibc, were patched as a result. The newer advisory (released in September) introduced CVE-2017-1000253: another kernel vulnerability and a similar exploit, for a patched kernel (with a larger page-guard of ~1MB).
These vulnerabilities rely heavily on how the kernel maps the memory of a process. This mechanism is probabilistic, meaning that an exploit may require thousands of tries before it succeeds. This translates into exploits that take several hours to succeed.
While it had not been demonstrated that this attack could be exploited remotely, it wouldn’t be wise to dismiss the possibility. After all, until recently it has been believed that this attack is not really a “real world” issue, which caused the bug from CVE-2017-1000253 not to be labeled as a security issue.
So wot is aw this Rabbit and Pork I ‘ear abaht clashin’?
There are numerous examples of exploiting a stack clash, each consists of grueling minute details. These may get exhausting, rendering the reader unable to see the forest for the trees. In this post, I try to keep things as “human readable” as possible, without missing the overall picture.
When a process is executed, it is the job of the kernel to load the data into memory, initialize the stack, and perform many other operations. The way virtual memory is mapped when a process is executed depends on the OS, on how the kernel is configured, and on the executable itself. Naively, we can picture the virtual memory of a process to be made up of the stack, binary (more accurately, segments that are loaded into memory such as .text), memory maps and the heap. The stack, situated below the kernel protected memory, grows downward, while the heap grows upwards. When the stack or heap continues to grow, and potentially run over each other, we say that there is a Stack Clash.
In 32-bit processors, the entire virtual address space of a process is 4GB. This is further divided into user space and kernel, so the user’s memory is limited to 3GB. The distance between the stack and the heap, in most 32-bit architectures, is 2GB at most. This is not a “huge sea to cross”, and Qualys indeed successfully demonstrated multiple vulnerabilities that allow for a stack clash on 32-bit architectures.
In 64-bit processors, the user space virtual memory size is 128TB(!!). Still, Qualys were able to demonstrate one successful stack clash which resulted in privilege escalation. This required two vulnerabilities: CVE-2017-1000366 and CVE-2017-1000379.
To actually clash the stack with the heap or memory map, an attacker needs to make one or several of these grow. This can be done, for example, by making a process to perform memory maps, using large argv[] envp[]
(which are allocated on the stack) or by performing recursive function calls.
Clash Protection
When the stack pointer reaches the bottom of the stack, the kernel can expand the stack. This is done implicitly by the kernel by using a page-fault
once the stack tries to grow below the bottom address. While the stack (and / or the memory) keeps growing, there is a danger of these two clashing. When they clash, there won’t be a page-fault
because the memory below the stack will already be mapped.
To protect against a clash, there is the guard-page
. A guard-page
is located right below the stack. This page cannot be written to, and terminates the process with a SIGSEGV
once accessed. The following figure depicts the virtual memory layout including the guard-page
.
In theory, the guard-page should(!) protect against a clash. In practice, there are methods to bypass this protection.
Guard-Page Circumvention
In their advisory, Qualys discusses a 4-stage attack. The Clash, the Run, the Jump and the Smash. All these are steps that basically try to get the stack pointer past the guard-page and into another mapped memory region. I like to think of this attack from the end goal perspective: guard-page circumvention. Generally speaking, a guard-page circumvention occurs in one of the following cases:
- Once binary is loaded, part of it already mapped to memory belonging to stack:
- All is left for an attacker to do is to Smash
- The stack and mapped memory are contiguous once binary is loaded
- An attacker needs to Run the stack pointer to the bottom and Jump over the guard-page.
- Once done, the attacker can Smash away
- The stack does not clash with mapped memory upon load
The attacker needs to perform a Clash (i.e, allocating memory or expanding the stack) before moving on with the next steps.
CVE-2017-1000253
This vulnerability is caused by the way the kernel allocates memory on 64-bit architectures when it loads a PIE binary. Because the kernel did not take into account large binaries, it resulted into some segments of the binary to be loaded above the mapped memory. In 64-bit architectures, there is a gap between the stack and the memory map, which is guaranteed to be 128MB. Binaries with data segments that are larger than 128MB could effectively clash the stack. This vulnerability, however, is not limited to binaries with large data segments. The PIE can also be called with large (>1.5GB) number of argument strings, which will cause it to be mapped right below the stack, triggering the same vulnerability.
Qualys demonstrated how this clash could be exploited to escalate privileges. An oversimplified explanation of the exploit (or more details, please continue reading their advisory) is this: First, you need a suitable SUID binary (ping did the trick). The idea is to smash the binaries .dynamic section, thus loading our own malicious executable with higher privileges. To achieve this, the exploit uses the LD_DEBUG environment variable. This variable causes ld.so to allocate memory for each unknown option in LD_DEBUG
on the stack. The .dynamic section can be smashed with an offset to a pathname of the shared library we would like to load. The following figure illustrates the virtual memory of an exploited ping binary after triggering CVE-2017-1000253.
DIY Container Demo
To see this attack in action, and inside a container, will take a few steps. This demo relies on the POC code that performs the exploit over CentOS.
First, you will need to setup a CentOS machine with a vulnerable kernel:
- Install
CentOS 7
- Install Docker
- Edit the
/etc/yum.repos.d/CentOS-Vault.repo
file:- Change each entry to path to include 7.3.1611:
baseurl=http://vault.centos.org/7.3.1611/os/$basearch/
baseurl=http://vault.centos.org/7.3.1611/updates/$basearch/
… - Enable all the entries:
enabled=1
- Change each entry to path to include 7.3.1611:
- Update the repo list
yum repolist
- Install a vulnerable kernel
yum install kernel-3.10.0-514.26.1.el7.x86_64
- Check the kernel ID (in the following example, it is 0):
# awk -F' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
0 : CentOS Linux (3.10.0-514.26.1.el7.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-514.26.1.el7.x86_64) 7 (Core) with debugging
2 : CentOS Linux 7 (Core), with Linux 3.10.0-229.20.1.el7.x86_64
- Set the default entry for grub:
grub2-set-default 0
- Make sure it was changed:
# grub2-editenv list
saved_entry=0
- Reboot
shutdown -r now
- Make sure the vulnerable kernel was loaded
# uname −a
Linux devSDces71b305-vm0 3.10.0-514.26.1.el7.x86_64 #1 SMP Thu Jun 29 16:05:25 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Cool, we have a vulnerable kernel. Now we can run a container as a non-root user, and see it execute privileged operations inside the container. For this demo, I performed a small change to the loaded library given in the POC code – I wrote HACKED
to /root/hacket.txt
file.
I packed the exploit code, along with a vulnerable ping binary ( for which I turned on the suid flag, making it run as root ) into an image. All we have to do now, is run a container as a non root user, and see that the exploit successfully writes data to the /root
folder inside the container (this may take a few hours):
As you probably noticed, our privileged code hs a limited set of capabilities. This happens because Docker, by default, limits the set of the bound capabilities the process may have. If we decode this capability set we will notice that these are indeed the capabilities give to the process by Docker.
# capsh -decode=00000000a80425fb 0x00000000a80425fb=cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service, cap_net_raw,cap_sys_chroot,cap_mknod,cap_audit_write,cap_setfcap |
Clashing in a Container – Final Thoughts
Stack Clash is another indication that containers are often more secure than virtual machines. Containers, as we all already know (and love), are much “thinner” than virtual machines. They are made to perform a specific task. Therefore, the attacker’s attack surface (after gaining access to an unprivileged user in a container) is much smaller: There are fewer SUID files or privileged processes running. Privilege escalation to root does not necessarily mean a compromise of the host. To achieve that, the attacker will need to run a kernel exploit in order to escape to the underlying host.
Another advantage is that containers can also run in user namespace. If this is the case, even a successful exploit which will make the user root inside the container, will not make it root on the host.
Special attention should be given to internal threats. Developers or operations personnel may have the ability to run unprivileged containers, or even access to them (in order to debug for example). This will allow such a person to exploit this kind of vulnerability and gain privileged access to containers. To prevent this from happening, a container running as a non-root user should not include any SUID / GUID
files or privileged processes running. Without these, the Stack Clash cannot result in privilege escalation. Also, a container should not have any impactful access to privileged resources. For example, don’t rely on setting root as owner of a file to protect it against non-root users running inside a container with such a file mounted.
And most importantly, never rely on patches. Make your containers as thin as possible, and don’t forget to monitor them in runtime: a non-privileged user suddenly changing to root is an event worth noting, and the affected container is possibly worth terminating.