Copy Fail Bug Hands Linux Root to Anyone With a Shell

Advertisement: Alibaba.com (RU) LLC, TIN: 7703380158 erid=2SDnjdb8wti

A nine year old logic flaw sitting quietly in the Linux kernel just became one of the most uncomfortable security stories of the year. On April 29, 2026, security firm Theori went public with CVE-2026-31431, a local privilege escalation vulnerability that affects essentially every Linux distribution shipped since 2017. Two days later Microsoft Defender Security Research published its own deep dive and gave the bug a memorable handle. Copy Fail. Within a week, working exploit code was circulating, CISA had pulled the alarm cord, and federal agencies had a deadline of May 15 to patch or get off the network.

A 732 byte path from container to host

The mechanics of Copy Fail are both elegant and unnerving. The flaw lives in the algif_aead module of AF_ALG, the userspace cryptographic API that the Linux kernel exposes through a socket interface. By chaining an AF_ALG socket with the splice() system call, an unprivileged local user can perform a controlled four byte write into the kernel page cache. That cache holds in memory copies of executable files, including setuid root binaries like /usr/bin/su. Replace a few critical bytes there, wait for any privileged process to run the corrupted version, and a regular user account suddenly owns the box.

Researchers demonstrated the entire exploit in roughly ten lines of Python. The total payload size that achieves root has been measured at 732 bytes. There is no buffer overflow theatrics here, no exotic memory corruption craft, just a logic bug in how the kernel hands off cryptographic operations. The exploit blends into normal system activity because it uses standard system calls. It leaves no disk artifacts because the modification happens entirely in memory. The page cache simply gets quietly poisoned, and the next privileged execution does the rest.

The CVSS score undersells the operational reality

National Vulnerability Database lists Copy Fail at CVSS 7.8, which puts it firmly in High territory but not Critical. The number is technically defensible. The bug requires local code execution as a starting point, so it is not a remote zero click catastrophe. An attacker needs a foothold first, whether that foothold comes from an SSH session, a compromised CI/CD job, a malicious notebook in a data science environment, or a container running untrusted code. Without that initial access, Copy Fail does nothing.

The trouble is that local code execution is not nearly the barrier it sounds like in modern infrastructure. Cloud Linux workloads run untrusted code as a feature, not a bug. Continuous integration pipelines execute pull request scripts dozens of times an hour. Multi tenant Kubernetes clusters share a single host kernel across customer workloads. In all of those settings, the gap between local code execution and root on the host is exactly the gap Copy Fail closes. Microsoft was explicit in its advisory, calling the flaw particularly dangerous in cloud, CI/CD, and Kubernetes environments where untrusted execution is routine. A compromised container becomes a compromised node. A compromised node becomes a compromised tenant. A compromised tenant becomes a multi customer incident.

Almost every distribution made the list

The blast radius reads like a directory of the Linux ecosystem. Ubuntu, Red Hat Enterprise Linux, SUSE, Amazon Linux, Debian, Fedora, Arch Linux, and basically anything downstream of those projects. Microsoft specifically called out Ubuntu 24.04 LTS, Amazon Linux 2023, RHEL 10.1, and SUSE 16 in its writeup. If a server is running a kernel that descends from anything released in the last nine years and has not been patched in the past two weeks, it should be treated as exposed.

Patch availability has been uneven. Arch Linux, Fedora, and Amazon Linux had updates ready at the moment of disclosure, partly because Theori notified the Linux kernel security team five weeks before going public. Other vendors moved more slowly. SUSE, Red Hat, and Ubuntu published mitigation guidance by April 30, with full kernel updates rolling out through the following week. The lag between disclosure and downstream availability is exactly the window attackers exploit, which is why CISA added Copy Fail to its Known Exploited Vulnerabilities catalog on May 1 with a federal compliance deadline of May 15.

The mitigations that buy time when patches lag

For systems that cannot reboot into a new kernel immediately, defenders have a few interim options. The most direct is to block AF_ALG socket creation entirely, which removes the attack surface at the cost of breaking any legitimate userspace consumer of the kernel crypto API. Another approach is to disable the affected algif_aead module specifically, which is narrower but requires verifying that no production workload depends on it. Mandatory access control frameworks like SELinux and AppArmor can theoretically blunt the exploit, but only when configured tightly enough that unconfined processes cannot reach the AF_ALG socket family. Default profiles on most distributions do not meet that bar, so the protection is largely theoretical without explicit policy work.

The harder operational lesson involves containers. A common reflex when a kernel CVE drops is to update application container images and call it done. Copy Fail makes that reflex actively dangerous. The vulnerable kernel lives on the host, not in the container, and a container with current userspace can still be the launching pad for a host compromise. Microsoft's guidance is unambiguous on this point. Treat any container remote code execution as a potential host compromise, recycle nodes aggressively after any indicator of compromise, and patch the underlying host kernel rather than just the workload images. In Kubernetes terms, that means worker node operating system updates, not Helm chart bumps.

What this episode says about cloud security assumptions

Two threads from this incident deserve longer reflection. The first is how long the bug sat undiscovered. The flawed code was committed to the kernel in 2017, which means it survived nine years of code review, fuzzing campaigns, academic scrutiny, and offensive research before Theori finally pulled it apart. That is not a Linux specific failure. Every large kernel codebase carries similar latent bugs. It is a reminder that mature, widely audited software still hides serious flaws, and that the assumption of safety improves slowly with age, not instantly.

The second thread is the architectural implication for shared cloud environments. The container security model leans heavily on the idea that namespaces, cgroups, seccomp profiles, and unprivileged user mappings provide meaningful isolation between workloads on a shared host. Copy Fail demonstrates that a single kernel logic flaw can render most of that scaffolding moot. When the kernel itself is the trust boundary, any kernel bug is a tenant boundary bug. Defense in depth helps, hardware backed isolation through technologies like confidential computing helps more, but the underlying point stands. Multi tenant Linux is only as secure as the kernel running underneath it, and the kernel is a moving target with a long memory of old commits.

CISA's two week patch deadline will pass before most organizations finish their full audit. Microsoft Defender XDR has shipped detection rules for the exploit signatures, Kaspersky added its own detections by May 5, and the proof of concept code is already being adapted into other languages by anyone who wants to weaponize it. The patch is straightforward, the harder work is the inventory. Every Linux host, every container node, every CI runner, every developer workstation that descended from a 2017 or later kernel needs to be checked, updated, and verified. Copy Fail is not the most sophisticated bug ever published, but it may be the most consequential reminder this year that old code never really stops being a security problem.