Linux Capabilities 101: Fundamentals and...

Introduction

Linux capabilities split the traditional root privilege into fine-grained units that can be independently granted to processes. Instead of a binary UID 0 versus non-root model, capabilities allow a process to hold just enough power to perform a specific task-e.g., binding to a low port, changing the system clock, or loading kernel modules-while remaining otherwise unprivileged.

Understanding capabilities is essential for modern privilege-escalation research, secure system hardening, and container security. Attackers routinely hunt for binaries that retain elevated capabilities, and defenders must know how to enumerate and mitigate them.

Real-world relevance includes:

Container runtimes that drop capabilities to reduce the attack surface.
System-wide policies (e.g., systemd unit files) that grant only the required capabilities.
Bug bounty programs that reward discovery of mis-configured setuid binaries with retained capabilities.

Prerequisites

Basic command-line proficiency (ls, chmod, id, etc.).
Understanding of traditional Unix file permissions, UID/GID, and the setuid/setgid bits.
Familiarity with privilege-escalation concepts such as sudo, setuid binaries, and kernel exploits.

Core Concepts

Linux capabilities are defined in /usr/include/linux/capability.h and are represented as a bitmask. Each capability corresponds to a single privileged operation. Some of the most common capabilities include:

CAP_NET_BIND_SERVICE - Bind to ports < 1024.
CAP_SYS_TIME - Set system clock.
CAP_SYS_MODULE - Load/unload kernel modules.
CAP_DAC_OVERRIDE - Bypass file read/write permission checks.

Capabilities are attached to three distinct sets for each process:

Permitted - The superset of capabilities a process may assume during its lifetime.
Effective - The subset of the permitted set that is currently active. System calls check the effective set.
Inheritable - Capabilities that can be passed across an execve() if the target binary's file capabilities allow it.

When a process execs a file, the kernel computes the new capability sets using the following formula (simplified):

new_permitted = (file_permitted | (old_permitted & file_inheritable))
new_effective = file_effective ? file_permitted : 0
new_inheritable = file_inheritable

Understanding this flow is crucial when hunting for binaries that unintentionally grant privileged capabilities.

What Linux capabilities are and why they exist

The classic Unix model grants all privileges to any process with UID 0. This all-or-nothing approach makes it difficult to limit the impact of a compromised service. Capabilities were introduced in Linux 2.2 (and refined in 2.6) to provide a least-privilege mechanism without rewriting every daemon to drop root after startup.

Benefits include:

Reduced attack surface - a web server can bind to port 80 using CAP_NET_BIND_SERVICE while lacking the ability to change passwords.
Better compatibility with containers - namespaces isolate resources, but capabilities still need fine-grained control.
Granular auditability - each capability can be logged in auditd, making forensic analysis clearer.

From a security standpoint, capabilities enable a defensive “defence-in-depth” strategy: even if an attacker gains code execution as a non-root user, they still need the right capability to perform high-impact actions.

The three capability sets: Effective, Permitted, Inheritable

Let's dive deeper into each set with concrete examples.

Permitted Set

The permitted set is the union of all capabilities a process may ever use. It is derived from three sources:

File capabilities of the binary being executed.
Inheritable capabilities from the parent process.
Ambient capabilities (covered later).

If a capability is not present in the permitted set, the kernel will reject any attempt to enable it, even if the effective set claims otherwise.

Effective Set

The effective set is the active subset used for permission checks. A process can drop capabilities from its effective set at runtime (e.g., via prctl(PR_CAPBSET_DROP) or capset()) without losing the ability to re-enable them later, as long as they remain in the permitted set.

Inheritable Set

Inheritable is a legacy mechanism that predates file capabilities. It allows a parent process to pass selected capabilities to a child across execve(). Modern practice prefers ambient capabilities, but many legacy binaries still rely on inheritable.

Example scenario:

# Assume a binary /usr/local/bin/pinger has file capabilities:
# cap_net_raw+ep
# The parent process runs as a regular user but has CAP_NET_RAW in its inheritable set.
# When it execs /usr/local/bin/pinger, the resulting process gets:
# Permitted: CAP_NET_RAW (from file) | (parent_inheritable & file_inheritable)
# Effective: CAP_NET_RAW (because the file sets the effective flag)
# Inheritable: as defined by the file (usually empty)

This illustrates why a mis-configured file capability can inadvertently grant a privilege that the original binary never needed.

Viewing capabilities on files with getcap, capsh, and lsattr

Linux provides several utilities to inspect capabilities.

getcap

$ getcap /usr/bin/ping
/usr/bin/ping = cap_net_raw+ep

getcap prints the capability bounding set attached to a file. The suffix letters mean:

e - Effective.
p - Permitted.
i - Inheritable.

capsh

capsh is a more interactive tool that can display the current process's capability sets.

$ capsh --print
Current: =#ffffffffff
Bounding set =#ffffffffff
Ambient set =
Capabilities in inheritable set: =

Running capsh --drop=cap_net_raw -- -c 'id' demonstrates dropping a capability at runtime.

lsattr

While lsattr primarily shows extended attributes, on filesystems that support the security.capability xattr, you can view it directly:

$ getfattr -n security.capability /usr/bin/ping --only-values
0x0000000010000000

Interpreting the hex mask requires mapping bits to capability names (use capsh --decode=0x…).

Capability namespaces and ambient capabilities overview

Namespaces isolate resources such as PIDs, mount points, and user IDs. Capability namespaces, introduced in Linux 4.14, allow a process to have a distinct capability bounding set per user namespace. This is heavily used by containers.

When a process creates a new user namespace (unshare -U), it starts with an empty capability set. The parent can then grant a limited set via setns() or by writing to /proc/[pid]/status (e.g., CapEff).

Ambient capabilities

Ambient capabilities, added in Linux 4.3, bridge the gap between file capabilities and the inheritable set. An ambient capability is automatically added to the permitted and effective sets of a child after execve() if the following conditions hold:

The capability is present in the parent’s permitted set.
The capability is also present in the parent’s ambient set.
The executable file does not have a no_new_privs flag set.

Ambient capabilities are especially useful for privileged containers that need a stable capability set across multiple execs without relying on file capabilities.

# Example: add CAP_NET_RAW to the ambient set
$ sudo setcap cap_net_raw+ep /usr/bin/ping
$ sudo -u nobody bash -c "capsh --print" # shows empty sets
$ sudo -u nobody bash -c "capsh --ambient=+cap_net_raw; capsh --print"
Ambient set = cap_net_raw
Permitted = cap_net_raw
Effective = cap_net_raw

Commonly mis-configured binaries that retain elevated capabilities

Attackers love binaries that have been granted more capabilities than they need. Below is a non-exhaustive list of typical culprits:

ping - Often given CAP_NET_RAW to send ICMP echo requests. If the binary is world-writable, an attacker can replace it with a malicious payload that retains the capability.
mount - May have CAP_SYS_ADMIN for mounting filesystems. Mis-using it can lead to full root escalation.
nsenter - Frequently granted CAP_SYS_ADMIN to enter namespaces. A compromised user can escape containers.
setcap itself - If left setuid or with capabilities, it can be abused to add capabilities to arbitrary binaries.
systemctl - When run as root, it inherits many capabilities; exposing it to untrusted users is risky.

Key patterns to watch for:

Capabilities that do not match the binary’s purpose (e.g., CAP_SYS_MODULE on cat).
Executable files located in writable directories (e.g., /tmp, user home directories) with elevated capabilities.
Setuid binaries that also have file capabilities - the two mechanisms stack, potentially granting a superset of privileges.

Remediation typically involves removing unnecessary capabilities with setcap -r or moving the binary to a read-only location.

Practical Examples

Example 1: Enumerating all files with capabilities on a system

# Find every file that has a security.capability xattr
$ sudo find / -type f -exec getcap {} + 2>/dev/null | grep -v " = "
# Alternative using getfattr for deeper analysis
$ sudo find / -type f -exec getfattr -n security.capability {} + 2>/dev/null | grep -v "security.capability="

This script quickly surfaces binaries that could be abused.

Example 2: Dropping a capability at runtime

#!/usr/bin/env bash
# Demonstrate dropping CAP_NET_RAW before executing ping
set -e
# Verify we have the capability initially (run as root with file cap)
capsh --print | grep CAP_NET_RAW || { echo "No CAP_NET_RAW"; exit 1; }
# Drop it
capsh --drop=cap_net_raw -- -c "ping -c 1 127.0.0.1" || echo "Ping failed as expected after drop"

Running the script as root shows that the ping fails once the capability is removed, proving the enforcement.

Example 3: Using ambient capabilities in a container

# Build a minimal container that needs raw sockets
cat > Dockerfile <<'EOF'
FROM alpine:latest
RUN apk add --no-cache iproute2
# Grant ambient capability via Docker's --cap-add (runtime) and set no_new_privs=0
EOF
# Build and run
docker build -t cap-demo .
docker run --rm --cap-add=NET_RAW --security-opt=no-new-privileges:false cap-demo ping -c 1 8.8.8.8

This demonstrates that the container can use raw sockets without being full root.

Tools & Commands

getcap - Display file capabilities.
setcap - Assign or remove capabilities from a file.
capsh - Interactive shell for inspecting and manipulating process capabilities.
getfattr / setfattr - Directly read/write the security.capability extended attribute.
prctl (via C or python-prctl) - Change ambient capabilities programmatically.
auditd - Log capability checks (use -a always,exit -F arch=b64 -S capset -k capset).

Sample command to list all binaries with CAP_SYS_ADMIN:

$ sudo getcap -r / 2>/dev/null | grep "cap_sys_admin"

Defense & Mitigation

Principle of Least Capability: Grant only the capabilities required for a service. Use systemd’s CapabilityBoundingSet and AmbientCapabilities directives.
Remove unnecessary file capabilities: sudo setcap -r /usr/bin/ping if the service can run as a non-privileged user with CAP_NET_RAW provided by a wrapper.
Secure binary locations: Keep privileged binaries on read-only partitions (e.g., /usr/sbin) and ensure they are not user-writable.
Audit and monitor: Deploy auditd rules to alert on capset or setcap syscalls, and monitor changes to /etc/security/capability.conf (if used).
Container hardening: Use Docker’s --cap-drop=ALL and then explicitly add only needed capabilities.
Enable no_new_privs for untrusted executables to block the inheritance of ambient capabilities.

Common Mistakes

Assuming setuid root automatically provides all needed privileges - it often grants more than required.
Forgetting to clear the ambient set after spawning a child process, leading to privilege leakage.
Granting capabilities to scripts or interpreted binaries (e.g., Python) without considering that the interpreter inherits them.
Leaving writable directories in $PATH that contain capability-enabled binaries - attackers can replace them.
Neglecting to audit capabilities after system upgrades; new packages may introduce binaries with default capabilities.

Real-World Impact

In 2023, a high-profile bug bounty disclosed that a custom monitoring agent shipped with CAP_SYS_TIME and CAP_SYS_MODULE. An attacker with local user access replaced the binary with a trojan, gaining the ability to load kernel modules and set the system clock, ultimately achieving full root compromise.

My experience in red-team engagements shows that the “low-hanging fruit” often lies in mis-configured capabilities rather than classic buffer overflows. Organizations that rely on containers but forget to drop CAP_SYS_ADMIN expose themselves to breakout attacks similar to the infamous “Dirty COW” scenario, albeit via a different attack surface.

Trends indicate a growing emphasis on capability-aware security frameworks (e.g., OCI runtime specifications). As supply-chain attacks rise, ensuring that third-party binaries do not carry unnecessary capabilities becomes a critical part of the software bill of materials (SBOM) verification process.

Practice Exercises

Exercise 1 - Capability Discovery
On a fresh Ubuntu VM, run a script that enumerates all files with capabilities, then classify each based on its expected function. Identify any outliers and propose a remediation plan.
Exercise 2 - Capability Hardening
Create a systemd service that requires CAP_NET_BIND_SERVICE. Use CapabilityBoundingSet and AmbientCapabilities to limit the service to only that capability. Verify with systemctl status and capsh --print.
Exercise 3 - Ambient Capability Leak
Write a small C program that sets an ambient capability, execs /bin/bash, and then checks whether the new shell retains the capability. Document the steps and explain why no_new_privs would prevent the leak.
Exercise 4 - Container Escape Simulation
Launch a Docker container with --cap-add=SYS_ADMIN and attempt to mount a host filesystem inside the container. Then repeat the experiment with --cap-drop=ALL and observe the failure.

Summary

Linux capabilities break the monolithic root model into fine-grained privileges.
Three sets (Effective, Permitted, Inheritable) dictate what a process can do now, later, and across execs.
Tools like getcap, capsh, and lsattr let you enumerate and audit capabilities.
Ambient capabilities and capability namespaces extend the model to containers and user namespaces.
Mis-configured binaries with unnecessary capabilities are a common escalation path; regular audits and least-capability policies are essential defenses.

Linux Capabilities 101: Fundamentals and Enumeration Guide