How the netfilter Framework Controls Packet Flow Inside the Linux Kernel

A network packet arriving on a Linux machine does not travel in a straight line from the network interface to an application. It moves through a layered processing pipeline inside the kernel, and at precisely defined points along that journey the kernel pauses and asks registered code what should happen next. Should the packet continue? Should it be silently discarded? Should its source address be rewritten before forwarding? That mechanism is netfilter, and it has been part of the Linux kernel since version 2.4, introduced in 1998 by Rusty Russell as a replacement for the older, far less flexible ipchains and ipfwadm systems.

Everything the Linux kernel does with network packets at the subsystem level, including stateful firewalling, Network Address Translation, connection tracking, and traffic classification, runs through netfilter. Tools like iptables, nftables, and ipvs are not independent firewalling systems. They are userspace interfaces to the same underlying kernel framework. Understanding netfilter's architecture means understanding the foundation beneath all of them.

Five Hook Points Where Registered Code Intercepts Every Packet

The structural core of netfilter is a set of hook points embedded at fixed locations in the kernel's IPv4 and IPv6 networking code. Each hook is a numbered position in the packet processing path where the kernel invokes every registered callback function in priority order before allowing the packet to continue. For IPv4, there are five hooks, defined in the kernel's nf_inet_hooks enumeration:

enum nf_inet_hooks {
    NF_INET_PRE_ROUTING,   /* all incoming packets, before routing decision  */
    NF_INET_LOCAL_IN,      /* packets destined for this machine              */
    NF_INET_FORWARD,       /* packets being routed through to another host   */
    NF_INET_LOCAL_OUT,     /* packets generated by local processes           */
    NF_INET_POST_ROUTING,  /* all outgoing packets, after routing decision   */
};

The path any individual packet takes through these five points depends entirely on its destination. An incoming packet destined for a local service traverses NF_INET_PRE_ROUTING and then NF_INET_LOCAL_IN. A packet that the kernel is forwarding between interfaces passes through NF_INET_PRE_ROUTING, then NF_INET_FORWARD, then NF_INET_POST_ROUTING. A packet generated by a local process enters at NF_INET_LOCAL_OUT and exits through NF_INET_POST_ROUTING. No single packet traverses all five hooks, which is a critical fact when reasoning about where a filtering rule will actually take effect.

Since Linux 4.2, an additional ingress hook fires on each network device individually, even before NF_INET_PRE_ROUTING and before IP fragmentation reassembly. This makes it possible to drop unwanted traffic at the absolute earliest point in processing, before the kernel has spent any resources on IP header parsing or connection tracking lookup.

How Kernel Modules Register Callbacks Using nf_hook_ops

Any kernel code that wants to process packets at a hook point registers a callback function using the nf_hook_ops structure. That structure declares which protocol family and which hook the callback applies to, the function pointer itself, and a numerical priority that determines its position relative to other registered callbacks at the same hook:

#include <linux/netfilter.h>
#include <linux/netfilter_ipv4.h>
#include <linux/ip.h>

static unsigned int my_hook_fn(void *priv,
                                struct sk_buff *skb,
                                const struct nf_hook_state *state)
{
    struct iphdr *iph = ip_hdr(skb);

    /* Drop all packets from 192.0.2.1 */
    if (iph->saddr == htonl(0xC0000201))
        return NF_DROP;

    return NF_ACCEPT;
}

static struct nf_hook_ops my_hook_ops = {
    .hook     = my_hook_fn,
    .pf       = NFPROTO_IPV4,
    .hooknum  = NF_INET_PRE_ROUTING,
    .priority = NF_IP_PRI_FILTER,
};

/* Called in module init */
nf_register_net_hook(&init_net, &my_hook_ops);

/* Called in module exit */
nf_unregister_net_hook(&init_net, &my_hook_ops);

The sk_buff pointer, universally abbreviated as skb, is the kernel's universal packet representation. It carries the raw packet data, metadata about the originating network interface, any connection tracking state already associated with the flow, and pointers into the parsed headers. Every netfilter callback receives an skb and returns one of several verdict constants: NF_ACCEPT passes the packet to the next registered callback or to the next stage of processing; NF_DROP silently discards it; NF_STOLEN tells the framework that the callback has taken ownership of the packet; NF_QUEUE hands it to a userspace process via the nfqueue mechanism; and NF_REPEAT asks the framework to re-invoke the current callback on the same packet.

When multiple callbacks are registered at the same hook, priority governs execution order. Lower numerical values run first. The connection tracking subsystem, for example, registers at NF_IP_PRI_CONNTRACK, which is numerically lower than NF_IP_PRI_FILTER, ensuring that flow state is established before filtering rules evaluate it.

Connection Tracking as a Stateful Foundation for Everything Above It

Packet filtering that cannot distinguish a new connection attempt from an established reply is not stateful filtering at all. The connection tracking subsystem, implemented in the nf_conntrack kernel module, is what gives netfilter its memory. It registers callbacks at NF_INET_PRE_ROUTING and NF_INET_LOCAL_OUT that examine every packet and associate it with a tracked flow, creating a new nf_conn entry in the conntrack table if none exists, or updating an existing one.

Each tracked connection is represented by a pair of nf_conntrack_tuple structures, one for each direction of the flow, encoding the source and destination addresses, ports, and protocol. The tuple for an incoming TCP connection and the tuple for its replies are stored together in a single nf_conn entry and indexed in a hash table for fast lookup:

# View the live conntrack table from userspace
conntrack -L

# Example output for an established TCP connection:
# tcp  6  86394  ESTABLISHED
#   src=192.168.1.10  dst=93.184.216.34  sport=54320  dport=443
#   src=93.184.216.34 dst=192.168.1.10   sport=443    dport=54320
#   [ASSURED] mark=0 use=1

# Count entries currently in the table
conntrack -C

# Delete all tracked connections for a specific host
conntrack -D -s 192.168.1.50

Connection state is exposed to filtering rules through the ctstate match in iptables and the ct state expression in nftables. A rule that accepts packets in the ESTABLISHED or RELATED state, and drops everything INVALID, is the minimal stateful firewall that most administrators deploy. Without connection tracking underneath it, that distinction would be impossible to make.

The conntrack table has a maximum size controlled by the nf_conntrack_max kernel parameter. On systems with high connection rates, this limit requires tuning. Hitting it means new connections cannot be tracked and are dropped:

# View current conntrack limits and table usage
sysctl net.netfilter.nf_conntrack_max
sysctl net.netfilter.nf_conntrack_count

# Raise the limit permanently
echo "net.netfilter.nf_conntrack_max = 262144" >> /etc/sysctl.d/99-conntrack.conf
sysctl -p /etc/sysctl.d/99-conntrack.conf

The iptables Layer and How It Maps to Hooks Through Tables and Chains

iptables is the traditional administrative interface to netfilter. It organizes rules into tables, and each table contains chains, which are ordered lists of rules evaluated top to bottom. The mapping between chains and hook points is fixed: the PREROUTING chain corresponds to NF_INET_PRE_ROUTING, INPUT to NF_INET_LOCAL_IN, FORWARD to NF_INET_FORWARD, OUTPUT to NF_INET_LOCAL_OUT, and POSTROUTING to NF_INET_POST_ROUTING.

Tables organize rules by function rather than by hook position. The filter table handles accept or drop decisions and is registered at the INPUT, FORWARD, and OUTPUT hooks. The nat table handles address and port translation and operates at PREROUTING, OUTPUT, and POSTROUTING. The mangle table allows arbitrary packet modification at all five hooks. The raw table, evaluated before connection tracking, allows specific flows to bypass the conntrack subsystem entirely using the NOTRACK target.

A practical baseline ruleset using iptables for a server that accepts SSH and HTTP traffic, tracks established connections, and drops everything else looks like this:

# Flush existing rules
iptables -F
iptables -X

# Default policies: drop all input and forwarded traffic, allow output
iptables -P INPUT   DROP
iptables -P FORWARD DROP
iptables -P OUTPUT  ACCEPT

# Accept traffic on the loopback interface
iptables -A INPUT -i lo -j ACCEPT

# Accept established and related connections via conntrack
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT

# Drop invalid packets before any other evaluation
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP

# Accept SSH and HTTPS on specific ports
iptables -A INPUT -p tcp --dport 22   -m conntrack --ctstate NEW -j ACCEPT
iptables -A INPUT -p tcp --dport 443  -m conntrack --ctstate NEW -j ACCEPT

# Save rules across reboots (Debian/Ubuntu)
iptables-save > /etc/iptables/rules.v4

Each -A line appends a rule to the named chain. The conntrack --ctstate match relies directly on the state that nf_conntrack has already established at the PRE_ROUTING hook before the INPUT chain evaluates the packet.

nftables as the Modern Successor with a Unified Rule Language

nftables, merged into the mainline kernel in version 3.13, was designed to address the structural limitations of iptables: separate tools for IPv4, IPv6, ARP, and bridge filtering, redundant rule evaluation, no native arithmetic on packet fields, and a match extension system that required adding new kernel modules to cover new use cases. nftables replaces all of that with a single framework and a flexible expression language that can match and modify arbitrary offsets in a packet header without any additional kernel code.

In nftables, the administrator creates tables and chains explicitly, declares which hook and priority each chain attaches to, and writes rules using expressions rather than match-target pairs. An equivalent stateful firewall ruleset in nftables is more compact and applies uniformly to both IPv4 and IPv6:

# Apply a complete nftables ruleset from a file
nft -f /etc/nftables.conf

# /etc/nftables.conf

table inet filter {

    chain input {
        type filter hook input priority filter; policy drop;

        iif "lo" accept
        ct state established,related accept
        ct state invalid drop

        tcp dport { 22, 443 } ct state new accept
    }

    chain forward {
        type filter hook forward priority filter; policy drop;
    }

    chain output {
        type filter hook output priority filter; policy accept;
    }
}

The inet address family handles both IPv4 and IPv6 through a single table. The ct state expression consults the same nf_conntrack infrastructure that iptables uses. Underneath the different syntax, both tools register their callbacks through the identical nf_hook_ops mechanism, and both operate on the same sk_buff structures passing through the same five hook points.

NAT Implementation and Why It Depends on Connection Tracking

Network Address Translation in netfilter is not a standalone feature. It is a specialized application of connection tracking, and it cannot function without the nf_conntrack subsystem loaded. When a NAT rule matches a new connection, the nat table callback modifies the packet's address or port fields and records that transformation in the nf_conn entry for that flow. Every subsequent packet belonging to the same connection is automatically translated in the same way, without consulting the rule table again, by a callback registered at a lower priority than filtering:

# Source NAT: rewrite the source address of all outgoing packets to the public interface IP
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

# Destination NAT: redirect incoming traffic on port 80 to an internal host
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 \
    -j DNAT --to-destination 10.0.0.5:8080

# nftables equivalent for masquerade and DNAT in a single inet table
nft add rule ip nat postrouting oif "eth0" masquerade
nft add rule ip nat prerouting iif "eth0" tcp dport 80 dnat to 10.0.0.5:8080

The reply direction of a NATed connection is handled automatically. When a packet arrives in the reverse direction belonging to a tracked and NATed flow, the connection tracking lookup finds the existing nf_conn entry, reads the stored translation, and applies the inverse transformation. The administrator writes one rule for the outgoing direction; the return path requires no additional configuration.

What Kernel Namespaces and nftables Sets Add to the Framework at Scale

On systems running container workloads or multi-tenant network environments, the interaction between netfilter and Linux network namespaces becomes architecturally significant. Each network namespace has its own independent copy of every hook point, its own conntrack table, and its own nftables or iptables ruleset. Rules installed in the host namespace do not apply inside a container namespace, and vice versa. This isolation means a container cannot inspect or modify the conntrack state of the host or of sibling containers.

For environments with large and frequently changing sets of addresses or ports, nftables provides native set and map types that avoid the performance degradation caused by long linear rule chains. A set containing ten thousand blocked IP addresses is evaluated in constant time through a hash or red-black tree structure maintained inside the kernel:

# Create a named set of IPv4 addresses to block, with automatic timeout
nft add set inet filter blocklist { type ipv4_addr\; flags dynamic,timeout\; timeout 1h\; }

# Add an address to the set
nft add element inet filter blocklist { 203.0.113.7 }

# Reference the set in a drop rule
nft add rule inet filter input ip saddr @blocklist drop

# Add elements dynamically from a meter (rate limiting per source)
nft add rule inet filter input \
    meter ratelimit { ip saddr limit rate 100/second } accept

The same set infrastructure underlies the verdict maps that nftables uses to route packets to different chains based on header values, enabling rule structures that would require hundreds of iptables rules to express in just a few compact statements.

netfilter's longevity in the Linux kernel, now spanning more than two decades and four generations of userspace interface, reflects something about the solidity of its core design. The five hook points, the priority-ordered callback chain, and the separation between the filtering mechanism and the policy expressed through it have proven flexible enough to accommodate stateful NAT, connection tracking helpers for protocols like FTP and SIP, traffic classification for quality of service, transparent proxying, and the entire nftables expression language, all without architectural revision. The packet processing framework that Rusty Russell embedded in the 2.4 kernel is still, at its structural core, the same framework that processes every packet on a modern Linux router, container host, or cloud instance today.