Sunday March 17 2019

Docker + nftables

Normally, when you install docker it takes care of mucking about the firewall rules for you. It uses iptables under the hood to do this. Unfortunately at this time Docker does not have any native support for nftables. This leaves us with a couple of options, stop using the current Linux firewall and go back to the now legacy iptables utilities.

As you can probably imagine from the title of the article I do not plan on going back to iptables, so let’s get into making this work with nftables.

Modifying the service

On Void Linux it’s pretty straightforward. Once docker is installed, but before you symlink the service directory simply add a conf file:

/etc/sv/docker/conf

OPTS="--iptables=false"

If you’re on another systemd based distribution such as Ubuntu or Fedora you can copy the service from /lib/systemd/system/docker.service to /etc/systemd/system/docker.service, modify the line that starts with ExecStart like so:

ExecStart=/usr/bin/dockerd --iptables=false -H fd://

Then reload the daemons:

# systemctl daemon-reload

You can then enable and start the service from there.

Basic /etc/nftables.conf

nftables is sometimes a bit tricky to get started with, so an example should help out a bit:

/etc/nftables.conf

#!/usr/sbin/nft -f

flush ruleset

table inet filter	{
	chain input	{
		type filter hook input priority 0;

		# Allow all input on loopback
		iif lo accept

		# Accept stateful traffic
		ct state established,related accept

		# Accept SSH
		tcp dport 22 accept

		# Accept HTTP and HTTPs
		tcp dport { 80, 443 } accept

		# Allow some icmp traffic for ipv6
		ip6 nexthdr icmpv6 icmpv6 type {
			nd-neighbor-solicit, echo-request,
			nd-router-advert, nd-neighbor-advert
		} accept

		counter drop
	}
	chain forward	{
		type filter hook forward priority 0;
		# Note that by default docker has a drop default on the
		# forward chain. This is done for security reasons, and I
		# highly recommend you do the same. You will however have
		# to explicitly define what traffic is to be acepted here,
		# e.g. can the networks communicate to the world, other
		# networks, etc.
	}
	chain output	{
		type filter hook output priority 0;
	}
}

table ip nat	{
	chain prerouting	{
		type nat hook prerouting priority 0
	}
	chain postrouting	{
		type nat hook postrouting priority 100
		# You may need to change 'eth0' to your primary interface
		oif eth0 masquerade persistent
	}
}

Once you have the configuration in place it’s as simple as enabling the nftables service for your distribution. You can reload the rules with nft -f /etc/nftables.conf

Forwarding ports

This is all fine and dandy if you’re not trying to hos a service inside of your docker container. Most people use Docker to deploy their services and it would only make sense if I also showed an example of forwarding a port on the host machine to that of an internal docker container.

To do this we don’t have to, but it’s easier to create our own network in the long run as docker does not let us specify a container’s IP on the default network.

It’s pretty straightforward to create this new network:

$ docker network create \
	-o com.docker.network.bridge.name=user0 \
	--subnet=172.20.0.0/16 \
	user

As you can see above we’re passing in an additional option -o to set the bridge name that we’re going to be using for this network. Although not explicitly required this is going to us to easily know what bridge we just created, as docker would otherwise create a random name for us.

We can also inspect and see a little bit more information about the network:

$ docker network inspect user
[
    {
        "Name": "user",
        "Id": "290a98a6f57739f29d758964924c95214219426c02c5b58e0a1049627b8da535",
        "Created": "2019-03-17T14:15:12.635615052-04:00",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.20.0.0/16",
                    "Gateway": "172.20.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.bridge.name": "user0"
        },
        "Labels": {}
    }
]

We can also see that the new bridge we specified exists:

$ brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.0242a6395da2       no
user0           8000.0242b7d38495       no

We can then run a container for our service:

$ docker run \
	--network user \
	--rm \
	-d \
	--ip 172.20.1.80 \
	-v /var/lib/couchdb/data:/opt/couchdb/data \
	couchdb:latest

We can then test that our service, couchdb in this case is responding as expected with:

mitch@void.rygel.us /u/mitch/my-vehicle/couchdb $ curl 172.20.1.80:5984
{"couchdb":"Welcome","version":"2.3.1","git_sha":"c298091a4","uuid":"4e10b1c62f372d787c503842e54e230b","features":["pluggable-storage-engines","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

We can then adjust our nftables configuration to forward along the proper port:

	chain prerouting {
		type nat hook prerouting priority 0
		iif eth0 tcp dport 5984 dnat 172.20.1.80
	}

or if you’d prefer a different port:

iif eth0 tcp dport 24000 dnat 172.20.1.80:5984

Final thoughts

The bridge user0 and the network name user can be changed to anything else you’d like, same with the --subnet when creating the network. Keep in mind though that you should stick to RFC 1918 networks for your --subnet value unless you know what you’re doing.

This makes docker a bit harder to use–of course. Many would likely see more benefit to using the legacy iptables tools and holding off their switch to nftables to after Docker supports it.

As mentioned in the configuration example above, docker by default limits communication between forwarded networks which is good from a security standpoint. I recommend that you do this as well by changing the default policy and explicitly allowing the traffic you want.