Expose network namespace created by Docker

Expose network namespace created by Docker

Disclaimer: this is probably *not* the best way for doing this, but it's pretty good for educational purposes.

During a debug session I wanted to connect an application to a service tha ran in a docker container. This was for test-purposes only, so hackish and fast are the keywords.

First of all, I'm not a Docker expert, but I've a pretty good understanding on Linux internals, namespaces and how things works on a Linux system. So I started to use the tools I had.

Create a container to work with

This is not the container I wanted to debug, but to have something to demonstrate the concept:

FROM ubuntu

MAINTAINER Marcus Folkesson <marcus.folkesson@gmail.com>

RUN apt-get update
RUN apt-get update
RUN apt-get install -y socat

RUN ["/bin/sh"]

Create and start the container

docker build -t netns .
docker run -ti netns /bin/bash

Inside the container, create a TCP-server with socat:

root@c8a2438ad58e:/# socat - TCP-L:1234

Lets play around

I used to list network namespaces with ip netns list, but the command gave me no outputs even while the docker container was running.

That was unexpected. I wonder where ip actually look for namespaces. To find out I used strace and looked for the openat system call:

$ strace -e openat  ip netns list
...
openat(AT_FDCWD, "/var/run/netns", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 5
...

Ok, ip netns list does probably look for files representing network namespaces in /var/run/netns.

Lets try to create an entry (dockerns) on that location:

$ sudo touch /var/run/netns/dockerns
$ ip netns  list
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
dockerns

Good. It tries to dereference the namespace!

All namespaces for a certain PID is exposed in procfs. For example, here are the namespaces that my bash session belongs to:

$ ls -al /proc/self/ns/
total 0
dr-x--x--x 2 marcus marcus 0 15 dec 12.35 .
dr-xr-xr-x 9 marcus marcus 0 15 dec 12.35 ..
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 mnt -> 'mnt:[4026531841]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 net -> 'net:[4026531840]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 time -> 'time:[4026531834]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 user -> 'user:[4026531837]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 uts -> 'uts:[4026531838]'

Now we only should do the same for the container.

First, find the ID of the running container with docker ps:

$ docker ps
CONTAINER ID   IMAGE     COMMAND       CREATED              STATUS          PORTS     NAMES
c8a2438ad58e   netns     "/bin/bash"   About a minute ago   Up 59 seconds             dazzling_lewi

Inspect the running container to determine the PID:

$ docker inspect -f '{{.State.Pid}}'  c8a2438ad58e
36897

36897, there we have it.

Lets see which namespaces it has:

$ sudo ls -al /proc/36897/ns
total 0
dr-x--x--x 2 root root 0 15 dec 09.59 .
dr-xr-xr-x 9 root root 0 15 dec 09.59 ..
lrwxrwxrwx 1 root root 0 15 dec 10.01 cgroup -> 'cgroup:[4026533596]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 ipc -> 'ipc:[4026533536]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 mnt -> 'mnt:[4026533533]'
lrwxrwxrwx 1 root root 0 15 dec 09.59 net -> 'net:[4026533538]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 pid -> 'pid:[4026533537]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 pid_for_children -> 'pid:[4026533537]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 uts -> 'uts:[4026533534]'

As we can see, the container have different IDs for the most (user and time is shared) namespaces.

Now we have the network namespace for the container, lets bind mount it to var/run/netns/dockerns:

$ sudo mount -o bind /proc/36897/ns/net /var/run/netns/dockerns

And run ip netns list again:

$ ip netns list
dockerns (id: 0)

Nice.

Start socat in the dockerns network namespace and connect to localhost:1234:

$ sudo ip netns exec dockerns socat - TCP:localhost:1234
hello
Hello from the outside world
Hello from inside docker

It works! We are now connected to the service running in the container!

Conclusion

It's fun to play around, but there are room for improvements.

For example, a better way to list namespaces is to use lsns as this tool looks after namespaces in more paths including /run/docker/netns/.

Also, a more "correct" way is probably to create a virtual ethernet device and attach it to the same namespace.

veth example

To do that, we first need to determine the PID for the container:

$ lsns -t net
        NS TYPE NPROCS     PID USER       NETNSID NSFS                           COMMAND
	...
	4026533538 net       3   36897 root             0 /run/docker/netns/06a083424158 /bin/bash

Create the public dockerns namespace (create /var/run/netns/dockerns as we did earlier but with ip netns attach):

$ sudo ip netns attach dockerns 36897

Create virtual interfaces, assign network namespace and create routes:

$ sudo ip link add veth0 type veth peer name veth1
$ sudo ip link set veth1 netns dockerns
$ sudo ip address add 192.168.3.1/24 dev veth0
$ sudo ip netns exec dockerns ip a add 192.168.3.2/24 dev veth1
$ sudo ip link set up veth0
$ sudo ip netns exec dockerns ip l set up veth1
$ sudo ip route add 10.0.42.0/24 via 192.168.3.2

That is pretty much it.

Loopback with two (physical) ethernet interfaces

Loopback with two (physical) ethernet interfaces

Imagine that you have an embedded device with two physical ethernet ports. You want to verify the functionality of both these ports in the manufacturing process, so you connect an ethernet cable between the ports, setup IP addresses and now what?

As Linux (actually the default network namespace) is aware of the both adapters and their IP/MAC-addresses, the system see no reason to send any traffic out. Instead, Linux will loop all traffic between the interfaces internally.

To avoid that and actually force traffic out on the cable, we have to make the adapters unaware of eachother. This is done by putting them into different network namespaces!

/media/loopback.png

Hands on

To do this, all you need is to have support for network namespaces in the kernel (CONFIG_NET_NS=y) and the iproute2 [1] package, which both probably is included in every standard Linux distribution nowadays.

We will create two network namespaces, lets call them netns_eth0 and netns_eth1:

ip netns add netns_eth0
ip netns add netns_eth1

Move each adapter to their new home:

ip link set eth0 netns netns_eth0
ip link set eth1 netns netns_eth1

Assign ip addresses:

ip netns exec netns_eth0 ip addr add dev eth0 192.168.0.1/24
ip netns exec netns_eth1 ip addr add dev eth1 192.168.0.2/24

Bring up the interfaces:

ip netns exec netns_eth0 ip link set eth0 up
ip netns exec netns_eth1 ip link set eth1 up

Now we can ping each interface and know for sure that the traffic is actually on the cable:

ip netns exec netns_eth0 ping 192.168.0.2
ip netns exec netns_eth1 ping 192.168.0.1