Expose network namespace created by Docker

Expose network namespace created by Docker

Disclaimer: this is probably *not* the best way for doing this, but it's pretty good for educational purposes.

During a debug session I wanted to connect an application to a service tha ran in a docker container. This was for test-purposes only, so hackish and fast are the keywords.

First of all, I'm not a Docker expert, but I've a pretty good understanding on Linux internals, namespaces and how things works on a Linux system. So I started to use the tools I had.

Create a container to work with

This is not the container I wanted to debug, but to have something to demonstrate the concept:

FROM ubuntu

MAINTAINER Marcus Folkesson <marcus.folkesson@gmail.com>

RUN apt-get update
RUN apt-get update
RUN apt-get install -y socat

RUN ["/bin/sh"]

Create and start the container

docker build -t netns .
docker run -ti netns /bin/bash

Inside the container, create a TCP-server with socat:

root@c8a2438ad58e:/# socat - TCP-L:1234

Lets play around

I used to list network namespaces with ip netns list, but the command gave me no outputs even while the docker container was running.

That was unexpected. I wonder where ip actually look for namespaces. To find out I used strace and looked for the openat system call:

$ strace -e openat  ip netns list
...
openat(AT_FDCWD, "/var/run/netns", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 5
...

Ok, ip netns list does probably look for files representing network namespaces in /var/run/netns.

Lets try to create an entry (dockerns) on that location:

$ sudo touch /var/run/netns/dockerns
$ ip netns  list
Error: Peer netns reference is invalid.
Error: Peer netns reference is invalid.
dockerns

Good. It tries to dereference the namespace!

All namespaces for a certain PID is exposed in procfs. For example, here are the namespaces that my bash session belongs to:

$ ls -al /proc/self/ns/
total 0
dr-x--x--x 2 marcus marcus 0 15 dec 12.35 .
dr-xr-xr-x 9 marcus marcus 0 15 dec 12.35 ..
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 mnt -> 'mnt:[4026531841]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 net -> 'net:[4026531840]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 time -> 'time:[4026531834]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 user -> 'user:[4026531837]'
lrwxrwxrwx 1 marcus marcus 0 15 dec 12.35 uts -> 'uts:[4026531838]'

Now we only should do the same for the container.

First, find the ID of the running container with docker ps:

$ docker ps
CONTAINER ID   IMAGE     COMMAND       CREATED              STATUS          PORTS     NAMES
c8a2438ad58e   netns     "/bin/bash"   About a minute ago   Up 59 seconds             dazzling_lewi

Inspect the running container to determine the PID:

$ docker inspect -f '{{.State.Pid}}'  c8a2438ad58e
36897

36897, there we have it.

Lets see which namespaces it has:

$ sudo ls -al /proc/36897/ns
total 0
dr-x--x--x 2 root root 0 15 dec 09.59 .
dr-xr-xr-x 9 root root 0 15 dec 09.59 ..
lrwxrwxrwx 1 root root 0 15 dec 10.01 cgroup -> 'cgroup:[4026533596]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 ipc -> 'ipc:[4026533536]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 mnt -> 'mnt:[4026533533]'
lrwxrwxrwx 1 root root 0 15 dec 09.59 net -> 'net:[4026533538]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 pid -> 'pid:[4026533537]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 pid_for_children -> 'pid:[4026533537]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 15 dec 10.01 uts -> 'uts:[4026533534]'

As we can see, the container have different IDs for the most (user and time is shared) namespaces.

Now we have the network namespace for the container, lets bind mount it to var/run/netns/dockerns:

$ sudo mount -o bind /proc/36897/ns/net /var/run/netns/dockerns

And run ip netns list again:

$ ip netns list
dockerns (id: 0)

Nice.

Start socat in the dockerns network namespace and connect to localhost:1234:

$ sudo ip netns exec dockerns socat - TCP:localhost:1234
hello
Hello from the outside world
Hello from inside docker

It works! We are now connected to the service running in the container!

Conclusion

It's fun to play around, but there are room for improvements.

For example, a better way to list namespaces is to use lsns as this tool looks after namespaces in more paths including /run/docker/netns/.

Also, a more "correct" way is probably to create a virtual ethernet device and attach it to the same namespace.

veth example

To do that, we first need to determine the PID for the container:

$ lsns -t net
        NS TYPE NPROCS     PID USER       NETNSID NSFS                           COMMAND
	...
	4026533538 net       3   36897 root             0 /run/docker/netns/06a083424158 /bin/bash

Create the public dockerns namespace (create /var/run/netns/dockerns as we did earlier but with ip netns attach):

$ sudo ip netns attach dockerns 36897

Create virtual interfaces, assign network namespace and create routes:

$ sudo ip link add veth0 type veth peer name veth1
$ sudo ip link set veth1 netns dockerns
$ sudo ip address add 192.168.3.1/24 dev veth0
$ sudo ip netns exec dockerns ip a add 192.168.3.2/24 dev veth1
$ sudo ip link set up veth0
$ sudo ip netns exec dockerns ip l set up veth1
$ sudo ip route add 10.0.42.0/24 via 192.168.3.2

That is pretty much it.

TIL - docker scratch image

TIL - Docker scratch image

TIL, Today I Learned, is more of a "I just figured this out: here are my notes, you may find them useful too" rather than a full blog post

The scratch image is the smallest possible image for docker. It does not contain any libraries nor other executables. It is simply a new, fresh and empty setup of namespaces.

The FROM scratch line is even a no-op [1] in the Dockerfile, which results in that it will not create an extra layer in you image. Example of a Dockerfile:

FROM scratch
ADD hello /
CMD ["/hello"]

As the Docker image does not contains any libraries, the hello application in the example above has to be compiled statically.

One use I see is to setup a Docker image based on a completely custom made root filesystem, e.g. the output from Buildroot.

See Docker documentation [2] for further reading.