ligsegfault.so

libsegfault.so

The dynamic linker [1] in a Linux system is using several environment variables to customize it's behavior. The most commonly used is probably LD_LIBRARY_PATH which is a list of directories where it search for libraries at execution time. Another variable I use quite often is LD_TRACE_LOADED_OBJECTS to let the program list its dynamic dependencies, just like ldd(1).

For example, consider the following output

$ LD_TRACE_LOADED_OBJECTS=1 /bin/bash
    linux-vdso.so.1 (0x00007ffece29e000)
    libreadline.so.7 => /usr/lib/libreadline.so.7 (0x00007fc9b82d1000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fc9b80cd000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007fc9b7d15000)
    libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007fc9b7add000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fc9b851f000)
    libtinfo.so.6 => /usr/lib/libtinfo.so.6 (0x00007fc9b78b1000)

LD_PRELOAD

LD_PRELOAD is a list of additional shared objects that should be loaded before all other dynamic dependencies. When the loader is resolving symbols, it sequentially walk through the list of dynamic shared objects and takes the first match. This makes it possible to overide functions in other shared objects and change the behavior of the application completely.

Consider the following example

$ LD_PRELOAD=/usr/lib/libSegFault.so LD_TRACE_LOADED_OBJECTS=1 /bin/bash
    linux-vdso.so.1 (0x00007ffc73f61000)
    /usr/lib/libSegFault.so (0x00007f131c234000)
    libreadline.so.7 => /usr/lib/libreadline.so.7 (0x00007f131bfe6000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f131bde2000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f131ba2a000)
    libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007f131b7f2000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f131c439000)
    libtinfo.so.6 => /usr/lib/libtinfo.so.6 (0x00007f131b5c6000)

Here we have preloaded libSegFault and it is listed in second place. In the first place we have linux-vdso.so.1 which is a Virtual Dynamic Shared Object provided by the Linux kernel. The VDSO deserves it's own separate blog post, it is a cool feature that maps kernel code into the a process's context as a .text segment in a virtual library.

libSegFault.so

libSegFault.so is part of glibc [2] and comes with your toolchain. The library is for debugging purpose and is activated by preload it at runtime. It does not actually overrides functions but register signal handlers in a constructor (yes, you can execute code before main) for specified signals. By default only SIGEGV (see signal(7)) is registered. These registered handlers print a backtrace for the applicaton when the signal is delivered. See its implementation in [3].

Set the environment variable SEGFAULT_SIGNALS to explicit select signals you want to register a handler for.

/media/libsegfault.png

This is an useful feature for debug purpose. The best part is that you don't have to recompile your code.

libSegFault in action

Our application

Consider the following in real life application taken directly from the local nuclear power plant:

void handle_uranium(char *rod)
{
    *rod = 0xAB;
}

void start_reactor()
{
    char *rod = 0x00;
    handle_uranium(rod);
}

int main()
{
    start_reactor();
}

The symptom

We are seeing a segmentation fault when operate on a particular uranium rod, but we don't know why.

Use libSegFault

Start the application with libSegFault preloaded and examine the dump:

$ LD_PRELOAD=/usr/lib/libSegFault.so ./powerplant
*** Segmentation fault
Register dump:

 RAX: 0000000000000000   RBX: 0000000000000000   RCX: 0000000000000000
 RDX: 00007ffdf6aba5a8   RSI: 00007ffdf6aba598   RDI: 0000000000000000
 RBP: 00007ffdf6aba480   R8 : 000055d2ad5e16b0   R9 : 00007f98534729d0
 R10: 0000000000000008   R11: 0000000000000246   R12: 000055d2ad5e14f0
 R13: 00007ffdf6aba590   R14: 0000000000000000   R15: 0000000000000000
 RSP: 00007ffdf6aba480

 RIP: 000055d2ad5e1606   EFLAGS: 00010206

 CS: 0033   FS: 0000   GS: 0000

 Trap: 0000000e   Error: 00000006   OldMask: 00000000   CR2: 00000000

 FPUCW: 0000037f   FPUSW: 00000000   TAG: 00000000
 RIP: 00000000   RDP: 00000000

 ST(0) 0000 0000000000000000   ST(1) 0000 0000000000000000
 ST(2) 0000 0000000000000000   ST(3) 0000 0000000000000000
 ST(4) 0000 0000000000000000   ST(5) 0000 0000000000000000
 ST(6) 0000 0000000000000000   ST(7) 0000 0000000000000000
 mxcsr: 1f80
 XMM0:  00000000000000000000000000000000 XMM1:  00000000000000000000000000000000
 XMM2:  00000000000000000000000000000000 XMM3:  00000000000000000000000000000000
 XMM4:  00000000000000000000000000000000 XMM5:  00000000000000000000000000000000
 XMM6:  00000000000000000000000000000000 XMM7:  00000000000000000000000000000000
 XMM8:  00000000000000000000000000000000 XMM9:  00000000000000000000000000000000
 XMM10: 00000000000000000000000000000000 XMM11: 00000000000000000000000000000000
 XMM12: 00000000000000000000000000000000 XMM13: 00000000000000000000000000000000
 XMM14: 00000000000000000000000000000000 XMM15: 00000000000000000000000000000000

Backtrace:
./powerplant(+0x606)[0x55d2ad5e1606]
./powerplant(+0x628)[0x55d2ad5e1628]
./powerplant(+0x639)[0x55d2ad5e1639]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f9852ec6f6a]
./powerplant(+0x51a)[0x55d2ad5e151a]

Memory map:

55d2ad5e1000-55d2ad5e2000 r-xp 00000000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant
55d2ad7e1000-55d2ad7e2000 r--p 00000000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant
55d2ad7e2000-55d2ad7e3000 rw-p 00001000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant
55d2ada9c000-55d2adabd000 rw-p 00000000 00:00 0                          [heap]
7f9852c8f000-7f9852ca5000 r-xp 00000000 00:13 13977863                   /usr/lib/libgcc_s.so.1
7f9852ca5000-7f9852ea4000 ---p 00016000 00:13 13977863                   /usr/lib/libgcc_s.so.1
7f9852ea4000-7f9852ea5000 r--p 00015000 00:13 13977863                   /usr/lib/libgcc_s.so.1
7f9852ea5000-7f9852ea6000 rw-p 00016000 00:13 13977863                   /usr/lib/libgcc_s.so.1
7f9852ea6000-7f9853054000 r-xp 00000000 00:13 13975885                   /usr/lib/libc-2.26.so
7f9853054000-7f9853254000 ---p 001ae000 00:13 13975885                   /usr/lib/libc-2.26.so
7f9853254000-7f9853258000 r--p 001ae000 00:13 13975885                   /usr/lib/libc-2.26.so
7f9853258000-7f985325a000 rw-p 001b2000 00:13 13975885                   /usr/lib/libc-2.26.so
7f985325a000-7f985325e000 rw-p 00000000 00:00 0
7f985325e000-7f9853262000 r-xp 00000000 00:13 13975827                   /usr/lib/libSegFault.so
7f9853262000-7f9853461000 ---p 00004000 00:13 13975827                   /usr/lib/libSegFault.so
7f9853461000-7f9853462000 r--p 00003000 00:13 13975827                   /usr/lib/libSegFault.so
7f9853462000-7f9853463000 rw-p 00004000 00:13 13975827                   /usr/lib/libSegFault.so
7f9853463000-7f9853488000 r-xp 00000000 00:13 13975886                   /usr/lib/ld-2.26.so
7f9853649000-7f985364c000 rw-p 00000000 00:00 0
7f9853685000-7f9853687000 rw-p 00000000 00:00 0
7f9853687000-7f9853688000 r--p 00024000 00:13 13975886                   /usr/lib/ld-2.26.so
7f9853688000-7f9853689000 rw-p 00025000 00:13 13975886                   /usr/lib/ld-2.26.so
7f9853689000-7f985368a000 rw-p 00000000 00:00 0
7ffdf6a9b000-7ffdf6abc000 rw-p 00000000 00:00 0                          [stack]
7ffdf6bc7000-7ffdf6bc9000 r--p 00000000 00:00 0                          [vvar]
7ffdf6bc9000-7ffdf6bcb000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

At a first glance, the information may feel overwelming, but lets go through the most importat lines.

The backtrace lists the call chain when the the signal was delivered to the application. The first entry is on top of the stack

Backtrace:
./powerplant(+0x606)[0x55d2ad5e1606]
./powerplant(+0x628)[0x55d2ad5e1628]
./powerplant(+0x639)[0x55d2ad5e1639]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f9852ec6f6a]
./powerplant(+0x51a)[0x55d2ad5e151a]

Here we can see that the last executed instruction is at address 0x55d2ad5e1606. The tricky part is that the address is not absolute in the application, but virtual for the whole process. In other words, we need to calculate the address to an offset within the application's .text segment. If we look at the Memory map we see three entries for the powerplant application:

55d2ad5e1000-55d2ad5e2000 r-xp 00000000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant
55d2ad7e1000-55d2ad7e2000 r--p 00000000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant
55d2ad7e2000-55d2ad7e3000 rw-p 00001000 00:13 14897704                   /home/marcus/tmp/segfault/powerplant

Why three? Most ELF files (application or library) has at least three memory mapped sections: - .text, The executable code - .rodata, read only data - .data, read/write data

With help of the permissions it is possible to figure out which mapping correspond to each section.

The last mapping has rw- as permissions and is probably our .data section as it allows both write and read. The middle mapping has r-- and is a read only mapping - probably our .rodata section. The first mapping has r-x which is read-only and executable. This must be our .text section!

Now we can take the address from our backtrace and subtract with the offset address for our .text section: 0x55d2ad5e1606 - 0x55d2ad5e1000 = 0x606

Use addr2line to get the corresponding line our source code

$ addr2line -e ./powerplant -a 0x606
    0x0000000000000606
    /home/marcus/tmp/segfault/main.c:3

If we go back to the source code, we see that line 3 in main.c is

*rod = 0xAB;

Here we have it. Nothing more to say.

Conclusion

libSegFault.so has been a great help over a long time. The biggest benefit is that you don't have to recompile your application when you want to use the feature. However, you cannot get the line number from addr2line if the application is not compiled with debug symbols, but often it is not that hard to figure out the context out from a dissassembly of your application.

BEER2RST - my first attempt with golang

BEER2RST - my first attempt with golang

I'm using Beersmith [1] a (non-free...) software to create my beer recipes. It is written in Java and runs well on Linux. One of the biggest benefits with using Beersmith is that my brewing house [2] is taking exported recipes as input and setup mash schemes automatically - really comfortable.

I brew beer several times a month and always takes notes of each brewing, both methods and results. The goal is to get reproducible results each time or to make small improvements. Instead of having all these notes in separate locations, it would be nice if I instead collected all of my brewing as blog posts.

With that said, this blog will probably evolve to contain non-technical posts as well in the near future.

In these posts, I also want to publish my recipes somehow. Beersmith is able to export recipes in a non overcomplicaded XML format and is quite straight forward to parse. All posts that I'm writing is in reStructuredText [3] format, so I need to create a tool that read the XML and export the recipes in reStructuredText format.

First glance at Golang

/media/golang.png

To be honest, I'm not really a fan of high-level programming languages as I can't take them seriously. But I guess it is time to learn something new. What I have tested with Go so far is rather impressive. For example, I cannot imagine a simpler XML parsing (I'm used to libxml2).

I also like the gofmt [4] tool to make the code properly formatted. Every language should have such a tool. It is also easy so cross compile the application to different architectures by specify $GOOS and $GOARCH. I have only tested with ARM and it just works.

Imports

Most languages has a way to tell what functionality it should import and be usable to your file. Go is using import and has the following syntax

import (
    "encoding/xml"
    "flag"
    "fmt"
    "io/ioutil"
    "os"
    "strings"
)

What makes it really interesting is when you do this

import (
    "github.com/marcusfolkesson/tablewriter"
)

I.e. point to a repository. You only need to download the repository to your $GOPATH location and then it is useable

go get github.com/marcusfolkesson/tablewriter
Notes:
I need to create tables for print the recipes properly. I found tablewriter [5] that is printing ASCII-tables. Unfortunately, it does not create tables in reStructuredText format so I had to fork the project and implement support for that. Hopefully the changes will make it back to the original project. There is a pending pull request for that.

Unmarshal XML

I really liked how easy it was to unmarshal XML. Consider the following snip from the exported recipe

<HOP>
 <NAME>Northern Brewer</NAME>
 <VERSION>1</VERSION>
 <ORIGIN>Germany</ORIGIN>
 <ALPHA>8.5000000</ALPHA>
 <AMOUNT>0.0038459</AMOUNT>
 <USE>Boil</USE>
 <TIME>60.0000000</TIME>
 <NOTES>Also called Hallertauer Northern Brewers
Used for: Bittering and finishing both ales and lagers of all kinds
Aroma: Fine, dry, clean bittering hop.  Unique flavor.
Substitutes: Hallertauer Mittelfrueh, Hallertauer
Examples: Anchor Steam, Old Peculiar, </NOTES>
 <TYPE>Both</TYPE>
 <FORM>Pellet</FORM>
 <BETA>4.0000000</BETA>
 <HSI>35.0000000</HSI>
 <DISPLAY_AMOUNT>3.85 g</DISPLAY_AMOUNT>
 <INVENTORY>0.00 g</INVENTORY>
 <DISPLAY_TIME>60.0 min</DISPLAY_TIME>
</HOP>

The first step is to create a structure that should hold the values

type Hop struct {
    Name   string  `xml:"NAME"`
    Origin string  `xml:"ORIGIN"`
    Alpha  float64 `xml:"ALPHA"`
    Amount float64 `xml:"AMOUNT"`
    Use    string  `xml:"USE"`
    Time   float64 `xml:"TIME"`
    Notes  string  `xml:"NOTES"`
}

There is no need to create variables for all tags, just the ones you are interesting in.

Later on, read the file and unmarshal the XML

content, err := ioutil.ReadFile("beer.xml")
if err != nil {
        panic(err)
}

err = xml.Unmarshal(content, &hops)
if err != nil {
        panic(err)
}

The structure will now be populated with values from the XML file. Magical, isn't it?

Conclusion

I have only used tested Golang for a working day approximately, and I like it. I used to use Python for all kind of fast prototyping, but I think I will consider Golang the next time.

What I really like in comparison with Python are:

  • The fact that it compiles to a single binary is really nice, especially when you cross compile to a different architecture.
  • No need for interpreter
  • Static types! Dynamic type languages makes my brain hurt

The result can be found on my GitHub [6] account.

Lund Linux Conference 2018

Lund Linux Conference 2018

It is just two weeks from now til the Lund Linux Conference (LLC) [1] begins! LLC is a two-day conference with the same layout as the bigger Linux conferences - just smaller, but just as nice.

There will be talks about PCIe, The serial device bus, security in cars and a few more topics. My highlights this year is to hear about the XDP (eXpress Data Path) [2] to get really fast packet processing with standard Linux. For the last six months, XDP has had great progress and is a technically cool feature.

Here we are back in 2017:

/media/lund-linuxcon-2018.jpg

ath10k QCA6584 and Wireless network stack

ath10k QCA6584 and Wireless network stack

ATH10K is the mac80211 wireless driver for Qualcomm Atheros QCA988x family of chips, and I'm currently working [1] with the QCA6584 chip which is an automotive graded radio chip with PHY support for the abgn+ac modes. The connection interface to the chip is SDIO which is hardly supported for now, but my friend and kernelhacker, Erik Strömdahl [2] , has got his hands dirty and is currently working on it. There has been some progress, the chip now able to scan, connect, send and receive data. There is still some issues with the link speed but that is coming.

He is also the reason for why I got interested in the network part of the kernel which is quite... big.

Even only the wireless networking subsystem is quite big, and the first you meet when you start to dig is a bunch of terms thrown up in your face. I will try to briefly describe a few of these terms that is fundamental for wireless communication.

In this post will discuss the right side of the this figure:

/media/wireless-stack.png

IEEE 802.11

We will see 802.11 a lot of times, so the first thing is to know where these numbers comes from. IEEE 802.11 is a set of specifications for implementation of wireless networking over several frequency bands. The specifications cover layer 1 (Physical) and layer 2 (Data link) of the OSI model [3].

The Linux kernel MAC subsystem register ieee80211 compliant hardware device with

int ieee80211_register_hw(struct ieee80211_hw *hw)

found in .../net/mac80211/main.c

The Management Layer (MLME)

One more thing that we need to cover is the management layer, since all other layers somehow depend on it.

There are three components in the 802.11 management architecture: - The Physical Layer Management Entity (PLME) - The System Management Entity (SME) - The MAC Layer Management Entity (MLME)

The Management layer assist you in several ways. For instance, it handle things such as scanning, authentication, beacons, associations and much more.

Scanning

Scanning is simply looking for other 802.11 compliant devices in the air. There are two types of scanning; passive and active.

Passive scanning

When performing a passive scanning, the radio is listening passively for beacons, without transmitting packages, as it moves from channel to channel and records all devices that it receives beacons from. Higher frequency bands in the ieee802.11a standard does not allow to transmit anything unless you have heard an Access Point (AP) beacon. Passive scanning is therefore the only way to be aware of the surroundings.

Active scanning

Active scanning on the other hand, is transmitting Probe Request (IEEE80211_STYPE_PROBE_REQ) management packets. This type of scanning is also walking from channel to channel, sending these probe requests management packet for each channel.

These requests is handled by ieee80211_send_probe_req() in .../net/mac80211/util.c:

void ieee80211_send_probe_req(struct ieee80211_sub_if_data *sdata,
                  const u8 *src, const u8 *dst,
                  const u8 *ssid, size_t ssid_len,
                  const u8 *ie, size_t ie_len,
                  u32 ratemask, bool directed, u32 tx_flags,
                  struct ieee80211_channel *channel, bool scan)

Authentication

The authentication procedure sends a management frame of a authentication type (IEEE80211_STYPE_AUTH). There is not only one type of authentication but plenty of them. The ieee80211 specification does only specify one mandatory authentication type; the Open-system authentication (WLAN_AUTH_OPEN). Another common authentication type is Shared key authentication (WLAN_AUTH_SHARED_KEY).

These management frames is handled by ieee80211_send_auth() in .../net/mac80211/util.c:

void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata,
             u16 transaction, u16 auth_alg, u16 status,
             const u8 *extra, size_t extra_len, const u8 *da,
             const u8 *bssid, const u8 *key, u8 key_len, u8 key_idx,
             u32 tx_flags)

Open system authentication

This is the most simple type of authentication, all clients that request authentication will be authenticated. No security is involved at all.

Shared key authentication

In this type of authentication the client and AP is using a shared key, also known as Wired Equivalent Privacy (WEP) key.

Association

The association is started when the station sends management frames of the type IEEE80211_STYPE_ASSOC_REQ. In the kernel code this is handled by ieee80211_send_assoc() in .../net/mac80211/mlme.c

static void ieee80211_send_assoc(struct ieee80211_sub_if_data *sdata)

Reassociation

When the station is roaming, i.e. moving between APs within an ESS (Extended Service Set), it also sends a reassociation request to a new AP of the type IEEE802_STYPE_REASSOC_REQ. Association and reassociation has so much in common that it is both handled by ieee80211_send_assoc().

MAC (Medium Access Control)

All ieee80211 devices needs to implement the Management Layer (MLME), but the implementation could be in device hardware or software. These types of devices are divided into Full MAC device (hardware implementation) and Soft MAC device (software implementation). Most devices today are soft MAC devices.

The MAC layer can be further broken down into two pieces: Upper MAC and Lower MAC. The upper part of the MAC handle the management aspect (all that we covered in the MLME section above), and the lower part handle the time critical operations such as ACK:ing received packets.

Linux does only handle the upper part of MAC, the lower part is operated in device hardware. What we can see in the figure is that the MAC layer is separating data packets from configuration/management packets. The data packets is forwarded to the network device and will travel the same path through the network layer as data packets from all other type of network devices.

The Linux wireless subsystem consists of two major parts, where this, mac80211, is one of them. cfg80211 is the other major part.

CFG80211

cfg80211 is a configuration management service for mac80211 compliant devices. Both Full MAC and Soft MAC devices needs to implement operations to be compatible with the cfg80211 configuration interface in order to let userspace application to configure the device.

The configuration may be done with on of two interfaces, wext and nl80211.

Wireless Extension, WEXT (Legacy)

This is the legacy and ugly way to configure wireless devices. It is still supported only for backward compatibility reasons. Users of this configuration interface are wireless-tools (iwconfig, iwlist).

nl80211

nl80211 on the other hand, is a new netlink interface intended to replace the Wireless Extension (wext) interface. Users of this interface is typically iw and wpa_supplicant.

Conclusion

The whole network stack of the Linux kernel is really complex and optimized for high throughput with low latencies. In this post we only covered what support for wireless devices has complemented the stack with, which is mainly the mac80211 layer for handle all device management, and cfg80211 layer to configure the MAC layer. Packets to wireless devices is divided into data packets and configuration/managment packets. The data packets follow the same path as for all network devices, and the management packets goes to the cfg80211 layer.

Linux driver for PhoenixRC adapter

Linux driver for PhoenixRC adapter

Update: Michael Larabel on Phoronix has written a post [3] about this driver. Go ahead and read it as well!

A few years ago I used to build multirotors, mostly quadcopters and tricopters. It is a fun hobby, both building and flying is incredible satisfying. The first multirotors i built was nicely made with CNC cutted details. They looked really nice and robust. However, with more than 100+ crashes under the belt, the last ones was made out of sticks and a food box. Easy to repair and just as fun to fly.

This hobby requires practice, and even if the most fun way to practice is by flying, it is really time consuming. A more time efficient way to practice is by a simulator, so I bought PhoenixRC [1], which is a flight simulator. It comes with an USB adapter that let you connect and fly with your own RC controller. I did not run the simulator so much. PhoenixRC is a Windows software and there was no driver for the adapter for Linux. The only instance of Windows I had was on a separate disk that layed on the shelf, but switching disk on your laptop each time you want to fly is simply not going to happened.

This new year eve (2017), my wife became ill and I got some time for my own. Far down into a box I found the adapter and plugged it into my Linux computer. Still no driver.

Reverse engineering the adapter

The reverse engineering was quite simple. It turns out that the adapter only has 1 configuration, 1 interface and 1 endpoint of in-interrupt type. This simply means that it only has an unidirectional communication path, initiated by sending an interrupt URB (USB Request Block) to the device. If you are not familiar with what configurations, interfaces and endpoints are in terms of USB, please google the USB standard specification.

The data from our URB was only 8 bytes. After some testing with my RC controller I got the following mapping between data and the channels on the controller:

data[0] = channel 1
data[1] = ? (Possibly a switch)
data[2] = channel 2
data[3] = channel 3
data[4] = channel 4
data[5] = channel 5
data[6] = channel 6
data[7] = channel 7

So I created a device driver that registered an input device with the following events:

Channel Event
1 ABS_X
2 ABS_Y
3 ABS_RX
4 ABS_RY
5 ABS_RUDDER
6 ABS_THROTTLE
7 ABS_MISC

Using a simulator

Heli-X [2] is an excellent cross platform flight simulator that runs perfect on Linux. Now I have spent several hours with a Goblin 700 Helicopter and it is just as fun as I rembembered.

/media/phoenixrc.jpg

Available in Linux kernel 4.17

Of course all code is submitted to the Linux kernel and should be merged in v4.17.

Patch changelogs with git-notes

Patch changelogs with git-notes

Git notes [1] is a neat function that has been around since v1.6.6. Notes is a kind of metadata that belongs to a certain commit but is stored separately (different git object) from the commit itself. The fact that it is a separate git object is important, it will therefor keep the commit hash intact, and as a bonus - it has its own diff log.

Basic usage

git-notes has support for add, append, copy edit, list, prune, remove and show subcommands. These subcommands is rather self-explanatory, so I will not describe them any further.

add and edit fits allmost all my needs, so I will stick to them.

First, create a git repository

mkdir gitnotes
cd gitnotes/
git init
echo testfile > test
git add test
git commit --signoff -m "Initial commit"
echo add-more-stuff >> test
git commit --signoff -a -m "Feature X"

We now have two commits:

commit bce22657cb5fd8353d43537e6b64422a8ce1107d (HEAD -> master)
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Sun Feb 11 21:56:17 2018 +0100

    Feature X

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

commit 93726c3f6bf5c95faf969f040966b15d58d6fd0b
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Sun Feb 11 21:55:21 2018 +0100

    Initial commit

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

Add a note

Add a note is simple:

git notes add -m "v2: update locking handling for Feature X"

We now have a note appended to our git log HEAD -1

commit bce22657cb5fd8353d43537e6b64422a8ce1107d (HEAD -> master)
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Sun Feb 11 21:56:17 2018 +0100

    Feature X

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>

Notes:
    v2: update locking handling for Feature X

Note that the SHA, bce22657cb5fd8353d43537e6b64422a8ce1107d, has not changed.

Edit a note

Edit a note is as simple as adding one:

git notes edit HEAD

When to use notes?

The first field of application is obvious, keep notes about individual commits. The kind of notes I usually use is comments during log reviews.

For the last year I have been using notes for a more practical use; patch changelog history, which is an incredible powerful but undocumented (try to google it yourself) area of use for notes.

Patch format and changelogs

When talking about patches, the --- marker line serves the essential purpose of marking for patch handling tools where the commit message ends. One good use for the additional comments after the --- marker is for patch changelogs, which describe what has changed between the v1 and v2 version of the patch.

Rebase and amend

When working with patches, it is not unusual that it will take a few iterations until the patchset is in a ready-to-merge state. The way to rework patches is with an interactive rebase and amending.

A note is linked to a certain git object and this will be a problem with rebase and amend since it will overwrite that git object. With that said; if you amend your commit, the notes will disappear... unless... you configure git to rewrite the note upon amend and rebase.

From git-notes(1):

notes.rewrite.<command>

When rewriting commits with <command> (currently amend or rebase) and this variable is set to true, Git automatically copies your notes from the original to the rewritten commit. Defaults to true, but see notes.rewriteRef below.

notes.rewriteRef

When copying notes during a rewrite, specifies the (fully qualified) ref whose notes should be copied. The ref may be a glob, in which case notes in all matching refs will be copied. You may also specify this configuration several times.

Does not have a default value; you must configure this variable to enable note rewriting. Set it to refs/notes/commits to enable rewriting for the default commit notes.

This setting can be overridden with the GIT_NOTES_REWRITE_REF environment variable, which must be a colon separated list of refs or globs.

The manpage simply tells you to add these lines to your ~/.gitconfig or .git/config

[notes "rewrite"]
    amend = true
    rebase = true
[notes]
    rewriteRef = refs/notes/commits

To make amend and rebase rewrite the notes.

Generate patches

The best part is that git format-patch --notes will generate patches with appended notes after the --- marker.

For example: git format-patch --notes -1 -v2 will generate v2-0001-Feature-X.patch:

From bce22657cb5fd8353d43537e6b64422a8ce1107d Mon Sep 17 00:00:00 2001
From: Marcus Folkesson <marcus.folkesson@gmail.com>
Date: Mon, 19 Feb 2018 21:42:45 +0100
Subject: [PATCH v2] Feature X

Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
---

Notes:
    v2: update locking handling for Feature X

 test | 1 +
 1 file changed, 1 insertion(+)

diff --git a/test b/test
index 2691857..066c37d 100644
--- a/test
+++ b/test
@@ -1 +1,2 @@
 testfile
+add-more-stuff
--
2.15.1

This makes the patch workflow real smooth instead of manually track and update your patch changelogs.

My typical workflow

/media/git-patch-workflow.png

Other tricks

Note logs

Even notes has a log (all objects in refs/XXX has), and it could be useful to see how your notes has evolved over time:

git log -p notes/commits

Push and fetch notes

The notes itself is stored as git object in .git/refs/notes. A thing to keep in mind is that refs/notes is not pushed by default, but that is not a problem - my opinion is that notes should be local only anyway.

If you really want to push your notes, you can to that with

git push <remote> refs/notes/*

and fetch them with

git fetch origin refs/notes/*:refs/notes/*

Buildroot: out-of-tree builds

Buildroot: out-of-tree builds

Like the Linux kernel, Buildroot supports building out-of-tree in a very similiar way. This could be very useful when you have multiple configurations in the same Buildroot repository that you want to build without interference.

Consider the following example:

cd buildroot/
mkdir -p ../outputs/device{1,2}

make O=../outputs/device1 menuconfig
make O=../outputs/device1

make O=../outputs/device2 menuconfig
make O=../outputs/device2

Each output has its own .config so you may change the build configurations independently.

The big benefit compared with git worktree [1] or other multiple-instances-of-the-same-repository-approaches is that the dl directory is shared (without need to specify BR2_DL_DIR [2]) among all build directories.

The feature is documented in the Buildroot manual [3].

get_maintainers and git send-email

get_maintainers and git send-email

Many with me prefer email as communication channel, especially for patches. Github, Gerrit and all other "nice" and "userfriendly" tools that tries to "help" you to manage your submissions does not simply fit my workflow.

As you may already know, all patches to the Linux kernel is by email. scripts/get_maintainer.pl (see [1] for more info about the process) is a handy tool that takes a patch as input and gives back a bunch of emails addresses. These email addresses is usually passed to git send-email [2] for submission.

I have used various scripts to make the output from get_maintainer.pl to fit git send-email, but was not completely satisfied until I found the --to-cmd and --cc-cmd parameters to git send-email:

--to-cmd=<command>
 Specify a command to execute once per patch file which should generate patch file specific "To:" entries. Output of this command must be single email address per line. Default is the value of sendemail.tocmd configuration value.
--cc-cmd=<command>
 Specify a command to execute once per patch file which should generate patch file specific "Cc:" entries. Output of this command must be single email address per line. Default is the value of sendemail.ccCmd configuration value.

I'm very pleased with these parameters. All I have to to is to put these extra lines into my ~/.gitconfig (or use git config):

[sendemail.linux]
    tocmd ="`pwd`/scripts/get_maintainer.pl --nogit --nogit-fallback --norolestats --nol"
    cccmd ="`pwd`/scripts/get_maintainer.pl --nogit --nogit-fallback --norolestats --nom"

To submit a patch, I just type:

git send-email --identity=linux ./0001-my-fancy-patch.patch

and let --to and --cc to be populated automatically.

OOM-killer

OOM-killer

When the system is running out of memory, the Out-Of-Memory (OOM) killer picks a process to kill based on the current memory footprint. In case of OOM, we will calculate a badness score between 0 (never kill) and 1000 for each process in the system. The process with the highest score will be killed. A score of 0 is reserved for unkillable tasks such as the global init process (see [1]) or kernel threads (processes with PF_KTHREAD flag set).

/media/oomkiller.jpg

The current score of a given process is exposed in procfs, see /proc/[pid]/oom_score, and may be adjusted by setting /proc/[pid]/oom_score_adj. The value of oom_score_adj is added to the score before it is used to determine which task to kill. The value may be set between OOM_SCORE_ADJ_MIN (-1000) and OOM_SCORE_DJ_MAX (+1000). This is useful if you want to guarantee that a process never is selected by the OOM killer.

The calculation is simple (nowadays), if a task is using all its allowed memory, the badness score will be calculated to 1000. If it is using half of its allowed memory, the badness score is calculated to 500 and so on. By setting oom_score_adj to -1000, the badness score sums up to <=0 and the task will never be killed by OOM.

There is one more thing that affects the calculation; if the process is running with the capability CAP_SYS_ADMIN, it gets a 3% discount, but that is simply it.

The old implementation

Before v2.6.36, the calculation of badness score tried to be smarter, besides looking for the total memory usage (task->mm->total_vm), it also considered: - Whether the process creates a lot of children - Whether the process has been running for a long time, or has used a lot of CPU time - Whether the process has a low nice value - Whether the process is privileged (CAP_SYS_ADMIN or CAP_SYS_RESOURCE set) - Whether the process is making direct hardware access

At first glance, all these criteria looks valid, but if you think about it a bit, there is a lot of pitfalls here which makes the selection not so fair. For example: A process that creates a lot of children and consumes some memory could be a leaky webserver. Another process that fits into the description is your session manager for your desktop environment which naturally creates a lot of child processes.

The new implementation

This heuristic selection has evolved over time, instead of looking on mm->total_vm for each task, the task's RSS (resident set size, [2]) and swap space is used instead. RSS and Swap space gives a better indication of the amount that we will be able to free if we chose this task. The drawback with using mm->total_vm is that it includes overcommitted memory ( see [3] for more information ) which is pages that the process has claimed but has not been physically allocated.

The process is now only counted as privileged if CAP_SYS_ADMIN is set, not CAP_SYS_RESOURCE as before.

The code

The whole implementation of OOM killer is located in mm/oom_kill.c. The function oom_badness() will be called for each task in the system and returns the calculated badness score.

Let's go through the function.

unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
              const nodemask_t *nodemask, unsigned long totalpages)
{
    long points;
    long adj;

    if (oom_unkillable_task(p, memcg, nodemask))
        return 0;

Looking for unkillable tasks such as the global init process.

p = find_lock_task_mm(p);
if (!p)
    return 0;

adj = (long)p->signal->oom_score_adj;
if (adj == OOM_SCORE_ADJ_MIN ||
        test_bit(MMF_OOM_SKIP, &p->mm->flags) ||
        in_vfork(p)) {
    task_unlock(p);
    return 0;
}

If proc/[pid]/oom_score_adj is set to OOM_SCORE_ADJ_MIN (-1000), do not even consider this task

points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
    atomic_long_read(&p->mm->nr_ptes) + mm_nr_pmds(p->mm);
task_unlock(p);

Calculate a score based on RSS, pagetables and used swap space

if (has_capability_noaudit(p, CAP_SYS_ADMIN))
    points -= (points * 3) / 100;

If it is root process, give it a 3% discount. We are no mean people after all

adj *= totalpages / 1000;
points += adj;

Normalize and add the oom_score_adj value

return points > 0 ? points : 1;

At last, never return 0 for an eligible task as it is reserved for non killable tasks

}

Conclusion

The OOM logic is quite straightforward and seems to have been stable for a long time (v2.6.36 was released in october 2010). The reason why I was looking at the code was that I did not think the behavior I saw when experimenting corresponds to what was written in the man page for oom_score. It turned out that the manpage was not updated when the new calculation was introduced back in 2010.

I have updated the manpage and it is available in v4.14 of the Linux manpage project [4].

commit 5753354a3af20c8b361ec3d53caf68f7217edf48
Author: Marcus Folkesson <marcus.folkesson@gmail.com>
Date:   Fri Nov 17 13:09:44 2017 +0100

    proc.5: Update description of /proc/<pid>/oom_score

    After Linux 2.6.36, the heuristic calculation of oom_score
    has changed to only consider used memory and CAP_SYS_ADMIN.

    See kernel commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10.

    Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
    Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>

diff --git a/man5/proc.5 b/man5/proc.5
index 82d4a0646..4e44b8fba 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -1395,7 +1395,9 @@ Since Linux 2.6.36, use of this file is deprecated in favor of
 .IR /proc/[pid]/oom_score_adj .
 .TP
 .IR /proc/[pid]/oom_score " (since Linux 2.6.11)"
-.\" See mm/oom_kill.c::badness() in the 2.6.25 sources
+.\" See mm/oom_kill.c::badness() in pre 2.6.36 sources
+.\" See mm/oom_kill.c::oom_badness() after 2.6.36
+.\" commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10
 This file displays the current score that the kernel gives to
 this process for the purpose of selecting a process
 for the OOM-killer.
@@ -1403,7 +1405,16 @@ A higher score means that the process is more likely to be
 selected by the OOM-killer.
 The basis for this score is the amount of memory used by the process,
 with increases (+) or decreases (\-) for factors including:
-.\" See mm/oom_kill.c::badness() in the 2.6.25 sources
+.\" See mm/oom_kill.c::badness() in pre 2.6.36 sources
+.\" See mm/oom_kill.c::oom_badness() after 2.6.36
+.\" commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10
+.RS
+.IP * 2
+whether the process is privileged (\-);
+.\" More precisely, if it has CAP_SYS_ADMIN or (pre 2.6.36) CAP_SYS_RESOURCE
+.RE
+.IP
+Before kernel 2.6.36 the following factors were also used in the calculation of oom_score:
 .RS
 .IP * 2
 whether the process creates a lot of children using
@@ -1413,10 +1424,7 @@ whether the process creates a lot of children using
 whether the process has been running a long time,
 or has used a lot of CPU time (\-);
 .IP *
-whether the process has a low nice value (i.e., > 0) (+);
-.IP *
-whether the process is privileged (\-); and
-.\" More precisely, if it has CAP_SYS_ADMIN or CAP_SYS_RESOURCE
+whether the process has a low nice value (i.e., > 0) (+); and
 .IP *
 whether the process is making direct hardware access (\-).
 .\" More precisely, if it has CAP_SYS_RAWIO

Embedded Linux course in Linköping

Embedded Linux course in Linköping

I tech in our Embedded Linux course on a regular basis, this time in Linköping. It's a fun course with interesting labs where you will write your own linux device driver for a custom board.

The course itself is quite moderate, but with talented participants, we easily slip over to more interesting things like memory management, the MTD subsystem, how ftrace works internally and how to use it, different contexts, introduce perf and much more.

This time was no exception.

/media/embedded-linux-course.jpg