MMAP memory between kernel and userspace
Allocate memory in kernel space and then let the userspace map it to their virtual address space sounds like an easy task, and sure it's.
There are just a few things that is good to know about page mapping.
The MMU (Memory Management Unit) contains page tables with entries for mapping between virtual and physical addresses. These pages is the smallest units that the MMU deals with. The size of a page is given by the PAGE_SIZE macro in asm/page.h and is typically 4k for most (32 bit) architectures.
There is a few more useful macros in asm/page.h:
- PAGE_SHIFT: How many steps we should shift to left to get a PAGE_SIZE
- PAGE_SIZE: Size of a page, defined as (1 << PAGE_SHIFT).
- PAGE_ALIGN(len): Will round up the length to the closest alignment of PAGE_SIZE.
How does mmap(2) work?
Every page table entry has a bit that tells us if the entry is valid in supervisor mode (kernel mode) only. And sure, all memory allocated in kernel space will have this bit set. What the mmap(2) system call do is simply creating a new page table entry with a different virtual address that points to the same physical memory page. The difference is that this supervisor-bit's not set for this entry.
This let the userspace access the memory as if it was a part of the application itself - and now it is. The kernel is not involved in those accesses at all, so there are no penalites for accessing those pages.
Magic? Kind of.
The magic is called remap_pfn_range().
What remap_pfn_range() does is just essentially to update the process's specific page table with these new entries.
Example, please
Allocate memory
As we already know, the smallest unit that the MMU handle is the size of PAGE_SIZE and the mmap(2) only works with full pages. Even if you just want to share only 100 bytes, a whole page frame will be remapped and must therefor be allocated in the kernel. The allocated memory must also be page aligned.
__get_free_pages()
One way to allocate pages is with __get_free_pages():
1unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order)
gft_mask is commonly set to GFP_KERNEL in process/kernel context and GFP_ATOMIC in interrupt context. The order is the number of pages to allocate expressed in 2^order.
For example:
1u8 *vbuf = __get_free_pages(GFP_KERNEL, size >> PAGE_SHIFT);
Allocated memory is freed with __free_pages().
vmalloc()
A more common (and preferred) way to allocate virtual continuous memory is to use vmalloc(). vmalloc() will allways allocate whole set of pages, no matter what. This is exactly what we want!
Read about vmalloc() in kmalloc(9):
Allocates size bytes, and returns a pointer to the allocated memory. size becomes page aligned by vmalloc(), so the smallest allocated amount is 4kB. The allocated pages are mapped to the virtual memory space behind the 1:1 mapped physical memory in the kernel space. Behind every vmalloc'ed area there is at least one unmapped page. So writing behind the end of a vmalloc'ed area will not result in a system crash, but in a segmentation violation in the kernel space. Because memory fragmentation isn't a big problem for vmalloc(), vmalloc() should be used for huge amounts of memory.
Allocated memory is freed with vfree().
alloc_page()
If you need only one page, alloc_page() will give you that. If this is the case, insead of using remap_pfn_range(), vm_insert_page() will do the work you for you. Notice that vm_insert_page() apparently only works on order-0 (single-page) allocation. So if you want to allocate N pages, you will hace to call vm_insert_page() N times.
Now some code
Allocation
1 priv->a_size = ATTRIBUTE_N * ATTRIBUTE_SIZE;
2
3 /* page align */
4 priv->a_size = PAGE_ALIGN(priv->a_size);
5 priv->a_area =vmalloc(priv->a_size);
file_operations.mmap
1 static int scan_mmap (struct file *file, struct vm_area_struct *vma)
2 {
3 struct mmap_priv *priv = file->private_data;
4 unsigned long start = vma->vm_start;
5 unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
6 unsigned long page;
7 size_t size = vma->vm_end - vma->vm_start;
8 if (size > priv->a_size)
9 return -EINVAL;
10 page = vmalloc_to_pfn((void *)priv->a_area);
11 if (remap_pfn_range(vma, start, page, priv->a_size, PAGE_SHARED))
12 return -EAGAIN;
13 vma->vm_flags |= VM_RESERVED; /* avoid to swap out this VMA */
14 return 0;
15 }