16 March, 2017

A faster memory allocator - rpmalloc

Repository: https://github.com/rampantpixels/rpmalloc

We just released our new memory allocator, rpmalloc, to the public domain. It is designed to be faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~1300 lines of C code. And it is in the public domain - use it without any restrictions.

Performance

Contained in the repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.

Below is an example performance comparison chart of rpmalloc and other popular allocator implementations.

This example chart shows benchmarks results when running the benchmark suite on a 8-core Windows 10 machine (crt is the Microsoft standard c runtime malloc implementation).

Implementation details

The allocator is based on 64k alignment, where all runs of memory pages are mapped to 64KiB boundaries. On Windows this is automatically guaranteed by the VirtualAlloc granularity, and on mmap systems it is achieved by atomically incrementing the address where pages are mapped to. By aligning to 64KiB boundaries the free operation can locate the header of the memory block without having to do a table lookup (as tcmalloc does) by simply masking out the low 16 bits of the address.

Memory blocks are divided into three categories. Small blocks are [16, 2032] bytes, medium blocks (2032, 32720] bytes, and large blocks (32720, 2097120] bytes. The three categories are further divided in size classes.

Small blocks have a size class granularity of 16 bytes each in 127 buckets. Medium blocks have a granularity of 512 bytes, 60 buckets. Large blocks have a 64KiB granularity, 32 buckets. All allocations are fitted to these size class boundaries (an allocation of 34 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.

Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans for each number of pages in the span configuration. The fourth level is a global list of free spans for each number of pages in the span configuration. Each cache level can be configured to control memory usage versus performance.

Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread.

Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. Each cache level can be configured to control memory usage versus performance.

Support us

If you find this useful, please consider our Patreon - https://www.patreon.com/rampantpixels

09 August, 2016

Code coverage

A powerful tool that will help you write better quality code is measuring code coverage. This post will explain the basics how I use this in my foundation library project (https://github.com/rampantpixels/foundation_lib) written in C, but the principles apply to other languages and toolchains as well. All code I publish is released under public domain and free to use for any purpose.

Code coverage measures how many times each line of code is executed while running your test suite. By looking at the lines that have not been executed you can find potential errors in your code or learn which test cases that needs to be improved in order to test all code paths. I use the free service at https://codecov.io to visualize the results.

For an example, look at https://goo.gl/TtNJl1 to see the coverage results of a module in the library. The lines that have been executed at least once are marked in green, the lines that have never been executed are marked in red, and the lines that are not generating any instructions are unmarked.

Coverage is pretty good, but in the function profile_end_block there is a while loop marked with the comment "Walk sibling list backwards" that is never executed.



Is this dead code that can be removed (as in, will this case never happen due to logic in other parts of the code) or are the test cases lacking the proper tests? If the latter, we have no idea if the code actually works or not, since it has not been executed. Finding these untested parts of the code is that goal of code coverage analysis.

If you have an already existing code base it might seem a daunting task to start using code coverage as you will most likely find many parts of the code which is not covered. But once you find an fix the first bug in one of these dark areas of the code it will be worth it! It's gamification at its best, you will work hard to reach that elusive 100% coverage, and your code quality will rise quickly.

In order to generate the code coverage measurements I use the built-in functionality of the clang compiler by passing the "--coverage" argument to compiler and linker. When the code is compiled a .gcno metadata file is generated by the compiler for each object file, and once the final test case binary is executed it will generate a corresponding .gcda coverage data file.



These files can then be parsed together with the corresponding source file to find the execution count per line per file using the llvm-cov tool. This tool can output a gcov text file with the wanted data (similar/compatible to what the gcc toolchain also uses).

I wrote two python scripts to do the job invoking llvm-cov, parsing these gcov files and upload the results to codecov.io. Read a gcov file at https://goo.gl/bH4AUm, invoke llvm-cov and build a report for codecov.io at https://goo.gl/bH4AUm.

As a closing remark also note that code coverage can be used with profile guided optimization by feeding the coverage data from your final executable back to the compiler. But this is a topic for another post.

31 July, 2014

Heroines


A good friend of mine is creating a touchpad game and chose to change the main character from a passive anonymous guy to an active girl. Awesome!

I love the design and its a great example on how you can create an obviously female character without resorting to using stereotypic design choices such as broken armor.

Take the time to check out the game in it’s current state at gnbw.com

(Reblogged from my tumblr: maniccoder.tumblr.com)

07 June, 2014

Rampant Kittens

Just promoting myself a bit here. We finally managed to get our little block drop game Rampant Kittens published on all our major platforms. Try it out!


iPhone/iPad: iTunes App Store
Android: Google Play
MacOS X: Mac App Store
Windows: Our website

10 May, 2014

Rovio Game Jam

I participated in the 12 hour Rovio Game Jam yesterday together with a friend, and the theme of the jam was "eqality". So we created a game called Agenda about distributing wealth and resources between cities. The end result was good enough to earn us a second place in the competition, and we'll polish it up and release on Android/iOS tablets at some point.

26 April, 2014

SublimeText & clang on Windows

After getting Intel compiler to play nice with SublimeText and SCons on Windows, the next target was getting it to use clang. The LLVM environment on Windows requires the MinGW base package for headers and libs, and I decided to use the MinGW-x64 prebuilt mingw-builds distribution from http://mingw-w64.sourceforge.net/download.php#mingw-builds

However, the headers do not play nice with clang and -Wall, with a bunch of typedef and macro redefinitions. I ended up adding

env.Append( CFLAGS=['-Wno-ignored-attributes', '-Wno-typedef-redefinition', '-Wno-undef',
'-Wno-unknown-pragmas', '-Wno-redundant-decls', '-Wno-shadow'] )

to my SConscript environment setup in order to get it to build, only to be greeted with a bunch of alignmen cast compilation warnings and then multiple definition errors when linking

lib\win64\debug/libfoundation.a(log.o): In function `GetCurrentFiber':
D:\dev\mingw\include/winnt.h:8139: multiple definition of `GetCurrentFiber'
lib\win64\debug/libfoundation.a(main.o):D:\dev\mingw\include/winnt.h:8139: first defined here
lib\win64\debug/libfoundation.a(log.o): In function `GetFiberData':
D:\dev\mingw\include/winnt.h:8140: multiple definition of `GetFiberData'

More to follow

22 April, 2014

Rampant Kittens

My block-drop game Rampant Kittens is now available on Google Play. Give it a shot! https://play.google.com/store/apps/details?id=com.rampantpixels.p1



I'm slowly cleaning up, refactoring and releasing the code for the game as well, available on github. To begin with the platform abstraction foundation: https://github.com/rampantpixels/foundation_lib