Manic Coder

Testing error handling in C

2017-04-23T13:26:00.001+02:00

Writing tests is an useful tool in any programmers toolbox. For C programmers it tends to be slightly more cumbersome compared to other more high level languages. Especially if you are like me and fall in the "handmade" category (see the Handmade Manifesto) and don't just want to use any and every library and framework you can get your hands on.

Even so, I tend to write test suites and measure code coverage (see my other article) to make sure I get as close as possible to execute all potential code paths in my projects. In my foundation codebase I recently hit the wall where I was basically testing everything I could except for one thing. Code handling standard library or system call failures.

In some cases you can provoke a failure in order to test your error handling by sending in invalid values. But this might not always be possible, like if your function makes the standard library call with parameters independent of the arguments of the function you are testing. No matter what your test passes to the function, the standard library call will succeed. It will only fail due to external factors which are impossible or at least very difficult for your test to control. One simple example, using getcwd on POSIX-compliant system to get the path of the current working directory for the process. In my codebase the call looks something like

char*
function_we_want_to_test(int arg) {
 //Some code goes here
 char buffer[MAX_PATH_LEN];
 if (!getcwd(buffer, sizeof(buffer)) {
  //Error handling code goes here
  return 0;
 }
 //Success code path...
}

In this snippet, buffer is a local variable which is independent from whatever arguments the enclosing function has. No matter what my tests sends to the function, the call will most likely succeed. In order to make it fail, the test has to either make sure the test process does not have read access to a part of the working directory path, or has been removed. This could be hard to reproduce, or at least take a lot of custom wrapping code to get right without affecting other tests or be guaranteed to work on other systems.

However, those two cases might very well occur in real use of the code, and I would like to make sure the error handling is working as expected. Ideally the test should be able to provoke the error in a way that can be used for any standard library or system call, and not rely on writing lots of code just to test a single point of failure.

To do this, I came up with a simple solution. I create a mock library which is linked last in the list of libraries when linking the final test executable. The mock library adds a mock/proxy implementation of the standard library function, and use available platform support calls (like dlsym or GetProcAddress) to locate the real entry point of the standard library implementation of the function and depending on a toggle either call the real implementation, or produce the wanted error.

The test code then only has to call a function in the mock library to toggle the wanted error, execute the test, and then toggle off the mock implementation causing the real implementation to be executed in the next call. This can easily be reproduced for other standard library calls in an identical fashion (but with separate mock functions). For getcwd the mock wrappers could look like this:

static bool getcwd_is_mocked;
static char* getcwd_ret;
static int getcwd_err;

void
getcwd_mock(char* ret, int err) {
    getcwd_is_mocked = true;
    getcwd_ret = ret;
    getcwd_err = err;
}

void
getcwd_unmock(void) {
    getcwd_is_mocked = false;
}

char*
getcwd(char* arg0, size_t arg1) {
    if (getcwd_is_mocked) {
        errno = getcwd_err;
        return getcwd_ret;
    }
    static void* real_getcwd = 0;
    if (!real_getcwd)
        real_getcwd = dlsym(RTLD_NEXT, "getcwd");
    return (*(char* (*)(char*, size_t))real_getcwd)(arg0, arg1);
}

The test code could then be:

getcwd_mock(0, EACCESS);
char* ret = function_we_want_to_test(arg);
getcwd_unmock();
// verify that ret is 0
// verify that errno is EACCESS

A couple of caveats:

The test is inherently aware of the implementation to test, and actually testing not the function as in given input results in the expected output, but rather that external conditions affect the expected implementation in the correct way. The function is no longer a black box, and the implementation cannot change without invalidating the test.

Depending on the implementation, it can be hard to verify that the expected error path was entered. Maybe the success path can return the same value? But at least code coverage should show that the error path was executed.

Link order will be important. By placing the mock library last it should resolve any references to the mocked entry points from the standard library.

A faster memory allocator - rpmalloc

2017-03-16T09:24:00.003+01:00

Repository: https://github.com/rampantpixels/rpmalloc

We just released our new memory allocator, rpmalloc, to the public domain. It is designed to be faster than most popular memory allocators like tcmalloc, hoard, ptmalloc3 and others. We also believe the implementation to be easier to read and modify compared to these allocators, as it is a single source file of ~1300 lines of C code. And it is in the public domain - use it without any restrictions.

Performance

Contained in the repository is a benchmark utility that performs interleaved allocations (both aligned to 8 or 16 bytes, and unaligned) and deallocations (both in-thread and cross-thread) in multiple threads. It measures number of memory operations performed per CPU second, as well as memory overhead by comparing the virtual memory mapped with the number of bytes requested in allocation calls. The setup of number of thread, cross-thread deallocation rate and allocation size limits is configured by command line arguments.

Below is an example performance comparison chart of rpmalloc and other popular allocator implementations.

This example chart shows benchmarks results when running the benchmark suite on a 8-core Windows 10 machine (crt is the Microsoft standard c runtime malloc implementation).

Implementation details

The allocator is based on 64k alignment, where all runs of memory pages are mapped to 64KiB boundaries. On Windows this is automatically guaranteed by the VirtualAlloc granularity, and on mmap systems it is achieved by atomically incrementing the address where pages are mapped to. By aligning to 64KiB boundaries the free operation can locate the header of the memory block without having to do a table lookup (as tcmalloc does) by simply masking out the low 16 bits of the address.

Memory blocks are divided into three categories. Small blocks are [16, 2032] bytes, medium blocks (2032, 32720] bytes, and large blocks (32720, 2097120] bytes. The three categories are further divided in size classes.

Small blocks have a size class granularity of 16 bytes each in 127 buckets. Medium blocks have a granularity of 512 bytes, 60 buckets. Large blocks have a 64KiB granularity, 32 buckets. All allocations are fitted to these size class boundaries (an allocation of 34 bytes will allocate a block of 48 bytes). Each small and medium size class has an associated span (meaning a contiguous set of memory pages) configuration describing how many pages the size class will allocate each time the cache is empty and a new allocation is requested.

Spans for small and medium blocks are cached in four levels to avoid calls to map/unmap memory pages. The first level is a per thread single active span for each size class. The second level is a per thread list of partially free spans for each size class. The third level is a per thread list of free spans for each number of pages in the span configuration. The fourth level is a global list of free spans for each number of pages in the span configuration. Each cache level can be configured to control memory usage versus performance.

Each span for a small and medium size class keeps track of how many blocks are allocated/free, as well as a list of which blocks that are free for allocation. To avoid locks, each span is completely owned by the allocating thread, and all cross-thread deallocations will be deferred to the owner thread.

Large blocks, or super spans, are cached in two levels. The first level is a per thread list of free super spans. The second level is a global list of free super spans. Each cache level can be configured to control memory usage versus performance.

Support us

If you find this useful, please consider our Patreon - https://www.patreon.com/rampantpixels

Code coverage

2016-08-09T14:47:00.000+02:00

A powerful tool that will help you write better quality code is measuring code coverage. This post will explain the basics how I use this in my foundation library project (https://github.com/rampantpixels/foundation_lib) written in C, but the principles apply to other languages and toolchains as well. All code I publish is released under public domain and free to use for any purpose.

Code coverage measures how many times each line of code is executed while running your test suite. By looking at the lines that have not been executed you can find potential errors in your code or learn which test cases that needs to be improved in order to test all code paths. I use the free service at https://codecov.io to visualize the results.

For an example, look at https://goo.gl/TtNJl1 to see the coverage results of a module in the library. The lines that have been executed at least once are marked in green, the lines that have never been executed are marked in red, and the lines that are not generating any instructions are unmarked.

Coverage is pretty good, but in the function profile_end_block there is a while loop marked with the comment "Walk sibling list backwards" that is never executed.

Is this dead code that can be removed (as in, will this case never happen due to logic in other parts of the code) or are the test cases lacking the proper tests? If the latter, we have no idea if the code actually works or not, since it has not been executed. Finding these untested parts of the code is that goal of code coverage analysis.

If you have an already existing code base it might seem a daunting task to start using code coverage as you will most likely find many parts of the code which is not covered. But once you find an fix the first bug in one of these dark areas of the code it will be worth it! It's gamification at its best, you will work hard to reach that elusive 100% coverage, and your code quality will rise quickly.

In order to generate the code coverage measurements I use the built-in functionality of the clang compiler by passing the "--coverage" argument to compiler and linker. When the code is compiled a .gcno metadata file is generated by the compiler for each object file, and once the final test case binary is executed it will generate a corresponding .gcda coverage data file.

These files can then be parsed together with the corresponding source file to find the execution count per line per file using the llvm-cov tool. This tool can output a gcov text file with the wanted data (similar/compatible to what the gcc toolchain also uses).

I wrote two python scripts to do the job invoking llvm-cov, parsing these gcov files and upload the results to codecov.io. Read a gcov file at https://goo.gl/bH4AUm, invoke llvm-cov and build a report for codecov.io at https://goo.gl/bH4AUm.

As a closing remark also note that code coverage can be used with profile guided optimization by feeding the coverage data from your final executable back to the compiler. But this is a topic for another post.

Heroines

2014-07-31T16:46:00.004+02:00

A good friend of mine is creating a touchpad game and chose to change the main character from a passive anonymous guy to an active girl. Awesome!

I love the design and its a great example on how you can create an obviously female character without resorting to using stereotypic design choices such as broken armor.

Take the time to check out the game in it’s current state at gnbw.com

(Reblogged from my tumblr: maniccoder.tumblr.com)

Rampant Kittens

2014-06-07T21:03:00.005+02:00

Just promoting myself a bit here. We finally managed to get our little block drop game Rampant Kittens published on all our major platforms. Try it out!

iPhone/iPad: iTunes App Store
Android: Google Play
MacOS X: Mac App Store
Windows: Our website

Rovio Game Jam

2014-05-10T20:30:00.001+02:00

I participated in the 12 hour Rovio Game Jam yesterday together with a friend, and the theme of the jam was "eqality". So we created a game called Agenda about distributing wealth and resources between cities. The end result was good enough to earn us a second place in the competition, and we'll polish it up and release on Android/iOS tablets at some point.

SublimeText & clang on Windows

2014-04-26T22:05:00.000+02:00

After getting Intel compiler to play nice with SublimeText and SCons on Windows, the next target was getting it to use clang. The LLVM environment on Windows requires the MinGW base package for headers and libs, and I decided to use the MinGW-x64 prebuilt mingw-builds distribution from http://mingw-w64.sourceforge.net/download.php#mingw-builds

However, the headers do not play nice with clang and -Wall, with a bunch of typedef and macro redefinitions. I ended up adding

env.Append( CFLAGS=['-Wno-ignored-attributes', '-Wno-typedef-redefinition', '-Wno-undef',
'-Wno-unknown-pragmas', '-Wno-redundant-decls', '-Wno-shadow'] )

to my SConscript environment setup in order to get it to build, only to be greeted with a bunch of alignmen cast compilation warnings and then multiple definition errors when linking

lib\win64\debug/libfoundation.a(log.o): In function `GetCurrentFiber':
D:\dev\mingw\include/winnt.h:8139: multiple definition of `GetCurrentFiber'
lib\win64\debug/libfoundation.a(main.o):D:\dev\mingw\include/winnt.h:8139: first defined here
lib\win64\debug/libfoundation.a(log.o): In function `GetFiberData':
D:\dev\mingw\include/winnt.h:8140: multiple definition of `GetFiberData'

More to follow

Rampant Kittens

2014-04-22T22:06:00.001+02:00

My block-drop game Rampant Kittens is now available on Google Play. Give it a shot! https://play.google.com/store/apps/details?id=com.rampantpixels.p1

I'm slowly cleaning up, refactoring and releasing the code for the game as well, available on github. To begin with the platform abstraction foundation: https://github.com/rampantpixels/foundation_lib

SCons & SublimeText - Intel compiler

2014-04-13T20:02:00.000+02:00

In order to get SCons to play nice with the Intel compiler suite version 14 (or later) on Windows it seems the intelc.py tool definition in SCons needs to be updated a bit. The layout of keys in the registry have changed and the directory layout as well.

The registry keys are now chained, so first it needs to look in

HKLM/Software/Wow6432Node/Intel/Suites/<version>/Defaults/C++/<abi>

to find the UUID of the instance to use, then look in

HKLM/Software/Wow6432Node/Intel/Suites/<version>/<uuid>/C++/<abi>

for the various values to use for locating compiler binaries. I modified the abi strings used in the SCons intelc.py tools match the EM64T_NATIVE and IA32 strings in the 14.0 compiler release, and rewrote the get_all_compiler_versions and get_intel_registry_value functions to use this chained default -> uuid registry lookup. You can get the final modified intelc.py tool from http://www.rampantpixels.com/pub/intelc.py.

This together with the default and mslib tools (basically passing tools=['default','intelc','mslib'] to the Environment construction) allows my SCons construction environment to compile my foundation library with the Intel compiler suite within SublimeText 3. Great success!

SublimeText & SCons - part one

2014-04-11T12:58:00.003+02:00

First step in my goal to make a decent development environment on Windows using SublimeText 3, SCons and clang (or failing that, Intel compiler suite) is to make SublimeText able to use SCons as a build system.

As a sandbox I will use my public domain foundation library available at https://github.com/rampantpixels/foundation_lib (you can find the SCons build scripts in there for reference).

To begin with, I create a SublimeText project and add the main source directory as a folder in the project. I save the project in the build subdirectory as I hate cluttering the main directory with project files. I also add a project local build system defintion for scons:

{
 "folders":
 [
  {
   "follow_symlinks": true,
   "path": "..\\foundation"
  }
 ],
 "build_systems":
 [
  {
   "name": "SCons",
   "shell_cmd": "scons",
   "working_dir": "${project_path:${folder:${file_path}}}\\..",
  }
 ]
}

Trying it out by selecting SCons as the build system and starting a build results in

scons: Reading SConscript files ...
Building on win32 (x86_64) for  ()
Building RELEASE configuration
scons: done reading SConscript files.
scons: Building targets ...
cl /Fobuild\scons\release-\tools\bin2hex\main.obj /c tools\bin2hex\main.c /nologo /DBUILD_RELEASE=1 /I. /Itools
scons: *** [build\scons\release-\tools\bin2hex\main.obj] error : (2, 'CreateProcess', 'The system cannot find the file specified.')

So now I need to modify my SCons build scripts to use clang and/or Intel toolchain. I decided to go with Intel since I already had that installed on my system, and adding intelc as tools to the SCons environment creation should do the trick. However, the Intel tool setup and version detection is really outdated in SCons, so next step will be to modify this to support version 14.

Abandon Visual Studio

2014-04-10T20:01:00.003+02:00

I've got a new pet project - abandon Visual Studio for Windows development and instead setup a decent environment using SublimeText, SCons and clang and/or Intel compiler suite. I will post any progress and useful scripts/configs/tools here.

Agile - do you speak it? A late night rant.

2011-07-05T23:42:00.001+02:00

Agile development methodology have made quite an impact in game development companies, but my experience have been that the adoption is severely flawed. Management usually sees SCRUM, kanban and other agile methods as a silver bullet that will solve all their planning problems and save them investing the time and effort understanding what it is all about.

In my mind, agile development has a lot in common with theoretical physics. Consider the Heisenberg uncertainty principle. It states that there is a relationship between the accuracy with which you can measure the velocity and the position of a particle. If you want to know exactly where a particle is, you cannot know how fast it is moving (and vice versa). Agile development realizes the same principle applies to project planning and management. You can not know both the scope and the time of a project! However, the corresponding Planck constant remains to be determined :)

In agile practice, this usually means that you set a fixed deadline (measure the time property), and let the scope vary. By ordering all tasks by priority, and making sure all tasks are done before moving on to the next task (with some definition of "done"), you make sure that by the time you reach the deadline, you have as many features complete in your software as you possibly can within the given time frame.

And this is where most game companies fail. They think that they can still wave their magic wand, set a fixed deadline and a fixed scope, and their "agile" method will somehow fix everything and guarantee the production meets the deadlines. And in the end it results in a massive crunch period since the guesstimates at the project start were way off.

It's time to wake up and smell the coffee. You cannot cherry pick all the good bits and ignore the (potential) drawbacks. If you plan on using an agile development method, please make sure you invest the time to fully understand the implications of that choice. Blindly trusting the agile fairy to fix your production and resorting to crunching for months and months because you failed to understand the basic principles is not going to solve anything.

Concurrency and lock-free basics

2011-06-08T12:55:00.000+02:00

Concurrency is the property where multiple computations are executed simultaneously. In the domain of modern game programming, that usually means having multiple threads running on multiple cores, potentially located on multiple physically separated processors. One way to ensure progression in the executing threads is to avoid critical sections and locks on shared resources, and design your code to use lock-free algorithms.

The definition of the term "lock-free" has changed over time and is now used to denote the property of an algorithm where the suspension of one thread will not block the potential progress of other threads. It is important to note though that lock-free does NOT imply wait-free. A thread can still stall in a lock-free algorithm, but only if at least one other thread has made some progress. In comparison, using a lock on a shared resource will stall all other threads trying to access that resource if the thread holding the lock is suspended during that critical section.

Lock-free algorithms are usually implemented using atomic read-modify-write primitives, such as CAS (compare-and-swap). An atomic instruction is instantaneous, or at least appears that way to the rest of the system, and is usually defined with a succeed-or-fail interface where a failure to perform the operation has no visible side-effects on the system. In the case of a CAS, a failure occurs when another thread has changed the value, i.e made progress. A quick pseudo-C-code example of updating a shared resource using an atomic CAS:

var my_shared_resource;

var new_value;
do {
  old_value = my_shared_resource;
  new_value = calculate_something( old_value );
} while( !atomic_cas( &my_shared_resource, old_value, new_value ) );

Basically the thread busy-loops while waiting for the atomic operation to succeed. The CAS operation only succeeds and stores the new_value if the current value of the resource equals the old_value passed in. It's not wait-free, but if the thread stalls (loops), it means that some other thread has succeeded in the operation it's performing on the shared resource.

A potential pitfall of the CAS operation is known as the ABA problem. If some other thread performs some operation that stores the value B and then restores the value A, the thread performing the CAS will think the value is not modified and the CAS will succeed. If the value is some bit pattern that has some other instrinsic meaning (like a memory address, or a wrapping counter), you could be in serious problem. The solution in these cases is usually to use a double-length variable, for example using a 64-bit value for a 32-bit counter, and using the remaining bits as an "operation counter" which is increased every time an operation is performed. This will solve the ABA problem since the restoration of value A will have a different bit pattern due to the increasing counter in the top 32 bits.

So, now that we have a basic atomic operation, what can we do with it? As an example, let's compare the implementation of a LIFO/stack. A quick lock-based algorithm would look like:

var top
var mutex

function push( item ) {
  lock( mutex )
  item->next = top
  top = item
  unlock( mutex )
}

function pop() {
  lock( mutex )
  var item = top
  top = item->next
  unlock( mutex )
  return item
}

In comparison, a lock-free implementation would look like

var top

function push( item ) {
  var old_top
  do {
    old_top = top
    item->next = old_top
  } while( !atomic_cas( &top, old_top, item ) )
}

function pop() {
  var item, next_top
  do {
    item = top
    next_top = item->next
  } while( !atomic_cas( &top, item, next_top ) )
  return item
}

Designing an API (part 2)

2011-06-05T22:53:00.001+02:00

This is the second part of my API design post, and part three in my series of posts about building a profiling library. The initial post covered the base code for measuring elapsed time, and the second the design of the API. I'll go through some of the details about the implementation, and if you want the full code it's available at github (released to the public domain). The code for the final implementation will be pushed to this repository tomorrow. This post is very code-oriented, but I promise to have more pretty pictures next time!

To recap, these are the API design guidelines I think are important to keep in mind during the implementation:

Orthogonal. A function should not have any side effects, and there should be only one way to perform an operation in the system.
Contained. Following from the rant in my previous post, we should avoid third party dependencies and prefer to use primitive or well-defined data types.
Specialized. A function in an API should perform a single task. Don't create functions that do completely different unrelated tasks depending on the contents of the variables passed to the function.

The first thing that I think about is memory management. Most games tend to have their own memory management and debugging systems, which in my book immediately makes raw new/delete/malloc/free calls invalid in a library. If you need to make allocations, you should provide means in the API to set callbacks (or similar constructs) for memory management to let the library use the same implementation as the game using the library.

In this small library however, I decided to take the easy route and simply require a pre-allocated memory buffer and limit all in-library memory requirements to that buffer. The drawback is of course that the library can only handle a fixed amount of data, but since it's also continuously writing the data to an output stream the required buffer size is proportional to the rate at which the data is being written. I find that an 8k buffer is plenty of space for this purpose. The internal data structures are laid out as follows:

typedef struct _profile_block_data   profile_block_data;
typedef struct _profile_block        profile_block;

struct _profile_block_data
{
 uint32_t           id;
 uint32_t           parentid;
 uint32_t           processor;
 uint32_t           thread;
 uint64_t           start;
 uint64_t           end;
 char               name[20];
}; //sizeof( profile_block_data ) == 52

struct _profile_block
{
 profile_block_data data;
 uint32_t           previous;
 uint32_t           sibling;
 uint32_t           child;
}; //sizeof( profile_block ) == 64

The blocks can be arranged in a tree structure using the child, sibling and previous pointers (stored as an index which could be 16 bits, but stored as 32 bits to allow atomic swap instructions). The array of blocks is initialized in the library entry point:

void profile_initialize( char* identifier, void* buffer, profilesize_t size )
{
 profile_block* block = buffer;
 profilesize_t num_blocks = size / sizeof( profile_block );
 profilesize_t i;
 
 for( i = 0; i < ( num_blocks - 1 ); ++i, ++block )
  block->child = ( i + 1 );
 block->child = 0;

 //[...]
}

The library keeps a single block as the root of all currently queued blocks, and the operation of adding a block with no parent becomes a simple task of swapping the child pointer (index) of this root block. To make it thread safe we need to use an atomic compare-and-swap instruction:

static void _profile_put_root_block( profile_block* block )
{
 uint32_t self = (uint32_t)((uintptr_t)( block - _profile_root ));
 do
 {
  block->sibling = _profile_root.child;
 } while( !_atomic_cas_32( &_profile_root.child, self, block->sibling ) );
}

Each block can also be a subblock inside another block. To push such a block we implement a thread-local variable storing the current active block, and can perform a swap of pointers without worrying about thread safety:

static void _profile_put_simple_block( profile_block* block )
{
 //Add to current block, or if no current add to array
 profile_block* masterblock = _thread_data();
 if( masterblock )
 {
  uint32_t self = (uint32_t)((uintptr_t)( block - _profile_root ));
  block->previous = masterblock;
  block->sibling = masterblock->child;
  if( masterblock->child )
   masterblock->child->previous = self;
  masterblock->child = self;
 }
 else
 {
  _profile_put_root_block( block );
 }
}

Once we want to flush the queued blocks we simply iterate the list of queued blocks in the root block, and process each tree of blocks. Again, we can get away without a lock by doing a atomic compare-and-swap on the single child pointer of the root block:

do
{
 block = _profile_root.child;
} while( !_atomic_cas_ptr( &_profile_root.child, 0, block ) );

A recursive function can then traverse the block tree and rearrange the child/sibling pointers in order to create a single list of blocks arranged through the child pointer. Since only closed blocks are added to the root list, the traversal can be done without worrying about other threads modifying the data:

static profile_block* _process_profile_block( profile_block* block )
{
 profile_block* leaf = block;

 //[...] process block and write to output stream 

 if( block->child )
  leaf = _process_profile_block( _profile_blocks[ block->child ] );
 if( block->sibling && !block->child )
 {
  block->child = block->sibling;
  block->sibling = 0;
  leaf = _process_profile_block( _profile_blocks + block->child );
 }
 if( block->sibling )
 {
  profile_block* subleaf = _process_profile_block( _profile_blocks + block->sibling );
  subleaf->child = block->child;
  block->child = block->sibling;
 }
 return leaf;
}

In the end, we get a stream of profiling block data detailing the time spent in selected parts of the code. The implementation goals are maintained. The code is lock-free and has very low overhead, and no third-party dependencies through the use of a single pre-allocated buffer for all storage. Each function is specialized and has no side effects through the use of thread-local storage and atomic swap instructions.

In the next post I'll present a simple UI for analyzing the data.

The hacking business model

2011-05-21T10:02:00.001+02:00

How do you run a company? With so many experienced developers moving from the well-established AAA production companies to an indie life, I think it is important to stop and think about what the reasons are for not enjoying working at said companies in the first place. If you are successful with your indie startup, chances are you will end up making the same mistakes in your own organisation.

I strongly believe in empowering the employees, and I think that is the key to any long-term successful company, both small and big. One of the founders of the MySQL database, Michael Widenius, co-wrote a manifesto of sorts with rules for running a company based on egalitarian and sustainable principles. I think it's a great starting place for anyone looking to start their own company. It was originally posted on Zak Greant's blog, and later used for founding Monty Program AB.

Of course, some details might not be ideal for all companies, but I think the principles that the company exists to create a meaningful workplace and to generate employment and bonuses for the employees (and not to get sold) is definitely something more startups should be built upon. I've taken the liberty of removing some minor details, for the full manifesto check the link above.

Purpose

Create a sustainable business model that can be adopted and adapted by others.
Create a fair and democratic company that is owned by the workers.
Have long-term, trustworthy and meaningful relationships with our staff and customers.

Principles

Egalitarian: The belief that all people should be treated equally. This includes equality, non-discrimination and inclusivity.
Sustainable: We have a long-term view on our business. We watch our profits & spend wisely, we take care of each other, we support the things we depend on.
Transparent: We communicate in an honest and genuine way. Any information or process that can be made open, will be made open.
Fun: Create a workplace where people can have fun and want to work.
Agile: Be flexible, receptive & adaptive, especially when dealing with staff and customers.

Methods

Concrete tools for helping us live according to our principles, including:

Consensus-based decision making.
Corporate transparency - any information or process that can be made open, should be made open.
Licensing that helps benefit our company, our staff, our customers, our partners and society at large.
Profit-sharing with staff, contributors and worthy causes.
Don't try to change people. Focus on getting the best from their strengths. Develop ways to work around their weaknesses.
Prefer to work with people who share our values.
Work against patents and other legislation that harms individual rights.

Default Employee Rules

Some employee roles may have different requirements — for example, someone working on customer support may need to have regular hours. Of course, any differences need to be noted explicitly in the employee's contract in a section that clearly details any differences from the standard agreement.

75 working hours per two weeks. Ideally, employees should work schedules that are kind to them and to others.
Equal free (vacation) time; Everyone has 35 free days a year + Saturday & Sundays. (This is basicly Finnish 5 week vacation + 10 (avg) weekday holidays). Note that this means that one should use a free day if one wants not work any Monday-Friday even if it's a public holiday in your country! Up to 25 free days will roll over to next year (ie, the vacation part not the public holiday part); If the employee quits or is let go all saved vacation money will be paid out. One earns 25/47 vacation days / week of active work (This is used for the first and last year of employment).
Vacation money ("Lomalta paluu raha"). When you go on vacation that is more than 3 weeks (or have used more than 3 weeks of your vacation for the year) you get an extra 1/2 month of salary. If you don't keep your vacation the vacation money is payed at the end of the year under 'get a life' bonus plan.
The employee will get a shared copyright to all code and documentation he/she produces according to the spirit of the Sun's SCA license agreement.This doesn't include confidential code/documentation or code/documentation that we do for customers that require full ownership to the produced code/documentation.
80 / 20 rule; 80% of the time the employee should work on scheduled tasks. 20% of the time they can work on any tasks of their choice, as long as they will generate revenue, make employees more efficient, or enhance company recognition in a 2 year window. The 20% tasks need to be approved to ensure they follow the above guidelines.
The employee needs to be transparent with everything they are doing. Transparency makes the employee responsible to their peers, and makes them accountable by their own statements. It's also the best vehicle to create trust.
The employee needs to do a weekly report that includes everything they have done during the week, their plans for the upcoming week, and also any ideas they may have (half-baked or ready-to-use), that they would like to discuss with others. The weekly reports should show clear progress in the employee's major tasks.
The employee must speak up if the company is doing something they think is harmful for itself, its employees or customers. It's crucial for the success of the company that the employees are 'on board' with the vision of the company. Politics are strictly forbidden!
The salary should be competitive in the area where the employee is located. The bonus plan is not dependent on where employee is living.
If an employee has been of significant help in getting / delivering a customer order, they are entitled to a bonus for this work. Everyone involved in a sale will share 5% in the 'price - cost of sales' portion of the sale (in proportion to their involvement) and everyone involved in the delivery will share another 5%. However, any such bonuses will be deducted from their part of the year-end bonus (provided the company was profitable).
Each employee will be assigned a VIP number of 1-10 (10 being the highest) to describe their importance to the company. This number is used to calculate the end of year bonuses and 'sell-shares'. The VIP number is agreed to when an employee joins the company and will be reviewed yearly by their manager. The employee will get directions on what they can do to increase their importance to the company (and thus their VIP number).
The employee is assumed to be cost efficient. This means they should prefer to use:
- Cost efficient communication tools like Skype and VoIP.
- Economy traveling tickets.
- Medium priced hotels, rental cars, and restaurants.
- When traveling they should strive to stay over at their fellow employees' places and/or share rooms with their fellow employees. (This item can be overridden with a 'good cause' by their manager)
- If the employee wants better hotel, food, traveling arrangements, working equipment, etc this can be arranged, but the difference should be reduced from their salary, contract money or bonus.
When hired, the employee will be considered as 'on-trial' for the first 3 months. After the trial period, the Company and employee will decide if things are working out and either hire the employee or contract him under similar terms as if they were employed full-time.
People that are not working up to expectations will receive a warning. If the situation is not corrected within one month, they will be moved to 'on-trial' status. After 3 months, the Company and the employee will revisit the employee's performance and will decide whether further employment is in their best interests.

The Rules of the Company

The Company is primarily created to generate bonuses for the employees (not to get sold).
The Company should make it as fun as possible to work for the Company.
The Company will strive to be small and efficient. If growing too big, it will split into separate business units or companies.
Strive for long relationships with employees and customers. (The Company is a Family)
The Company needs to respect the individuality of it's employees. If the employee has reasonable 'extra' demands they need to be seriously considered. (For example when it comes to work on weekends, room sharing, not wanting to travel, etc).
The Company will employ people based on their merits. They will not be discriminated based on their gender, race, religion, location, marital status or whom they have married.
The Company will not require people to work on weekends. The Company has the right to ask the employee to work on weekends, but the employee has the right to refuse without any consequences.
The Company will actively encourage its employees to take out their vacation and not save it for later. This is especially important for employees that are "burning out".
For time-critical, highly-paid, highly-profitable projects that require double working hours, the Company will pay three times the salary and/or offer paid vacation days.
The Company will, whenever it's possible, largely let the employee choose their own work, instead of being told what to do. By letting the employee set their own goals, he is more likely to meet them. When working on a chosen project the employee needs to work with the team lead, but after the project is done, the employee should decide on what to do next.
The Company needs to be long term cost efficient in its daily operation. This should be considered when choosing software, phone usage, equipment, etc...
The Company will not censor employees' opinions or try to hinder employees from expressing them. The Company should provide appropriate forums for the employees to discuss anything work related.
The Company will actively help and sponsor open source projects (see bonus plan).
The Company will strive for having as much of its plans and information publicly available. All rules of the Company will be made public on its web site.
The Company will respect the privacy of the employees. It will not read, without explicit permission from either the author or the receivers, any employee emails that are not sent to an email list and it will not read any logs from any private (query) IRC sessions.

Bonus

At end of year the profit will be distributed as follows:

45% will be saved for expansion
5% will be donated to open source projects. The project(s) will be chosen by the employees of the company by voting.
5% will be used to support charities. The projects will be chosen by the employees of the company by voting.
20% will be used to pay off existing debts.
20-45% (depending on debts paid) will be given out (as bonus, dividend or some other form) to employees and investors based on their VIP number and the number of working hours.

The bonus for each employee is calculated as follows:

employee_profit_hours = employee_working_hours*VIP
bonus= profit/(SUM(all employee_profit_hours))*employee_profit_hours

An investor which has invested 100,000 Euros is treated as an employee of VIP level 5 that has 37.5 hours a week for 47 weeks. And the investor should expect to get at least 8 % as 'bonus' from the bonus pool. If he gets less, the missing money will be added to his loan.

At end of year the Company will distribute 100,000 'semi-shares' to its employees and active investors, in proportion to ones 'employee_profit_hour'. This share does not have any other rights than if the company would get sold then the money paid for the company (minus all costs, investments, loans etc) will be distributed among all semi-share holders.

At end of the year the Company should strive to pay off (part of) its investors if it doesn't expect to need the money within a 2-year time frame.

Decision-making processes

The Company is led by a CEO. His actions are governed by:

An advisory board that is a team of external chosen experts.
A governance team, consisting of members from the employees and the advisory board

The purpose of the advisory board is to give good advice to the CEO and the employees of the company.
The purpose of the governance team is to give directives (that must be followed) to the CEO and suggest changes to the rules and decisions processes. The governance team has the right to fire and reinstate people (including the CEO). The Company will be led in an open and democratic fashion:

All (not customer classified information) information will be public inside of the company. This includes salaries, bonuses, shares, birthdays, etc...
Decisions will be done in a democratic fashion and all employees should have a chance to have their say in things that matter to them.
In cases of disagreement things should be resolved by voting with the 3 vote rule; 'Yes, No and Never'. A decision should normally not be taken if there is a single 'Never' vote. In exceptional 'life and death' cases, after thorough discussions have been had and after all other options are exhausted, a decision can be taken even if there are 'a few' 'Never' votes, if 50 % of the company will vote yes to the proposal.
Company should work according to the motto: "Do good decisions fast but be prepared to quickly change course if there is a way to do it." This implies that the following should hold for 'controversial decisions':
- If requested, the decision makers need to clearly define the basis for a decision and provide means for proving/disproving that the decision is in the Company's best interests. If a decision is proved wrong, it needs to be reverted and the decision makers need to analyze why it went wrong and take steps to ensure that it won't happen again.
The Company should learn from its mistakes and its successes. It should strive to repeat its successes and avoid its mistakes in the future.

Difficult decisions / Changing the rules

The rules of the Company can only be changed if at least 75% of the Company employees are not opposed to the changes. The owners of 51% of the voting shares in the Company (initially only the founders) have the definite power to say 'NO' to any made decision that doesn't have 75% of the employees behind it. They can also propose new decisions that will be be taken if 51% of the employees stand behind it. This rule can only be changed with 85% of the voting shares.

Designing an API

2011-04-06T14:44:00.002+02:00

How do you design an API? My experience is that just like with most other issues concerning code formatting and standards, every programmer has their own set of preferences. Which of course means this post is entirely based on my personal views, and I will obviously assume you agree with the choices I make.

This is part two in my series of posts about building a profiling library. The previous post covered the base code for measuring elapsed time. Just like the code for that post, the code for this and the next parts will be available through github, released to the public domain:

https://github.com/rampantpixels/profile_lib

Let's get down to business! An API should be:

Consistent. All functions, parameters and types should follow a well defined pattern in order to make it easy to remember how function names are constructed and how to pass the expected parameters.
Orthogonal. A function should not have any side effects, and there should be only one way to perform an operation in the system.
Compact. The API needs to be compact, meaning the user can use it without using a manual. Note though that "compact" does not mean "small". If you follow the first rule, a consistent naming scheme makes the API easier to use and remember.
Contained. Following from the rant in my previous post, we should also avoid third party dependencies and prefer to use primitive or well-defined data types.
Specialized. A function in an API should perform a single task. Don't create functions that do completely different unrelated tasks depending on the contents of the variables passed in.

And now we apply these principles to the problem of designing a profiling API. The minimal set of functionality we need is to be able to define an execution block for which we want to measure the elapsed time. We need to be able to identify the block, as well as have blocks within blocks. For blocks that execute over a prolonged period of time we could also use a method to check if the thread has migrated to another core. By requiring that a call to begin a block is matched by a call to end, we can hide all the hierarchy details in the implementation.

 void         profile_begin_block( const char* identifier );  
 void         profile_update_block();
 void         profile_end_block();

Everything else we add might violate one or more of the rules, so we need to tread carefully. A reasonable feature of this system would be to be able to group blocks together in a frame (where the meaning of a "frame" would be open for interpretation by the user). Since the end of a frame automatically marks the beginning of the next frame, we only need one function (compact). The function will have no side effects and only fulfill the primary purpose of marking a group of blocks as belonging to a frame (orthogonal). We name the function according to the previous block functions and use a standard integer type to identify the frame number (consistent, contained).

 void         profile_end_frame( uint64_t counter );

Going forward, we also want a function to enable/disable profiling as well as functionality to insert generic log messages into the profiling data stream. It could also be useful to add notifications of mutex locks/unlocks as they control the flow and execution of threads. However, we're now close to breaking the orthogonality rule as we could potentially achieve these goals by using the generic log message function with messages of certain pre-defined formats. But this is where the "specialized" rule kicks in. Having a generic log message function performing both the task of inserting a log message into the profile data as well as inserting notifications about mutex locks would make the API harder to remember and more bug-prone.

 void         profile_enable( int enable );
 void         profile_log( const char* message );
 void         profile_trylock( const char* name );
 void         profile_lock( const char* name );
 void         profile_unlock( const char* name );

Finally we need functions to initialize and shutdown the profiling backend, and a function to declare how to output the gathered date. In order to minimize the dependencies of the library and allow the user to output in application-specific ways, we'll use a callback.

 typedef void (*profile_write_fn)( void*, uint64_t );
 void         profile_initialize( char* identifier );
 void         profile_shutdown();
 void         profile_output( profile_write_fn writer );

As you might have guessed I've placed a few restrictions/demands on the user of the API. In order to stick with primitive data types and minimize the amount of memory management going on in the implementation of the library, the strings passed to the functions must be valid until the data has been flushed to the output stream.

That concludes this small exercise in API design. In the next part I'll go through the implementation of the profiling library, dealing with the thread synchronization issues and the steps taken to minimize the overhead of the implementation. To be effective, the profiling code itself can only spend an order of magnitude less time than the code being profiled.

Time to know your code

2011-03-20T22:48:00.001+01:00

Ever wanted to know which parts of your codebase that keeps you from reaching the nirvana of 60 fps smoothness? Tired of not being able to pinpoint and replicate the use-case which causes your game to display strange frame time patterns on your artists' machines? Look no further.

This is the first post in a series of four where I will go through the steps of building a simple but useful profiling library for multithread/multicore games. It's not a profiler on instruction level like VTune and similar tools, but rather a library for measuring and capturing profiling data on a block/function level. It will also provide the functionality to record and analyze performance remotely through a network or by capturing the data stream to a file. The reasoning for reinventing yet another wheel is that I wanted something unobtrusive that was easy to use and had no dependencies.

In general, I want to ask fellow programmers out there who make their code available to the public to please make the effort to cut down on the dependencies. Using other libraries is all well and good, but it makes using your code so much more painful, especially when these libraries in turn have other dependencies. I've seen countless examples of useful bits and pieces of code on the net with horrible dependency chains (Boost being my pet peeve) which makes them more or less useless in my book. After all, if you released the code you probably want people to use it, so why make it more painful than it has to be?

Ok, rant over. My goals for this exercise:

Simple and easy to use API
Written in C90 for portability and ease of use in other languages
No external dependencies except the standard C runtime and OS provided libraries

This first part of the series will focus on solving the core issues of any profiling code, which is the problem of measuring elapsed time. All code for this post (and eventually the entire profiling library) will be released to the public domain through a github repository available at https://github.com/rampantpixels/timer_lib

Since we're working on block/function level, I decided that the high frequency counters available through QueryPerformanceCounter on Windows and clock_gettime on Linux would do. However, there's a slight problem with these functions on some faulty hardware/software that causes the value returned to jump several seconds when the thread migrates between CPUs/cores. To solve this, we can either force the thread to a certain core before calling the function (with SetThreadAffinityMask on Windows), or we can detect this case by comparing the elapsed time with a low resolution reference value which is guaranteed to give consistent results even during core migration. More about this later.

The code is written for Windows and Linux at the moment, but I will update it with MacOS X implementation as soon as I can (basically when I can get my MacBook back, it's currently on holiday at another company). If someone would like to provide implementations for other platforms, just drop me a line and I'll give you write access to the repository.

Time to pin down the API!

 void         timer_lib_initialize();  
 void         timer_lib_shutdown();  
 void         timer_initialize( timer* t );  
 void         timer_reset( timer* t );  
 deltatime_t  timer_elapsed( timer* t, int reset );  
 tick_t       timer_elapsed_ticks( timer* t, int reset );  
 tick_t       timer_ticks_per_second( timer* t );  
 tick_t       timer_current();  
 tick_t       timer_current_ticks_per_second();  
 tick_t       timer_system();

The main entry points for the library are the initialize and shutdown functions, responsible for setting up any OS specific resources for the timing. Measuring elapsed time is mainly done through a simple timer struct

 typedef struct  
 {  
     tick_t       clock;  
     tick_t       ref;  
     tick_t       freq;  
     deltatime_t  oofreq;  
 } timer;

It keeps the last measured timestamp, a reference value (used to detect the anomaly discussed earlier) and the frequency of the measured ticks. The timer_intialize and timer_reset will store the current timestamp which results in the reported elapsed time being reset to zero. To avoid having to deal with memory allocations, the library requires the user to allocate/free the timer objects, which follows from the mantra of keeping it simple and reducing dependencies to a minimum. Since most engines and frameworks usually have their own memory allocation wrappers, I prefer to make simple libraries like this stay away from that issue instead of having the user pass in function pointers to custom allocators.

There are two methods to query the time elapsed. timer_elapsed which returns elapsed time in seconds, and timer_elapsed_ticks which returns raw "ticks" elapsed. The timer_ticks_per_second can be used to query how many of these "ticks" there are per second.

Finally there are a couple of convenience functions, timer_current which returns a process-wide timestamp (basically an non-resettable global timer) and timer_system which returns a UNIX timestamp (number of milliseconds elapsed since the epoch).

Regarding implementation it's all pretty straightforward, except for the slight problem I mentioned earlier. Since the reported elapsed tick count can jump unexpectedly when the thread migrates between cores, I've chosen to use timeGetTime as a reference value and falling back on it in case the performance counter is messed up. This of course introduces yet another problem: The timeGetTime returns a 32 bit value which wraps around. If that happens, I just assume the performance counter is behaving during that specific call. This is of course Windows-specific, but so far I've not been able to detect a case where clock_gettime bails out.

 tick_t timer_elapsed_ticks( timer* time, int reset )  
 {  
     tick_t dt = 0;  
   
 #if defined( _WIN32 ) || defined( _WIN64 )  
   
     tick_t diff, refdiff;  
     deltatime_t timerdiff;  
     tick_t curclock = time->clock;  
     tick_t ref   = time->ref;  
   
     QueryPerformanceCounter( (LARGE_INTEGER*)&curclock );  
     ref = timeGetTime();  
   
     diff  = curclock - time->clock;  
     refdiff = ref - time->ref;  
   
     if( refdiff < 0 )  
         refdiff = (tick_t)( 1000.0 * diff * time->oofreq ); //Catch looping of the millisecond counter  
   
     timerdiff = (deltatime_t)( ( diff * time->oofreq ) - ( refdiff * 0.001 ) );  
     if( ( diff < 0 ) || ( timerdiff > 0.1 ) || ( timerdiff < -0.1 ) )  
         diff = (tick_t)( ( refdiff * 0.001 ) * time->freq ); //Performance counter messed up, transform reference to counter frequency  
   
     dt = diff;  
   
     if( reset )  
         time->ref = ref;  
   
 #else  
   
     tick_t curclock;  
     struct timespec ts = { .tv_sec = 0, .tv_nsec = 0 };  
     int err = clock_gettime( CLOCK_REALTIME, &ts );  
     if( err < 0 )  
         return;  
   
     curclock = ( (tick_t)ts.tv_sec * 1000000ULL ) + ( ts.tv_nsec / 1000ULL );  
   
     dt = curclock - time->clock;  
   
 #endif  
   
     if( reset )  
         time->clock = curclock;  
   
     return dt;  
 }

This concludes the first part of my profiling library implementation. The full code is available in the github repo mentioned earlier. Any comments, rants or suggestions are welcome! The next part will be a discussion about the design and requirements of the actual profiling API.