8-bit Assembly was Fun Ramblings of a not-so-old Programmer

Releasing a New Project

I am releasing another project under the Apache License. Instead of just dumping all the code in one go, I will try to release it in manageable chunks accompanied by a post explaining the ideas behind them. This will also make it easier to keep a strong build infrastructure, such as continuous integration, automatically generated documentation, and code coverage reports. Finally it gives me a chance to reorganize some of the code. It was not particularly horrible, but it is always easier the second time.

Motivation

TL;DR; The author wants to learn how to program on GPUs.

The problem this library tries to solve is how to measure the delay between two market data feeds. The problem is most common in the US equity markets, but I have good reason that it is interesting in US options, and it might be interested in other markets too. The problem appears because many market participants use the consolidated feeds, which are often significantly slower than the direct feeds from the exchanges. There is very little trading opportunities created by measuring the exact delay between the consolidated and direct feeds, it is enough to know that the consolidated feeds are slower to open trading opportunities. But measuring the delay can help one determine how long or big those opportunities are. Furthermore, exchanges release new feeds all the time, and comparing their latency is critical to the business that depend on timely market data.

In addition, the performance of the software that captures, normalizes and distributes these market feeds must be tested in every release. Regressions can be expensive. Benchmarks and simulated measurements are often not enough, the characteristics of a production feed are hard to reproduce in a lab, so we must have a mechanism to measure the performance in production.

All this would not be so challenging if the message streams were identical in both the direct and consolidated feeds. But they are not, the direct feeds often contain more information. For example, a direct feed often describes the interest for all price levels, while the consolidated feeds often only indicate the interest at the best available price, i.e., the highest price for the buy interest, and the lowest price for the sell interest.

And to further confuse matters, there is rarely a reliable identifier that can be used to correlate messages on the consolidated feed against the messages in the direct feed.

The approach that we attempt on this library is to treat the market feeds as basic timeseries, and then perform time-delay estimation using the cross correlation of the paired timeseries.

Cross-correlation (and time-delay estimation) are expensive operations if performed naively. A naive cross-relation is a algorithm, fortunately, one can use Fast-Fourier-Transform (FFT) to implement the algorithm in . In addition, both FFTs and the time-delay estimation algorithms can be implemented efficiently on modern GPUs, to further speed up the computation.

Though the primary motivation is the analysis of market data feeds, I believe this technique is applicable any time the same timeseries is measured in two different ways. For example, CPU utilization, page hit rates, server crash counts, etc.

First Commit

That said, we need to start coding something. So I created a repository in github.com (with some basic defaults), and then submitted the autoconf and automake boilerplate.

More than One Compiler and Then The End

At this point I have been able to configure Ubuntu 12.04 to compile my small C++11 library, I have also been able to configure Travis to automatically compile the library from the github source. Travis allows you to configure more than one compiler, which sounds fantastic from my perspective. Unfortunately, why they allow you to test your code against multiple versions of Python (ref) they do not (yet) allow you to easily configure multiple C++ compiler versions.

This is not too difficult in practice, you simply need to override the CXX and CC environment variable settings to your liking. In my case I modified the compiler section in the .travis.yml file to look like this:

compiler:
  - clang
  - gcc

Then I modified the before_script section to include:

before_script:
  - uname -a
....
..
  - if [ "x$CC" == "xgcc" ]; then CXX=g++-4.9; CC=gcc-4.9; fi
  - if [ "x$CC" == "xclang" ]; then CXX=clang++-3.6; CC=clang-3.6; fi
  - export CC
  - export CXX  

The full configuration file is here

Future Changes

The solution described above for testing multiple compilers is not very scalable. It seems the state of the art is to use the new container-based build infrastructure in Travis, and build a matrix of configurations as described here, or if you want an even more sophisticated example look at this one.

I will probably need such an approach when I start testing builds with code coverage, without it, with optimizations and without them, with different memory checking tools (ASAN, TSAN, etc). But for the time being I am satisfied that I can continue to code and have something running the tests for me.

Taking Stock

I started this series of posts to investigate if it was possible to setup C++11 builds using any of the hosted continuous integration solutions out there. Though I did not show my attempts at using other CI frameworks, all the ones I tried use Ubuntu 12.04 as their base platform, so the first step was to install the necessary tools for C++11 on said platform. Once that problem is resolved, using Travis CI, which appears to be the most popular product, proved relatively easy. In a scale of 1-10 where 1 is booting Android and 10 is configuring sendmail using the original .cf file, I rate this a 2.

The biggest feature I miss from Travis CI is some kind of report to show what specific tests broke or were fixed in each change. The are no plans to implement such a feature (ref). This feature is so important that I may look into using a completely different continuous integration solution, such as Circle CI).

Building with Travis CI

Travis CI is one of the hosted continuous integration frameworks that offer C++ support (ref). Their instructions looked promising, and they are easy to follow. Unfortunately their default setup does not work for C++11 libraries, and this is where I learn that I needed to setup a C++11 development environment on Ubuntu 12.04 first.

The Configuration File

Travis follows the instructions on a simple .travis.yml file in the top level directory of your project. The instructions in the website are comprehensive enough, but I think it is easier to follow if we describe the contents of our Travis file section by section.

Configure the Language and Compilers

First we tell Travis to use C++ and to compile on Linux.

language: cpp

os:
  - linux

Then we tell it what compilers to use. This is a nice feature, testing C++ with more than one compiler is a good way to avoid portability problems, both to other platforms and to future updates in the compiler. We will initially configure just one compiler, just to make testing easier, but we will want to setup additional compilers later:

compiler:
  - clang

Setting Up the Development environment

We are going to use the the before_install section to install the development environment. The documentation explicitly recommends installing Ubuntu packages there.

The set of apt repositories and packages was described in the previous post, here we simply reproduce those commands in the .travis.yml format:

before_install:
  - sudo apt-get -qq -y install python-software-properties
  - sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
  - sudo add-apt-repository -y ppa:dns/gnu
  - sudo add-apt-repository -y ppa:boost-latest/ppa
  - sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise main"
  - sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.6 main"
  - wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key|sudo apt-key add -
  - sudo apt-get -qq update
  - sudo apt-get -qq -y install clang-3.6
  - sudo apt-get -qq -y install g++-4.9
  - sudo apt-get -qq -y install boost1.55
  - sudo apt-get -qq -y install autoconf automake autoconf-archive make
  - sudo apt-get -qq -y install git

Logging the Configuration

As this program will run automatically, it is always useful to log the critical dependency versions to make debugging easier. We do this just before running the configuration script:

before_script:
  - uname -a
  - g++ --version || echo "no g++ found"
  - clang++ --version || echo "no clang++ found"
  - g++-4.9 --version || echo "no g++-4.9 found"
  - clang++-3.6 --version || echo "no clang++-3.6 found"
  - make --version || echo "no make found"
  - automake --version || echo "no automake found"
  - autoconf --version || echo "no autoconf found"
  - dpkg -s autoconf-archive || echo "no autoconf-archive found"
  - dpkg -s libboost-test1.55-dev || echo "no libboost-test1.55-dev found"
  - echo $CXX
  - echo $CC
  - CC=clang-3.6
  - export CC
  - CXX=clang++-3.6
  - export CXX  

Compiling the Code

With all these preambles behind us, we can now compile the code:

script:
  - ./bootstrap
  - echo CC=${CC?} CXX=${CXX?}
  - buildir=$(basename ${CC?})
  - mkdir ${buildir?} && cd ${buildir?}
  - ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
  - make
  - make check

At the end of this process you should have a file that looks like this one. Notice that this link points to an specific version, I am planning to update the file, but not these instructions.

Sign-in To Travis

I have been using travis-ci.org (notice the TLD) for these builds. You may be using the version with commercial support at travis-ci.com. I used github to create a travis-ci.org account. And then enabled builds for the github.com/coryan/Skye project.

Travis scanned my github account and discovered by repositories. From there, I selected the settings for the Skye project, and enabled “Build only if .travis.yml is present”, and “Build pushes”, leaving all other options to their defaults (disabled). With apologies to any visually impaired readers, this screenshot may help image.

What is Next

So far we successfully built with clang, but we want to build with g++ too. That comes in the next installment of this series.

To Build Your Program From Scratch First you must Build the Universe

Given that most hosted continuous integration solutions are based on Ubuntu 12.04 I started by testing my small C++11 library in that environment. Of course not, I started by trying to build directly in one of the hosted solutions and failed miserably. This was entirely my fault, of course, once the coffee kicked in and I started thinking more clearly I prepped a virtual machine to test my library there.

I will not go into the details of building virtual machines and running them, I am sure you can find information on the web about it. I run Fedora 21 on my workstation, and for these purposes I find virt-manager, a point-and-click interface perfectly acceptable.

Create your Baseline VM

First you must download the Ubuntu 12.04 install CD, I easily found this online at:

http://releases.ubuntu.com/12.04/

Because I am planning to use this VM just to verify my builds, and not to use it as a primary development platform I used the server ISO:

http://releases.ubuntu.com/12.04/ubuntu-12.04.5-server-amd64.iso

Once you download the ISO, move it to wherever you keep the images and ISO for your virtual machines. Then create the VM, a chose a fairly small machine, 1 CPU, 2 GiB of RAM, 32GiB of disk space. One can chance those if needed, so better to start small.

Then simply boot the VM and let the installer do its job, you probably want to enable SSH so you do not need to login through the console. Last time I used a Debian-based system (such as Ubuntu), was around

  1. I recall the packaging system getting wedged routinely, but now it is 2015, so the packaging system gets wedged sometimes. Sigh. In my case, the default installation, selecting only SSH server as an option left the server unable to update some packages. A quick web search found this series of incantations to fix it:

    sudo apt-get clean sudo find /var/lib/apt/lists -type f | xargs sudo rm sudo apt-get update sudo apt-get dist-upgrade

Install the Development Tools

Because I am likely to restart the rest of the process numerous times, I took a snapshot and cloned this VM at this point. The biggest question I had was how to get recent versions of the development tools installed. Ubuntu 12.04 was released in 2012, when the support for C++11 was fairly immature. Luckily, an army of volunteers have created /backports/ of all sorts of packages to the platform. You just need to find their packages. More web searches and you discover the rich collection of /Personal Package Archives/ (PPA) for Ubuntu.

In my case these included:

  • ppa:ubuntu-toolchain-r/test: the GNU toolchain, including g++ and gcc.
  • ppa:dns/gnu: a host of GNU tools, include autoconf and automake.
  • ppa:boost-latest/ppa: recent (though not necessarily the latest) version of the boost libraries.

The clang and llvm packages can be downloaded from http://llvm.org/apt/, and they list what repositories are needed for each version of Ubuntu.

I find the whole idea of downloading a pre-built binary from an unknown party and running it is mildly terrifying. I would much rather use the packages built by a well-known source, or build them from source as a second choice. But neither approach is realistic for these purposes. The binaries for Ubuntu 12.04 are simply too old for my purposes. Upgrading the hosted VMs where I would like to run the builds is not possible (we will revisit this later as we setup containers). And building from source will take too long and would be wasteful on the hosted environment. I could create my own PPA, but that solves the problem for me and nobody else. And ultimately I am running these packages on a throwaway virtual machine.

Having found all the packages we need to simply configure our sacrificial VM:

# Install a tool to easily add PPAs and other sources:
sudo apt-get -qq -y install python-software-properties
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo add-apt-repository -y ppa:dns/gnu
sudo add-apt-repository -y ppa:boost-latest/ppa
sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise main"
sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.6 main"
# ... add the public key used by llvm.org ...
wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key|sudo apt-key add -

Once the package sources are configured, we are ready to download the list of packages and their dependencies:

sudo apt-get -qq update
sudo apt-get -qq -y install clang-3.6
sudo apt-get -qq -y install g++-4.9
sudo apt-get -qq -y install boost1.55
sudo apt-get -qq -y install autoconf automake autoconf-archive make
sudo apt-get -qq -y install git

After 30 years of coding, I am paranoid, I want to know what really got installed:

$ g++-4.9 --version
g++-4.9 (Ubuntu 4.9.2-0ubuntu1~12.04) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang++-3.6 --version
Ubuntu clang version 3.6.2-svn240577-1~exp1 (branches/release_36) (based on LLVM 3.6.2)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.23.1
Copyright 2012 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for x86_64-pc-linux-gnu
$ automake --version
automake (GNU automake) 1.14
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Tom Tromey <tromey@redhat.com>
       and Alexandre Duret-Lutz <adl@gnu.org>.
$ autoconf --version
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David J. MacKenzie and Akim Demaille.
$ dpkg -s autoconf-archive | grep ^Version
Version: 20130406-0gnu1~12.04

Okay, seems like we are ready to go.

Download the Source and Compile

Now that the development tools are here, download the source code for our C++11 library and compile it:

git clone https://github.com/coryan/Skye
cd Skye
./bootstrap
mkdir clang ; cd clang
CXX=clang++-3.6 CC=clang-3.6 ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
make check

Success! Let’s try with gcc:

cd ..
mkdir gcc ; cd gcc
CXX=g++-4.9 CC=gcc-4.9 ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
make check

Success again!

What is next?

Now that we have reproduceable builds for this library on Ubuntu 12.04 we can attempt a build using some of the existing hosted continuous integration environments. Stay tuned for the next post.

Is Hosted Continous Integration Suitable for C++?

For reasons that will hopefully will apparent in future posts, I became interested in using a hosted service to perform continuous integration of my C++ projects. There are many offerings out there, in fact the sheer number can become bewildering (Travis-CI, Circle-CI, drone.io, just to start). After trying a couple of them it became apparent that most, if not all, of them provide virtual machines based on Ubuntu 12.04 (aka Precise Pangolin).

A reasonable platform choice for most purposes, but an unfortunate one for me. Most of my code uses C++11, which was poorly supported in that version of Ubuntu. I also tend to use recent versions of the boost libraries, and the GNU auto configuration tools.

I will probably be discussing soon whether the choice of automake is a poor one. But the choice of libraries and compilers I will defend, not on any technical basis, simply because my hobby projects are supposed to be fun. That usually involves not limiting myself to use well-proven, and stable platforms, as I often argue professionally.

With this in mind, the next posts will describe my failures (and successes hopefully) trying to use hosted environments for a small C++11 project. I will be writing them as I try different solutions, so do not expect polished and well reasoned conclusions soon. Instead, join me in a journey of toil, suffering, failures, successes, and ultimately discovery (gulp, I hope).