22 Aug 2015
I am releasing another project under the
Apache License.
Instead of just dumping all the code in one go, I will try to release
it in manageable chunks accompanied by a post explaining the ideas
behind them.
This will also make it easier to keep a strong build infrastructure,
such as continuous integration, automatically generated documentation,
and code coverage reports.
Finally it gives me a chance to reorganize some of the code. It was
not particularly horrible, but it is always easier the second time.
Motivation
TL;DR; The author wants to learn how to program on GPUs.
The problem this library tries to solve is how to measure the delay
between two market data feeds.
The problem is most common in the US equity markets, but I have good
reason that it is interesting in US options, and it might be
interested in other markets too.
The problem appears because many market participants use the
consolidated feeds,
which are often significantly slower than the direct feeds from the
exchanges.
There is very little trading opportunities created by measuring the
exact delay between the consolidated and direct feeds, it is enough to
know that the consolidated feeds are slower to open trading
opportunities.
But measuring the delay can help one determine how long or big those
opportunities are. Furthermore, exchanges release new feeds all the
time, and comparing their latency is critical to the business that
depend on timely market data.
In addition, the performance of the software that captures, normalizes
and distributes these market feeds must be tested in every release.
Regressions can be expensive. Benchmarks and simulated measurements
are often not enough, the characteristics of a production feed are
hard to reproduce in a lab, so we must have a mechanism to measure the
performance in production.
All this would not be so challenging if the message streams were
identical in both the direct and consolidated feeds. But they are
not, the direct feeds often contain more information.
For example, a direct feed often describes the interest for all price
levels, while the consolidated feeds often only indicate the interest
at the best available price, i.e., the highest price for the buy
interest, and the lowest price for the sell interest.
And to further confuse matters, there is rarely a reliable identifier
that can be used to correlate messages on the consolidated feed
against the messages in the direct feed.
The approach that we attempt on this library is to treat the market
feeds as basic timeseries, and then perform time-delay estimation
using the cross
correlation of the
paired timeseries.
Cross-correlation (and time-delay estimation) are expensive operations
if performed naively.
A naive cross-relation is a algorithm, fortunately, one can
use Fast-Fourier-Transform (FFT) to implement the algorithm in
.
In addition, both FFTs and the time-delay estimation algorithms can be
implemented efficiently on modern GPUs, to further speed up the
computation.
Though the primary motivation is the analysis of market data feeds, I
believe this technique is applicable any time the same timeseries is
measured in two different ways. For example, CPU utilization, page
hit rates, server crash counts, etc.
First Commit
That said, we need to start coding something. So I created a
repository in github.com (with some basic defaults), and then submitted
the autoconf
and automake
boilerplate.
22 Aug 2015
At this point I have been able to configure Ubuntu 12.04 to compile my
small C++11 library, I have also been able to configure Travis to
automatically compile the library from the github source.
Travis allows you to configure more than one compiler, which sounds
fantastic from my perspective. Unfortunately, why they allow you to
test your code against multiple versions of Python
(ref) they do not
(yet)
allow you to easily configure multiple C++ compiler versions.
This is not too difficult in practice, you simply need to override the
CXX
and CC
environment variable settings to your liking. In my
case I modified the compiler section in the .travis.yml
file to look
like this:
compiler:
- clang
- gcc
Then I modified the before_script
section to include:
before_script:
- uname -a
....
..
- if [ "x$CC" == "xgcc" ]; then CXX=g++-4.9; CC=gcc-4.9; fi
- if [ "x$CC" == "xclang" ]; then CXX=clang++-3.6; CC=clang-3.6; fi
- export CC
- export CXX
The full configuration file is here
Future Changes
The solution described above for testing multiple compilers is not
very scalable. It seems the state of the art is to use the new
container-based build infrastructure in Travis, and build a matrix of
configurations as described
here,
or if you want an even more sophisticated example look at
this one.
I will probably need such an approach when I start testing builds with
code coverage, without it, with optimizations and without them, with
different memory checking tools (ASAN, TSAN, etc). But for the time
being I am satisfied that I can continue to code and have something
running the tests for me.
Taking Stock
I started this series of posts to investigate if it was possible to
setup C++11 builds using any of the hosted continuous integration
solutions out there. Though I did not show my attempts at using other
CI frameworks, all the ones I tried use Ubuntu 12.04 as their base
platform, so the first step was to install the necessary tools for
C++11 on said platform.
Once that problem is resolved, using Travis CI, which appears to be
the most popular product, proved relatively easy.
In a scale of 1-10 where 1 is booting Android and 10 is configuring
sendmail using the original .cf
file, I rate this a 2.
The biggest feature I miss from Travis CI is some kind of report to
show what specific tests broke or were fixed in each change. The are
no plans to implement such a feature
(ref).
This feature is so important that I may look into using a completely
different continuous integration solution,
such as Circle CI).
22 Aug 2015
Travis CI is one of the hosted continuous
integration frameworks that offer C++ support
(ref).
Their instructions looked promising, and they are easy to follow.
Unfortunately their default setup does not work for C++11 libraries,
and this is where I learn that I needed to setup a C++11 development
environment on Ubuntu 12.04 first.
The Configuration File
Travis follows the instructions on a simple .travis.yml
file in the
top level directory of your project. The instructions in the website
are comprehensive enough, but I think it is easier to follow if we
describe the contents of our Travis file section by section.
First we tell Travis to use C++ and to compile on Linux.
language: cpp
os:
- linux
Then we tell it what compilers to use. This is a nice feature,
testing C++ with more than one compiler is a good way to avoid
portability problems, both to other platforms and to future updates in
the compiler.
We will initially configure just one compiler, just to make testing
easier, but we will want to setup additional compilers later:
compiler:
- clang
Setting Up the Development environment
We are going to use the the before_install
section to install the
development environment. The
documentation
explicitly recommends installing Ubuntu packages there.
The set of apt repositories and packages was described in the
previous post,
here we simply reproduce those commands in the .travis.yml
format:
before_install:
- sudo apt-get -qq -y install python-software-properties
- sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
- sudo add-apt-repository -y ppa:dns/gnu
- sudo add-apt-repository -y ppa:boost-latest/ppa
- sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise main"
- sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.6 main"
- wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key|sudo apt-key add -
- sudo apt-get -qq update
- sudo apt-get -qq -y install clang-3.6
- sudo apt-get -qq -y install g++-4.9
- sudo apt-get -qq -y install boost1.55
- sudo apt-get -qq -y install autoconf automake autoconf-archive make
- sudo apt-get -qq -y install git
Logging the Configuration
As this program will run automatically, it is always useful to log
the critical dependency versions to make debugging easier. We do this
just before running the configuration script:
before_script:
- uname -a
- g++ --version || echo "no g++ found"
- clang++ --version || echo "no clang++ found"
- g++-4.9 --version || echo "no g++-4.9 found"
- clang++-3.6 --version || echo "no clang++-3.6 found"
- make --version || echo "no make found"
- automake --version || echo "no automake found"
- autoconf --version || echo "no autoconf found"
- dpkg -s autoconf-archive || echo "no autoconf-archive found"
- dpkg -s libboost-test1.55-dev || echo "no libboost-test1.55-dev found"
- echo $CXX
- echo $CC
- CC=clang-3.6
- export CC
- CXX=clang++-3.6
- export CXX
Compiling the Code
With all these preambles behind us, we can now compile the code:
script:
- ./bootstrap
- echo CC=${CC?} CXX=${CXX?}
- buildir=$(basename ${CC?})
- mkdir ${buildir?} && cd ${buildir?}
- ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
- make
- make check
At the end of this process you should have a file that looks like
this
one.
Notice that this link points to an specific version, I am planning to
update the file, but not these instructions.
Sign-in To Travis
I have been using travis-ci.org
(notice the TLD) for these builds.
You may be using the version with commercial support at
travis-ci.com
. I used github to create a travis-ci.org account.
And then enabled builds for the github.com/coryan/Skye project.
Travis scanned my github account and discovered by repositories. From
there, I selected the settings for the Skye project, and enabled
“Build only if .travis.yml is present”, and “Build pushes”, leaving
all other options to their defaults (disabled). With apologies to any
visually impaired readers, this screenshot may help
image.
What is Next
So far we successfully built with clang, but we want to build with g++
too. That comes in the next installment of this series.
22 Aug 2015
Given that most hosted continuous integration solutions are based on
Ubuntu 12.04 I started by testing my small C++11
library in that environment.
Of course not, I started by trying to build directly in one of the
hosted solutions and failed miserably.
This was entirely my fault, of course, once the coffee kicked in and I
started thinking more clearly I prepped a virtual machine to test my
library there.
I will not go into the details of building virtual machines and
running them, I am sure you can find information on the web about it.
I run Fedora 21 on my workstation, and for these purposes I find
virt-manager, a point-and-click interface
perfectly acceptable.
Create your Baseline VM
First you must download the Ubuntu 12.04 install CD, I easily found
this online at:
http://releases.ubuntu.com/12.04/
Because I am planning to use this VM just to verify my builds, and not
to use it as a primary development platform I used the server ISO:
http://releases.ubuntu.com/12.04/ubuntu-12.04.5-server-amd64.iso
Once you download the ISO, move it to wherever you keep the images and
ISO for your virtual machines. Then create the VM, a chose a fairly
small machine, 1 CPU, 2 GiB of RAM, 32GiB of disk space. One can
chance those if needed, so better to start small.
Then simply boot the VM and let the installer do its job, you probably
want to enable SSH so you do not need to login through the console.
Last time I used a Debian-based system (such as Ubuntu), was around
-
I recall the packaging system getting wedged routinely, but now
it is 2015, so the packaging system gets wedged sometimes. Sigh. In my
case, the default installation, selecting only SSH server as an
option left the server unable to update some packages. A quick web
search found this series of incantations to fix it:
sudo apt-get clean
sudo find /var/lib/apt/lists -type f | xargs sudo rm
sudo apt-get update
sudo apt-get dist-upgrade
Because I am likely to restart the rest of the process numerous times,
I took a snapshot and cloned this VM at this point. The biggest question
I had was how to get recent versions of the development tools
installed. Ubuntu 12.04 was released in 2012, when the support for
C++11 was fairly immature. Luckily, an army of volunteers have
created /backports/ of all sorts of packages to the platform. You
just need to find their packages. More web searches and you discover
the rich collection of /Personal Package Archives/ (PPA) for Ubuntu.
In my case these included:
- ppa:ubuntu-toolchain-r/test: the GNU toolchain, including g++ and gcc.
- ppa:dns/gnu: a host of GNU tools, include autoconf and automake.
- ppa:boost-latest/ppa: recent (though not necessarily the latest) version of the boost libraries.
The clang and llvm packages can be downloaded from
http://llvm.org/apt/, and they list what
repositories are needed for each version of Ubuntu.
I find the whole idea of downloading a pre-built binary
from an unknown party and running it is mildly terrifying.
I would much rather use the packages built by a
well-known source, or build them from source as a second choice.
But neither
approach is realistic for these purposes.
The binaries for Ubuntu 12.04 are simply too old for my purposes.
Upgrading the hosted VMs where I would like to run the builds is not
possible (we will revisit this later as we setup containers).
And building from source will take too long and would be wasteful on the
hosted environment. I could create my own PPA, but that solves the
problem for me and nobody else. And ultimately I am running these
packages on a throwaway virtual machine.
Having found all the packages we need to simply configure our
sacrificial VM:
# Install a tool to easily add PPAs and other sources:
sudo apt-get -qq -y install python-software-properties
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
sudo add-apt-repository -y ppa:dns/gnu
sudo add-apt-repository -y ppa:boost-latest/ppa
sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise main"
sudo add-apt-repository -y "deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.6 main"
# ... add the public key used by llvm.org ...
wget -O - http://llvm.org/apt/llvm-snapshot.gpg.key|sudo apt-key add -
Once the package sources are configured, we are ready to download the list
of packages and their dependencies:
sudo apt-get -qq update
sudo apt-get -qq -y install clang-3.6
sudo apt-get -qq -y install g++-4.9
sudo apt-get -qq -y install boost1.55
sudo apt-get -qq -y install autoconf automake autoconf-archive make
sudo apt-get -qq -y install git
After 30 years of coding, I am paranoid, I want to know what really
got installed:
$ g++-4.9 --version
g++-4.9 (Ubuntu 4.9.2-0ubuntu1~12.04) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang++-3.6 --version
Ubuntu clang version 3.6.2-svn240577-1~exp1 (branches/release_36) (based on LLVM 3.6.2)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.23.1
Copyright 2012 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
$ make --version
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for x86_64-pc-linux-gnu
$ automake --version
automake (GNU automake) 1.14
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Tom Tromey <tromey@redhat.com>
and Alexandre Duret-Lutz <adl@gnu.org>.
$ autoconf --version
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by David J. MacKenzie and Akim Demaille.
$ dpkg -s autoconf-archive | grep ^Version
Version: 20130406-0gnu1~12.04
Okay, seems like we are ready to go.
Download the Source and Compile
Now that the development tools are here, download the source code for
our C++11 library and compile it:
git clone https://github.com/coryan/Skye
cd Skye
./bootstrap
mkdir clang ; cd clang
CXX=clang++-3.6 CC=clang-3.6 ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
make check
Success! Let’s try with gcc:
cd ..
mkdir gcc ; cd gcc
CXX=g++-4.9 CC=gcc-4.9 ../configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu
make check
Success again!
What is next?
Now that we have reproduceable builds for this library on Ubuntu
12.04 we can attempt a build using some of the existing hosted
continuous integration environments. Stay tuned for the next post.
19 Aug 2015
For reasons that will hopefully will apparent in future posts,
I became interested in using a hosted service to perform continuous
integration of my C++ projects. There are many offerings out there,
in fact the sheer number can become bewildering (Travis-CI, Circle-CI,
drone.io, just to start).
After trying a
couple of them it became apparent that most, if not all, of them
provide virtual machines based on Ubuntu 12.04 (aka Precise Pangolin).
A reasonable platform choice for most purposes, but an unfortunate one
for me. Most of my code uses C++11, which was poorly supported in
that version of Ubuntu. I also tend to use recent versions of the
boost libraries, and the GNU auto configuration tools.
I will probably be discussing soon whether the choice of automake is a
poor one. But the choice of libraries and compilers I will defend,
not on any technical basis, simply because my hobby
projects are supposed to be fun.
That usually involves not limiting myself to
use well-proven, and stable platforms, as I often argue
professionally.
With this in mind, the next posts will describe my failures (and
successes hopefully) trying to use hosted environments for a small
C++11 project. I will be writing them as I try different solutions,
so do not expect polished and well reasoned conclusions soon. Instead,
join me in a journey of toil, suffering, failures, successes, and
ultimately discovery (gulp, I hope).