# Noel Welsh

Collected thoughts on startups, software, and machine learning.

# Why I Don't Like Akka Actors 04 Mar 2013

We recently rewrote Myna’s back-end service. The architecture changed dramatically, and is now both faster and easier to extend. One of the significant architectural changes was removing all Akka actors. After heavily using them in the first version of the back-end, I have come to prefer other methods of managing concurrency. Since Akka’s actors are so prominent within the Scala community I thought it might be of interest to describe why we made this change.

## Actors are Coarse Abstractions

Actors are presented as a universal primitive for concurrency. That is, in the orthodox actor world view they are all you need for any concurrent program. There is an appealing conceptual simplicity to this approach, and the idea of finding an uber-abstraction has been successful in other contexts. For example, Scala’s unification of Java’s primitive and object types, or Python representing all values as mutable dictionaries as generally considered positive points by their respective language communities.

Problems arise when in the quest for simplicity important distinctions are hidden. In programming languages this usually comes up when discussing performance. The distinction between primitive and object types really matters if you care about speed. Scala gets away with this for the most part through clever compilation in both the Scala compiler and Hotspot, but writing high performance code can still be something of a dark art1.

Concurrent programming involves at least three distinct concerns: concurrency, mutual exclusion, and synchronisation. With actors the first two always come combined, and you’re left to hand-roll your own synchronisation through custom message protocols. It’s an unhappy state of affairs if you want to do something as simple as separating control of concurrency and mutual exclusion. This is not an esoteric concern – it is exactly what a ConcurrentHashMap provides, for example. If you’re really seeking performance then you probably want to use lock-free algorithms. Again, these don’t fit into the actor model. Basically the actor model is forcing us to give up a lot of tools so we can fit within its rigid conception of a concurrent program.

## Actors do not Compose

Composition is a desireable property of abstractions. Functions compose. If I create some functions (say, plus and minus) you can create another function (say multiply) that uses my functions. In particular I don’t have to anticipate your usage ahead of time to allow you to use my functions.

Actors don’t compose. By default actors hard-code the receiver of any messages they send. If I create an actor A that sends a message to actor B, and you want to change the receiver to actor C you are basically out of luck. If you’re lucky I anticipated this in advance and made it configurable, but more likely you have to change the source. Lack of composition makes it difficult to create big systems out of small ones.

## Akka’s Actors are not Usefully Typed

Akka’s actors give you static typing within a single actor, but the communication between actors – the complex bits that are most likely to go wrong – are not typed in any useful manner. I could live with the above two issues, but this one really gets me.

The type system is the reason we use Scala. Types allow use to guarantee certain properties of our programs. If you’ve never used a modern statically typed programming language you might be surprised just how far you can push this. We try to push it reasonably far, so we can guarantee that, for example, Myna’s API generates useful error messages (this is important because the API is the UI for many users). In return for this awesome power we put up with a bit of extra complexity compared to a dynamically typed language.

Akka supports a number of features, such as become and transparent distribution, that make statically typing messages difficult. We still have some inconvenience over dynamically typed languages but we lose the benefits of static typing. This is the wrong tradeoff for me.

Other languages, like Concurrent ML and Haskell, have demonstrated it’s possible to have great concurrent and distributed programming abstractions in a statically typed language. I expect the same in Scala.

## So What Does Myna Use?

So given the above, what does Myna use? We use Akka’s Futures, which I think are fantastic. We use plain-old locks for some simple cases where we want mutual exclusion, and we use a few of the utilities in java.util.concurrent. It’s quite simple and it’s quite fast: 2.5ms average response time, and well over 650 requests/s on a single core machine.

1. Python basically does nothing about this issue, which is why it’s so slow. In fact the decision to make everything a mutable dictionary is one big reason it’s so hard to optimise Python. PyPy, a JIT compiler for Python, has consumed 10 years and several million currency units and is still not widely deployed.

# Pre-register for the Streaming Algorithms Course 04 Mar 2013

I have given a number of talks on streaming algorithms and had requests for more depth on the material. I would like to expand my talks into a course, but in a true data-driven way I first want to gauge interest. Hence I’m asking interested people to pre-register now (no monetary commitment required) so I know if it’s worthwhile going ahead.

Go ahead and pre-register now or read on for more on the course.

## Why?

I believe streaming algorithms are a wildly underappreciated technology for data analysis.

The defining characteristic of a streaming algorithm is that it only processes a data point once. You can run them on data stored on disk, but more commonly you just fire data at the algorithm as it arrives.

Streaming algorithms are real-time by definition. Real-time is a very nice property to have. Obviously, it lets you know right away what’s going on in your system, which can be important in certain situations. An underappreciated benefit is you don’t have any task switching. Just like long compile times lead to distracted developers, if queries take a long time to run you’ll end up reading your email for half an hour before you realise it.

Streaming algorithms also tend to be ridiculously scalable. For example, we’ll look at algorithms that use only 4K to count the number of distinct items in a set with 10^9 elements. Scalability is great even if you don’t have a tidal wave of data, because when things fit into a single machine they are so much easier to develop and maintain.

Finally, streaming algorithms are also easy to implement (and you can often find implementations online). They are the kind of thing you can knock up in an afternoon. Then, because of their awesome scalability, just wrap a HTTP front-end around it and you have yourself an analytics machine. You’ll spend two days instead two months building a system, which is really awesome.

## Course Content

The course will run in two parts. The first part will cover methods for processing streams of numbers. These are mainly used in system monitoring scenarios. We’ll start by looking at various ways of calculating moving averages, useful for calculating hits per second over a time window and so on. Then we’ll move onto quantiles, which you’ll typically want to use for calculating 99% response time etc. Finally we will look a methods for constructing histograms from streaming data, useful if you want to get a closer look at your response time distribution, for example.

The second part of the course will focus on methods for sets and multi-sets. These are better suited to answering business questions. We’ll start with ways to find the most frequent items in a stream. You can use this to find who your most active users are, or to, say, filter out attackers by IP address. We will then look at methods to calculate the number of distinct items in a set. This is super useful for general analytics. Say you’re running a website. You can answer a lot of questions if you can count the number of people who visit different parts of your site and count the number of people who arrive from different sources. This is exactly what these methods do. Even better, they also support set algebra, so you can find how many users are in the intersection, union, and set difference of the various sets you’re counting. As these methods are super scalable you count thousands of different sets on a single box, which can get you a very powerful and flexible system.

The course will be very practical. Although we’ll be focused mostly on algorithms, and thus be programming language agnostic, we’ll talk implementation details and I’ll give example code (probably in Scala, Python, or Javascript). You can use whatever language you want. So long as you can perform bit manipulation you’ll be alright.

## Course Options

I’d like to offer two options for the course: in person, and online. The former will run in London over a day and cost less than £500 per person. The online course will be self paced and cost less than £100.

Once again, if you’re interested tell me by signing up and I’ll let you know if and when the course goes ahead.

# Fitness for Busy People I 06 Jan 2013

This post is the first in a series about attaining or maintaining a good level of fitness without using a lot of time and money. It comes from my own experience. After 20 years of general gym rattery, my regular workouts were brought to a halt by the arrival of two children and the decision to found a company. Suddenly, finding the time to get to the gym was difficult, and it was hard to justify the expense given my increasing outgoings and meagre income. I wanted to maintain (or even improve) my fitness, and so far I’ve been able to. If you’re in a similar situation, where you don’t have the time to get to a gym and can’t equip your home with a lot of workout equipment, I my methods will work for you.

Nothing here is new. All I’ve done in collected ideas from a variety of sources, and mixed them together into something that works for me. There is no magic to working out, and there never will be.

I don’t want to make this series too large by covering material already freely available elsewhere, so where possible I link to other sites. At the end of the series I’ll give some resources for more in-depth reading.

## Goals

My primary goal is to teach you how to workout effectively on limited time and money. I will mostly focus on movements that use your own body weight, perhaps augmented with a few inexpensive accessories. If you can access free weights they will enhance your workouts but I’m not going to discuss them much.

In my definition of fitness there are three main attributes: strength, mobility, and endurance.

It is common to equate fitness solely with endurance: if you can run 10 miles to my 5, you’re fitter than me. I don’t follow this belief. In fact I believe strength is the primary attribute that should be developed.

Strong people look better naked. Muscle also makes weight maintenance easier as muscle uses more energy than fat. Vanity aside, strength training has numerous benefits. Many of the effects of aging, such as muscle atrophy and decreasing bone density, can be countered by strength training and it’s this that will keep you out of a nursing home. As strength is associated with longer life you can look forward to more great years as you get stronger.

Attaining the endurance required for day-to-day life and most team sports is relatively easy, and this level of endurance also brings all the health benefits associated with a “healthy heart”. Participating in pure endurance sports like marathons or long-distance cycling requires time consuming training, harms other aspects of fitness, and can be a costly endeavour. For these reasons I’m not covering them here.

Mobility, the third attribute, is the ability to move your joints through a full range of motion. Developing mobility is essential for avoiding injury – if you can’t put your body in the correct positions you will place load where it shouldn’t be placed and eventually develop some kind of problem.

## Overview

Here’s the essentials of what I suggest:

• Bodyweight movements will be the mainstay of your workouts. These require no special equipment except possibly an inexpensive pull-up bar or gymnastics rings. They are time efficient as they work large amounts of muscle mass at once. They can be scaled down to suit the most basic beginner or up to tax the Olympic contender.
• At a minimum, you can fit most of your training into short bursts done during the day, supplemented with two to four twenty minute sessions for more intense activity.
• Eat a sensible diet.
• Play with movement. Enjoy using your body. If you have kids, throw them into the air (just catch them on the way down, ok?)
• Enjoy the journey. It will last a lifetime.

## Movements

Movement is all that exercise really is. Here we’ll look at the basic movement patterns, some specific movements, and some useful equipment.

### Basic Movements

I divide movements into three groups of two pairs1:

• Vertical upper body pull and push.
• Horizontal upper body pull and push.
• Lower body pull and push.

A push primarily uses the muscles on the front of your body. A pull primarily uses the muscles on the back of your body.

A good movement is one that uses a lot of muscle mass and move the muscles through a full range-of-motion. Here are some of the movements I recommend:

• The pull-up and the handstand push-up are the classic bodyweight vertical pull and push movements respectively.
• You probably know the push-up, which is a fundamental horizontal push. The planche is another horizontal push that will give you a greater challenge. The front lever is the corresponding pull, with the body row providing a basic entry point if the front lever progressions are too hard.
• The squat and deadlift are the paradigmatic lower body push and pull respectively. Bodyweight movements tend to be less effective for lower body strength development then their barbell cousins, but there are still some great options. One-legged squats are a good push exercise, as is jumping. Sprinting is an excellent lower body pulling exercise, as is the shrimp squat, and the bodyweight hamstring curl (requires a bit of equipment).

It’s also worthwhile developing a good handstand and l-seat. Past a certain level they won’t develop strength but they are fun skills and will lead into more advanced movements.

### Scaling

To progress we need ways to increase (or decrease!) the difficulty of movements. Some of the links above show progressions. Where progressions aren’t given there are some general principles you can use.

Increasing leverage is one method for making a bodyweight movement harder. For example, the increase in difficulty from a tucked to a full front lever is accomplished by the increasing the leverage applied by the body against the active muscles. You can apply the same trick to many movements. For example you make push-ups easier by raising your hands relative to your body (e.g. put them on stairs) and make them harder by raising your feet relative to your hands.

Removing a limb doesn’t require drastic surgery. Simply using one hand or leg where you’d use two will make an exercise much harder. Consider, for example, the difference between a one-handed and a normal pullup! You can find intermediate points by, say, raising or lowering one hand relative to the other. For example, if you do push-ups with one on a box you will make the exercise harder. The same applies to pull-ups, where you can use a towel or belt to bring one hand lower than the other, providing an intermediate step between a normal pull-up and a one handed pull-up.

You can also combine these ideas. For example, you can progress to full one arm push-ups by starting with one handed push-ups on three or four steps and working your way to the ground as your strength increases.

### Equipment

There are a few inexpensive pieces of equipment that will greatly assist your workouts. If you prefer, you can make most equipment yourself; see Ross Training for instructions.

Gymnastics rings are one of the most versatile and challenging pieces of equipment for training your upper body. Push-ups, pull-ups, dips and the like can all be done of the rings for added difficulty. Moves like 360 pulls can only be done on the rings. A partial list of moves can be found here.

The only disadvantage of rings is that they require a lot of space to hang. Most people won’t be able to hang them indoors, but a sturdy tree branch or piece of play equipment will do the job.

A pull-up bar is the next best thing to rings, and has the advantage you can use it indoors during bad weather. The best type are those that clip under a doorway like the Iron Gym model.

Dumbbells are easy to store, and you can get them second-hand quite cheaply. In addition to all the classic movements you’ll find on a site like ExRx.net you can put the weights into a strong backpack to increase the difficulty of push-ups, pull-ups, and so on.

Chldren can hardly be considered inexpensive, but if you got ‘em you might as well use ‘em. Doing push-ups with children on your back is entertaining for both you and them. Children also get you a free pass into any playground, where you can use the equipment to the appreciation of all present.

## Tune In Next Time!

That’s it for now. Next time we’ll look at combining movements into effective workouts.

1. This classification doesn’t work for all movements, but it’s good enough.

I believe LinkedIn’s business model is ready for disruption. The value it provides is limited to a small set of users, and is created by artificial restrictions on use of user contributed data. There are viable alternatives such as Vizify and G+ that offer more value to the majority of users.

For most of my career I have made little use of LinkedIn. Like most people, I created a profile because I felt I ought to. Adding contacts or joining forums hasn’t led to any benefit for me. Networking is virtually non-existent on LinkedIn, probably because, by reducing people to their CVs, LinkedIn has created such a formal environment that no-one wants to engage in socialising.

In the last few months I’ve been involved in business development and sales at Myna, which has given me a different appreciation of LinkedIn. A small number of people use LinkedIn as their professional profile, in the way that Facebook is a personal profile, and so prefer to communicate over LinkedIn. More useful for me, if I have identified a company as a potential customer I can use LinkedIn to quickly find the most appropriate person to approach within that company.

LinkedIn won’t allow me to contact people outside of my network without paying. LinkedIn will graciously allow me send an InMail for a small fee, or alternatively I can buy a Premium account which gives me InMail and a few other things1. In this way LinkedIn is a kind of freemium offering, where monetisation is based on restricting access to information that users have freely provided.

Now we’ve overviewed LinkedIn’s model, here are my reasons for believing that it is a prime target for disruption.

Firstly, as explained above I don’t believe it offers much value to the majority of its users. It would be easy for an alternative service to draw them away, and there are some great alternatives. One I particularly like is Vizify2, which gives you a professional profile which also looks great and allows you to express some humanity.

If you want to network, G+ is the professional social network. It don’t think this was Google’s intention, but G+’s communities, introduced a few weeks ago, are booming. For example, the Machine learning community is getting about a post a day with only a few thousand members.

There isn’t currently a great alternative for doing the by-company searches that LinkedIn allows, but remember that LinkedIn relies on its users to generate that data for them. If people stop using LinkedIn then the value of the data will diminish, and it would be relatively simple for Vizify, G+, or some other company to provide this service. In the short term, once you have a name from LinkedIn it is easy to find an email address or Twitter account through the usual means.

In summary, LinkedIn’s freemium model is under threat from services that offer more value to the users who provide the data that LinkedIn monetises. Without the continued buy-in from these users I think LinkedIn is dead. I see LinkedIn making some attempts to turn LinkedIn into a more social network, but so far this does not seem successful to me.

1. Recruitment is where LinkedIn really makes it’s dough. If you’re a recruiter it is de riguer to sign up for some kind of premium account. You pay several thousand per person per year for the top end products.

2. Vizify is a customer of Myna, but I think their offering is great regardless.

# Streaming Algorithms, Scala eXchange Edition. Or, Stop Analysing and Start Acting 22 Nov 2012

On Monday I delivered a talk on streaming algorithms at Scala eXchange 2012. Skillsmatter are super-fast at getting video online, so you can view it already! My slides are also online.

Compared to previous talks I spent much more time on motivation. Analytics are only part of the build-measure-learn loop (or, if you prefer, the scientific method) and I wanted to put them in context, and, to be honest, motivate people to look beyond analytics. The focus of the big data community still seems to be on collecting data and performing elementary analyses on it. This misses the point that if your data doesn’t lead to action there is no value in collecting it. Furthermore, optimising your speed through the build-measure-learn loop can be a huge win. The best way to speed up a process is to automate it and, as we’re showing with Myna, this is entirely viable. I truly believe that if data scientists are to realise their true value they need reposition themselves as a stage in the feedback loop, to a critical component overseeing and optimising the entire loop.