Day 26: Bin your data and more with accumarray

When it comes to making a custom vector or matrix containing exactly the data you want, accumarray offers a fast and readable solution.

That’s it. That’s the post! Basically, accumarray gives you the ability to throw elements of a second array (colored squares) into the positions specified by the first array (white squares). These arrays must be columns.

Here’s how it looks in MATLAB:

accumarray( [1;1;1;2;2;2], [10,1,35,11,3,15] )

Binning your data is easy with accumarray

Here is the most complicated thing about using accumarray: creating a vector that specifies the positions of your data. Intuitively, when you bin data, you want to place the first three numbers (x1,x2,x3) into the first position, so they are accumulated into positions [1;1;1]. The next three numbers (x4,x5,x6) are placed into the second position, so they are accumulated into positions [2;2;2]. But how do you avoid typing out [1;1;1;2;2;2]?

First, we’ll generate some random data:

rng(5);
mydata = rand(1,30);
N_to_bin = 3; % We want to put three numbers into each bin

Here’s a quick solution:

makevector = @(mydata,N_to_bin) reshape( repmat( [1:numel( mydata )/N_to_bin], N_to_bin, 1), numel(mydata), 1 )

This function takes your data and the number of items you plan to put into each bin, to generate positions for the binning. The number of elements in mydata must be a multiple of bins! In other words, numel(mydata)/N_to_bin must be an integer.

The easy part is the accumarray:

accumarray( makevector(mydata,bins), mydata );

Done! You can see exactly where your data is getting accumulated.

To really make hay using accumarray, you should venture into the anonymous functions that you can use to operate on each ‘bin’ of data before storing it!

accumarray( makevector(mydata,bins), mydata, [], @mean );

The great thing is that you can use any anonymous function or even a function in a .m file. You can even use it to do a rolling average to interpolate over NaN values. Here’s an example:

mydata = rand(30,1);
mydata( randi(30,1,5) ) = nan;
output = accumarray( makevector(mydata,bins), mydata, [], @nanmean )

[We give an empty third argument into accumarray unless we want to reshape it.]

2-D matrices are no problem for accumarray

And it works for 2-D matrices as well!

I’ll leave it up to you to generate the first input to these. Needless to say, there are tons of ways to imagine using accumarray, which should save you dozens of lines of code and hours of frustration. Enjoy!

Neuroscientist and data scientist at Columbia University. On Twitter: @NeuroJoJo