Day 20: Rank your data and become a unique expert in MATLAB

When working with sets of values, we may often want to sort data, find unique values and rank data regardless of its absolute value. MATLAB has several paths to do this, but the documentation isn’t always the easiest to understand. Here I’ll diagram the ways that the unique function can be used effectively. I’ll be using colored boxes to represent numbers since it’s more fun and will give us some more practice with arrayfun.

Let’s start with a sequence of random numbers.

We can use what we’ve learned in the days so far to generate a useful figure:

The first line above creates an axis with the parameters that we want. The second line is an arrayfun operation that plots a square at each x-axis location defined by ‘i’ and zero on the y-axis. The last argument in the arrayfun is [1:20] so the plot will place a ‘ksquare’ (black-edged square) at the locations (1,0), (2,0), (3,0), etc. all the way to (20,0). The color of each square will be set by colors( numbers(i), : ). Since numbers(1) is equal to 3, we know the face color of the first square will be colors(3,:) — the third color in our ‘jet’ color palette.

You should see something that looks like this:

Let’s go one step further and add the numbers to the boxes. Piece of cake with arrayfun. I’ll leave it up to you to code that part. It shouldn’t be too hard for you if you are comfortable using arrayfun, text, and sprintf.

I’ll use these boxes to show how unique and sort can work for you.

Use unique to see unique values and where they are

First of all, you may already be familiar with unique:

Here’s what happens, in graphic form:

You shouldn’t be surprised that we’re missing 4. If you used the rng(5) to get the same random numbers as I have, you should also be missing 4.

From the output of the unique function, notice these two things:

  1. The numbers have become sorted: [1,2,3,5,6,7,8,9,10]
  2. We don’t know where the unique values are located in our original array.

We’ll do the following thing to get around all of these issues:

  1. Keep the numbers in their original order. Use unique( numbers, ‘stable’ )
  2. Get the locations of the unique values by including a second output argument

Here’s a graphic representation of the result:

The numbers in the variable first_loc tell you the first index in the variable numbers where your unique value was found.

Now here is why you might want to do this

Sometimes you’ll have data that takes a wide range of values, which means that when you try to see local differences, it’s hard without knowing exactly where to look and then zooming in. Ranking your data and displaying the ranks gets you around this issue.

Unique is the way to get there. Create this data which contains fifteen numbers:

First five entries: Integer from 1 to 5
Second five entries: Decimal from 200 to 201
Third five entries: Integer from 1 to 5

The point of this fictional data is that it contains small differences that will be hard to see when you plot the data. Ranking it will allow you to overcome this issue.

Make sure you do not add the ‘stable’ flag as we had above.

Now we’ll plot the original data and the ranked data. Check out the difference!

It’s the same data, but by ranking it you can see details you wouldn’t have seen otherwise. I hope this helps you. And if you are moved by this, maybe it will get you thinking about using non-parametric statistics!

Neuroscientist and data scientist at Columbia University. On Twitter: @NeuroJoJo