# A few MATLAB table tricks I had to learn the hard way

--

Tables are very versatile which means there isn’t a single document explaining the uses you might have for them, which is why I created this tutorial.

# In this tutorial, I will outline best practices for making a table, filtering a table, and displaying table contents.

In MATLAB, nearly every entry point for a thorough exploratory data analysis or machine learning application is a table. Sure, you can do these things by cobbling together matrices, arrays, cells, structures — but your code can get extremely long and indecipherable. Bad for replicability! Here are some tips for getting started with tables.

These are not strategies that MATLAB documentation will outline for you — they are very specific pathways through the table variable type which I have had to discover through many hours of painstaking trial and error. MATLAB tables are one of the most opaque concepts that the average MATLAB user will encounter: replete with non-intuitive conventions, sparsely documented properties or functions, and cryptic error messages. I hope this tutorial will help speed up your learning process.

# Four quick ways to make a table:

Here are four of the most common ways to generate a table: table( var1,var2,var3,etc…), array2table( matrix), readtable(‘file.csv’), and dot notation.

## 1) table( var1,var2,var3,etc… )

Let’s create some data in the form of x, y, and score.

Here’s how you would make it into a table using table():

If you do not specify a ‘VariableNames’ input to the table function, the columns will be named what you passed to the function, “x”, “y” and “score”.

## 2) array2table( matrix )

One of the most annoying things that you’ll encounter with tables (and a mistake you’ll surely make even as a TablePro™) is trying to just put a matrix into the table command above. MATLAB will allow it, but this is almost never what you want.

If you are trying to convert a matrix into a table, use array2table() instead.

This will make each column in your matrix into its own column in the table. Much better!

This is pretty self-explanatory and it’s compatible with Microsoft Excel (as long as you have Microsoft Excel also installed). I strongly recommend getting all your data in a cleaned-up CSV, which you can do by opening your Excel Spreadsheet in Google and exporting it. This is much easier than using an XLS file.

4) You can build a table using dot notation, with some caveats.

Generate an empty table:

`mytable = table()`

Now try to add data to it:

`mytable.Stuff = [1:3]'`

Cool! But once you have added your first column using dot notation, you have defined your table’s size and there’s no real going back!

Here’s what I mean: try adding a column of zeros.

`mytable.Zeros = 0`

No. This won’t work! Here is what would work:

`mytable.Zeros = repmat(0,size(mytable,1),1)`

What about a column of text?

# You will always, always want to enclose text in cells when adding it to a table.

`mytable.Country = {'Canada';'US';'Mexico'}`

Or

`mytable.TextColumn = repmat( {'To fill later'}, size(mytable,1), 1 )`

(For advanced users) You can use rowfun to make a new table from an existing table, but if your newly formed table is a column of text, you need to enclose every element in brackets:

`mytable.TextColumn = rowfun(@(text1,text2) ...   {sprintf('%s_%s',text1,text2)}, my_other_table )`

Bonus Tip: If you have a table and you want to get, set, or change its column headings, you have to do it like so:

`mytable.Properties.VariableNames = {'Col1','Col2','etc'}`

Another Bonus Tip: It’s no secret that every row in a matrix must have the same number of elements — the same is true for tables — so you cannot make a table like this:

`table( [[1:10];[1:5]] )`

This will give you a concatenation error. However, this will work and you should use it:

`table( [{1:10};{1:5}] )`

# Filtering a table based on various criteria

One of the most common operations you want to do in a table is filter it. Run the sample code below to get a table that contains some text, labels, integer data, and time series data.

I will go through and show how you would filter the table based on any of the columns.

CellNumber

Notice the code below — it doesn’t work — and you will get back a totally cryptic error talking about three or more subscripts or something like that:

`filteredtable = mytable( mytable.CellNumber==1 )`

You must always include a comma and semi-colon after your filtering condition.

`%You need to do: filteredtable = mytable( mytable.CellNumber==1, : )`

You might also want to try and filter a table based on whether the CellNumber is within a set that you’re interested in — here’s how using the ismember function:

`filteredtable = mytable( ismember(mytable.CellNumber,[1,3,4]),:)`

This is something that you would do if you wanted to isolate the effect of certain variables in your table.

Means

Similar story here. Let’s say you want rows with means greater than 0.5. You’ll use mytable( mytable.Means>0.5, : )

Now what about that NaN in the Means column? MATLAB does not make it straightforward to delete rows containing NaN. But no worries. Here’s how you’d do it:

`filteredtable = mytable( ~isnan(mytable.Means),: )`

Basically, you’re just finding the values where the Means column is not a NaN. This is not the same as replacing all NaN values in your table, which is a more complicated task and I recommend using a custom script or function.

Label

Let’s say you want to filter by text in a column. Seems easy, right? Again, the intuitive solution doesn’t work —

`filteredtable = mytable( mytable.Label=='Open',: )`

This is because you cannot filter on text columns. You can only filter on categorical variables. So enclose mytable.Label inside a categorical( ) type conversion.

`filteredtable = mytable( categorical(mytable.Label)=='Open',: )`

Or better yet, convert the whole column to categorical:

`mytable.Label = categorical( mytable.Label )filteredtable = mytable( mytable.Label=='Open',: )`

If you want to revert from categorical back to the cell string array you had before, you can use the cellstr() function:

`mytable.Label = cellstr( mytable.Label )`

You’d want to do this if you are planning to use contains — which only works with cell arrays and not categorical.

Date

Suppose you want to make a filtered table that contains only those rows of data that were collected in “Jan” (short for January). How to do it?

Here’s a slight exception to the rule above. If you use the MATLAB function contains, you can filter a cell array of character vectors. But this is slightly slower and it also works on patterns of text, so you should use it carefully.

Here’s how I would use it:

`filteredtable = mytable( contains(mytable.Date,'Jan'),: )`

Using contains, you can find all rows that match (even partially) any of a set of different items. So if you wanted everything in your table from Jan or Feb, you’d just do:

`filteredtable = mytable( contains(mytable.Date,{'Jan','Feb'}),: )`

Tseries

This is where the power of tables really comes through. You can filter your table based on data inside of a time series! Data that you cannot even see from the table. Here are a couple of ways you might choose to do that.

A combination of columns

There is no straightforward way to do this. You can try chaining together a bunch of conditional statements but this can get ugly with more than a couple columns. What I recommend instead is combining your columns into a new column using rowfun(), which will take data from multiple columns in a single row and allow you to work with them at once.

Two important things to notice here.

First, you always need to take the first element of “text” data when you wish to use it, thus lbl{1} and date{1}. Text data in your table is, after all, a cell string. (Unless you’ve turned it into a categorical — but we won’t go there right now).

Second, a more subtle point is that when you use rowfun, every variable you are placing into rowfun must match exactly with the table you are inputting. So for example,

`rowfun(@(cellnumber) sprintf('%i',cellnumber), mytable ) `

This won’t work, because mytable has multiple columns and the rowfun — as it is set up to only take ‘cellnumber’, will throw an error. You must index your table’s columns for rowfun, unless you plan to put every column into your rowfun, like so:

`rowfun(@(cellnum,means,lbl,date,tseries) somefxn(), mytable )`

But this is rarely what you’ll want, so you almost always need to specify certain columns of:

`rowfun(@(lbl,date) somefxn(lbl,date), mytable(:,{'Label','Date'} )`

Also notice that the variables going into rowfun are not verbatim the names of the columns — they could be — but they don’t have to be. However, they do need to be in order. So for example,

`rowfun(@(date,lbl) somefxn(lbl,date), mytable(:,{'Label','Date'} )`

Would give you a result that won’t be an error but it may not be what you intend, since you are sending in the Label column and calling it “date” in rowfun. Definitely takes some practice, but once you have run the code to create the CombinedLabel column, you can search your table based on a variety of strings, for example:

`mytable( contains(mytable.CombinedLabel,'Open'), : )mytable( contains(mytable.CombinedLabel,'Jan1'), : )mytable( contains(mytable.CombinedLabel,'Cell2'), : )`

There are also uses for this if you want to combine two tables using a common key, an advanced (and valuable) use for tables.

Bonus Tip: Find groups in your data using findgroups() function

While not specifically a table-related function, findgroups works really well with tables, especially when you want to do exploratory data analysis.

1. Imagesc (heatmap for time series)

There are several massive advantages to having your data in a table. Here’s one of line of code that makes me the most grateful I learned tables:

`figure(); imagesc( cell2mat( mytable.Tseries ) )`

Why? Because it allows you to view subsets of your data, filtered or transformed in any way that you choose.

Here’s one way I’d use this:

No towering for loops, no arbitrary variables clogging up the workspace, and all in a couple of lines — thanks to tables.

2. Plot() (lines)

If you’re a neuroscientist, you’ve surely admired those cool Joy Division looking plots. With tables, it’s easy to make your own using the cell number and time series columns.

3. Heatmap() (heatmap for data other than time series)

This is a pretty finnicky function but it has some concrete uses. Use the command below to get a heatmap that shows the average value of table elements that correspond to those x,y values.

`figure('color','w'); heatmap( mytable, 'Label', 'CellNumber', 'ColorVariable', 'Means' )`

Another use of heatmap is to place values at specific x,y locations.

Suppose you want to examine some spatial dependency of a phenomenon (for example, the average activity of neurons at a specific location). Here’s some code and what your figure might look like.

Let’s generate some artificial data for that.

Now try to get this figure using the tip from above:

Here’s how you would do it:

`heatmap( mytable2, 'x','y','colorvariable','value' )`

Bonus tip: There is no way to remove the text labels, unfortunately. Here’s a workaround that gets you a close approximation of what you see above:

To read about more advanced uses of tables, check out my post on varfun, another one about rowfun, as well as a post on how to join multiple tables. Stay tuned for more posts from me about how to fully harness the potential of tables!