Day 21: varfun is the most fun way to use tables in MATLAB

Jozsef Meszaros
3 min readMay 23, 2020

The best way to tell a story about your data is to group it into relevant chunks. With these chunks, you can test hypotheses and look for patterns. Having an approach that allows you to dynamically change input variables and groupings is essential to doing this. Here I’ll show you how to develop that skill using varfun.

Do everything you want with varfun

This fun function is more powerful than any of the others so far because in addition to performing batch operations, it also slices your table into more manageable mini-tables, then groups the data in those tables. You could do this all using scripts, but I can say from experience: each use of varfun saves you at least ten lines of code, five dummy variables, and a handful of boolean operators. It’s much easier to debug and see what’s going on.

We’ll start with some preloaded MATLAB data.

load census1994

Let’s check out the variables in the table adultdata.

adultdata.Properties.VariableNames % (this is case-sensitive)

You should see a list like this:

Let’s try a few combinations.

“What is the average age of a census participant of each race?”

varfun( @(x) mean(x), adultdata, 'GroupingVariables', {'race'}, 'InputVariables', {'age'} )

“What is the average capital gain (income from investments) of a census participant of each race?”

varfun( @(x) mean(x), adultdata, 'GroupingVariables', {'race'}, 'InputVariables', {'capital_gain'}, )

“What is the average capital gain (income from investments) of a census participant of different race-occupation groups?” — This could help us answer whether people who identified as one race get paid more for the same work as other people who identified as another race.

varfun( @(x) mean(x), adultdata, 'GroupingVariables', {'race','occupation'}, 'InputVariables', {'capital_gain'} )

The output is very large because there are a lot of different occupations.

How can we leverage this information?

Use varfun to output a table

As we saw with rowfun, you can specify the type of output you will get when you execute varfun. Here’s how that would look:

myoutput = varfun( @(x) mean(x), adultdata, 'GroupingVariables', {'race','occupation'}, 'InputVariables', {'capital_gain'}, 'OutputFormat', 'table' )

Now you can query your summary table, myoutput:

myoutput( myoutput.occupation=='Adm-clerical', : )% Remember to use the colon because you want to retrieve all of the columns from your table %

You can additionally sort the rows of your table by a specific column of interest. We can use the column ‘Fun_capital_gain’, which was the average we had calculated using varfun!

mysummary = myoutput( myoutput.occupation=='Adm-clerical', : )
sortrows( mysummary, 'Fun_capital_gain' )

Hope you found that informative.

--

--