Day 29: Join your tables to make a sorted boxplot

Joining tables is an art and the benefits and consequences of joining are not always be clear. Here, I’ll describe a simple case that works well even for a beginner learning to use tables.

Joining for enhanced boxplots

Here, I’ll use the generic join to spruce up a boxplot and accentuate a pattern in the data. First, load up some data from MATLAB’s standard library.

load census1994

adultdata_subset = adultdata( find(adultdata.capital_gain>0),:);

I created a subset of the data that only contains entries where individuals reported any capital gain at all. Next, graph it sideways with some data limits imposed and clipping the outliers.

figure('color','w'); boxplot( adultdata_subset.capital_gain, adultdata_subset.race, 'datalim', [0,30000], 'extrememode', 'clip' )
ylabel('Capital gain ($)')

There’s clearly a disparity in the capital gain amount based on race, but we can really make this result appear if we sorted the boxes.

To improve the presentation of this data, let’s try to show the boxplot with the lowest medians near the top and increasing medians as you go down.

This requires five easy steps:

  1. Create a separate table newtable using varfun that will contain the median value for each race.
  2. Sort newtable by the median value, and store the sorting order in a separate column.
  3. Create a ranking of the sorting order by using the sort-twice trick.
  4. Innerjoin the original data table with newtable.
  5. Plot and add the sorted column of ‘race’ as x-tick labels.

Here’s the full process:

The join function basically distributes the values from ranking_table into adultdata_subset to matching entries of the “race” variable.

By appending the “rank” column, we are able to, in essence, create a new grouping for our data and MATLAB’s boxplot function leverages that to create an ordered display.

Often tables are joined which have different columns, unmatched rows, and other idiosyncrasies. Getting good at joining tables can take weeks to months and there are dozens of ways to use the additional innerjoin and outerjoin functions.



Neuroscientist/cell biologist/data scientist at Columbia University

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jozsef Meszaros

Neuroscientist/cell biologist/data scientist at Columbia University