# Day 29: Join your tables to make a sorted boxplot

Joining tables is an art and the benefits and consequences of joining are not always be clear. Here, I’ll describe a simple case that works well even for a beginner learning to use tables.

# Joining for enhanced boxplots

Here, I’ll use the generic *join *to spruce up a boxplot and accentuate a pattern in the data. First, load up some data from MATLAB’s standard library.

`load census1994`

adultdata_subset = adultdata( find(adultdata.capital_gain>0),:);

I created a subset of the data that only contains entries where individuals reported any capital gain at all. Next, graph it sideways with some data limits imposed and clipping the outliers.

`figure('color','w'); boxplot( adultdata_subset.capital_gain, adultdata_subset.race, 'datalim', [0,30000], 'extrememode', 'clip' )`

ylabel('Capital gain ($)')

camroll(-90)

There’s clearly a disparity in the capital gain amount based on race, but we can really make this result appear if we sorted the boxes.

## Ordering a boxplot

To improve the presentation of this data, let’s try to show the boxplot with the lowest medians near the top and increasing medians as you go down.

This requires five easy steps:

- Create a separate table
**newtable**using*varfun*that will contain the median value for each race. - Sort
**newtable**by the median value, and store the sorting order in a separate column. - Create a ranking of the sorting order by using the sort-twice trick.
- Innerjoin the original data table with
**newtable**. - Plot and add the sorted column of ‘race’ as x-tick labels.

Here’s the full process:

The *join *function basically distributes the values from **ranking_table **into **adultdata_subset** to matching entries of the “race” variable.

By appending the “rank” column, we are able to, in essence, create a new grouping for our data and MATLAB’s *boxplot *function leverages that to create an ordered display.

## When things get complicated

Often tables are joined which have different columns, unmatched rows, and other idiosyncrasies. Getting good at joining tables can take weeks to months and there are dozens of ways to use the additional *innerjoin *and *outerjoin *functions.