Day 24: Swap out MATLAB functions with your own NaN-friendly ones

Jozsef Meszaros
3 min readMay 25, 2020

--

One of the problems with some of MATLAB’s built-in statistics functions is that they break when any of the input values are NaN. In this post I’ll show you a great opportunity to build on some of MATLAB’s extremely useful NaN-friendly versions of sum, mean, and std. This skill will allow you to replace NaN-unfriendly functions like corrcoef and zscore with your own custom functions that work in a variety of situations.

Make your own corrcoef robust to nan-values

For me, the most typical use of the correlation coefficient function is to find the correlation between two data sets: x and y. Here are my two gripes about MATLAB’s built-in corrcoef function:

  1. I want robustness to NaN values. If you try to calculate the correlation coefficient and there’s a nan value in either your x or y, forget about it
  2. I want one value. The correlation coefficient. I don’t want a matrix that tells me that x is perfectly correlated with itself, or that y is correlated with itself.

Let’s set up a simple data set and show off the problem.

rng(5);
x = [1:10]+rand(1,10);
y = [1:10]+rand(1,10);
x(end) = nan;
corrcoef( x,y )

Yuck. A bunch of NaN’s.

Here’s the solution using a one-line anonymous function that takes both your x and y as inputs. Enter it once into your command-line or include it in your script.

mycorrcoef = @(x,y) (1./(numel(x)-1)) .* nansum( (x-nanmean(x))/nanstd(x) .* (y-nanmean(y))/nanstd(y) );

Might look unwieldy, but it gets the job done. Note that this is an approximation to the “real correlation coefficient” and if you have many NaN values, I would recommend interpolating or figuring out what’s going on — rather than presenting the correlation coefficient in a statistical table.

Now enter mycorrcoef( x, y ) and you should return a value of 0.7949, which is what we would expect since we made our data to be roughly correlated.

Do the same thing for zscore

The built-in MATLAB function zscore suffers the same fate. Any NaN values will completely destroy your result. Here’s how to get around it:

myzscore = @(x) (x-nanmean(x))./nanstd(x);

Try it with the x from the data above. You’ll see that z-scores are calculated for all of the values except the last one, which went in as NaN and came out as NaN! No disruption to your work flow.

One caveat: The anonymous functions shown here are intended to assist you in quickly getting a sense of correlations and outliers. If you are going to perform more advanced statistical analyses or you have a lot of NaN values, I highly recommend that you figure out where your NaN’s are coming from and think of a strategy to safely exclude those observations.

Hope this will inspire you to use more NaN-friendly operations in your everyday work, and also quickly improve some of the other built-in MATLAB functions.

This story is a part of my series titled ‘30 Days of MATLAB tips I wish I had known doing graduate school in neuroscience’. Follow me here Neurojojo or on Twitter to stay updated with more tips.

--

--

Jozsef Meszaros
Jozsef Meszaros

No responses yet