“Out of memory” in MATLAB? Easily cut memory use in half with this tip

I’ll demonstrate how simply changing the precision of your numeric data can cut your memory usage by half or even more

Photo by Possessed Photography on Unsplash

ave you ever ran into an issue with memory in MATLAB and you went looking online?

Google Search result for what to do if you run out of memory.

Turns out #7 is the simplest solution in many cases and can cut your memory use by up to 90%. This method comes with the added benefit that if you save your variables, they will also take up less hard drive space and load much more quickly.

Let’s dive right in to understand how we can reduce memory usage (the concepts apply to other programming environments, but I’ll focus on MATLAB):

Critical commands discussed in this post:

whos single double int16 format

When it comes to numerical data, you can specify the type using single or double but people almost never do! What does that mean for your data? Let’s take a look with some random examples:

mydata = rand(1000,1000); % Do not omit this semi-colon!
whos mydata

You’ll see a table that says your variable mydata has a value of Bytes equal to: 8 000 000. That’s 8 megabytes. Stored in your memory.

Next to bytes, it will say “double”. MATLAB will, by default, create a very wasteful double precision variable. Try this:

mydata = single( rand(1000,1000) ); % Do not omit this semi-colon!
whos mydata

What do we lose by going from double to single?

mydata_double = rand(1000,1000);
mydata_single = single( mydata_double );
mean(mydata_double,'all') % No semi-colons here
mean(mydata_single,'all')

You’ll see that the result for the double precision variable is “0.500049032047244”, whereas the result for the single precision variable is “0.5000491”. If the difference between 5,000,490 and 5,000,491 will have a substantial effect on your data, then do not convert to single.

Can we do even better with integers?

Yes. With caveats. If you are happy with 50%, you can stop reading here. However, if you’re willing to go the extra mile, continue. You should only do this if you test your results with single and then with the integer formats int8 or int16— and you confirm they’re comparable. Working with integers is complicated but can be worth it in some cases.

  • Caveat 1: Data should either already be integers in double precision format, or round-able to integers without much loss of information (e.g. 1.001, 2.0045, etc), or you should be prepared to transform your data back and forth.

To see how this caveat can hurt you if you aren’t careful — try this:

mydata = rand(10,10);
mydata = int8( mydata );
mydata

❌ You’ll get a bunch of 0’s and 1’s, because your data has been rounded.

How can we avoid this? You can multiply by each number by 10, 100, or 1000 and then subtract later if necessary.

% Scale up the data and scale it back down
to_integer = @(x) x*1000
to_decimal = @(x) double(x)/1000
% Convert to integer
mydata = int16( to_integer( rand(10,10) ) );
% And back to double!
mydata = to_decimal( mydata );
  • Caveat 2: Your integers must have a set minimum and maximum value that you won’t exceed (and it will be rounded down to that value, if you do).

If you are converting using int16 that means -2¹⁶ to 2¹⁶, so you could potentially store values between -65536 and +65536 — or -6.5536 and +6.5536 to have the most precision in your decimals — or between -65.636 and +65.636. See how that works?

What is the maximum value of an int8 matrix? Scroll to the bottom for the answer.

Caveat 3: You’ll need to convert to double or single if you want to do certain kinds of processing on your matrix. For example, you’ll be able to take the mean or standard deviation of your matrix, but you won’t be able to use inv to compute the matrix inverse, for example.

Let’s check out the savings from each type of data

We’ll try with the assumption that our data will be less than 256 and we can tolerate losing some decimals (you should always check this works for your data).

to_integer = @(x) ( x*100 )
to_decimal = @(x) ( double(x) /100 )
mydata = rand(1000,1000); % 8 megabyte double
mydata_to_integer = to_integer(mydata); % Still 8 megabytes (double)
mydata_to_integer = single( mydata_to_integer ); % Now 4 megabytes
% Check if the min and max are between -2^16 and 2^16
[ min(mydata_to_integer,[],'all'),max(mydata_to_integer,[],'all') ]

mydata_to_integer = int16( mydata_to_integer ); % Now 2 megabytes
% Since the max is 100, we can go down to int8mydata_to_integer = int8( mydata_to_integer ); % 1 megabyte!

Now let’s try to recover our original data and see how accurate it is by computing the mean:

format long
mean( mydata, 'all' ) % Original data
to_decimal( mean( mydata_to_integer,'all' ) ) % Converted back data

0.500068458752177 versus 0.500070540000000. Let’s compute the standard deviation as well.

std(mydata,[],'all')
std( to_decimal( double( mydata_to_integer ) ),[],'all' )

0.288641641801215 versus 0.288675051151625 — we can live with it.

Conclusion

To save memory (RAM and hard drive storage), you should use single(yourvariable). This is the easiest and most straightforward way to cut your memory use in half. If you want to cut your memory use by up to 90%, you should carefully experiment with int16 or int8 (which stores values between -256 and +256, as the answer to the question above).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jozsef Meszaros

Neuroscientist and data scientist at Columbia University. On Twitter: @NeuroJoJo