“Out of memory” in MATLAB? Easily cut memory use in half with this tip
I’ll demonstrate how simply changing the precision of your numeric data can cut your memory usage by half or even more
Have you ever ran into an issue with memory in MATLAB and you went looking online?
Turns out #7 is the simplest solution in many cases and can cut your memory use by up to 90%. This method comes with the added benefit that if you save your variables, they will also take up less hard drive space and load much more quickly.
Let’s dive right in to understand how we can reduce memory usage (the concepts apply to other programming environments, but I’ll focus on MATLAB):
Critical commands discussed in this post:
whos
single
double
int16
format
When it comes to numerical data, you can specify the type using single
or double
but people almost never do! What does that mean for your data? Let’s take a look with some random examples:
mydata = rand(1000,1000); % Do not omit this semi-colon!
whos mydata
You’ll see a table that says your variable mydata has a value of Bytes equal to: 8 000 000. That’s 8 megabytes. Stored in your memory.
Next to bytes, it will say “double”. MATLAB will, by default, create a very wasteful double precision variable. Try this:
mydata = single( rand(1000,1000) ); % Do not omit this semi-colon!
whos mydata
What do we lose by going from double to single?
mydata_double = rand(1000,1000);
mydata_single = single( mydata_double );
mean(mydata_double,'all') % No semi-colons here
mean(mydata_single,'all')
You’ll see that the result for the double precision variable is “0.500049032047244”, whereas the result for the single precision variable is “0.5000491”. If the difference between 5,000,490 and 5,000,491 will have a substantial effect on your data, then do not convert to single.
Can we do even better with integers?
Yes. With caveats. If you are happy with 50%, you can stop reading here. However, if you’re willing to go the extra mile, continue. You should only do this if you test your results with single and then with the integer formats int8 or int16— and you confirm they’re comparable. Working with integers is complicated but can be worth it in some cases.
- Caveat 1: Data should either already be integers in double precision format, or round-able to integers without much loss of information (e.g. 1.001, 2.0045, etc), or you should be prepared to transform your data back and forth.
To see how this caveat can hurt you if you aren’t careful — try this:
mydata = rand(10,10);
mydata = int8( mydata );
mydata
❌ You’ll get a bunch of 0’s and 1’s, because your data has been rounded.
How can we avoid this? You can multiply by each number by 10, 100, or 1000 and then subtract later if necessary.
% Scale up the data and scale it back down
to_integer = @(x) x*1000
to_decimal = @(x) double(x)/1000% Convert to integer
mydata = int16( to_integer( rand(10,10) ) );% And back to double!
mydata = to_decimal( mydata );
- Caveat 2: Your integers must have a set minimum and maximum value that you won’t exceed (and it will be rounded down to that value, if you do).
If you are converting using int16
that means -2¹⁶ to 2¹⁶, so you could potentially store values between -65536 and +65536 — or -6.5536 and +6.5536 to have the most precision in your decimals — or between -65.636 and +65.636. See how that works?
What is the maximum value of an
int8
matrix? Scroll to the bottom for the answer.
Caveat 3: You’ll need to convert to double or single if you want to do certain kinds of processing on your matrix. For example, you’ll be able to take the mean or standard deviation of your matrix, but you won’t be able to use inv
to compute the matrix inverse, for example.
Let’s check out the savings from each type of data
We’ll try with the assumption that our data will be less than 256 and we can tolerate losing some decimals (you should always check this works for your data).
to_integer = @(x) ( x*100 )
to_decimal = @(x) ( double(x) /100 )mydata = rand(1000,1000); % 8 megabyte double
mydata_to_integer = to_integer(mydata); % Still 8 megabytes (double)
mydata_to_integer = single( mydata_to_integer ); % Now 4 megabytes% Check if the min and max are between -2^16 and 2^16
[ min(mydata_to_integer,[],'all'),max(mydata_to_integer,[],'all') ]
mydata_to_integer = int16( mydata_to_integer ); % Now 2 megabytes% Since the max is 100, we can go down to int8mydata_to_integer = int8( mydata_to_integer ); % 1 megabyte!
Now let’s try to recover our original data and see how accurate it is by computing the mean:
format long
mean( mydata, 'all' ) % Original data
to_decimal( mean( mydata_to_integer,'all' ) ) % Converted back data
0.500068458752177 versus 0.500070540000000. Let’s compute the standard deviation as well.
std(mydata,[],'all')
std( to_decimal( double( mydata_to_integer ) ),[],'all' )
0.288641641801215 versus 0.288675051151625 — we can live with it.
Conclusion
To save memory (RAM and hard drive storage), you should use single(yourvariable)
. This is the easiest and most straightforward way to cut your memory use in half. If you want to cut your memory use by up to 90%, you should carefully experiment with int16
or int8
(which stores values between -256 and +256, as the answer to the question above).