How do I optimise this numpy array operation?

How do I optimise this numpy array operation? - python

This numpy operation gives a memory error. (Here X and Y are 2D arrays with shape (5000, 3072) and (500, 3072))
dists[:,:] = np.sqrt(np.sum(np.square(np.subtract(X, Y[:,np.newaxis])), axis=2))
I think the numpy array broadcasting is taking up a lot of memory. Is there any way to optimise the memory usage for these array operations?
Edit:
(This is a problem in Assignment 1 of cs231n). I found another solution that gives the same thing without a memory error:
dists[:,:] = np.sqrt((Y**2).sum(axis=1)[:, np.newaxis] + (X**2).sum(axis=1) - 2 * Y.dot(X.T))
Can you help me understand why my solution is inefficient memory wise?

Related

How to efficiently broadcast multiplication between arrays of shapes (n,m,k) and (n,m)

Let a be a numpy array of shape (n,m,k) and a_msk is an array of shape (n,m) containing that masks elements from a through multiplication.
Up to my knowledge, I had to create a new axis in a_msk in order to make it compatible with a for multiplication.
b = a * a_msk[:,:,np.newaxis]
Unfortunately, my Google Colab runtime is running out of memory at this very operation given the large size of the arrays.
My question is whether I can achieve the same thing without creating that new axis for the mask array.

As #hpaulj commented adding an axis to make the two arrays "compatible" for broadcasting is the most straightforward way to do your multiplication.
Alternatively, you can move the last axis of your array a to the front which would also make the two arrays compatible (I wonder though whether this would solve your memory issue):
a = np.moveaxis(a, -1, 0)
Then you can simply multiply:
b = a * a_msk
However, to get your result you have to move the axis back:
b = np.moveaxis(b, 0, -1)
Example: both solutions return the same answer:
import numpy as np
a = np.arange(24).reshape(2, 3, 4)
a_msk = np.arange(6).reshape(2, 3)
print(f'newaxis solution:\n {a * a_msk[..., np.newaxis]}')
print()
print(f'moveaxis solution:\n {np.moveaxis((np.moveaxis(a, -1, 0) * a_msk), 0, -1)}')

Are squeezed arrays (in Python) smaller than arrays with single-dimensional entries? ([x,y] vs [x,y,1]?)

When trying to get code to work using different frameworks and sources, I've stumbled across this multiple times:
Python Numpy arrays A and B that contentwise are the same, but one has A.shape == [x, y] and the other B.shape == [x, y, 1]. From dealing with it several times, I know that I can solve issues with this with squeeze:
A == numpy.squeeze(B)
But currently I have to redesign a lot of code that errors due to "inconsistent" arrays in that regard (some images with len(img.shape) = 2 [1024, 1024] and some images with len(img.shape) = 3 [1024, 1024, 1].
Now I have to pick one and I'm leaning towards [1024, 1024, 1], but since this code should be memory-efficient I'm wondering:
Do arrays with single-dimensional entries consume more memory than squeezed arrays? Or is there any other reason why I should avoid single-dimensional entries?

Do arrays with single-dimensional entries consume more memory than squeezed arrays?
They take the same amount of memory.
NumPy arrays have a property called nbytes that represents the number of bytes used by the array itself. Using this you can easily verify this:
>>> import numpy as np
>>> arr = np.ones((1024, 1024, 1))
>>> arr.nbytes
8388608
>>> arr.squeeze().nbytes
8388608
The reason it takes the same amount of memory is actually easy: NumPy arrays aren't real multi-dimensional arrays. They are one-dimensional arrays that use strides to "emulate" multidimensionality. These strides give the memory offset for a particular dimension:
>>> arr.strides
(8192, 8, 8)
>>> arr.squeeze().strides
(8192, 8)
So by removing the length-one dimension you effectively removed a zero-byte offset.
Or is there any other reason why I should avoid single-dimensional entries?
It depends. In some cases you actually create these yourself to utilize broadcasting with NumPy arrays. However in some cases they are annoying.
Note that there is in fact a small memory difference because NumPy has to store one stride and shape integer for each dimension:
>>> import sys
>>> sys.getsizeof(arr)
8388736
>>> sys.getsizeof(arr.squeeze().copy()) # remove one dimension
8388720
>>> sys.getsizeof(arr[:, None].copy()) # add one dimension
8388752
However 16 bytes per dimension isn't very much compared to the 8kk bytes the array takes and to a view (squeeze returns a view - that's why I had to copy it) which uses ~100 bytes.

how to multiply large matrices without getting memory error

I have a matrix X with (140000, 28) dims.
Calculating np.transpose(X) * X in python gives memory error.
how can I do the multiplication ?
Name:X, Type: Float64, Size:(140000L, 28L)... I think it is ndarray

My data input is 5 GB of numpy arrays, yet running through a function takes 20 GB. Why?

The code is too complicated to paste here, but I have a numpy array shaped (800, 800, 1300), or 1300 matrices shaped (800, 800). This is 5GB.
I pass this array into a function, whereby the function
multiplies each "matrix" in the above array by a float in a (1300,) shaped array
sums the array into one "matrix", shaped (800, 800)
and takes the inverse of the matrix
This program runs at 20.2 GB RAM! Is that possible? I cannot see any memory leaks. I am simply taking numpy arrays, and passing them through a function. I then save the resulting arrays.
I'll try to post the code.
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.io
import os
data_file1 = "filename1.npy"
data_file2 = "filename2.npy"
data_file3 = "filename3.npy"
data1 = np.load(data_file1)
data2 = np.load(data_file2)
data3 = np.load(data_file3)
data_total = np.concatenate((data1, data2, data3)) # This array is shape (800,800,1300), around 6 GB.
array1 = np.arange(1300) + 1
vector = np.arange(800) + 1
def function_matrix(data_total, vector):
Multi_matrix = array1[:, None, None] * data_total # step 1, multiplies each (800,800) matrix
Sum_matrix = np.sum(Multi_matrix, axis=0) #sum matrix
mTCm = np.array([np.dot(vector.T , (np.linalg.solve(Sum_matrix , vector)) )])
return mTCm
draw_pointsA = np.asarray([[function_matrix(data_total[i], vector[j]) for i in np.arange(0,100)] for j in np.arange(0,100)])
filename = "save_datapoints.npy"
np.save(filename, draw_pointsA)
EDIT 2:
See below. It is actually 12 GB RAM, 20.1 GB virtual size of process.

This doesn't answer your question, but proposes a way to avoid the problem from the start.
Step 1 is sequential -- you only need 1 matrix loaded at a time.
Change your code to process each matrix independently
By Step 2 your memory requirement is down to 800 * 800 * sizeof(datum), which is a few megabytes, and you can certainly afford to keep that in memory.

It sounds like this could be a type issue, i.e. you converted the values in the matrices to a different type. Perhaps you stored the original matrix with values as int16 or a single, and after multiplying it with a float, it's stored as a matrix of double values (which require 2 times more space in memory).
You can use the dtype argument to set the value type for the matrix.
Other possible reasons could be that some additional matrices are created underway. That's obviously impossible to decode unless you post the code.
A possible solution to your memory problem is to use HDF5 files, and write the matrices to disk. Then you could load the matrix one at a time. This is easy with h5py, as the matrices can be compressed, and/or sliced using numpy/scipy syntax.

Memory growth with broadcast operations in NumPy

I am using NumPy to handle some large data matrices (of around ~50GB in size). The machine where I am running this code has 128GB of RAM so doing simple linear operations of this magnitude shouldn't be a problem memory-wise.
However, I am witnessing a huge memory growth (to more than 100GB) when computing the following code in Python:
import numpy as np
# memory allocations (everything works fine)
a = np.zeros((1192953, 192, 32), dtype='f8')
b = np.zeros((1192953, 192), dtype='f8')
c = np.zeros((192, 32), dtype='f8')
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
Please note that initial memory allocations are done without any problems. However, when I try to perform the subtract operation with broadcasting, the memory grows to more than 100GB. I always thought that broadcasting would avoid making extra memory allocations but now I am not sure if this is always the case.
As such, can someone give some details on why this memory growth is happening, and how the following code could be rewritten using more memory efficient constructs?
I am running the code in Python 2.7 within IPython Notebook.

#rth's suggestion to do the operation in smaller batches is a good one. You could also try using the function np.subtract and give it the destination array to avoid creating an addtional temporary array. I also think you don't need to index c as c[np.newaxis, :, :], because it is already a 3-d array.
So instead of
a[:] = b[:, :, np.newaxis] - c[np.newaxis, :, :] # memory explodes here
try
np.subtract(b[:, :, np.newaxis], c, a)
The third argument of np.subtract is the destination array.

Well, your array a takes already 1192953*192*32* 8 bytes/1.e9 = 58 GB of memory.
The broadcasting does not make additional memory allocations for the initial arrays, but the result of
b[:, :, np.newaxis] - c[np.newaxis, :, :]
is still saved in a temporary array. Therefore at this line, you have allocated at least 2 arrays with the shape of a for a total memory used >116 GB.
You can avoid this issue, by operating on a smaller subset of your array at one time,
CHUNK_SIZE = 100000
for idx in range(b.shape[0]/CHUNK_SIZE):
sl = slice(idx*CHUNK_SIZE, (idx+1)*CHUNK_SIZE)
a[sl] = b[sl, :, np.newaxis] - c[np.newaxis, :, :]
this will be marginally slower, but uses much less memory.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I optimise this numpy array operation? - python

Related

How to efficiently broadcast multiplication between arrays of shapes (n,m,k) and (n,m)

Are squeezed arrays (in Python) smaller than arrays with single-dimensional entries? ([x,y] vs [x,y,1]?)

how to multiply large matrices without getting memory error

My data input is 5 GB of numpy arrays, yet running through a function takes 20 GB. Why?

Memory growth with broadcast operations in NumPy

Categories

Resources