optimize python code for memory efficiency

optimize python code for memory efficiency - python

I have a python code as follow:
import numpy as np
sizes = 2000
array1 = np.empty((sizes, sizes, sizes, 3), dtype=np.float32)
for i in range(sizes):
array1[i, :, :, 0] = 1.5*i
array1[:, i, :, 1] = 2.5*i
array1[:, :, i, 2] = 3.5*i
array2 = array1.reshape(sizes*sizes*sizes, 3)
#do something with array2
array3 = array2.reshape(sizes*sizes*sizes, 3)
I would want to optimize this code for memory efficient but I have no idea. Could I use "numpy.reshape" by a more memory efficient way?

I think your code is already memory efficient.
When possible, np.reshape returns a view of the original array. That is so in this case and therefore np.reshape is already as memory efficient as can be.
Here is how you can tell np.reshape is returning a view:
import numpy as np
# Let's make array1 smaller; it won't change our conclusions
sizes = 5
array1 = np.arange(sizes*sizes*sizes*3).reshape((sizes, sizes, sizes, 3))
for i in range(sizes):
array1[i, :, :, 0] = 1.5*i
array1[:, i, :, 1] = 2.5*i
array1[:, :, i, 2] = 3.5*i
array2 = array1.reshape(sizes*sizes*sizes, 3)
Note the value of array2 at a certain location:
assert array2[0,0] == 0
Change the corresponding value in array1:
array1[0,0,0,0] = 100
Note that the value of array2 changes.
assert array2[0,0] == 100
Since array2 changes due to a modification of array1, you can conclude that array2 is a view of array1. Views share the underlying data. Since there is no copy being made, the reshape is memory efficient.
array2 is already of shape (sizes*sizes*sizes, 3), so this reshape does nothing.
array3 = array2.reshape(sizes*sizes*sizes, 3)
Finally, the assert below shows array3 was also affected by the modification made to array1. So that proves conclusively that array3 is also a view of array1.
assert array3[0,0] == 100

So really your problem depends on what you are doing with the array. You are currently storing a large amount of redundant information. You could keep 0.15% of the currently stored information and not lose anything.
For instance, if we define the following three one dimensional arrays
a = np.linspace(0,(size-1)*1.5,size).astype(np.float32)
b = np.linspace(0,(size-1)*2.5,size).astype(np.float32)
c = np.linspace(0,(size-1)*3.5,size).astype(np.float32)
We can create any minor entry (i.e. entry in the fastest rotating axis) in your array1:
In [235]: array1[4][3][19] == np.array([a[4],b[3],c[19]])
Out[235]: array([ True, True, True], dtype=bool)
The use of this all depends on what you are doing with the array, as it will be less performant to remake array1 from a,b and c. However, if you are nearing the limits of what your machine can handle, sacrificing performance for memory efficiency may be a necessary step. Also moving a,b and c around will have a much lower overhead than moving array1 around.

Related

Faster definition of "matrix multiplication" in Python

I need to define matrix multiplication from scratch, as instead of multiplying each constant together, each constant is actually another array and any two arrays need to be "convolved" together (I don't think it's necessary to define what a convolution is here).
I have made a picture that hopefully explains what I'm trying to say better:
The code I have to do this with is this:
for row in range(arr1.shape[2]):
for column in range(arr2.shape[3]):
for index in range(arr2.shape[2]): # Could also be "arr1.shape[3]"
out[:, :, row, column] += convolve(
arr2[:, :, : , column][:, :, index],
arr1[:, :, row, : ][:, :, index]
)
However, this method had proved to be very slow for me, so I was wondering if there was a faster way to do this.

If the intermediate fits in memory the following should be reasonably efficient
import numpy as np
from scipy.signal import fftconvolve,convolve
# example
rng = np.random.default_rng()
A = rng.random((5,6,2,3))
B = rng.random((4,3,3,4))
# custom matmul
Ae,Be = A[...,None],B[:,:,None]
shsh = np.maximum(Ae.shape[2:],Be.shape[2:])
Ae = np.broadcast_to(Ae,(*Ae.shape[:2],*shsh))
Be = np.broadcast_to(Be,(*Be.shape[:2],*shsh))
C = fftconvolve(Ae,Be,axes=(0,1),mode='valid').sum(3)
# original loop for reference
out = np.zeros_like(C)
for row in range(A.shape[2]):
for column in range(B.shape[3]):
for index in range(B.shape[2]): # Could also be "A.shape[3]"
out[:, :, row, column] += convolve(
B[:, :, : , column][:, :, index],
A[:, :, row, : ][:, :, index],
mode='valid'
)
print(np.allclose(C,out))
# True
By doing the convolution in bulk we reduce the total number of fft's we have to do.
If need be this could be further optimized for both speed and memory by doing the sum reduction in Fourier space using einsum. This would require doing the fft convolution by hand, though.

Alternative to loop for for boolean / nonzero indexing of numpy array

I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.

Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)

Vectorizing operations using 3d numpy arrays

I'm working in an algorithm to match two kind of objects (lets say balls and buckets). Each object is modeled as a 4D numpy array, and each kind is grouped within another array. My method is based on calculating all possible differences between each pair (ball, bucket) and applying a similarity function on that difference.
I'm trying to avoid for loops since speed is really relevant for what I'm doing, so I'm creating those differences by reshaping one of the initial arrays, broadcasting numpy operations and creating a 3D numpy array (diff_map). I'm not finding any good tutorial about this, so I'd like to know if there is a more "proper way" to do that. Id also like to see any good references about this kind of operation (multidimensional reshape and broadcast) if possible.
My code:
import numpy as np
balls = np.random.rand(3,4)
buckets = np.random.rand(6,4)
buckets = buckets.reshape(len(buckets), 1, 4)
buckets
array([[[ 0.38382622, 0.27114067, 0.63856317, 0.51360638]],
[[ 0.08709269, 0.21659216, 0.31148519, 0.99143705]],
[[ 0.03659845, 0.78305241, 0.87699971, 0.78447545]],
[[ 0.11652137, 0.49490129, 0.76382286, 0.90313785]],
[[ 0.62681395, 0.10125169, 0.61131263, 0.15643676]],
[[ 0.97072113, 0.56535597, 0.39471204, 0.24798229]]])
diff_map = balls-buckets
diff_map.shape
(6, 3, 4)
For Loop
As requested, this is the for loop I'm trying to avoid:
diff_map_for = np.zeros((len(buckets), len(balls), 4))
for i in range(len(buckets)):
for j in range(len(balls)):
diff_map_for[i, j] = buckets[i]-balls[j]
`Just to be sure, let's compare the two results:
np.all(diff_map == diff_map_for)
True

Does this work for you?
import numpy as np
balls = np.random.rand(3,4)
buckets = np.random.rand(6,4)
diff_map = buckets[:, np.newaxis, :] - balls[np.newaxis, :, :]
print(diff_map.shape)
# output: (6, 3, 4)
# ... compared to for loop
diff_map_for = np.zeros((len(buckets), len(balls), 4))
for i in range(len(buckets)):
for j in range(len(balls)):
diff_map_for[i, j] = buckets[i] - balls[j]
print(np.sum(diff_map - diff_map_for))
# output: 0.0

numpy padding matrix of different row size

I have a numpy array of different row size
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
and I would like to become this one into a dense (fixed n x m size, no variable rows) matrix. Until now I tried with something like this
size = (len(a),5)
result = np.zeros(size)
result[[0],[len(a[0])]]=a[0]
But I receive an error telling me
shape mismatch: value array of shape (5,) could not be broadcast to
indexing result of shape (1,)
I also tried to do padding wit np.pad, but according to the documentation of numpy.pad it seems I need to specify in the pad_width, the previous size of the rows (which is variable and produced me errors trying with -1,0, and biggest row size).
I know I can do it padding padding lists per row as it's shown here, but I need to do that with a much bigger array of data.
If someone can help me with the answer to this question, I would be glad to know of it.

There's really no way to pad a jagged array such that it would loose its jaggedness, without having to iterate over the rows of the array. You'll have to iterate over the array twice even: once to find out the maximum length you need to pad to, another to actually do the padding.
The code proposal you've linked to will get the job done, but it's not very efficient, because it adds zeroes in a python for-loop that iterates over the elements of the rows, whereas that appending could have been precalculated, thereby pushing more of that code to C.
The code below precomputes an array of the required minimal dimensions, filled with zeroes and then simply adds the row from the jagged array M in place, which is far more efficient.
import random
import numpy as np
M = [[random.random() for n in range(random.randint(0,m))] for m in range(10000)] # play-data
def pad_to_dense(M):
"""Appends the minimal required amount of zeroes at the end of each
array in the jagged array `M`, such that `M` looses its jagedness."""
maxlen = max(len(r) for r in M)
Z = np.zeros((len(M), maxlen))
for enu, row in enumerate(M):
Z[enu, :len(row)] += row
return Z
To give you some idea for speed:
from timeit import timeit
n = [10, 100, 1000, 10000]
s = [timeit(stmt='Z = pad_to_dense(M)', setup='from __main__ import pad_to_dense; import numpy as np; from random import random, randint; M = [[random() for n in range(randint(0,m))] for m in range({})]'.format(ni), number=1) for ni in n]
print('\n'.join(map(str,s)))
# 7.838103920221329e-05
# 0.0005027339793741703
# 0.01208890089765191
# 0.8269036808051169
If you want to prepend zeroes to the arrays, rather than append, that's a simple enough change to the code, which I'll leave to you.

You can do something like this with numpy.pad
import numpy as np
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
l = np.array([len(a[i]) for i in range(len(a))])
width = l.max()
b=[]
for i in range(len(a)):
if len(a[i]) != width:
x = np.pad(a[i], (0,width-len(a[i])), 'constant',constant_values = 0)
else:
x = a[i]
b.append(x)
b = np.array(b)
print(b)
Above piece of code outputs something like this.
b = [[1, 2, 3, 4, 5],
[1, 2, 3, 0, 0],
[1, 0, 0, 0, 0]]
You can read back your input version of data by doing something as follows
a = []
for i in range(len(b)):
a.append(b[i][0:l[i]])
a = np.array(a)
print(a)
where you get the following output
a = array([array([1, 2, 3, 4, 5]), array([1, 2, 3]), array([1])], dtype=object)
Hopefully this helps someone who struggled like me to solve the issue.
Thank you.

Is there a difference in the way we access elements of a list comprehension and the elements of a numpy array

I am working on a genetic algorithm code. I am fairly new to python.
My code snippet is as follows:
import numpy as np
pop_size = 10 # Population size
noi = 2 # Number of Iterations
M = 2 # Number of Phases in the Data
alpha = [np.random.randint(0, 64, size = pop_size)]* M
phi = [np.random.randint(0, 64, size = pop_size)]* M
reduced_tensor = [np.zeros((pop_size,3,3))]* M
for n_i in range(noi):
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
phi_en = [(phi/63.00) for phi in phi]
for i in range(M):
for j in range(pop_size):
reduced_tensor[i][j] = [[1, 0, 0],
[0, phi_en[i][j], 0],
[0, 0, 0]]
Here I have a list of numpy arrays. The variable 'alpha' is a list containing two numpy arrays. How do I use list comprehension in this case? I want to create a similar list 'alpha_en' which operates on every element of alpha. How do I do that? I know my current code is wrong, it was just trial and error.
What does 'for alpha in alpha' mean (line 11)? This line doesn't give any error, but also doesn't give the desired output. It changes the dimension and value of alpha.
The variable 'reduced_tensor' is a list of an array of 3x3 matrix, i.e., four dimensions in total. How do I differentiate between the indexing of a list comprehension and a numpy array? I want to perform various operations on a list of matrices, in this case, assign the values of phi_en to one of the elements of the matrix reduced_tensor (as shown in the code). How should I do it efficiently? I think my current code is wrong, if not just confusing.

There some questionable programming in these 2 lines
alpha = [np.random.randint(0, 64, size = pop_size)]* M
...
alpha_en = [(2*np.pi*alpha/63.00) for alpha in alpha]
The first makes an array, and then makes a list with M pointers to the same thing. Note, M copies of the random array. If I were to change one element of alpha, I'd change them all. I don't see the point to this type of construction.
The [... for alpha in alpha] works because the 2 uses of alpha are different. At least in newer Pythons the i in [i*3 for i in range(3)] does not 'leak out' of the comprehension. That said, I would not approve of that variable naming. At the very least is it confusing to readers.
The arrays in alpha_en are separate. Values are derived from the array in alpha, but they are new.
for a in alphas:
a *= 2
would modify each array in alphas; how ever due to how alphas is constructed this ends up multiplying the array many times.
reduced_tensor = [np.zeros((pop_size,3,3))]* M
has the same problem; it's a list of M references to the same 3d array.
reduced_tensor[i][j]
references the i reference in that list, and the j 'row' of that array. I like to use
reduced_tensor[i][j,:,:]
to make it clearer to me and my reader the expected dimensions of the result.
The iteration over M does nothing for you; it just repeats the same assignment M times.
At the root of your problems is that use of list replication.
In [30]: x=[np.arange(3)]*3
In [31]: x
Out[31]: [array([0, 1, 2]), array([0, 1, 2]), array([0, 1, 2])]
In [32]: [id(i) for i in x]
Out[32]: [3036895536, 3036895536, 3036895536]
In [33]: x[0] *= 10
In [34]: x
Out[34]: [array([ 0, 10, 20]), array([ 0, 10, 20]), array([ 0, 10, 20])]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

optimize python code for memory efficiency - python

Related

Faster definition of "matrix multiplication" in Python

Alternative to loop for for boolean / nonzero indexing of numpy array

Vectorizing operations using 3d numpy arrays

numpy padding matrix of different row size

Is there a difference in the way we access elements of a list comprehension and the elements of a numpy array

Categories

Resources