Fast Circular buffer in python than the one using deque?

Fast Circular buffer in python than the one using deque? - python

I am implementing circular buffer in python using collections.deque to use it for some calculations. This is my original code:
clip=moviepy.editor.VideoFileClip('file.mp4')
clip_size= clip.size[::-1]
Depth=30
dc=5
TempKern = # some array of size Depth
RingBuffer=deque(np.zeros(clip_size, dtype=float),maxlen=NewDepth)
modified_clip = clip.fl_image(new_filtered_output)
modified_clip.write_videofile('output.mp4'))
def new_filtered_output(image):
global RingBuffer
inter_frame=somefunction(image)# inter_frame and image shape is same as clip_size
RingBuffer.append(inter_frame)
# Apply kernel
Output = dc + np.sum([np.asarray(RingBuffer)[j]*TempKern[j] for j in range(Depth)],axis=0)
return Output
Is this the fastest way possible? I have heard that numpy roll is an option. But I don't know how to make it behave like the above code?

I noticed you changed the code above, but your original code was:
def one():
TempKern=np.array([1,2,3,4,5])
depth=len(TempKern)
buf=deque(np.zeros((2,3)),maxlen=5)
for i in range(10):
buf.append([[i,i+1,i+2],[i+3,i+4,i+5]])
total= + np.sum([np.asarray(buf)[j]*TempKern[j] for j in range(depth)],axis=0)
print('total')
print(total)
return total
You can simplify things greatly and make it run quite a bit faster if you first flatten the arrays for the computation.
def two():
buf = np.zeros((5,6), dtype=np.int32)
for idx, i in enumerate(range(5, 10)):
buf[idx] = np.array([[i,i+1,i+2,i+3,i+4,i+5]], dtype=np.int32)
return (buf.T * np.array([1, 2, 3, 4, 5])).sum(axis=1).reshape((2,3))
The second implementation returns the same values and runs about 4x faster on my machine
one()
>> [[115 130 145]
[160 175 190]] ~ 100µs / loop
two()
>> array([[115, 130, 145],
[160, 175, 190]]) ~~ 26µs / loop
You can further simplify and parameterize this as such:
def three(n, array_shape):
buf = np.zeros((n,array_shape[0]*array_shape[1]), dtype=np.int32)
addit = np.arange(1, n+1, dtype=np.int32)
for idx, i in enumerate(range(n, 2*n)):
buf[idx] = np.arange(i, i+n+1)
return (buf.T * addit).sum(axis=1).reshape(array_shape)
three(5, (2,3))
>> array([[115, 130, 145],
[160, 175, 190]]) ~ 17µs / loop
Note that the second and third version returns a numpy array. You can cast it to a list by using .tolist() if need be.
Based on your feedback - edit below:
def four(array_shape):
n = array_shape[0] * array_shape[1] - 1
buf = []
addit = np.arange(1, n+1, dtype=np.int32)
for idx, i in enumerate(range(n, 2*n)):
buf.append(np.arange(i, i+n+1))
buf = np.asarray(buf)
summed = (buf.T * addit).sum(axis=1)
return summed.reshape(array_shape)

You can have the ring buffer as a numpy array, by doubling the size and slicing:
clipsize = clip.size[::-1]
depth = 30
ringbuffer = np.zeros((2*depth,) + clipsize)
framecounter = 0
def new_filtered_output(image):
global ringbuffer, framecounter
inter_frame = somefunction(image)
idx = framecounter % depth
ringbuffer[idx] = ringbuffer[idx + depth] = inter_frame
buffer = ringbuffer[idx + 1 : idx + 1 + depth]
framecounter += 1
# Apply kernel
output = dc + np.sum([buffer[j]*kernel[j] for j in range(depth)], axis=0)
return output
Now you don't have convert the deque into a numpy array every frame (and every loop iteration..).
As mentioned in the comments, you can apply the kernel more effeciently:
output = dc + np.einsum('ijk,i->jk', buffer, kernel)
Or:
output = dc + np.tensordot(kernel, buffer, axes=1)

Related

Using map() on a function with multiple inputs to get rid of for loops

Context:
I have a function to upsample multiple arrays that I want to write as efficiently as possible (because I have to run it 370000 times).
This function takes multiple inputs and is composed of 2 for loops. To upsample my arrays, I loop over this function with a parameter k, and I would like to get rid of this loop (which sits outside of the function). I tried using a mix of map() and list-comprehension to minimize my computing time but I can't make it work.
Question:
How to get my map() part of the code working (see last section of code) ? Is there a better way than map() to get rid of for loops ?
Summary:
Function interpolate_and_get_results: 2 for loops. Takes 3D, 2D arrays and int as inputs
This function is ran inside a for loop (parameter k) that I want to get rid of
I wrote some example code, the part with map() does not work because I can't think of a way to pass the k parameter as a list, but also an input.
Thank you !
ps: code to parallelize the interpolation function that I do not use for this example
import numpy as np
import time
#%% --- SETUP OF THE PROBLEM --- %%#
temperatures = np.random.rand(10,4,7)*100
precipitation = np.random.rand(10,4,7)
snow = np.random.rand(10,4,7)
# Flatten the arrays to make them iterable with map()
temperatures = temperatures.reshape(10,4*7)
precipitation = precipitation.reshape(10,4*7)
snow = snow.reshape(10,4*7)
# Array of altitudes to "adjust" the temperatures
alt = np.random.rand(4,7)*1000
# Flatten the array
alt = alt.reshape(4*7)
# Weight Matrix
w = np.random.rand(4*7, 1000, 1000)
#%% Function
def interpolate_and_get_results(temp, prec, Eprec, w, i, k):
# Do some calculations
factor1 = ((temperatures[i,k]-272.15) + (-alt[k] * -6/1000))
factor2 = precipitation[i,k]
factor3 = snow[i,k]
# Iterate through every cell of the upsampled arrays
for i in range(w.shape[1]):
for j in range(w.shape[2]):
val = w[k, i, j]
temp[i, j] += factor1 * val
prec[i, j] += factor2 * val
Eprec[i, j] += factor3 * val
#%% --- Function call without loop simplification --- ##%
# Prepare a template array
dummy = np.zeros((w.shape[1], w.shape[2]))
# Initialize the global arrays to be filled
tempYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
precYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
EprecYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
ts = time.time()
for i in range(temperatures.shape[0]):
# Create empty host arrays
temp = dummy.copy()
prec = dummy.copy()
Eprec = dummy.copy()
for k in range(w.shape[0]):
interpolate_and_get_results(temp, prec, Eprec, w, i, k)
print('Time: ', (time.time()-ts))
#%% --- With Map (DOES NOT WORK) --- %%#
del k
dummy = np.zeros((w.shape[1], w.shape[2]))
# Initialize the global arrays to be filled
tempYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
precYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
EprecYEAR2 = np.zeros((9, dummy.shape[0], dummy.shape[1]))
# Create a list k to be iterated through with the map() function
k = [k for k in range(0, temperatures.shape[1])]
for i in range(temperatures.shape[0]):
# Create empty host arrays
temp = dummy.copy()
prec = dummy.copy()
Eprec = dummy.copy()
# Call the interpolate function with map() iterating through k
map(interpolate_and_get_results(temp, prec, Eprec, w, i, k), k)
Code from #Jérôme Richard using numba added at the request of user #ken (takes 48.81s to run on my pc):
import numpy as np
import multiprocessing as mp
import time
#%% ------ Create data ------ ###
temperatures = np.random.rand(10,4,7)*100
precipitation = np.random.rand(10,4,7)
snow = np.random.rand(10,4,7)
# Array of altitudes to "adjust" the temperatures
alt = np.random.rand(4,7)*1000
#%% ------ IDW Interpolation ------ ###
# We create a weight matrix that we use to upsample our temperatures, precipitations and snow matrices
# This part is not that important, it works well as it is
MX,MY = np.shape(temperatures[0])
N = 300
T = np.zeros([N*MX+1, N*MY+1])
# create NxM inverse distance weight matrices based on Gaussian interpolation
x = np.arange(0,N*MX+1)
y = np.arange(0,N*MY+1)
X,Y = np.meshgrid(x,y)
k = 0
w = np.zeros([MX*MY,N*MX+1,N*MY+1])
for mx in range(MX):
for my in range(MY):
# Gaussian
add_point = np.exp(-((mx*N-X.T)**2+(my*N-Y.T)**2)/N**2)
w[k,:,:] += add_point
k += 1
sum_weights = np.sum(w, axis=0)
for k in range(MX*MY):
w[k,:,:] /= sum_weights
#%% --- Function --- %%#
# Code from Jérôme Richard: https://stackoverflow.com/questions/72399050/parallelize-three-nested-loops/72399494?noredirect=1#comment127919686_72399494
import numba as nb
# get_results + interpolator
#nb.njit('void(float64[:,::1], float64[:,::1], float64[:,::1], float64[:,:,::1], int_, int_, int_, int_)', parallel=True)
def interpolate_and_get_results(temp, prec, Eprec, w, i, k, mx, my):
factor1 = ((temperatures[i,mx,my]-272.15) + (-alt[mx, my] * -6/1000))
factor2 = precipitation[i,mx,my]
factor3 = snow[i,mx,my]
# Filling the
for i in nb.prange(w.shape[1]):
for j in range(w.shape[2]):
val = w[k, i, j]
temp[i, j] += factor1 * val
prec[i, j] += factor2 * val
Eprec[i, j] += factor3 * val
#%% --- Main Loop --- %%#
ts = time.time()
if __name__ == '__main__':
dummy = np.zeros((w.shape[1], w.shape[2]))
# Initialize the permanent arrays to be filled
tempYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
precYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
EprecYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
smbYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
# Initialize semi-permanent array
smb = np.zeros((dummy.shape[0], dummy.shape[1]))
# Loop over the "time" axis
for i in range(0, temperatures.shape[0]):
# Create empty semi-permanent arrays
temp = dummy.copy()
prec = dummy.copy()
Eprec = dummy.copy()
# Loop over the different weights
for k in range(w.shape[0]):
# Loops over the cells of the array to be upsampled
for mx in range(MX):
for my in range(MY):
interpolate_and_get_results(temp, prec, Eprec, w, i, k, mx, my)
# At each timestep, update the semi-permanent array using the results from the interpolate function
smb[np.logical_and(temp <= 0, prec > 0)] += prec[np.logical_and(temp <= 0, prec > 0)]
# Fill the permanent arrays (equivalent of storing the results at the end of every year)
# and reinitialize the semi-permanent array every 5th timestep
if i%5 == 0:
# Permanent
tempYEAR[int(i/5)] = temp
precYEAR[int(i/5)] = prec
EprecYEAR[int(i/5)] = Eprec
smbYEAR[int(i/5)] = smb
# Semi-permanent
smb = np.zeros((dummy.shape[0], dummy.shape[1]))
print("Time spent:", time.time()-ts)

Note: This answer is not about how to use map, it's about "a better way".
You are doing a lot of redundant calculations. Believe it or not, this code outputs the same result.
# No change in the initialization section above.
ts = time.time()
if __name__ == '__main__':
dummy = np.zeros((w.shape[1], w.shape[2]))
# Initialize the permanent arrays to be filled
tempYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
precYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
EprecYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
smbYEAR = np.zeros((9, dummy.shape[0], dummy.shape[1]))
smb = np.zeros((dummy.shape[0], dummy.shape[1]))
temperatures_inter = temperatures - 272.15
w_inter = w.sum(axis=0)
alt_inter = (alt * (-6 / 1000)).sum()
for i in range(0, temperatures_inter.shape[0]):
temp_i = (temperatures_inter[i].sum() - alt_inter) * w_inter
prec_i = precipitation[i].sum() * w_inter
Eprec_i = snow[i].sum() * w_inter
condition = np.logical_and(temp_i <= 0, prec_i > 0)
smb[condition] += prec_i[condition]
if i % 5 == 0:
tempYEAR[i // 5] = temp_i
precYEAR[i // 5] = prec_i
EprecYEAR[i // 5] = Eprec_i
smbYEAR[i // 5] = smb
smb = np.zeros((dummy.shape[0], dummy.shape[1]))
print("Time spent:", time.time() - ts)
I verified the results by comparing them to the output of the code that uses numba. The difference is about 0.0000001, which is probably caused by rounding error.
print((tempYEAR_from_yours - tempYEAR_from_mine).sum()) # -8.429287845501676e-08
print((precYEAR_from_yours - precYEAR_from_mine).sum()) # 2.595697878859937e-09
print((EprecYEAR_from_yours - EprecYEAR_from_mine).sum()) # -7.430216442116944e-09
print((smbYEAR_from_yours - smbYEAR_from_mine).sum()) # -6.875431779462815e-09
On my PC, this code took 0.36 seconds. It does not use numba and is not even parallelized. It just eliminated redundancy.

Python numpy: Simplify operation on multiple matrices

I have 3 numpy matrices:
One contains pixels positions in X (x_pos), another pixel positions in Y (y_pos) and a last one containing pixel values (p_value)
I would like to use these 3 matrices to build a results image
With loops I have this result:
#Resulting image
res = np.zeros((128,128,3), dtype = np.uint8)
for i in range(x_pos.shape[0]):
for j in range(x_pos.shape[1]):
# Get coordinates
x = x_pos[i][j]
y = y_pos[i][j]
res[y,x] = p_value[i][j]
With large matrices (2048*2048) this code already takes a lot of time. Is it possible to optimize this code without using a nested loop?
I specify that the positions in the pos_x and pos_y matrices do not necessarily follow each other, there may be holes or duplicate values

It should be possible using np.meshgrid
i = np.arange(0, x.shape[0])
j = np.arange(0, x.shape[1])
i_1, j_1 = np.meshgrid(i, j, indexing='ij')
res[y_1.ravel(),x_1.ravel()] = p_value[i_1.ravel(),j_1.ravel()]

First use consistent numpy 2d array indexing:
x = x_pos[i,j]
y = y_pos[i,j]
res[y,x] = p_value[i,j]
Now instead of scalar i,j use arrays
i = np.arange(n); j = np.arange(m)
You didn't provida [mcve] so I won't try to demonstrate that th

Thanks to #hpaulj and #ai2ys answer the problem is solved.
Here is a comparison of the results in terms of execution speed:
import numpy as np
import cv2
import time
m_size = 4096
m_x = np.random.randint(0,m_size,(m_size,m_size), dtype = np.uint16)
m_y = np.random.randint(0,m_size,(m_size,m_size), dtype = np.uint16)
p_value = np.ones((m_size,m_size), dtype = np.uint8)
#Meshgrid method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
i = np.arange(0, m_x.shape[0])
j = np.arange(0, m_x.shape[1])
i_1, j_1 = np.meshgrid(i, j, indexing='ij')
out[m_x.ravel(),m_y.ravel()] = p_value[i_1.ravel(),j_1.ravel()]
end = time.time()
print("Meshgrid: {} s".format(end - start))
#No for loop method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
i = np.arange(m_x.shape[0])
j = np.arange(m_y.shape[1])
x = m_x[i,j]
y = m_y[i,j]
out[x,y] = p_value[i,j]
end = time.time()
print("No loop: {} s".format(end - start))
#For loop method:
out = np.zeros((m_size,m_size),dtype=np.uint8)
start = time.time()
for i in range(m_x.shape[0]):
for j in range(m_y.shape[1]):
x = m_x[i,j]
y = m_y[i,j]
out[x,y] = p_value[i,j]
end = time.time()
print("Nested loop: {} s".format(end - start))
#Output:
Meshgrid: 0.4837045669555664 s
No loop: 0.3600656986236572 s
Nested loop: 13.10097336769104 s

PyTorch: Vectorizing patch selection from a batch of images

Suppose I have a batch of images as a tensor, for example:
images = torch.zeros(64, 3, 1024, 1024)
Now, I want to select a patch from each of those images. All the patches are of the same size, but have different starting positions for each image in the batch.
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
I can achieve the desired result like that:
result = []
for i in range(arr.shape[0]):
result.append(arr[i, :, start_x[i]:start_x[i]+size_x, start_y[i]:start_y[i]+size_y])
result = torch.stack(result, dim=0)
The question is -- is it possible to do the same thing faster, without a loop? Perhaps there is some form of advanced indexing, or a PyTorch function that can do this?

You can use torch.take to get rid of a for loop. But first, an array of indices should be created with this function
def convert_inds(img_a,img_b,patch_a,patch_b,start_x,start_y):
all_patches = np.zeros((len(start_x),3,patch_a,patch_b))
patch_src = np.zeros((patch_a,patch_b))
inds_src = np.arange(patch_b)
patch_src[:] = inds_src
for ind,info in enumerate(zip(start_x,start_y)):
x,y = info
if x + patch_a + 1 > img_a: return False
if y + patch_b + 1 > img_b: return False
start_ind = img_b * x + y
end_ind = img_b * (x + patch_a -1) + y
col_src = np.linspace(start_ind,end_ind,patch_b)[:,None]
all_patches[ind,:] = patch_src + col_src
return all_patches.astype(np.int)
As you can see, this function essentially creates the indices for each patch you want to slice. With this function, the problem can be easily solved by
size_x = 100
size_y = 100
start_x = torch.zeros(64)
start_y = torch.zeros(64)
images = torch.zeros(64, 3, 1024, 1024)
selected_inds = convert_inds(1024,1024,100,100,start_x,start_y)
selected_inds = torch.tensor(selected_inds)
res = torch.take(images,selected_inds)
UPDATE
OP's observation is correct, the approach above is not faster than a naive approach. In order to avoid building indices every time, here is another solution based on unfold
First, build a tensor of all the possible patches
# create all possible patches
all_patches = images.unfold(2,size_x,1).unfold(3,size_y,1)
Then, slice the desired patches from all_patches
img_ind = torch.arange(images.shape[0])
selected_patches = all_patches[img_ind,:,start_x,start_y,:,:]

How to run scipy's ndimage.generic_filter() faster using parallel programming?

scipy generic_filter can be very slow when it is applied to a large N dimensional array. I am wondering if it can be parallelized using multiple cores since the filtering of each moving window is independent process.

I did a function that makes the generic_filter in parallel using the multiprocessing library. It divides the big array in blocks in the rows direction and makes each block in parallel.
def generic_filter_parallel(in_array,num_arrays,function_generic_filter,size):
"""
Parameters
----------
in_array : 2D numpy array
num_arrays : int
number of arrays into which the big array will be spllited
function_generic_filter : function
function to use in the generic filter
size : tupple
size of the filter
Returns
-------
filter_array : 2D numpy array
array filtered
"""
#get size for each dimension
window_size_rows = size[0]
window_size_cols = size[1]
##split arrays
array_split = np.array_split(in_array, num_arrays)
# generate a list of nrows as it can be different depending on how you split the array
nrows = []
for i_array in array_split:
nrows.append(i_array.shape[0])
# add borders in the upper and lower part of each spliited array
list_split_arrays = []
for n_array in range(num_arrays):
if n_array == 0:
aux = np.vstack((array_split[n_array], array_split[n_array + 1][0:window_size_rows, :]))
list_split_arrays.append(aux)
elif n_array < num_arrays - 1:
aux = np.vstack((array_split[n_array - 1][-window_size_rows::, :], array_split[n_array], array_split[n_array + 1][0:window_size_rows, :]))
list_split_arrays.append(aux)
else:
aux = np.vstack((array_split[n_array - 1][-window_size_rows::, :], array_split[n_array]))
list_split_arrays.append(aux)
#process in parallel each splitted array individually
pool = Pool(processes=num_arrays)
res_pool = pool.imap(partial(ndimage.generic_filter, function=function_generic_filter, size=(window_size_rows, window_size_cols)), list_split_arrays)
pool.close()
#collect the results
list_filters_parallel = []
for i in res_pool:
list_filters_parallel.append(i)
#compose the final image with each individual splitted array done in parallel
filter_array = np.zeros(in_array.shape)
ini_row = 0
fin_row = 0
for n_array in range(num_arrays):
if n_array == 0:
ini_row = 0
fin_row = nrows[n_array]
else:
ini_row = ini_row + nrows[n_array - 1]
fin_row = fin_row + nrows[n_array]
if n_array == 0:
aux = list_filters_parallel[n_array][ini_row:fin_row, :]
filter_array[ini_row:fin_row, :] = aux
elif n_array < num_arrays - 1:
aux = list_filters_parallel[n_array][window_size_rows:-window_size_rows, :]
filter_array[ini_row:fin_row, :] = aux
else:
aux = list_filters_parallel[n_array][window_size_rows::, :]
filter_array[ini_row:fin_row, :] = aux
return filter_array

Numpy convert list of 3D variable size volumes to 4D array

I'm working on a neural network where I am augmenting data via rotation and varying the size of each input volume.
Let me back up, the input to the network is a 3D volume. I generate variable size 3D volumes, and then pad each volume with zero's such that the input volume is constant. Check here for an issue I was having with padding (now resolved).
I generate a variable size 3D volume, append it to a list, and then convert the list into a numpy array. At this point, padding hasn't occured so converting it into a 4D tuple makes no sense...
input_augmented_matrix = []
label_augmented_matrix = []
for i in range(n_volumes):
if i % 50 == 0:
print ("Augmenting step #" + str(i))
slice_index = randint(0,n_input)
z_max = randint(5,n_input)
z_rand = randint(3,5)
z_min = z_max - z_rand
x_max = randint(75, n_input_x)
x_rand = randint(60, 75)
x_min = x_max - x_rand
y_max = randint(75, n_input_y)
y_rand = randint(60, 75)
y_min = y_max - y_rand
random_rotation = randint(1,4) * 90
for j in range(2):
temp_volume = np.empty((z_rand, x_rand, y_rand))
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
try:
temp_volume[k][l][m] = input_matrix[z][x][y]
except:
pdb.set_trace()
else:
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
temp_volume = np.asarray(temp_volume)
temp_volume = np.rot90(temp_volume,random_rotation)
if j == 0:
input_augmented_matrix.append(temp_volume)
else:
label_augmented_matrix.append(temp_volume)
input_augmented_matrix = np.asarray(input_augmented_matrix)
label_augmented_matrix = np.asarray(label_augmented_matrix)
The dimensions of input_augmented_matrix at this point is (N,)
Then I pad with the following code...
for i in range(n_volumes):
print("Padding volume #" + str(i))
input_augmented_matrix[i] = np.lib.pad(input_augmented_matrix[i], ((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
label_augmented_matrix[i] = np.lib.pad(label_augmented_matrix[i], ((0,n_input_z - int(label_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(label_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(label_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
At this point, the dimensions are still (N,) even though every element of the list is constant. For example input_augmented_matrix[0] = input_augmented_matrix[1]
Currently I just loop through and create a new array, but it takes too long and I would prefer some sort of method that automates this. I do it with the following code...
input_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
label_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
for i in range(n_volumes):
print("Converting to 4D tuple #" + str(i))
for j in range(n_input_z):
for k in range(n_input_x):
for l in range(n_input_y):
input_4d[i][j][k][l] = input_augmented_matrix[i][j][k][l]
label_4d[i][j][k][l] = label_augmented_matrix[i][j][k][l]
Is there a cleaner and faster way to do this?

As I understood this part
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
try:
temp_volume[k][l][m] = input_matrix[z][x][y]
except:
pdb.set_trace()
else:
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
You just want to do this
temp_input = input_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
temp_label = label_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
and then
temp_input = np.rot90(temp_input, random_rotation)
temp_label = np.rot90(temp_label, random_rotation)
input_augmented_matrix.append(temp_input)
label_augmented_matrix.append(temp_label)
Here
input_augmented_matrix[i] = np.lib.pad(
input_augmented_matrix[i],
((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
Better to do this, because shape property gives you size of array by all dimensions
ia_shape = input_augmented_matrix[i].shape
input_augmented_matrix[i] = np.lib.pad(
input_augmented_matrix[i],
((0, n_input_z - ia_shape[0]),
(0, n_input_x - ia_shape[1])),
(0, n_input_y - ia_shape[2]))),
'constant',
constant_values=0)
I guess now you're ready to refactor the last part of your code with magic indexing of NumPy.
My common suggestions:
use functions for repeated parts of code to avoid such indents like in your cascade of loops;
if you need so lot of nested loops, think about recursion, if you can't deal without them;
explore abilities of NumPy in official documentation: they're really exciting ;) For example, indexing is helpful for this task;
use PyLint and Flake8 packages to inspect quality of your code.
Do you want to write neural network by yourself, or you just want to solve some patterns recognition task? SciPy library may contain what you need and it's based on NumPy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast Circular buffer in python than the one using deque? - python

Related

Using map() on a function with multiple inputs to get rid of for loops

Python numpy: Simplify operation on multiple matrices

PyTorch: Vectorizing patch selection from a batch of images

How to run scipy's ndimage.generic_filter() faster using parallel programming?

Numpy convert list of 3D variable size volumes to 4D array

Categories

Resources