I have a 5D array with shape (80, 180, 144, 160, 11) (80 3D-images of size 180*144*160 each with 11 channels) and a set of indices referring to this array with shape (n, 4) (that is n indices referring to which image and which 3D-pixel I am interested in).
Now to the question, I want to extract "blocks" with shape (18, 18, 20) centered around every index and preserving all channels. This will yield an ndarray of shape (n, 18, 18, 20, 11). Also, if an index is too close to the border of the 3D-image as to not fit the entire block then I want to 0-pad the image.
I have managed to do this myself with a for-loop over every index but the performance is rather poor unfortunately (~10 s for n=100). I need to do this for ns in the range of 10 000 - 1 000 000 so my solution is not really an option.
My attempt where the images are given in images and the indices in block_indices:
block_shape = (18, 18, 20)
blocks = np.empty((0,) + block_shape + (11,))
for index in block_indices:
block = np.pad(images[index[0]], ((block_shape[0], block_shape[0]),
(block_shape[1], block_shape[1]),
(block_shape[2], block_shape[2]),
(0, 0)))[index[1]+int(block_shape[0]/2):index[1]+int(3*block_shape[0]/2),
index[2]+int(block_shape[1]/2):index[2]+int(3*block_shape[1]/2),
index[3]+int(block_shape[2]/2):index[3]+int(3*block_shape[2]/2),
...]
blocks = np.append(blocks, block[np.newaxis, ...], axis=0)
I was thinking that this can probably be done really quickly with slicing and fancy array indexing but I have tried to no avail. Do you have any suggestions how this can be done more quickly? Thanks in advance!
PS: The numbers presented can vary a bit but should give you a rough idea of the scale.
For anyone looking to do the same thing in the future
I have managed to come up with another solution which is a lot faster and scales better. It involves use of a "shifting" block matrix, np.tile, flattening and some reshaping. One caveat is that the indices of the blocks need to be given in a 1D array of length n where each index corresponds to the index in a flattened array of 3D-images. One can quite easily convert between these different representations however.
For brevity I will only explain the main concepts of the method and then post a working code example, here goes.
Main concepts:
First we flatten or images array so that it gets shape (80*180*144*160,11).
Now we need to come to the realisation that the blocks we are after can be accessed from the flattened array according to a predictable pattern which is only shifted along depending on the location of the block.
These elements can be taken out with np.take so long as we know the indices.
Lastly the result of np.take can be reshapened into an array of blocks.
Working code example:
# Make a 3D-image which enumerates all pixels.
image_pixel_enumeration = np.arange(180*144*160).reshape(180, 144, 160)
# Get the index pattern of a block.
block_shifts = image_pixel_enumeration[:block_shape[0], :block_shape[1], :block_shape[2]].flatten() \
- image_pixel_enumeration[int(block_shape[0]/2), int(block_shape[1]/2), int(block_shape[2]/2)]
# Tile an array with the pattern, one for each block.
block_shifts = np.tile(block_shifts, (len(block_indices), 1))
# Tile an array with the block center indices add to them the pattern.
validation_data_indices = np.tile(block_indices, (np.prod(block_shape), 1)).transpose() + block_shifts
# Take out elements.
validation_data = np.take(x_test.reshape((-1, 11)), validation_data_indices.flatten(), 0, mode='clip')
# Reshape into blocks.
validation_data = validation_data.reshape((-1,) + block_shape + (11,))
This method takes (on my machine) approximately 0.1 s, 0.2 s and 1.4 s for 10, 100 and 1 000 indices respectively whilst the old method took approximately 1 s, 16 s and 900 s for the same number of indices. A massive improvement!
PS. Note that this solution does not solve the issue of blocks extending beyond the original image and can potentially pick pixels from the wrong images or wrong slices in these cases.
Related
I am using Scikit-Image imread function for reading images for a PyTorch data loader.
I get errors from the function ToTensor(), saying the the strides of the numpy array are negative.
I read about it and using somearray.copy() solves it.
Yet, I'd like to solve it from the root. How can I force Scikit-Image to read the image into a contiguous array with regular strides?
I looked for solutions for this case and they mostly about creating a new copy of data which I want to avoid.
Those are the properties of the array:
print(f'shape: {img.shape}')
print(f'dtype: {img.dtype}')
print(f'strides: {img.strides}')
The output:
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
When I run img.base I get the values of the data. Though the dimensions are (3024, 4032, 3)
I don't know a lot about image file formats, but can make some deductions from the data you provided
shape: (4032, 3024, 3)
dtype: uint8
strides: (3, -12096, 1)
img.base (3024, 4032, 3)
img is a view of its base. The negative strides[1] means that dimension has been reversed, e.g. with a ::-1 indexing. The fact that the largest stride is in the middle, means the first two dimensions have been swapped (transpose(1,0,2)). I expect img.base.strides is (12096,3,1). 12096 is 3*4032.
jpg is a compressed format, but I assume the base is close in layout to the file, and this view is needed to conform to our normal numpy expectations for an array.
img.copy() will have the same shape, but strides will be (9072,3,1).
If plt.imread produces an array with that shape and strides, it may well have returned that copy rather than the view. It's not necessarily being any more "efficient".
Think about how we print a 2d array - 1st dimension, rows, going down, 2nd, columns, going across, left to right. But think about a common xy plot - x goes left to right, and y goes from bottom up. Or look at what np.meshgrid says about indexing, 'ij' versus 'xy'.
Having the size 3 dimension last is just another convention. That's the color 'channel', 3 for RGB, 4 adds a transparency value, and 1 for b/w. Sometimes arrays have that dimension first.
I know this question has been asked before (I did a pretty thorough search), and recognize Python intentionally doesn't really want you to do this. And, if you create a readable NumPy array that references locations in memory (where your NumPy smaller matrix values are), then the matrix is no longer a contiguous array. Which may cause issues if you were to do certain things with it (Numba or Cython I suppose).
Nonetheless, looking for a smart answer, where we can still use this non-contiguous array in calculations, to not increase the memory footprint of a larger NumPy array. Yes, it's easiest to just resize the data (which will copy it), but that defeats the goal of minimizing memory in RAM. Here is a sample of what I'm doing on a very basic level:
So step 1) here I'm going to generate some random numbers and do it for 12 assets and 1000 simulations and save into the variable a:
import numpy as np
a = np.random.randn(12,1000)
Okay lets look at it's initial shape:
a.shape
(12,1000)
So now all I want to do is make these EXACT numbers available for say 20 iterations (vectorized, not using loops). But I DO NOT want to just make the matrix BIGGER. So my goal here is to have instead of a (12,1000) shape, a (12*20,1000) shape, with simple replication via pointers (or Python's version of them) and not just copy the (12,1000) matrix into more memory. The same numbers are used in this example 20 times (all at once) when passed into another function. They never get overwritten either (read-only is fine, with views). I could explain the reason why but it's pretty complex math; all you need to know is that the function needs the original random numbers replicated exactly. The brainless mem copy routine would be something like:
b = np.resize(a, (12*20,1000))
Which does what I want on the surface, with the new shape:
b.shape
(240, 1000)
As I can check that they are equal with a couple commands, first, the start of the array vs. the 2nd copy:
np.allclose(b[0:11,:],b[12:23,:])
True
And the end of the array vs. the 1st one:
np.allclose(b[0:11,:],b[228:239,:])
True
So great, that's what I want - a repeat of these random numbers through the whole array. BUT I don't want my memory to blow up (I am using HUGE arrays, that can't fit into most PC's memory - I am a quant developer with a ton of RAM, but end users don't have as much RAM as I do). So let us examine the size in memory of a and b:
a.nbytes
96000
b.nbytes
1920000
Which makes perfect sense since the memory of a has been multiplied by 20 to store all the repeated values, i.e.:
b.nbytes/a.nbytes
20.0
So of course, 20x the memory usage. So what I'm trying to get at here is quite simple (well, in other languages). It is to construct b so that the only overhead is just pointers to the 20 replications of a, so that the memory space is merely a + the pointer(s). Of course, the math has to work using this setup. I have seen some tricks using strides although I am not sure they will work here. I don't want to use loops either (the idea is in 1 run it's done, with 20 slightly different inputs). So if ANYONE has figured out a way to do this without using a ton of memory (compared to the base case, here the variable a, versus the replicated array, here the variable b), I would like to know your approach.
Any help is greatly appreciated!
First, your use of resize actually does (summarizing the code)
a = np.concatenate((a,) * 20).reshape(new_shape)
I'm a little confused about the 20 repeats, but also talk about "20 slightly different inputs". Is that this array, or some other inputs. Also what's the point to using a (240, 1000) shape, instead of a (20,12,1000)?
With broadcasting a (1,12,1000) can behave the same as (20,12,1000).
A small sample array:
In [646]: arr = np.arange(12).reshape(3,4)
In [647]: arr.shape, arr.strides
Out[647]: ((3, 4), (32, 8))
We can "resize" as you do with repeat:
In [655]: arr1 =arr.repeat(5,0)
In [656]: arr1.shape, arr1.strides
Out[656]: ((15, 4), (32, 8))
Or repeat on a new leading axis:
In [657]: arr1 =arr[None,:,:].repeat(5,0)
In [658]: arr1.shape, arr1.strides
Out[658]: ((5, 3, 4), (96, 32, 8))
Or we can use broadcasting to make an equivalent array:
In [660]: arr2 = np.broadcast_to(arr,(5,3,4))
In [661]: arr2.shape, arr2.strides
Out[661]: ((5, 3, 4), (0, 32, 8))
It has the same shape as arr1, but the leading strides is 0.
In [662]: np.allclose(arr2, arr1)
Out[662]: True
arr1 is a copy from the original, but arr2 is a view (or the original arange use to make arr):
In [665]: arr1.base
In [666]: arr2.base
Out[666]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In other works, arr2 doesn't increase the memory footprint of arr.
Usually we don't have use np.broadcast_to.
arr3 = arr[None,:,:]
is enough or even arr itself.
For example we can add a size 5 array to any of these:
In [670]: x = np.arange(5)[:,None,None]
In [671]: np.allclose(x+arr1, x+arr2)
Out[671]: True
In [672]: np.allclose(x+arr1, x+arr[None,:,:])
Out[672]: True
In [673]: np.allclose(x+arr1, x+arr)
Out[673]: True
The result in will be the full size, same as arr1. Use of strides and broadcasting can reduce the size of initial arrays, compared to a 'repeat'. But the final result still has 534 unique values.
I have searched the Internet to try and find a solution and have tried to make my own but can't seem to figure it out.
I need to be able to take a 1D NumPy array and within that array, after every 1024 values they get turned into a 32x32 array and keep going until the initial array has been completely searched through and to avoid any errors simply append any zeros necessary to fill up the sub-arrays.
Any help or guidance would be appreciated!
You don't really need to do much. First pad the array to the nearest multiple of 1024:
arr = np.random.rand(1024 * 5 - 100)
pad = -arr.size % 1024
if pad:
arr = np.concatenate((arr, np.zeros(pad, dtype=arr.dtype)))
Then reshape into an array of shape (N, 32, 32):
imgs = arr.reshape(-1, 32, 32)
Now you have a stack of images. Indexing imgs or iterating over it will give you the individual (32, 32) images.
I have the following situation. I have an array of size (3, 128, n) (where n is large). (This array represents a picture). I have a superresolution deep learning model that takes as input a (3, 128, 128) picture and gives it back in better quality. I want to use apply my model to the whole picture.
My existing solution
My first solution to this problem is to split my array into array of size (3, 128, 128). I then have a list of square images, and I can apply my model to each of this square and then concatenate all the results to get a new (3, 128, n) image. The problem with this method is that the model does not perform as well on the edges of the image.
My desired solution
To get around this problem, I have thought of an alternative solution. Instead of considering non overlapping square images, I can consider all square images that can be extracted from my original image. I can pass all those images to my model. Then to reconstruct a point of coordinates (a, b, c), I will consider all reconstructed square pictures that contains c, and take an average of them. I want this average to give more weight to the square where c is near the center.
To be more specific :
I start with a 3*128*n array (let's call it A). I pad on the left and on the right which gives me a new array (let's call it A_pad) of size 3*128*(n+2*127)
For i in range(0,n+127), let A_i = A_pad[:, :, i:i+128], A_i is of size (3*128*128) and can be fed to my model which creates a new array B_i of the same size.
Now I want a new array B of the same size than A that is defined like this : For each (x, y , z), B[x, y, z] is the mean of the 128 B_i[x, y, z+127-i] such that z <= i < z+128 with the weight 1 + min(z + 127 -i, i-z). That corresponds to taking the mean of all the windows that contains z with a weight proportional to the distance to the closest edge.
My question is based on the computation of B. Given what I've described, I could write multiple for loops that would yield the correct results, but I'm afraid it would be slow. I'm looking for a solution using numpy that is as fast as possible.
This is an example implementation that follows the steps you outlined in the section "My desired solution". It makes extensive use of np.lib.stride_tricks.as_strided which at first glance might not seem obvious at all; I added detailed comments to each usage for clarification. Also note that in your description you use z to denote the column position within images while in comments I use the term n-position in order to comply with the shape specification via n.
Regarding efficiency it's not obvious whether this is a winner or not. Computation happens all in numpy but the expression sliding_128 * weights builds a large array (128x the size of the original image) before reducing it along the frame dimension. This definitely comes at its cost, memory might even be an issue. A loop might come in handy at this position.
Lines which contain a comment prefixed with # [TEST] were added for testing purposes. Concretely this means we're overwriting the weights for the final sum of frames with 1 / 128 in order to eventually recover the original image (since no ML model transformation is applied either).
import numpy as np
n = 640 # For example.
image = np.random.randint(0, 256, size=(3, 128, n))
print('image.shape: ', image.shape) # (3, 128, 640)
padded = np.pad(image, ((0, 0), (0, 0), (127, 127)), mode='edge')
print('padded.shape: ', padded.shape) # (3, 128, 894)
sliding = np.lib.stride_tricks.as_strided(
padded,
# Frames stored along first dimension; sliding across last dimension of `padded`.
shape=(padded.shape[-1]-128+1, 3, 128, 128),
# First dimension: Moving one frame ahead -> move across last dimension of `padded`.
# Remaining three dimensions: Move as within `padded`.
strides=(padded.strides[-1:] + padded.strides)
)
print('sliding.shape: ', sliding.shape) # (767, 3, 128, 128)
# Now at this part we would feed the frames `sliding` to the ML model,
# where the first dimension is the batch size.
# Assume the output is assigned to `sliding` again.
# Since we're not using an ML model here, we create a copy instead
# in order to update the strides of `sliding` with it's actual shape (as defined above).
sliding = sliding.copy()
sliding_128 = np.lib.stride_tricks.as_strided(
# Reverse last dimension since we want the last column from the first frame.
# Need to copy again because `[::-1]` creates a view with negative stride,
# but we want actual reversal to work with the strides below.
# (There's perhaps a smart way of adjusting the strides below in order to not make a copy here.)
sliding[:, :, :, ::-1].copy(),
# Second dimension corresponds to the 128 consecutive frames.
# Previous last dimension is dropped since we're selecting the
# column that corresponds to the current n-position.
shape=(128, n, 3, 128),
# First dimension (frame position): Move one frame and one column ahead
# (actually want to move one column less in `sliding` but since we reverted order of columns
# we need to move one ahead now) -> move across first dimension of `sliding` + last dimension of `sliding`.
# Second dimension (n-position): Moving one frame ahead -> move across first dimension of `sliding`.
# Remaining two dimensions: Move within frames (channel and row dimensions).
strides=((sliding.strides[0] + sliding.strides[-1],) + sliding.strides[:1] + sliding.strides[1:3])
)
print('sliding_128.shape: ', sliding_128.shape) # (128, 640, 3, 128)
# Weights are independent of the n-position -> we can precompute.
weights = 1 + np.concatenate([np.arange(64), np.arange(64)[::-1]])
weights = np.ones(shape=128) # [TEST] Assign weights for testing -> want to obtain the original image back.
weights = weights.astype(float) / weights.sum() # Normalize?
weights = weights[:, None, None, None] # Prepare for broadcasting.
weighted_image = np.moveaxis(np.sum(sliding_128 * weights, axis=0), 0, 2)
print('weighted_image.shape: ', weighted_image.shape) # (3, 128, 640)
assert np.array_equal(image, weighted_image.astype(int)) # [TEST]
I am trying to vectorize an operation using numpy, which I use in a python script that I have profiled, and found this operation to be the bottleneck and so needs to be optimized since I will run it many times.
The operation is on a data set of two parts. First, a large set (n) of 1D vectors of different lengths (with maximum length, Lmax) whose elements are integers from 1 to maxvalue. The set of vectors is arranged in a 2D array, data, of size (num_samples,Lmax) with trailing elements in each row zeroed. The second part is a set of scalar floats, one associated with each vector, that I have a computed and which depend on its length and the integer-value at each position. The set of scalars is made into a 1D array, Y, of size num_samples.
The desired operation is to form the average of Y over the n samples, as a function of (value,position along length,length).
This entire operation can be vectorized in matlab with use of the accumarray function: by using 3 2D arrays of the same size as data, whose elements are the corresponding value, position, and length indices of the desired final array:
sz_Y = num_samples;
sz_len = Lmax
sz_pos = Lmax
sz_val = maxvalue
ind_len = repmat( 1:sz_len ,1 ,sz_samples);
ind_pos = repmat( 1:sz_pos ,sz_samples,1 );
ind_val = data
ind_Y = repmat((1:sz_Y)',1 ,Lmax );
copiedY=Y(ind_Y);
mask = data>0;
finalarr=accumarray({ind_val(mask),ind_pos(mask),ind_len(mask)},copiedY(mask), [sz_val sz_pos sz_len])/sz_val;
I was hoping to emulate this implementation with np.bincounts. However, np.bincounts differs to accumarray in two relevant ways:
both arguments must be of same 1D size, and
there is no option to choose the shape of the output array.
In the above usage of accumarray, the list of indices, {ind_val(mask),ind_pos(mask),ind_len(mask)}, is 1D cell array of 1x3 arrays used as index tuples, while in np.bincounts it must be 1D scalars as far as I understand. I expect np.ravel may be useful but am not sure how to use it here to do what I want. I am coming to python from matlab and some things do not translate directly, e.g. the colon operator which ravels in opposite order to ravel. So my question is how might I use np.bincount or any other numpy method to achieve an efficient python implementation of this operation.
EDIT: To avoid wasting time: for these multiD index problems with complicated index manipulation, is the recommend route to just use cython to implement the loops explicity?
EDIT2: Alternative Python implementation I just came up with.
Here is a heavy ram solution:
First precalculate:
Using index units for length (i.e., length 1 =0) make a 4D bool array, size (num_samples,Lmax+1,Lmax+1,maxvalue) , holding where the conditions are satisfied for each value in Y.
ALLcond=np.zeros((num_samples,Lmax+1,Lmax+1,maxvalue+1),dtype='bool')
for l in range(Lmax+1):
for i in range(Lmax+1):
for v in range(maxvalue+!):
ALLcond[:,l,i,v]=(data[:,i]==v) & (Lvec==l)`
Where Lvec=[len(row) for row in data]. Then get the indices for these using np.where and initialize a 4D float array into which you will assign the values of Y:
[indY,ind_len,ind_pos,ind_val]=np.where(ALLcond)
Yval=np.zeros(np.shape(ALLcond),dtype='float')
Now in the loop in which I have to perform the operation, I compute it with the two lines:
Yval[ind_Y,ind_len,ind_pos,ind_val]=Y[ind_Y]
Y_avg=sum(Yval)/num_samples
This gives a factor of 4 or so speed up over the direct loop implementation. I was expecting more. Perhaps, this is a more tangible implementation for Python heads to digest. Any faster suggestions are welcome :)
One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape
This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.
One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:
shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)
len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)
mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape
Note broadcast_to was added in Numpy 1.10.0.