Related
Let a be a numpy array of shape (n,m,k) and a_msk is an array of shape (n,m) containing that masks elements from a through multiplication.
Up to my knowledge, I had to create a new axis in a_msk in order to make it compatible with a for multiplication.
b = a * a_msk[:,:,np.newaxis]
Unfortunately, my Google Colab runtime is running out of memory at this very operation given the large size of the arrays.
My question is whether I can achieve the same thing without creating that new axis for the mask array.
As #hpaulj commented adding an axis to make the two arrays "compatible" for broadcasting is the most straightforward way to do your multiplication.
Alternatively, you can move the last axis of your array a to the front which would also make the two arrays compatible (I wonder though whether this would solve your memory issue):
a = np.moveaxis(a, -1, 0)
Then you can simply multiply:
b = a * a_msk
However, to get your result you have to move the axis back:
b = np.moveaxis(b, 0, -1)
Example: both solutions return the same answer:
import numpy as np
a = np.arange(24).reshape(2, 3, 4)
a_msk = np.arange(6).reshape(2, 3)
print(f'newaxis solution:\n {a * a_msk[..., np.newaxis]}')
print()
print(f'moveaxis solution:\n {np.moveaxis((np.moveaxis(a, -1, 0) * a_msk), 0, -1)}')
I need to select only the non-zero 3d portions of a 3d binary array (or alternatively the true values of a boolean array). Currently I am able to do so with a series of 'for' loops that use np.any, but this does work but seems awkward and slow, so currently investigating a more direct way to accomplish the task.
I am rather new to numpy, so the approaches that I have tried include a) using
np.nonzero, which returns indices that I am at a loss to understand what to do with for my purposes, b) boolean array indexing, and c) boolean masks. I can generally understand each of those approaches for simple 2d arrays, but am struggling to understand the differences between the approaches, and cannot get them to return the right values for a 3d array.
Here is my current function that returns a 3D array with nonzero values:
def real_size(arr3):
true_0 = []
true_1 = []
true_2 = []
print(f'The input array shape is: {arr3.shape}')
for zero_ in range (0, arr3.shape[0]):
if arr3[zero_].any()==True:
true_0.append(zero_)
for one_ in range (0, arr3.shape[1]):
if arr3[:,one_,:].any()==True:
true_1.append(one_)
for two_ in range (0, arr3.shape[2]):
if arr3[:,:,two_].any()==True:
true_2.append(two_)
arr4 = arr3[min(true_0):max(true_0) + 1, min(true_1):max(true_1) + 1, min(true_2):max(true_2) + 1]
print(f'The nonzero area is: {arr4.shape}')
return arr4
# Then use it on a small test array:
test_array = np.zeros([2, 3, 4], dtype = int)
test_array[0:2, 0:2, 0:2] = 1
#The function call works and prints out as expected:
non_zero = real_size(test_array)
>> The input array shape is: (2, 3, 4)
>> The nonzero area is: (2, 2, 2)
# So, the array is correct, but likely not the best way to get there:
non_zero
>> array([[[1, 1],
[1, 1]],
[[1, 1],
[1, 1]]])
The code works appropriately, but I am using this on much larger and more complex arrays, and don't think this is an appropriate approach. Any thoughts on a more direct method to make this work would be greatly appreciated. I am also concerned about errors and the results if the input array has for example two separate non-zero 3d areas within the original array.
To clarify the problem, I need to return one or more 3D portions as one or more 3d arrays beginning with an original larger array. The returned arrays should not include extraneous zeros (or false values) in any given exterior plane in three dimensional space. Just getting the indices of the nonzero values (or vice versa) doesn't by itself solve the problem.
Assuming you want to eliminate all rows, columns, etc. that contain only zeros, you could do the following:
nz = (test_array != 0)
non_zero = test_array[nz.any(axis=(1, 2))][:, nz.any(axis=(0, 2))][:, :, nz.any(axis=(0, 1))]
An alternative solution using np.nonzero:
i = [np.unique(_) for _ in np.nonzero(test_array)]
non_zero = test_array[i[0]][:, i[1]][:, :, i[2]]
This can also be generalized to arbitrary dimensions, but requires a bit more work (only showing the first approach here):
def real_size(arr):
nz = (arr != 0)
result = arr
axes = np.arange(arr.ndim)
for axis in range(arr.ndim):
zeros = nz.any(axis=tuple(np.delete(axes, axis)))
result = result[(slice(None),)*axis + (zeros,)]
return result
non_zero = real_size(test_array)
Currently, I have a 4d array, say,
arr = np.arange(48).reshape((2,2,3,4))
I want to apply a function that takes a 2d array as input to each 2d array sliced from arr. I have searched and read this question, which is exactly what I want.
The function I'm using is im2col_sliding_broadcasting() which I get from here. It takes a 2d array and list of 2 elements as input and returns a 2d array. In my case: it takes 3x4 2d array and a list [2, 2] and returns 4x6 2d array.
I considered using apply_along_axis() but as said it only accepts 1d function as parameter. I can't apply im2col function this way.
I want an output that has the shape as 2x2x4x6. Surely I can achieve this with for loop, but I heard that it's too time expensive:
import numpy as np
def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
# source: https://stackoverflow.com/a/30110497/10666066
# Parameters
M, N = A.shape
col_extent = N - BSZ[1] + 1
row_extent = M - BSZ[0] + 1
# Get Starting block indices
start_idx = np.arange(BSZ[0])[:, None]*N + np.arange(BSZ[1])
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:, None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
return np.take(A, start_idx.ravel()[:, None] + offset_idx.ravel()[::stepsize])
arr = np.arange(48).reshape((2,2,3,4))
output = np.empty([2,2,4,6])
for i in range(2):
for j in range(2):
temp = im2col_sliding_broadcasting(arr[i, j], [2,2])
output[i, j] = temp
Since my arr in fact is a 10000x3x64x64 array. So my question is: Is there another way to do this more efficiently ?
We can leverage np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to get sliding windows. More info on use of as_strided based view_as_windows.
from skimage.util.shape import view_as_windows
W1,W2 = 2,2 # window size
# create sliding windows along last two axes1
w = view_as_windows(arr,(1,1,W1,W2))[...,0,0,:,:]
# Merge the window axes (tha last two axes) and
# merge the axes along which those windows were created (3rd and 4th axes)
outshp = arr.shape[:-2] + (W1*W2,) + ((arr.shape[-2]-W1+1)*(arr.shape[-1]-W2+1),)
out = w.transpose(0,1,4,5,2,3).reshape(outshp)
The last step forces a copy. So, skip it if possible.
Problem:
Let's say I have a 2D array from which I want to randomly sample (using Monte-Carlo) smaller 2D sub-arrays as shown by the black patches in the figure below. I am looking for an efficient method of doing this.
Prospective (but partial) solution:
I came across one function that partially achieves what I am trying to do after several hours of search, but it lacks the ability to sample a patch at a random location. At least I don't think it can sample from random locations based on its arguments, although it does have one random_state argument that I do not understand.
sklearn.feature_extraction.image.extract_patches_2d(image, patch_size, max_patches=None, random_state=None)
Question:
Select random patch coordinates (2D sub-array) and use them to slice a patch from the bigger array as shown in figure above. The randomly sampled patches are allowed to overlap.
Here is a sampler that creates a sample cut from an array of any dimensionality. It uses functions to control where to start the cut and for how wide the cut should be along any axis.
Here is an explanation of the parameters:
arr - the input numpy array.
loc_sampler_fn - this is the function you want to use to set the corner of the box. If you want the corner of the box to be sampled uniformly from the anywhere along the axis, use np.random.uniform. If you want the corner to be closer to the center of the array, use np.random.normal. However, we need to tell the function what range to sample over. This brings us to the next parameter.
loc_dim_param - this passes the size of each axis to loc_sampler_fn. If we are using np.random.uniform for the location sampler, we want to sample from the entire range of the axis. np.random.uniform has two parameters: low and high, so by passing the length of the axis to high it samples uniformly over the entire axis. In other words, if the axis has length 120 we want np.random.uniform(low=0, high=120), so we would set loc_dim_param='high'.
loc_params - this passes any additional parameters to loc_sampler_fn. Keeping with the example, we need to pass low=0 to np.random.uniform, so we pass the dictionary loc_params={'low':0}.
From here, it is basically identical for the shape of the box. If you want the box height and width to be uniformly sampled from 3 to 10, pass in shape_sampler_fn=np.random.uniform, with shape_dim_param=None since we are not using the size of the axis for anything, and shape_params={'low':3, 'high':11}.
def box_sampler(arr,
loc_sampler_fn,
loc_dim_param,
loc_params,
shape_sampler_fn,
shape_dim_param,
shape_params):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
loc_sampler_fn : function
The function to determine the where the minimum coordinate
for each axis should be placed.
loc_dim_param : string or None
The parameter in `loc_sampler_fn` that should use the axes
dimension size
loc_params : dict
Parameters to pass to `loc_sampler_fn`.
shape_sampler_fn : function
The function to determine the width of the sample cut
along each axis.
shape_dim_param : string or None
The parameter in `shape_sampler_fn` that should use the
axes dimension size.
shape_params : dict
Parameters to pass to `shape_sampler_fn`.
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
slices = []
for dim in arr.shape:
if loc_dim_param:
loc_params.update({loc_dim_param: dim})
if shape_dim_param:
shape_params.update({shape_dim_param: dim})
start = int(loc_sampler_fn(**loc_params))
stop = start + int(shape_sampler_fn(**shape_params))
slices.append(slice(start, stop))
return slices, arr[slices]
Example for a uniform cut on a 2D array with widths between 3 and 9:
a = np.random.randint(0, 1+1, size=(100,150))
box_sampler(a,
np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':3, 'high':10})
# returns:
([slice(49, 55, None), slice(86, 89, None)],
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 0, 1],
[1, 1, 1],
[1, 1, 0]]))
Examples for taking 2x2x2 chunks from a 10x20x30 3D array:
a = np.random.randint(0,2,size=(10,20,30))
box_sampler(a, np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':2, 'high':2})
# returns:
([slice(7, 9, None), slice(9, 11, None), slice(19, 21, None)],
array([[[0, 1],
[1, 0]],
[[0, 1],
[1, 1]]]))
Update based on the comments.
For your specific purpose, it looks like you want a rectangular sample where the starting corner is uniformly sampled from anywhere in the array, and the the width of the sample along each axis is uniformly sampled, but can be limited.
Here is a function that generates these samples. min_width and max_width can accept iterables of integers (such as a tuple) or a single integer.
def uniform_box_sampler(arr, min_width, max_width):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
arr : array
The numpy array to sample a box from
min_width : int or tuple
The minimum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
max_width : int or tuple
The maximum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
if isinstance(min_width, (tuple, list)):
assert len(min_width)==arr.ndim, 'Dimensions of `min_width` and `arr` must match'
else:
min_width = (min_width,)*arr.ndim
if isinstance(max_width, (tuple, list)):
assert len(max_width)==arr.ndim, 'Dimensions of `max_width` and `arr` must match'
else:
max_width = (max_width,)*arr.ndim
slices = []
for dim, mn, mx in zip(arr.shape, min_width, max_width):
fn = np.random.uniform
start = int(np.random.uniform(0,dim))
stop = start + int(np.random.uniform(mn, mx+1))
slices.append(slice(start, stop))
return slices, arr[slices]
Example of generating a box cut that starts uniformly anywhere in the array, the height is a random uniform draw from 1 to 4 and the width is a random uniform draw from 2 to 6 (just to show). In this case, the size of the box was 3 by 4, starting at the 66th row and 19th column.
x = np.random.randint(0,2,size=(100,100))
uniform_box_sampler(x, (1,2), (4,6))
# returns:
([slice(65, 68, None), slice(18, 22, None)],
array([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 0]]))
So it seems like your issue with sklearn.feature_extraction.image.extract_patches_2d is that it forces you to to specify a single patch size, whereas you are looking for different patches of random size.
One thing to note here is that your result can't be a NumPy array (unlike the result of the sklearn function) because arrays have to have uniform-length rows/columns. So your output needs to be some other data structure that contains differently-shaped arrays.
Here's a workaround:
from itertools import product
def random_patches_2d(arr, n_patches):
# The all possible row and column slices from `arr` given its shape
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
# Pick randomly from the possible slices. The distribution will be
# random uniform from the given slices. We can't use
# np.random.choice because it only samples from a 1d array.
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
Example:
np.random.seed(99)
arr = np.arange(49).reshape(7, 7)
res = list(random_patches_2d(arr, 5))
print(res[0])
print()
print(res[3])
[[0 1]
[7 8]]
[[ 8 9 10 11]
[15 16 17 18]
[22 23 24 25]
[29 30 31 32]]
Condensed:
def random_patches_2d(arr, n_patches):
row, col = arr.shape
row_comb = [(i, j) for i, j in product(range(row), range(row)) if i < j]
col_comb = [(i, j) for i, j in product(range(col), range(col)) if i < j]
a = np.random.choice(np.arange(len(row_comb)), size=n_patches)
b = np.random.choice(np.arange(len(col_comb)), size=n_patches)
for i, j in zip(a, b):
yield arr[row_comb[i][0]:row_comb[i][1],
col_comb[i][0]:col_comb[i][1]]
Addressing your comment: you could successively add 1 patch and check the area after each.
# `size` is just row x col
area = arr.size
patch_area = 0
while patch_area <= area: # or while patch_area <= 0.1 * area:
patch = random_patches_2d(arr, n_patches=1)
patch_area += patch
I would like to use numpy to convert a 2D array of x,y coordinates into a flat array of distance of each coordinates between the previous. Note that the first pair of x/y coordinates should be keeped in the output array as reference to rebuild the coordinates later.
The aim of this process is to reduce the size of the array to increase the speed of sharing on the network.
For instance:
input = [[-8081441,5685214], [-8081446,5685216], [-8081442,5685219], [-8081440,5685211], [-8081441,5685214]]
output = [-8081441, 5685214, 5, -2, -4, -3, -2, 8, 1, -3]
def parseCoords(coords):
#keep the first x,y coordinates
parsed = [int(coords[0][0]), int(coords[0][1])]
for i in xrange(1, len(coords)):
parsed.extend([int(coords[i-1][0]) - int(coords[i][0]), int(coords[i-1][1]) - int(coords[i][1])])
return parsed
parsedCoords = parseCoords(input)
Is it possible, for increased performance, to use numpy arrays to do the same thing that this function?
First off, for performance, let's convert the list input as an array if its not an array already, like so -
arr = np.asarray(input).astype(int)
Now, we would have one approach with np.diff -
np.hstack((arr[0], (-np.diff(arr, axis=0)).ravel()))
Another approach with slicing to replicate the differentiation -
np.hstack((arr[0], (arr[:-1,:] - arr[1:,:]).ravel()))