Say I have an array myarr such that myarr.shape = (2,64,64,2). Now if I define myarr2 = myarr[[0,1,0,0,1],...], then the following is true
myarr2.shape #(5,64,64,2)
myarr2[0,...] == myarr[0,...] # = True
myarr2[1,...] == myarr[1,...] # = True
myarr2[2,...] == myarr[0,...] # = True
...
Can this be generalized so the slices are arrays? That is, is there a way to make the following hypothetical code work?
myarr2 = myarr[...,[20,30,40]:[30,40,50],[15,25,35]:[25,35,45],..]
myarr2[0,] == myarr[...,20:30,15:25,...] # = True
myarr2[1,] == myarr[...,30:40,25:35,...] # = True
myarr2[2,] == myarr[...,40:50,35:45,...] # = True
you may feed the coordinates of subarrays to the cycle which cuts subarrays from myarray. I don't know hoe you store the indices of subarrays so I put them into nested list idx_list:
idx_list = [[[20,30,40],[30,40,50]],[[15,25,35]:[25,35,45]]] # assuming 2D cutouts
idx_array = np.array([k for i in idx_list for j in i for k in j]) # unpack
idx_array = idx_array .reshape(-1,2).T # reshape
myarray2 = np.array([myarray[a:b,c:d] for a,b,c,d in i2]) # cut and combine
Let's simplify the problem a bit; first by removing the two outer dimensions that don't affect the core indexing issue; and by reducing the size so we can see and understand the results.
The setup
In [540]: arr = np.arange(7*7).reshape(7,7)
In [541]: arr
Out[541]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48]])
In [542]: idx =np.array([[0,2,4,6],[1,3,5,7]])
Now a straightforward iteration approach:
In [543]: alist = []
...: for i in range(idx.shape[1]-1):
...: j,k = idx[:,i]
...: sub = arr[j:j+2, k:k+2]
...: alist.append(sub)
...:
In [544]: np.array(alist)
Out[544]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
In [545]: _.shape
Out[545]: (3, 2, 2)
I simplified the iteration from:
...: for i in range(idx.shape[1]-1):
...: sub = arr[idx[0,i]:idx[0,i+1],idx[1,i]:idx[1,i+1]]
...: alist.append(sub)
to highlight the fact that we are generating ranges of a consistent size, and make the next transformation more obvious.
So I start with a (7,7) array, and create 3 (2,2) slices.
As I demonstrated in Slicing a different range at each index of a multidimensional numpy array, we can use linspace to expand a set of slices, or ranges.
In [567]: ranges = np.linspace(idx[:,:3],idx[:,:3]+1,2).astype(int)
In [568]: ranges
Out[568]:
array([[[0, 2, 4],
[1, 3, 5]],
[[1, 3, 5],
[2, 4, 6]]])
So ranges[0] expands on the idx[0] slices, etc. But if I simply index with these I get 'diagonal' values from Out[554]:
In [569]: arr[ranges[0], ranges[1]]
Out[569]:
array([[ 1, 17, 33],
[ 9, 25, 41]])
to get blocks I have to add a dimension to the first indices:
In [570]: arr[ranges[0,:,None], ranges[1]]
Out[570]:
array([[[ 1, 17, 33],
[ 2, 18, 34]],
[[ 8, 24, 40],
[ 9, 25, 41]]])
these are the same values as in Out[554], but need to be transposed:
In [571]: _.transpose(2,0,1)
Out[571]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
The code's a bit clunky and needs to get generalized, but gives the general idea of how one can substitute one indexing for the iterative one, provide the slices are regular enough. For this small example it probably isn't faster, but it probably will come ahead as the problem size gets larger.
Related
Here a is a 1D array of integer indices. To give some context, a is the atom indices of the 1st molecules. The return is the atom indices for n identical molecules, each of which contains step atoms. This function basically applies the same atom selection to many molecules
def f(a, step, n):
a.shape=1,-1
got = np.repeat(a, n, axis=0)
blocks = np.arange(0, n*step, step)
blocks.shape = n, -1
return got + blocks
For example
In [52]: f(np.array([1,3,4]), 10, 4)
Out[52]:
array([[ 1, 3, 4],
[11, 13, 14],
[21, 23, 24],
[31, 33, 34]])
It looks like broadcasting should be enough:
def f(a, step, n):
return a + np.arange(0, n*step, step)[:, None]
f(np.array([1,3,4]), 10, 4)
output:
array([[ 1, 3, 4],
[11, 13, 14],
[21, 23, 24],
[31, 33, 34]])
I am to calculate sum of all the directly surrounding elements to some element in a matrix.
[ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
so that sum_neighbours(matrix[0][0]) == 11 and sum_neighbours(matrix[1][1]) == 40.
The problem is just that I'm a beginner and I don't know how to make sum_neighbours calculate how many neighbours a certain number has.
I figured that I could write write if-elif-else-statement and then give the specific amount of neighbours that each value in the matrix has, but surely there must be a more efficient way to do this?
Otherwise it'll only be able to calculate the sum of the neighbours for matrices that are 3 x 3.
A nice approach is to use numpy and a convolution:
import numpy as np
from scipy.signal import convolve2d
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
convolve2d(a, [[1,1,1],[1,0,1],[1,1,1]], mode='same')
# top center bottom
output:
array([[11, 19, 13],
[23, 40, 27],
[17, 31, 19]])
Alternatively:
convolve2d(a, np.ones((3,3)), mode='same')-a
# this sums the neighbours + the center
# so we need to subtract the initial array
example on a larger array and ignoring the top left neighbor
this is just to show yo how easy it is to perform similar operations when using convolutions
a = np.arange(5*6).reshape((5,6))
# array([[ 0, 1, 2, 3, 4, 5],
# [ 6, 7, 8, 9, 10, 11],
# [12, 13, 14, 15, 16, 17],
# [18, 19, 20, 21, 22, 23],
# [24, 25, 26, 27, 28, 29]])
convolve2d(a, [[0,1,1],[1,0,1],[1,1,1]], mode='same')
array([[ 7, 15, 19, 23, 27, 25],
[ 20, 42, 49, 56, 63, 52],
[ 44, 84, 91, 98, 105, 82],
[ 68, 126, 133, 140, 147, 112],
[ 62, 107, 112, 117, 122, 73]])
If you would like to achieve this without any imports (the underlying assumption is that you have already checked that you have a well formed list of lists/matrix i.e. all the rows have the same length):
# you pass the matrix and the (i,j) coordinates of the element of interest
# This select the "matrix" around i,j (flooring to 0 and capping to
# the number of elements in the list - this is for the elements on the edge
# of the matrix)
def select(m, i, j):
def s(x, y): return x[max(0,y-1):min(len(x),y+1) + 1]
return [s(x, j) for x in s(m, i)]
def sum_around(m, i, j, excluded = True):
# this sums all the elements within each list and compute the
# grand total. It then subtracts the element in (i,j) if
# excluded = True (which is the default behaviour and what you want here)
return sum([sum(x) for x in select(m, i, j)]) - (m[i][j] if excluded else 0)
m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(sum_around(m, 0, 0)) # prints 11
print(sum_around(m, 1, 1)) # prints 40
I guess you can add an extra row and column on boundary with values 0.
Then you can easily add the neighbouring elements, without any boundary conditions.
I have a matrix with several different values for each row:
arr1 = np.array([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18],[19,20,21,22,23,24,25,26,27]])
arr2 = np.array([["A"],["B"],["C"]])
This produces the following matrices:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
array([['A'],
['B'],
['C']])
A represents the first 3 columns, B represents the next 3 columns, and C represents the last 3 columns. So the result I'd like here is:
array([[1,2,3],
[13,14,15],
[25,26,27]])
I was thinking about converting arr2 to a mask array, but I'm not even sure how to do this. If it was a 1darray I could do something like this:
arr[0,1,2]
but for a 2darray I'm not even sure how to mask like this. I tried this and got errors:
arr[[0,1,2],[3,4,5],[6,7,8]]
What's the best way to do this?
Thanks.
You could use string.ascii_uppercase to index the index in the alphabet. And reshape arr1 by 3 chunks:
from string import ascii_uppercase
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(ascii_uppercase.index)(arr2).ravel()]
Or just directly map A to 0 and so on...
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(['A', 'B', 'C'].index)(arr2).ravel()]
Both Output:
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below:
arr2 = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]])
The output will be as follows:
[[ 1 2 3]
[13 14 15]
[25 26 27]]
you can use 'advanced indexing' which index the target array by coordinate arrays.
rows = np.array([[0,0,0],[1,1,1],[2,2,2]])
cols = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr1[rows, cols]
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
and you can make some functions like
def diagonal(arr, step):
rows = np.array([[x]*step for x in range(step)])
cols = np.array([[y for y in range(x, x+step)] for x in range(0, step**2, step)])
return arr[rows, cols]
diagonal(arr1, 3)
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
reference: https://numpy.org/devdocs/user/basics.indexing.html
I'd like to produce a function like split(arr, i, j), which divides array arr by axis i, j?
But I do not know how to do it. In the following method using array_split. It is impossible for me to obtain the two-dimensional array that we are seeking by merely dividing N-dimensional arrays into N-1 dimensional arrays.
import numpy as np
arr = np.arange(36).reshape(4,9)
dim = arr.ndim
ax = np.arange(dim)
arritr = [np.array_split(arr, arr.shape[ax[i]], ax[i]) for i in range(dim)]
print(arritr[0])
print(arritr[1])
How can I achieve this?
I believe you would like to slice array by axis(row, column). Here is the doc. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
arr[1,:] # will return all values at index 1(row index 1)
arr[:,1] # will return all values at column 1
I'm guessing a bit here, but it sounds like you want to divide the array into 4 blocks.
In [120]: arr = np.arange(36).reshape(6,6)
In [122]: [arr[:3,:4], arr[:3:,4:], arr[3:, :4], arr[3:,4:]]
Out[122]:
[array([[ 0, 1, 2, 3],
[ 6, 7, 8, 9],
[12, 13, 14, 15]]),
array([[ 4, 5],
[10, 11],
[16, 17]]),
array([[18, 19, 20, 21],
[24, 25, 26, 27],
[30, 31, 32, 33]]),
array([[22, 23],
[28, 29],
[34, 35]])]
Don't worry about efficiency. array_split does the same sort of slicing. Check its code to verify that.
If you want more slices, you could add more arr[i1:i2, j1:j2], for any mix of indices.
Are you looking for something like matlab's mat2cell? Then you could do:
import numpy as np
def ndsplit(a, splits):
assert len(splits) <= a.ndim
splits = [np.r_[0, s, m] for s, m in zip(splits, a.shape)]
return np.frompyfunc(lambda *x: a[tuple(slice(s[i],s[i+1]) for s, i in zip(splits, x))], len(splits), 1)(*np.indices(tuple(len(s) - 1 for s in splits)))
# demo
a = np.arange(56).reshape(7, 8)
print(ndsplit(a, [(2, 4), (1, 5, 6)]))
# [[array([[0],
# [8]])
# array([[ 1, 2, 3, 4],
# [ 9, 10, 11, 12]])
# array([[ 5],
# [13]]) array([[ 6, 7],
# [14, 15]])]
# [array([[16],
# [24]])
# array([[17, 18, 19, 20],
# [25, 26, 27, 28]])
# array([[21],
# [29]]) array([[22, 23],
# [30, 31]])]
# [array([[32],
# [40],
# [48]])
# array([[33, 34, 35, 36],
# [41, 42, 43, 44],
# [49, 50, 51, 52]])
# array([[37],
# [45],
# [53]])
# array([[38, 39],
# [46, 47],
# [54, 55]])]]
For computer vision training purposes, random cropping is often used as a data augmentation technique. At each iteration, a batch of random crops is generated and fed to the network being trained. This needs to be efficient, as it is done at each training iteration.
If the data has too many dimensions, random dimension selection might also be needed. Random frames can be selected in a video for example. The data can even have 4 dimensions (3 in space + time), or more.
How can one write an efficient generator of random views of lower dimension?
A very naïve version for getting 2D views from 3D data, and only one by one, could be:
import numpy as np
import numpy.random as nr
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
selector = np.zeros(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 1
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 1
else:
selector[:, :, drop_dim_keep] = 1
yield np.squeeze(data[selector])
A more elegant solution probably exists, where at least:
there is no ugly if/else on the randomly chosen dimension
views can take a batch_size integer argument and generate several views at once without a loop
the dimension of input/output data is not specified (e.g. can do 3D -> 2D as well as 4D -> 2D)
I tweaked your function to clarify what it's doing:
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
dropshape = list(shape[:])
dropshape[drop_dim] -= 1
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
selector = np.ones(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 0
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 0
else:
selector[:, :, drop_dim_keep] = 0
yield data[selector].reshape(dropshape)
A small sample run:
In [534]: data = np.arange(24).reshape(shape)
In [535]: data
Out[535]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [536]: v = views()
In [537]: next(v)
2 1
Out[537]:
array([[[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11]],
[[12, 14, 15],
[16, 18, 19],
[20, 22, 23]]])
In [538]: next(v)
0 0
Out[538]:
array([[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
So it's picking one of the dimensions, and for that dimension dropping one 'column'.
The main efficiency issue is whether it's returning a view or a copy. In this case it has to return a copy.
You are using a boolean mask to select the return, exactly the same as what np.delete does in this case.
In [544]: np.delete(data,1,2).shape
Out[544]: (2, 3, 3)
In [545]: np.delete(data,0,0).shape
Out[545]: (1, 3, 4)
So you could replace much of your interals with delete, letting it take care of generalizing the dimensions. Look at its code to see how it handles those details (It isn't short and sweet!).
def rand_delete():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
yield np.delete(data, drop_dim_keep, drop_dim)
In [547]: v1=rand_delete()
In [548]: next(v1)
0 1
Out[548]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
In [549]: next(v1)
2 0
Out[549]:
array([[[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]],
[[13, 14, 15],
[17, 18, 19],
[21, 22, 23]]])
Replace the delete with take:
def rand_take():
while True:
take_dim = nr.randint(0, 3)
take_keep = nr.randint(0, shape[take_dim])
print(take_dim, take_keep)
yield np.take(data, take_keep, axis=take_dim)
In [580]: t = rand_take()
In [581]: next(t)
0 0
Out[581]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [582]: next(t)
2 3
Out[582]:
array([[ 3, 7, 11],
[15, 19, 23]])
np.take returns a copy, but the equivalent slicing does not
In [601]: data.__array_interface__['data']
Out[601]: (182632568, False)
In [602]: np.take(data,0,1).__array_interface__['data']
Out[602]: (180099120, False)
In [603]: data[:,0,:].__array_interface__['data']
Out[603]: (182632568, False)
A slicing tuple can be generated with expressions like
In [604]: idx = [slice(None)]*data.ndim
In [605]: idx[1] = 0
In [606]: data[tuple(idx)]
Out[606]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Various numpy functions that take an axis parameter construct an indexing tuple like this. (For example one or more of the apply... functions.