How can I create an numpy array from two different numpy arrays? - python

I want to create a bumpy array from two different bumpy arrays. For example:
Say I have 2 arrays a and b.
a = np.array([1,3,4])
b = np.array([[1,5,51,52],[2,6,61,62],[3,7,71,72],[4,8,81,82],[5,9,91,92]])
I want it to loop through each indices in array a and find it in array b and then save the row of b into c. Like below:
c = np.array([[1,5,51,52],
[3,7,71,72],
[4,8,81,82]])
I have tried doing:
c=np.zeros(shape=(len(b),4))
for i in b:
c[i]=a[b[i][:]]
but get this error "arrays used as indices must be of integer (or boolean) type"

Approach #1
If a is sorted, we can use np.searchsorted, like so -
idx = np.searchsorted(a,b[:,0])
idx[idx==a.size] = 0
out = b[a[idx] == b[:,0]]
Sample run -
In [160]: a
Out[160]: array([1, 3, 4])
In [161]: b
Out[161]:
array([[ 1, 5, 51, 52],
[ 2, 6, 61, 62],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82],
[ 5, 9, 91, 92]])
In [162]: out
Out[162]:
array([[ 1, 5, 51, 52],
[ 3, 7, 71, 72],
[ 4, 8, 81, 82]])
If a is not sorted, we need to use sorter argument with searchsorted.
Approach #2
We can also use np.in1d -
b[np.in1d(b[:,0],a)]

Related

Numpy indexing by range of arrays

Say I have an array myarr such that myarr.shape = (2,64,64,2). Now if I define myarr2 = myarr[[0,1,0,0,1],...], then the following is true
myarr2.shape #(5,64,64,2)
myarr2[0,...] == myarr[0,...] # = True
myarr2[1,...] == myarr[1,...] # = True
myarr2[2,...] == myarr[0,...] # = True
...
Can this be generalized so the slices are arrays? That is, is there a way to make the following hypothetical code work?
myarr2 = myarr[...,[20,30,40]:[30,40,50],[15,25,35]:[25,35,45],..]
myarr2[0,] == myarr[...,20:30,15:25,...] # = True
myarr2[1,] == myarr[...,30:40,25:35,...] # = True
myarr2[2,] == myarr[...,40:50,35:45,...] # = True
you may feed the coordinates of subarrays to the cycle which cuts subarrays from myarray. I don't know hoe you store the indices of subarrays so I put them into nested list idx_list:
idx_list = [[[20,30,40],[30,40,50]],[[15,25,35]:[25,35,45]]] # assuming 2D cutouts
idx_array = np.array([k for i in idx_list for j in i for k in j]) # unpack
idx_array = idx_array .reshape(-1,2).T # reshape
myarray2 = np.array([myarray[a:b,c:d] for a,b,c,d in i2]) # cut and combine
Let's simplify the problem a bit; first by removing the two outer dimensions that don't affect the core indexing issue; and by reducing the size so we can see and understand the results.
The setup
In [540]: arr = np.arange(7*7).reshape(7,7)
In [541]: arr
Out[541]:
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12, 13],
[14, 15, 16, 17, 18, 19, 20],
[21, 22, 23, 24, 25, 26, 27],
[28, 29, 30, 31, 32, 33, 34],
[35, 36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47, 48]])
In [542]: idx =np.array([[0,2,4,6],[1,3,5,7]])
Now a straightforward iteration approach:
In [543]: alist = []
...: for i in range(idx.shape[1]-1):
...: j,k = idx[:,i]
...: sub = arr[j:j+2, k:k+2]
...: alist.append(sub)
...:
In [544]: np.array(alist)
Out[544]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
In [545]: _.shape
Out[545]: (3, 2, 2)
I simplified the iteration from:
...: for i in range(idx.shape[1]-1):
...: sub = arr[idx[0,i]:idx[0,i+1],idx[1,i]:idx[1,i+1]]
...: alist.append(sub)
to highlight the fact that we are generating ranges of a consistent size, and make the next transformation more obvious.
So I start with a (7,7) array, and create 3 (2,2) slices.
As I demonstrated in Slicing a different range at each index of a multidimensional numpy array, we can use linspace to expand a set of slices, or ranges.
In [567]: ranges = np.linspace(idx[:,:3],idx[:,:3]+1,2).astype(int)
In [568]: ranges
Out[568]:
array([[[0, 2, 4],
[1, 3, 5]],
[[1, 3, 5],
[2, 4, 6]]])
So ranges[0] expands on the idx[0] slices, etc. But if I simply index with these I get 'diagonal' values from Out[554]:
In [569]: arr[ranges[0], ranges[1]]
Out[569]:
array([[ 1, 17, 33],
[ 9, 25, 41]])
to get blocks I have to add a dimension to the first indices:
In [570]: arr[ranges[0,:,None], ranges[1]]
Out[570]:
array([[[ 1, 17, 33],
[ 2, 18, 34]],
[[ 8, 24, 40],
[ 9, 25, 41]]])
these are the same values as in Out[554], but need to be transposed:
In [571]: _.transpose(2,0,1)
Out[571]:
array([[[ 1, 2],
[ 8, 9]],
[[17, 18],
[24, 25]],
[[33, 34],
[40, 41]]])
The code's a bit clunky and needs to get generalized, but gives the general idea of how one can substitute one indexing for the iterative one, provide the slices are regular enough. For this small example it probably isn't faster, but it probably will come ahead as the problem size gets larger.

Numpy Interweave oddly shaped arrays

Alright, here the given data;
There are three numpy arrays of the shapes:
(i, 4, 2), (i, 4, 3), (i, 4, 2)
the i is shared among them but is variable.
The dtype is float32 for everything.
The goal is to interweave them in a particular order. Let's look at the data at index 0 for these arrays:
[[-208. -16.]
[-192. -16.]
[-192. 0.]
[-208. 0.]]
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
[[ 0.49609375 0.984375 ]
[ 0.25390625 0.984375 ]
[ 0.25390625 0.015625 ]
[ 0.49609375 0.015625 ]]
In this case, the concatened target array would look something like this:
[-208, -16, 1, 1, 1, 0.496, 0.984, -192, -16, 1, 1, 1, ...]
And then continue on with index 1.
I don't know how to achieve this, as the concatenate function just keeps telling me that the shapes don't match. The shape of the target array does not matter much, just that the memoryview of it must be in the given order for upload to a gpu shader.
Edit: I could achieve this with a few python for loops, but the performance impact would be a problem in this program.
Use np.dstack and flatten with np.ravel() -
np.dstack((a,b,c)).ravel()
Now, np.dstack is basically stacking along the third axis. So, alternatively we can use np.concatenate too along that axis, like so -
np.concatenate((a,b,c),axis=2).ravel()
Sample run -
1) Setup Input arrays :
In [613]: np.random.seed(1234)
...: n = 3
...: m = 2
...: a = np.random.randint(0,9,(n,m,2))
...: b = np.random.randint(11,99,(n,m,2))
...: c = np.random.randint(101,999,(n,m,2))
...:
2) Check input values :
In [614]: a
Out[614]:
array([[[3, 6],
[5, 4]],
[[8, 1],
[7, 6]],
[[8, 0],
[5, 0]]])
In [615]: b
Out[615]:
array([[[84, 58],
[61, 87]],
[[48, 45],
[49, 78]],
[[22, 11],
[86, 91]]])
In [616]: c
Out[616]:
array([[[104, 359],
[376, 560]],
[[472, 720],
[566, 115]],
[[344, 556],
[929, 591]]])
3) Output :
In [617]: np.dstack((a,b,c)).ravel()
Out[617]:
array([ 3, 6, 84, 58, 104, 359, 5, 4, 61, 87, 376, 560, 8,
1, 48, 45, 472, 720, 7, 6, 49, 78, 566, 115, 8, 0,
22, 11, 344, 556, 5, 0, 86, 91, 929, 591])
What I would do is:
np.hstack([a, b, c]).flatten()
assuming a, b, c are the three arrays

Finding max in numpy skipping some rows and column

I want to find the max row and column index in a numpy matrix. But it not be in the a set of rows or columns. Thus, it should skip those rows and columns while computing the max.
Example:
# finding max in numpy matrix
[row,col] = np.where(mat == mat.max())
But it should skip rows removed_rows=[] and columns columns_rows=[]
I don't want to create a new sub matrix for the computation.
Let a be the input array, rows_rem and cols_rem be the rows and column indices to be skipped respectively. We would have an approach using masking, like so -
m,n = a.shape
d0,d1 = np.ogrid[:m,:n]
a_masked = a*~(np.in1d(d0,rows_rem)[:,None] | np.in1d(d1,cols_rem))
max_row, max_col = np.where(a_masked == a_masked.max())
Sample run -
In [204]: # Inputs
...: a = np.random.randint(11,99,(4,5))
...: rows_rem = [1,3]
...: cols_rem = [1,2,4]
...:
In [205]: a
Out[205]:
array([[36, 51, 72, 18, 31],
[78, 42, 12, 71, 72],
[38, 46, 42, 67, 12],
[87, 56, 76, 14, 21]])
In [206]: a_masked
Out[206]:
array([[64, 0, 0, 90, 0],
[ 0, 0, 0, 0, 0],
[17, 0, 0, 40, 0],
[ 0, 0, 0, 0, 0]])
In [207]: max_row, max_col
Out[207]: (array([0]), array([3]))
Please note that if there's more than one element with the same max value, we would have all of those in the output. So, if you want any or the first of those, we can use argmax, like so -
max_row, max_col = np.unravel_index(a_masked.argmax(),a.shape)
remove_rows = [2,3]
remove_cols = [0,1]
a = np.random.randint(11,99,(4,5))
>>> a
array([[60, 86, 89, 66, 20],
[77, 86, 78, 90, 44],
[68, 57, 83, 48, 25],
[30, 81, 42, 11, 63]])
>>>
Get the row and column indices that you are interested in by filtering out the indices you want removed:
r, c = a.shape
r = [x for x in range(r) if x not in remove_rows]
c = [x for x in range(c) if x not in remove_cols]
>>> r,c
([0, 1], [2, 3, 4])
>>>
Now r and c can be used for integer indexing, numpy.ix_ helps with this.
>>> a[np.ix_(r,c)]
array([[89, 66, 20],
[78, 90, 44]])
>>>
Tack on ndarray.max() to get the max value:
>>> a[np.ix_(r,c)].max()
90
>>>
Finally, use numpy.where to find where it is in the original array:
>>> row, col = np.where(a == a[np.ix_(r,c)].max())
>>> row, col
(array([1]), array([3]))
>>>
This method will also work if removing non-sequential rows or columns.
For example:
remove_rows = [0,3]
remove_cols = [1,4]

Multiply each row of one array with each element of another array in numpy

I have two arrays A and B in numpy. A holds cartesian coordinates, each row is one point in 3D space and has the shape (r, 3). B has the shape (r, n) and holds integers.
What I would like to do is multiply each element of B with each row in A, so that the resulting array has the shape (r, n, 3). So for example:
# r = 3
A = np.array([1,1,1, 2,2,2, 3,3,3]).reshape(3,3)
# n = 2
B = np.array([10, 20, 30, 40, 50, 60]).reshape(3,2)
# Result with shape (3, 2, 3):
# [[[10,10,10], [20,20,20]],
# [[60,60,60], [80,80,80]]
# [[150,150,150], [180,180,180]]]
I'm pretty sure this can be done with np.einsum, but I've been trying this for quite a while now and can't get it to work.
Use broadcasting -
A[:,None,:]*B[:,:,None]
Since np.einsum also supports broadcasting, you can use that as well (thanks to #ajcr for suggesting this concise version) -
np.einsum('ij,ik->ikj',A,B)
Sample run -
In [22]: A
Out[22]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [23]: B
Out[23]:
array([[10, 20],
[30, 40],
[50, 60]])
In [24]: A[:,None,:]*B[:,:,None]
Out[24]:
array([[[ 10, 10, 10],
[ 20, 20, 20]],
[[ 60, 60, 60],
[ 80, 80, 80]],
[[150, 150, 150],
[180, 180, 180]]])
In [25]: np.einsum('ijk,ij->ijk',A[:,None,:],B)
Out[25]:
array([[[ 10, 10, 10],
[ 20, 20, 20]],
[[ 60, 60, 60],
[ 80, 80, 80]],
[[150, 150, 150],
[180, 180, 180]]])

New array of smaller size excluding one value from each column

In Python 2.7 using numpy or by any means if I had an array of any size and wanted to excluded certain values and output the new array how would I do that? Here is What I would like
[(1,2,3),
(4,5,6), then exclude [4,2,9] to make the array[(1,5,3),
(7,8,9)] (7,8,6)]
I would always be excluding data the same length as the row length and always only one entry per column. [(1,5,3)] would be another example of data I would want to excluded. So every time I loop the function it reduces the array row size by one. I would imagine I have to use a masked array or convert my mask to a masked array and subtract the two then maybe condense the output but I have no idea how. Thanks for your time.
You can do it very efficiently if you transform your 2-D array in an unraveled 1-D array. Then you repeat the array with the elements to be excluded, called e in order to do an element-wise comparison:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
e = [1, 5, 3]
ar = a.T.ravel()
er = np.repeat(e, a.shape[0])
ans = ar[er != ar].reshape(a.shape[1], a.shape[0]-1).T
But it will work if each element in e only matches one row of a.
EDIT:
as suggested by #Jaime, you can avoid the ravel() and get the same result doing directly:
ans = a.T[(a != e).T].reshape(a.shape[1], a.shape[0]-1).T
To exclude vector e from matrix a:
import numpy as np
a = np.array([(1,2,3), (4,5,6), (7,8,9)])
e = [4,2,9]
print np.array([ [ i for i in a.transpose()[j] if i != e[j] ]
for j in range(len(e)) ]).transpose()
This would take some work to generalize, but here's something that can handle 2-d cases of the kind you describe. If passed unexpected input, this won't notice and will generate strange results, but it's at least a starting point:
def columnwise_compress(a, values):
a_shape = a.shape
a_trans_flat = a.transpose().reshape(-1)
compressed = a_trans_flat[~numpy.in1d(a_trans_flat, values)]
return compressed.reshape(a_shape[:-1] + ((a_shape[0] - 1),)).transpose()
Tested:
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [4, 2, 9])
array([[1, 5, 3],
[7, 8, 6]])
>>> columnwise_compress(numpy.arange(9).reshape(3, 3) + 1, [1, 5, 3])
array([[4, 2, 6],
[7, 8, 9]])
The difficulty is that you're asking for "compression" of a kind that numpy.compress doesn't do (removing different values for each column or row) and you're asking for compression along columns instead of rows. Compressing along rows is easier because it moves along the natural order of the values in memory; you might consider working with transposed arrays for that reason. If you want to do that, things become a bit simpler:
>>> a = numpy. array([[1, 4, 7],
... [2, 5, 8],
... [3, 6, 9]])
>>> a[~numpy.in1d(a, [4, 2, 9]).reshape(3, 3)].reshape(3, 2)
array([[1, 7],
[5, 8],
[3, 6]])
You'll still need to handle shape parameters intelligently if you do it this way, but it will still be simpler. Also, this assumes there are no duplicates in the original array; if there are, this could generate wrong results. Saullo's excellent answer partially avoids the problem, but any value-based approach isn't guaranteed to work unless you're certain that there aren't duplicate values in the columns.
In the spirit of #SaulloCastro's answer, but handling multiple occurrences of items, you can remove the first occurrence on each column doing the following:
def delete_skew_row(a, b) :
rows, cols = a.shape
row_to_remove = np.argmax(a == b, axis=0)
items_to_remove = np.ravel_multi_index((row_to_remove,
np.arange(cols)),
a.shape, order='F')
ret = np.delete(a.T, items_to_remove)
return np.ascontiguousarray(ret.reshape(cols,rows-1).T)
rows, cols = 5, 10
a = np.random.randint(100, size=(rows, cols))
b = np.random.randint(rows, size=(cols,))
b = a[b, np.arange(cols)]
>>> a
array([[50, 46, 85, 82, 27, 41, 45, 27, 17, 26],
[92, 35, 14, 34, 48, 27, 63, 58, 14, 18],
[90, 91, 39, 19, 90, 29, 67, 52, 68, 69],
[10, 99, 33, 58, 46, 71, 43, 23, 58, 49],
[92, 81, 64, 77, 61, 99, 40, 49, 49, 87]])
>>> b
array([92, 81, 14, 82, 46, 29, 67, 58, 14, 69])
>>> delete_skew_row(a, b)
array([[50, 46, 85, 34, 27, 41, 45, 27, 17, 26],
[90, 35, 39, 19, 48, 27, 63, 52, 68, 18],
[10, 91, 33, 58, 90, 71, 43, 23, 58, 49],
[92, 99, 64, 77, 61, 99, 40, 49, 49, 87]])

Categories

Resources