Why does np.argwhere's result shape not match it's input? - python

Suppose I pass a 1D array:
>>> np.arange(0,20)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> np.arange(0,20).shape
(20,)
into argwhere:
>>> np.argwhere(np.arange(0,20)<10)
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
>>> np.argwhere(np.arange(0,20)<10).shape
(10, 1)
why has the result changed into a 2D array? What's the benefit of this?

argwhere returns the coordinates of where condition is True. In general, coordinates are tuples, therefore the output should be 2D.
>>> np.argwhere(np.arange(0,20).reshape(2,2,5)<10)
array([[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 0, 3],
[0, 0, 4],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
[0, 1, 3],
[0, 1, 4]])
For consistency, this also applies to the case of 1D input.

numpy.argwhere finds indices of elements that fulfill the condition. it happened that some of your elements are the outputted elements themselves (the index is the same as value).
Particularly, in your example the input is one dimensional, the output is one dimension (index) by two (the second is to iterate over values).
I hope this is clear, if not, take this example of two dimensional input array presented in the documentation of numpy:
>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
[1, 0],
[1, 1],
[1, 2]])

argwhere is simply the transpose of where (actually np.nonzero):
In [17]: np.where(np.arange(0,20)<10)
Out[17]: (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),)
In [18]: np.transpose(_)
Out[18]:
array([[0],
[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8],
[9]])
where produces a tuple of arrays, one array per dimension (here a 1 element tuple). transpose turns that tuple into an array (e.g. (1,10) shape), and then transposes it. So it's number of columns is the ndim of the input condition, and the number of rows the number of finds.
argwhere can be useful in visualizing the finds, but is not as useful in programs as the where itself. The where tuple can be used to index the condition array directly. The argwhere array is usually used iteratively. For example:
In [19]: x = np.arange(10).reshape(2,5)
In [20]: x %2
Out[20]:
array([[0, 1, 0, 1, 0],
[1, 0, 1, 0, 1]])
In [21]: np.where(x%2)
Out[21]: (array([0, 0, 1, 1, 1]), array([1, 3, 0, 2, 4]))
In [22]: np.argwhere(x%2)
Out[22]:
array([[0, 1],
[0, 3],
[1, 0],
[1, 2],
[1, 4]])
In [23]: x[np.where(x%2)]
Out[23]: array([1, 3, 5, 7, 9])
In [24]: for i in np.argwhere(x%2):
...: print(x[tuple(i)])
...:
1
3
5
7
9
In [25]: [x[tuple(i)] for i in np.argwhere(x%2)]
Out[25]: [1, 3, 5, 7, 9]

Related

Select subset of rows of numpy array based on a selection of rows in another array

I have 2 numpy arrays and I wanted to select a subset of rows in one of them based on a conditional subset of the other
arr1 = np.array([[1, 2, 1, 5], [3, 4, 1, 6], [2, 2, 2, 7]])
arr2 = np.array([[2, 2, 1], [2, 3, 0], [2, 1, 1]])
I want to select only those rows in arr1 where the 3rd element in arr2 is 1. In this case, arr2 will look like so:
np.array([[2, 2, 1], [2, 1, 1]]) and arr1 will become: np.array([[1, 2, 1, 5], [2, 2, 2, 7]]). Both of them can be assumed to have the same number of rows, but can have different number of columns. How can I achieve this?
In [500]: arr2
Out[500]:
array([[2, 2, 1],
[2, 3, 0],
[2, 1, 1]])
3rd element of each row:
In [502]: arr2[:,2]
Out[502]: array([1, 0, 1])
In [503]: arr2[:,2]==1
Out[503]: array([ True, False, True])
apply this boolean mask to select rows of arr1:
In [504]: arr1[arr2[:,2]==1]
Out[504]:
array([[1, 2, 1, 5],
[2, 2, 2, 7]])

How to label the entries of a matrix as their indices but still keep their original numerical values?

I have a matrix_1 full of numerical values and what I'd like to do is transform this into a matrix_2 with the values(of matrix_1 sorted) and then replace these sorted values in matrix 2 with the original indices from matrix_1.
I don't want to use any loops as the matrices are rather large.
for example : matrix_1=[[2,3,4,1],[6,5,9,7]]
I want to end up with matrix_2=[[(1,4),(1,1),(1,2),(1,3)],
[(2,2),(2,1),(2,4),(2,3)]]
I've tried use np.ndenumerate on the original matrix but it returns array([numpy.ndenumerate object at 0x1a1a9fce90], dtype=object)
I've now also tried np.argsort() but it doesn't seem to work, possibly because all of my entries are floats...
You must come from R or other language that start indexing on 1. In Python, indexes start at 0, so you have to explicitly add + 1 to the indexes to make them start at 1.
Use argsort and then reshape
m1 = matrix_1.argsort(1) + 1
i = (np.repeat(np.arange(m1.shape[0]), m1.shape[1]) + 1).reshape(m1.shape)
np.concatenate([m1[:, None],i[:, None]], axis=1).swapaxes(2,1)
which outputs
array([[[4, 1],
[1, 1],
[2, 1],
[3, 1]],
[[2, 2],
[1, 2],
[4, 2],
[3, 2]]])
using np.argsort should do the trick:
matrix_1=np.array([[2,3,4,1],[6,5,9,7]])
matrix_1
array([[2, 3, 4, 1],
[6, 5, 9, 7]])
x = np.argsort(matrix_1,axis=1)
array([[3, 0, 1, 2],
[1, 0, 3, 2]], dtype=int64)
A matrix consisting of floats shouldn't pose a problem.
You can then create the list as:
[[(i+1,v+1) for v in enumerate(y)] for i, y in enumerate(x.tolist())]
[[(1, 4), (1, 1), (1, 2), (1, 3)], [(2, 2), (2, 1), (2, 4), (2, 3)]]
argsort applied to the flattened array:
In [110]: np.argsort(arr1.ravel())
Out[110]: array([3, 0, 1, 2, 5, 4, 7, 6])
Turn that into 2d indices:
In [111]: np.unravel_index(_,(2,4))
Out[111]: (array([0, 0, 0, 0, 1, 1, 1, 1]), array([3, 0, 1, 2, 1, 0, 3, 2]))
Combine the arrays into one, and reshape:
In [112]: np.transpose(_)
Out[112]:
array([[0, 3],
[0, 0],
[0, 1],
[0, 2],
[1, 1],
[1, 0],
[1, 3],
[1, 2]])
In [113]: _+1 # tweak values to match yours
Out[113]:
array([[1, 4],
[1, 1],
[1, 2],
[1, 3],
[2, 2],
[2, 1],
[2, 4],
[2, 3]])
In [114]: _.reshape(2,4,2)
Out[114]:
array([[[1, 4],
[1, 1],
[1, 2],
[1, 3]],
[[2, 2],
[2, 1],
[2, 4],
[2, 3]]])

How to repeat a numpy array along a new dimension with padding?

Given an two arrays: an input array and a repeat array, I would like to receive an array which is repeated along a new dimension a specified amount of times for each row and padded until the ending.
to_repeat = np.array([1, 2, 3, 4, 5, 6])
repeats = np.array([1, 2, 2, 3, 3, 1])
# I want final array to look like the following:
#[[1, 0, 0],
# [2, 2, 0],
# [3, 3, 0],
# [4, 4, 4],
# [5, 5, 5],
# [6, 0, 0]]
The issue is that I'm operating with large datasets (10M or so) so a list comprehension is too slow - what is a fast way to achieve this?
Here's one with masking based on this idea -
m = repeats[:,None] > np.arange(repeats.max())
out = np.zeros(m.shape,dtype=to_repeat.dtype)
out[m] = np.repeat(to_repeat,repeats)
Sample output -
In [44]: out
Out[44]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])
Or with broadcasted-multiplication -
In [67]: m*to_repeat[:,None]
Out[67]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])
For large datasets/sizes, we can leverage multi-cores and be more efficient on memory with numexpr module on that broadcasting -
In [64]: import numexpr as ne
# Re-using mask `m` from previous method
In [65]: ne.evaluate('m*R',{'m':m,'R':to_repeat[:,None]})
Out[65]:
array([[1, 0, 0],
[2, 2, 0],
[3, 3, 0],
[4, 4, 4],
[5, 5, 5],
[6, 0, 0]])

Merging arrays of varying size in Python

is there an easy way to merge let's say n spectra (i.e. arrays of shape (y_n, 2)) with varying lengths y_n into an array (or list) of shape (y_n_max, 2*x) by filling up y_n with zeros if it is
Basically I want to have all spectra next to each other.
For example
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
into
c = [[1,2,6,7],[2,3,8,9],[4,5,0,0]]
Either Array or List would be fine. I guess it comes down to filling up arrays with zeros?
If you're dealing with native Python lists, then you can do:
from itertools import zip_longest
c = [a + b for a, b in zip_longest(a, b, fillvalue=[0, 0])]
You also could do this with extend and zip without itertools provided a will always be longer than b. If b could be longer than a, the you could add a bit of logic as well.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
b.extend([[0,0]]*(len(a)-len(b)))
[[x,y] for x,y in zip(a,b)]
Trying to generalize the other solutions to multiple lists:
In [114]: a
Out[114]: [[1, 2], [2, 3], [4, 5]]
In [115]: b
Out[115]: [[6, 7], [8, 9]]
In [116]: c
Out[116]: [[3, 4]]
In [117]: d
Out[117]: [[1, 2], [2, 3], [4, 5], [6, 7], [8, 9]]
In [118]: ll=[a,d,c,b]
zip_longest pads
In [120]: [l for l in itertools.zip_longest(*ll,fillvalue=[0,0])]
Out[120]:
[([1, 2], [1, 2], [3, 4], [6, 7]),
([2, 3], [2, 3], [0, 0], [8, 9]),
([4, 5], [4, 5], [0, 0], [0, 0]),
([0, 0], [6, 7], [0, 0], [0, 0]),
([0, 0], [8, 9], [0, 0], [0, 0])]
intertools.chain flattens the inner lists (or .from_iterable(l))
In [121]: [list(itertools.chain(*l)) for l in _]
Out[121]:
[[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]]
More ideas at Convert Python sequence to NumPy array, filling missing values
Adapting #Divakar's solution to this case:
def divakars_pad(ll):
lens = np.array([len(item) for item in ll])
mask = lens[:,None] > np.arange(lens.max())
out = np.zeros((mask.shape+(2,)), int)
out[mask,:] = np.concatenate(ll)
out = out.transpose(1,0,2).reshape(5,-1)
return out
In [142]: divakars_pad(ll)
Out[142]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])
For this small size the itertools solution is faster, even with an added conversion to array.
With an array as target we don't need the chain flattener; reshape takes care of that:
In [157]: np.array(list(itertools.zip_longest(*ll,fillvalue=[0,0]))).reshape(-1, len(ll)*2)
Out[157]:
array([[1, 2, 1, 2, 3, 4, 6, 7],
[2, 3, 2, 3, 0, 0, 8, 9],
[4, 5, 4, 5, 0, 0, 0, 0],
[0, 0, 6, 7, 0, 0, 0, 0],
[0, 0, 8, 9, 0, 0, 0, 0]])
Use the zip built-in function and the chain.from_iterable function from itertools. This has the benefit of being more type agnostic than the other posted solution -- it only requires that your spectra are iterables.
a = [[1,2],[2,3],[4,5]]
b = [[6,7],[8,9]]
c = list(list(chain.from_iterable(zs)) for zs in zip(a,b))
If you want more than 2 spectra, you can change the zip call to zip(a,b,...)

Is there a way to loop through the return value of np.where?

Is there a way to loop-through this tuple(?) where the left array are positions in an array and the right array is the value I would like to insert into the given positions:
(array([ 0, 4, 6, ..., 9992, 9996, 9997]), array([3, 3, 3, ..., 3, 3, 3]))
The output above is generated from the following piece of code:
np.where(h2 == h2[i,:].max())[1]
I would like the result to be like this:
array[0] = 3
array[4] = 3
...
array[9997] = 3
Just use a simple indexing:
indices, values = my_tuple
array[indices] = values
If you don't have the final array yet you can create it using a desire function like np.zeros, np.ones, etc. with a size as the size of maximum index.
I think you want the transpose of the where tuple:
In [204]: x=np.arange(1,13).reshape(3,4)
In [205]: x
Out[205]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
In [206]: idx=np.where(x)
In [207]: idx
Out[207]:
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], dtype=int32),
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32))
In [208]: ij=np.transpose(idx)
In [209]: ij
Out[209]:
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 2],
[2, 3]], dtype=int32)
In fact there's a function that does just that:
np.argwhere(x)
Iterating on ij, I can print:
In [213]: for i,j in ij:
...: print('array[{}]={}'.format(i,j))
...:
array[0]=0
array[0]=1
array[0]=2
zip(*) is a list version of transpose:
for i,j in zip(*idx):
print(i,j)

Categories

Resources