numpy, "cleaning" an index and value array

numpy, "cleaning" an index and value array - python

I have an index array, and an associated value array.
delta = np.array([0,3,4,1,1,4,4,5,7,10], dtype = int)
theta = np.random.normal(size = (12, 5))
I want to "clean" the index array such indices with no presence are dropped, and higher indices are moved down to take their place. In this case, the result will be:
delta == np.array([0,2,3,1,1,3,3,4,5,6], dtype = int)
theta == theta[np.array([0,1,3,4,5,7,10, 2,6,8,9,11], dtype = int)]
and the associated entries are moved up in the theta array such that their indices match the new indices in the delta vector. How do I go about this?

Let's ask unique for all of the optional values.
In [799]: np.unique(x,return_index=True,return_inverse=True, return_counts=True)
Out[799]:
(array([ 0, 1, 3, 4, 5, 7, 10]),
array([0, 3, 1, 2, 7, 8, 9]),
array([0, 2, 3, 1, 1, 3, 3, 4, 5, 6]),
array([1, 2, 1, 3, 1, 1, 1]))
Looks like the 'inverse' is what you want.
Review np.unique docs for more details.

Related

Find runs and lengths of consecutive values in an array

I'd like to find equal values in an array and their indices if they occur consecutively more then 2 times.
[0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4]
so in this example I would find value "2" occured "4" times, starting from position "8". Is there any build in function to do that?
I found a way with collections.Counter
collections.Counter(a)
# Counter({0: 3, 1: 4, 3: 2, 5: 1, 4: 1})
but this is not what I am looking for.
Of course I can write a loop and compare two values and then count them, but may be there is a more elegant solution?

Find consecutive runs and length of runs with condition
import numpy as np
arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2, 2, 2, 2, 1, 3, 4])
res = np.ones_like(arr)
np.bitwise_xor(arr[:-1], arr[1:], out=res[1:]) # set equal, consecutive elements to 0
# use this for np.floats instead
# arr = np.array([0, 3, 0, 1, 0, 1, 2, 1, 2.4, 2.4, 2.4, 2, 1, 3, 4, 4, 4, 5])
# res = np.hstack([True, ~np.isclose(arr[:-1], arr[1:])])
idxs = np.flatnonzero(res) # get indices of non zero elements
values = arr[idxs]
counts = np.diff(idxs, append=len(arr)) # difference between consecutive indices are the length
cond = counts > 2
values[cond], counts[cond], idxs[cond]
Output
(array([2]), array([4]), array([8]))
# (array([2.4, 4. ]), array([3, 3]), array([ 8, 14]))

_, i, c = np.unique(np.r_[[0], ~np.isclose(arr[:-1], arr[1:])].cumsum(),
return_index = 1,
return_counts = 1)
for index, count in zip(i, c):
if count > 1:
print([arr[index], count, index])
Out[]: [2, 4, 8]
A little more compact way of doing it that works for all input types.

Best way to find indexes of first occurences of integers in each row of numpy array?

If I have an array such as:
a = np.array([[1, 1, 2, 2, 1, 3, 4],
[8, 7, 7, 7, 4, 8, 8]])
what would be the best way to get as output:
array([[0, 2, 5, 6], [0, 1, 4]])
or
array([[0, 2, 5, 6], [4, 1, 0]])
These are the indices of the first occurence of each integer in each row. The order of the indices is not important.
Currently I am using:
res = []
for row in a:
unique, unique_indexes = np.unique(a, return_index=True)
res.append(unique_indexes)
But I wonder if there is a (num)pythonic way to avoid the for loop.

You can transform the array in such a way that you process the entire thing in one batch. Let's start with an example very similar to the one in your question:
a = np.array([[1, 1, 2, 2, 1, 3, 4], [8, 7, 7, 7, 5, 8, 8]])
Now get the indices:
_, ix = np.unique(a, return_index=True)
# ix = array([ 0, 2, 5, 6, 11, 8, 7])
Notice that the indices of the first elements are correct. The following elements are offset by the size of a. In general, the offset is
offset = ix // a.shape[-1]
# offset = array([0, 0, 0, 0, 1, 1, 1])
ix %= a.shape[-1]
# ix = array([0, 2, 5, 6, 4, 1, 0])
You can call np.split on the new ix at every location where offset changes value:
ix = np.split(ix, np.flatnonzero(np.diff(offset)) + 1)
So why is this example valid, but the one in the question is not? The key is that np.unique uses a sort-based approach (which makes it run in O(n log n) rather than the O(n) of collections.Counter). That means that for the order of the indices to be correct, each row must be unique from and larger than the previous row. Notice that in your example, 4 appears in both rows. You can ensure this with a simple check of the max and min values in each row:
mn = a.min(axis=1)
mx = a.max(axis=1)
diff = np.r_[0, (mx - mn + 1)[:-1].cumsum(0)] - mn
# diff = array([-4, -4])
b = a + diff[:, None]
# b = array([[0, 0, 1, 1, 0, 2, 3],
[7, 6, 6, 6, 4, 7, 7]])
Notice that you have to offset the cumulative sum by one to get the right index. If you deal with large integers and/or very large arrays, you will have to be more careful about making diff to avoid overflow.
Now you can use b in place of a in the call to np.unique.
TL;DR
Here is a general no-loop approach, applicable to any axis, not just the last:
def global_unq(a, axis=-1):
n = a.shape[axis]
a = np.moveaxis(np.asanyarray(a), axis, -1).reshape(-1, n)
mn = a.min(-1)
mx = a.max(-1)
diff = np.r_[0, (mx - mn + 1)[:-1].cumsum(0)] - mn
_, ix = np.unique(a + diff[:, None], return_index=True)
return np.split(ix % n, np.flatnonzero(np.diff(ix // n)) + 1)

You could put it into a list comprehension, but your loop is fairly clean already:
[list(np.unique(e, return_index=True)[-1]) for e in a]
# [[0, 2, 5, 6], [4, 1, 0]]

Numpy: for each element in one dimension, find coordinates of maximum of sub-array

I've seen variations of this question asked a few times but so far haven't seen any answers that get to the heart of this general case. I have an n-dimensional array of shape [a, b, c, ...] . For some dimension x, I want to look at each sub-array and find the coordinates of the maximum.
For example, say b = 2, and that's the dimension I'm interested in. I want the coordinates of the maximum of [:, 0, :, ...] and [:, 1, :, ...] in the form a_max = [a_max_b0, a_max_b1], c_max = [c_max_b0, c_max_b1], etc.
I've tried to do this by reshaping my input matrix to a 2d array [b, a*c*d*...], using argmax along axis 0, and unraveling the indices, but the output coordinates don't wind up giving the maxima in my dataset. In this case, n = 3 and I'm interested in axis 1.
shape = gains_3d.shape
idx = gains_3d.reshape(shape[1], -1)
idx = idx.argmax(axis = 1)
a1, a2 = np.unravel_index(idx, [shape[0], shape[2]])
Obviously I could use a loop, but that's not very pythonic.
For a concrete example, I randomly generated a 4x2x3 array. I'm interested in axis 1, so the output should be two arrays of length 2.
testarray = np.array([[[0.17028444, 0.38504759, 0.64852725],
[0.8344524 , 0.54964746, 0.86628204]],
[[0.77089997, 0.25876277, 0.45092835],
[0.6119848 , 0.10096425, 0.627054 ]],
[[0.8466859 , 0.82011746, 0.51123959],
[0.26681694, 0.12952723, 0.94956865]],
[[0.28123628, 0.30465068, 0.29498136],
[0.6624998 , 0.42748154, 0.83362323]]])
testarray[:,0,:] is
array([[0.17028444, 0.38504759, 0.64852725],
[0.77089997, 0.25876277, 0.45092835],
[0.8466859 , 0.82011746, 0.51123959],
[0.28123628, 0.30465068, 0.29498136]])
, so the first element of the first output array will be 2, and the first element of the other will be 0, pointing to 0.8466859. The second elements of the two matrices will be 2 and 2, pointing to 0.94956865 of testarray[:,1,:]

Let's first try to get a clear idea of what you are trying to do:
Sample 3d array:
In [136]: arr = np.random.randint(0,10,(2,3,4))
In [137]: arr
Out[137]:
array([[[1, 7, 6, 2],
[1, 5, 7, 1],
[2, 2, 5, *6*]],
[[*9*, 1, 2, 9],
[2, *9*, 3, 9],
[0, 2, 0, 6]]])
After fiddling around a bit I came up with this iteration, showing the coordinates for each middle dimension, and the max value
In [151]: [(i,np.unravel_index(np.argmax(arr[:,i,:]),(2,4)),np.max(arr[:,i,:])) for i in range
...: (3)]
Out[151]: [(0, (1, 0), 9), (1, (1, 1), 9), (2, (0, 3), 6)]
I can move the unravel outside the iteration:
In [153]: np.unravel_index([np.argmax(arr[:,i,:]) for i in range(3)],(2,4))
Out[153]: (array([1, 1, 0]), array([0, 1, 3]))
Your reshape approach does avoid this loop:
In [154]: arr1 = arr.transpose(1,0,2) # move our axis first
In [155]: arr1 = arr1.reshape(3,-1)
In [156]: arr1
Out[156]:
array([[1, 7, 6, 2, 9, 1, 2, 9],
[1, 5, 7, 1, 2, 9, 3, 9],
[2, 2, 5, 6, 0, 2, 0, 6]])
In [158]: np.argmax(arr1,axis=1)
Out[158]: array([4, 5, 3])
In [159]: np.unravel_index(_,(2,4))
Out[159]: (array([1, 1, 0]), array([0, 1, 3]))
max and argmax take only one axis value, where as you want the equivalent of taking the max along all but one axis. Some ufunc takes a axis tuple, but these do not. The transpose and reshape may be the only way.
In [163]: np.max(arr1,axis=1)
Out[163]: array([9, 9, 6])

Appending missing values according to index

Say if i have a tensor
values = torch.tensor([5, 3, 2, 8])
and a corresponding index to values
index = torch.tensor([0, 2, 4, 5])
and assuming that I want to insert in the missing index (1 and 3), with a fixed value (100), such that I get
values = torch.tensor([5, 100, 3, 100, 2, 8])
Is there a vectorized way to do it in PyTorch (or numpy)?

You can fill it with 100 first and then fill it with the original values.
in pytorch
import torch
result = torch.empty(6, dtype = torch.int32).fill_(100)
values = torch.tensor([5, 3, 2, 8], dtype = torch.int32)
index = torch.tensor([0, 2, 4, 5])
result[index] = values
print(result)
in numpy
import numpy as np
result = np.full((6,), 100)
index = np.array([0, 2, 4, 5])
values = np.array([5, 3, 2, 8])
result[index] = values
print(result)

Rolling array around

Let's say I have
arr = np.arange(6)
arr
array([0, 1, 2, 3, 4, 5])
and I decide that I want to treat an array "like a circle": When I run out of material at the end, I want to start at index 0 again. That is, I want a convenient way of selecting x elements, starting at index i.
Now, if x == 6, I can simply do
i = 3
np.hstack((arr[i:], arr[:i]))
Out[9]: array([3, 4, 5, 0, 1, 2])
But is there a convenient way of generally doing this, even if x > 6, without having to manually breaking the array apart and thinking through the logic?
For example:
print(roll_array_arround(arr)[2:17])
should return.
array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])

See mode='wrap' in ndarray.take:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
Taking your hypothetical function:
print(roll_array_arround(arr)[2:17])
If it is implied that it is a true slice of the original array that you are after, that is not going to happen; a wrapped-around array cannot be expressed as a strided view of the original; so if you seek a function that maps an ndarray to an ndarray, this will necessarily involve a copy of your data.
That is, efficiency-wise, you shouldnt expect to find solution that significantly differs in performance from the expression below.
print(arr.take(np.arange(2,17), mode='wrap'))

Modulus operation seems like the best fit here -
def rolling_array(n, x, i):
# n is rolling period
# x is length of array
# i is starting number
return np.mod(np.arange(i,i+x),n)
Sample runs -
In [61]: rolling_array(n=6, x=6, i=3)
Out[61]: array([3, 4, 5, 0, 1, 2])
In [62]: rolling_array(n=6, x=17, i=2)
Out[62]: array([2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0])

A solution you can look into would probably be :
from itertools import cycle
list_to_rotate = np.array([1,2,3,4,5])
rotatable_list = cycle(list_to_rotate)

You need to roll your array.
>>> x = np.arange(10)
>>> np.roll(x, 2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
See numpy documentation for more details.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy, "cleaning" an index and value array - python

Related

Find runs and lengths of consecutive values in an array

Best way to find indexes of first occurences of integers in each row of numpy array?

Numpy: for each element in one dimension, find coordinates of maximum of sub-array

Appending missing values according to index

Rolling array around

Categories

Resources