Given any N-tuple of slices (aka N-D slice) in NumPy how to convert it to corresponding indexes of N-D array represented as tuple of 1D arrays (indexes along each axes)? E.g. if we have np.nd_slice_to_indexes next code:
import numpy as np
print(np.nd_slice_to_indexes(np.s_[1 : 3]))
print(np.nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
should print
(array([1, 2]),)
(array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
It is common for NumPy to represent indexes of N-D array as N-tuple of 1-D arrays of same length (each element of k-th array in tuple represents next index along k-th dimension). E.g. np.nonzero returns such N-tuple in code
print(np.nonzero([[0, 1, 1], [1, 1, 0]])) # Non-zero elements in 2D array.
# (array([0, 0, 1, 1], dtype=int64), array([1, 2, 0, 1], dtype=int64))
Same behavior should be achieved like in Pythonic function below, but in a more efficient (performant) way:
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
assert type(nd_slice) in [tuple, slice], type(nd_slice)
if type(nd_slice) is not tuple:
nd_slice = (nd_slice,)
def iter_slices(slices):
if len(slices) == 0:
yield ()
else:
for i in range(slices[0].start, slices[0].stop, slices[0].step or 1):
for r in iter_slices(slices[1:]):
yield (i,) + r
*res, = np.vstack(list(iter_slices(nd_slice))).T
return tuple(res)
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
Thanks to suggestion of #hpaulj solved task efficiently using np.mgrid.
Try it online!
import numpy as np
def nd_slice_to_indexes(nd_slice):
grid = np.mgrid[{tuple: nd_slice, slice: (nd_slice,)}[type(nd_slice)]]
return tuple(grid[i].ravel() for i in range(grid.shape[0]))
print(nd_slice_to_indexes(np.s_[1 : 3]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2]))
print(nd_slice_to_indexes(np.s_[1 : 3, 5 : 11 : 2, 8 : 14 : 3]))
# (array([1, 2]),)
# (array([1, 1, 1, 2, 2, 2]), array([5, 7, 9, 5, 7, 9]))
# (array([1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2]), array([5, 5, 7, 7, 9, 9, 5, 5, 7, 7, 9, 9]), array([ 8, 11, 8, 11, 8, 11, 8, 11, 8, 11, 8, 11]))
Related
This question already has an answer here:
Numpy - create matrix with rows of vector
(1 answer)
Closed 2 years ago.
I want to create a NumPy array by duplicating another array by a few rows. I did it as shown below. Is there a NumPyier way of doing this?
>>> a = np.arange(0,10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = tuple( a for _ in range(3) )
>>> b
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
>>> c = np.vstack( b )
>>> c
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
I found a way to do it. Sharing it here.
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[None,:]
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> np.repeat( a[None,:], 3, axis=0 )
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
I have a very large 2D numpy array of m x n elements. For each row, I need to remove exactly one element. So for example from a 4x6 matrix I might need to delete [0, 1], [1, 4], [2, 3], and [3, 3] - I have this set of coordinates stored in a list. In the end, the matrix will ultimately shrink in width by 1.
Is there a standard way to do this using a mask? Ideally, I need this to be as performant as possible.
Here is a method that use ravel_multi_index() to calculate one-dim index, and then delete() the elements, and reshape back to two-dim array:
import numpy as np
n = 12
a = np.repeat(np.arange(10)[None, :], n, axis=0)
index = np.random.randint(0, 10, n)
ravel_index = np.ravel_multi_index((np.arange(n), index), a.shape)
np.delete(a, ravel_index).reshape(n, -1)
the index:
array([4, 6, 9, 0, 3, 5, 3, 8, 9, 8, 4, 4])
the result:
array([[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 4, 5, 6, 7, 8, 9]])
I am trying to get the indices of unique elements of a numpy array (long vector of 3628621 elements).
However, I must do something wrong, because when I try to select the unique elements I am still finding duplicates:
Vector
Out[165]: array([712450, 714390, 718560, ..., 384390, 992041, 94852])
Loc = np.where(np.unique(Vector)) # Find indices of unique elements
Vector_New = Vector[Loc] # Create new vector with all unique elements
np.where(Vector_New == 173020) # See how often/where '173020' exists
Out[166]: (array([ 7098, 11581], dtype=int64),)
So, the integer '173020' exists still twice in the new vector, although I expected that all elements should be unique. The new vector is 11594 elements long.
Thanks for the help!
Regards,
Timen
np.unique has several parameters that can be activated and will give you the needed information. It's calling signature is:
np.unique(ar, return_index=False, return_inverse=False, return_counts=False)
read the docs.
In [50]: keys
Out[50]:
array([1, 3, 5, 2, 0, 7, 4, 7, 7, 2, 7, 5, 5, 3, 6, 2, 3, 5, 5, 5, 6, 9, 6,
5, 2, 1, 6, 6, 5, 9, 9, 6, 5, 5, 9, 9, 6, 3, 7, 0, 5, 1, 7, 6, 2, 4,
1, 0, 6, 5, 4, 8, 8, 4, 2, 1, 8, 3, 1, 9, 8, 4, 4, 2, 4, 7, 2, 6, 8,
6, 5, 2, 4, 9, 1, 5, 3, 1, 5, 6, 2, 2, 8, 4, 0, 4, 9, 0, 8, 1, 5, 3,
1, 3, 7, 1, 5, 8, 5, 8])
In [51]: np.unique(keys, return_counts=True, return_index=True)
Out[51]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 4, 0, 3, 1, 6, 2, 14, 5, 51, 21], dtype=int32),
array([ 5, 11, 11, 8, 10, 18, 12, 8, 9, 8]))
Say I have a Numpy vector,
A = zeros(100)
and I divide it into subvectors by a list of breakpoints which index into A, for instance,
breaks = linspace(0, 100, 11, dtype=int)
So the i-th subvector would be lie between the indices breaks[i] (inclusive) and breaks[i+1] (exclusive).
The breaks are not necessarily equispaced, this is only an example.
However, they will always be strictly increasing.
Now I want to operate on these subvectors. For instance, if I want to set all elements of the i-th subvector to i, I might do:
for i in range(len(breaks) - 1):
A[breaks[i] : breaks[i+1]] = i
Or I might want to compute the subvector means:
b = empty(len(breaks) - 1)
for i in range(len(breaks) - 1):
b = A[breaks[i] : breaks[i+1]].mean()
And so on.
How can I avoid using for loops and instead vectorize these operations?
You can use simple np.cumsum -
import numpy as np
# Form zeros array of same size as input array and
# place ones at positions where intervals change
A1 = np.zeros_like(A)
A1[breaks[1:-1]] = 1
# Perform cumsum along it to create a staircase like array, as the final output
out = A1.cumsum()
Sample run -
In [115]: A
Out[115]: array([3, 8, 0, 4, 6, 4, 8, 0, 2, 7, 4, 9, 3, 7, 3, 8, 6, 7, 1, 6])
In [116]: breaks
Out[116]: array([ 0, 4, 9, 11, 18, 20])
In [142]: out
Out[142]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4]..)
If you want to have mean values of those subvectors from A, you can use np.bincount -
mean_vals = np.bincount(out, weights=A)/np.bincount(out)
If you are looking to extend this functionality and use a custom function instead, you might want to look into MATLAB's accumarray equivalent for Python/Numpy: numpy_groupies whose source code is available here.
There really isn't a single answer to your question, but several techniques that you can use as building blocks. Another one you may find helpful:
All numpy ufuncs have a .reduceat method, which you can use to your advantage for some of your calculations:
>>> a = np.arange(100)
>>> breaks = np.linspace(0, 100, 11, dtype=np.intp)
>>> counts = np.diff(breaks)
>>> counts
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
>>> sums = np.add.reduceat(a, breaks[:-1], dtype=np.float)
>>> sums
array([ 45., 145., 245., 345., 445., 545., 645., 745., 845., 945.])
>>> sums / counts # i.e. the mean
array([ 4.5, 14.5, 24.5, 34.5, 44.5, 54.5, 64.5, 74.5, 84.5, 94.5])
You could use np.repeat:
In [35]: np.repeat(np.arange(0, len(breaks)-1), np.diff(breaks))
Out[35]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9])
To compute arbitrary binned statistics you could use scipy.stats.binned_statistic:
import numpy as np
import scipy.stats as stats
breaks = np.linspace(0, 100, 11, dtype=int)
A = np.random.random(100)
means, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic='mean', bins=breaks)
stats.binned_statistic can compute means, medians, counts, sums; or,
to compute an arbitrary statistics for each bin, you can pass a callable to the statistic parameter:
def func(values):
return values.mean()
funcmeans, bin_edges, binnumber = stats.binned_statistic(
x=np.arange(len(A)), values=A, statistic=func, bins=breaks)
assert np.allclose(means, funcmeans)
I have a numpy array, for example
a = np.arange(10)
how can I move the first n elements to the end of the array?
I found this roll function but it seems like it only does the opposite, which shifts the last n elements to the beginning.
Why not just roll with a negative number?
>>> import numpy as np
>>> a = np.arange(10)
>>> np.roll(a,2)
array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> np.roll(a,-2)
array([2, 3, 4, 5, 6, 7, 8, 9, 0, 1])
you can use negative shift
a = np.arange(10)
print(np.roll(a, 3))
print(np.roll(a, -3))
returns
[7, 8, 9, 0, 1, 2, 3, 4, 5, 6]
[3, 4, 5, 6, 7, 8, 9, 0, 1, 2]