Using Numpy to replace values in place in arbitrary axis - python

How can I replace array values in place if I don't know the axis beforehand?
For example, if I wanna do something like
arr[:,5]
but I don't know the axis beforehand and want to make it general I can use take:
arr.take(5, axis=1)
and it'll work.
However, if I want to something like
arr[:,5]=10
but I don't know the axis beforehand, how can I do it? I obviously can't do arr.take(5, axis=1) = 10, and I can't find a function to do it.
The function that comes the closest (that I found) would be np.put(), but I don't think it can be done with that.

You could swap the desired axis to the first position and then do the assignment. swapaxes returns a view, so the assignment will do what you want.
For example,
In [87]: np.random.seed(123)
In [88]: a = np.random.randint(1, 9, size=(5, 8))
In [89]: a
Out[89]:
array([[7, 6, 7, 3, 5, 3, 7, 2],
[4, 3, 4, 2, 7, 2, 1, 2],
[7, 8, 2, 1, 7, 1, 8, 2],
[4, 7, 6, 5, 1, 1, 5, 2],
[8, 4, 3, 5, 8, 3, 5, 8]])
In [90]: ax = 1
In [91]: k = 5
In [92]: val = 99
In [93]: a.swapaxes(0, ax)[k] = val
In [94]: a
Out[94]:
array([[ 7, 6, 7, 3, 5, 99, 7, 2],
[ 4, 3, 4, 2, 7, 99, 1, 2],
[ 7, 8, 2, 1, 7, 99, 8, 2],
[ 4, 7, 6, 5, 1, 99, 5, 2],
[ 8, 4, 3, 5, 8, 99, 5, 8]])
In [95]: ax = 0
In [96]: k = 2
In [97]: val = -1
In [98]: a.swapaxes(0, ax)[k] = val
In [99]: a
Out[99]:
array([[ 7, 6, 7, 3, 5, 99, 7, 2],
[ 4, 3, 4, 2, 7, 99, 1, 2],
[-1, -1, -1, -1, -1, -1, -1, -1],
[ 4, 7, 6, 5, 1, 99, 5, 2],
[ 8, 4, 3, 5, 8, 99, 5, 8]])

I don't think there is a NumPy function for this, but it is not too hard to construct your own:
def replace(arr, indices, val, axis):
s = [slice(None)]*arr.ndim
s[axis] = indices
arr[s] = val
import numpy as np
def replace(arr, indices, val, axis):
s = [slice(None)]*arr.ndim
s[axis] = indices
arr[s] = val
arr = np.zeros((3,6,2))
indices = 5
axis = 1
val = 10
replace(arr, indices, val, axis)
print(np.take(arr, indices, axis))
prints
[[ 10. 10.]
[ 10. 10.]
[ 10. 10.]]

Related

Numpy operation to expand array into sequential slices of given length?

my_function must expand a 1D numpy array to a 2D numpy array, with the 2nd axis containing the slices of length starting from the first index until the end. Example:
import numpy as np
a = np.arange(10)
print (my_function(a, length=3))
Expected output
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
I can achieve this using a for loop, but I was wondering if there is a numpy vectorization technique for this.
def my_function(a, length):
b = np.zeros((len(a)-(length-1), length))
for i in range(len(b)):
b[i] = a[i:i+length]
return b
If you're careful with the math and heed the warningin the docs, you can use np.lib.stride_tricks.as_strided(). You need to calculate the correct dimensions for your array so you don't overflow. Also note that as_strided() shares memory, so you will multiple references to the same memory in the final output. (You can of course, copy this to a new array).
>> import numpy as np
>> def my_function(a, length):
stride = a.strides[0]
l = len(a) - length + 1
return np.lib.stride_tricks.as_strided(a, (l, length), (stride,stride) )
>> np.array(my_function(np.arange(10), 3))
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
>> np.array(my_function(np.arange(15), 7))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10],
[ 5, 6, 7, 8, 9, 10, 11],
[ 6, 7, 8, 9, 10, 11, 12],
[ 7, 8, 9, 10, 11, 12, 13],
[ 8, 9, 10, 11, 12, 13, 14]])
How about this function?
import numpy as np
def my_function(a, length):
result = []
for i in range(length):
result.append(a + i)
return np.vstack(result).T[:len(a) - length + 1]
a = np.arange(10)
length = 3
my_function(a, length)

Computing staggered/shifted means on Numpy arrays

Given a Numpy array of shape 6 x 10, how would you compute the shifted means of the diagonals? A matrix like this
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
Should result in an array of the following means
[np.mean(0), np.mean(1, 0), np.mean(2, 1, 0), np.mean(3, 2, 1, 0), ..., np.mean(9, 8) + np.mean(9)
Update: A much simpler method than the one I first posted:
>>> id_ = np.add.outer(*map(np.arange, A.shape))
>>> result = np.bincount(id_.ravel(), A.ravel()) / np.bincount(id_.ravel())
Update ends.
Here is a method using as_strided:
>>> A = np.repeat(np.arange(10)[None, :], 6, axis=0)
>>> A
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>>
>>> sh_i, sh_j = A.shape
>>> st_i, st_j = A.strides
>>>
>>> assert A.flags.c_contiguous
>>> assert sh_i <= sh_j
>>>
>>> A_full = np.lib.stride_tricks.as_strided(A.ravel()[(sh_i-1) * sh_j:], (sh_j-sh_i+1, sh_i), (st_j, st_j-st_i))
>>> A_part = np.lib.stride_tricks.as_strided(A.ravel()[sh_i * sh_j - sh_i+1:], (sh_i-2, sh_i+1), (st_j, st_j-st_i))
>>> split = np.array((np.zeros((sh_i-2,), int), np.arange(sh_i-1, 1, -1), np.full((sh_i-2,), sh_i+1))).T
>>> full_means = A_full.mean(axis=1)
>>> part_means = A_part.cumsum(axis=1)[np.arange(sh_i-2)[:, None], split[:, 1:]-1].astype(float)
>>> part_means[:, 1] -= part_means[:, 0]
>>> part_means /= np.diff(split, axis=1)
>>> result = np.concatenate([A[0, 0, None], part_means[:, 1], full_means, part_means[:, 0], A[-1, -1, None]])
>>>
>>> result
array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3.5, 4.5, 5.5, 6.5, 7. , 7.5, 8. ,
8.5, 9. ])
The easiest way of understanding what's going on here is to inspect the strided views A_full and A_part. A_full contains the full-length diagonals, while A_part contains the bottom right reduced-length diagonals except the very corner concatenated with the top left reduced-length diagonals except the very corner.
>>> A_full
array([[0, 1, 2, 3, 4, 5],
[1, 2, 3, 4, 5, 6],
[2, 3, 4, 5, 6, 7],
[3, 4, 5, 6, 7, 8],
[4, 5, 6, 7, 8, 9]])
>>> A_part
array([[5, 6, 7, 8, 9, 0, 1],
[6, 7, 8, 9, 0, 1, 2],
[7, 8, 9, 0, 1, 2, 3],
[8, 9, 0, 1, 2, 3, 4]])
split contains the posititons where bottom-right ends and top-left begins.
>>> from pprint import pprint
>>>
>>> split
array([[0, 5, 7],
[0, 4, 7],
[0, 3, 7],
[0, 2, 7]])
>>> pprint([np.split(Ap, sp) for Ap, sp in zip(A_part, split[:, 1, None])])
[[array([5, 6, 7, 8, 9]), array([0, 1])],
[array([6, 7, 8, 9]), array([0, 1, 2])],
[array([7, 8, 9]), array([0, 1, 2, 3])],
[array([8, 9]), array([0, 1, 2, 3, 4])]]
The rest of the code uses these bits to piece together the desired vector of means.
At the risk of being castigated for using a Python for-loop alongside NumPy code, you could take advantage of np.eye to serve as a sliding mask along the diagonals without sacrificing a ton of runtime.
>>> from functools import partial
>>> import numpy as np
>>> def diagonal_means(a):
... m, n = a.shape
... a_ = a[::-1].copy()
... eyemask = partial(np.eye, *a.shape, dtype=np.bool_)
... for k in range(1 - m, n):
... yield a_[eyemask(k=k)].mean()
Example:
>>> a = np.arange(56).reshape(7, 8); a
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55]])
# Means of: [0], [1, 8], [2, 9, 16], ...[55]
>>> np.array(list(diagonal_means(a)))
array([ 0. , 4.5, 9. , 13.5, 18. , 22.5, 27. , 28. , 32.5, 37. , 41.5,
46. , 50.5, 55. ])
Logically, you could reverse each mask generated with eyemask, but it's probably more efficient to reverse a copy of a first.

Get index of largest element for each submatrix in a Numpy 2D array

I have a 2D Numpy ndarray, x, that I need to split in square subregions of size s. For each subregion, I want to get the greatest element (which I do), and its position within that subregion (which I can't figure out).
Here is a minimal example:
>>> x = np.random.randint(0, 10, (6,8))
>>> x
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
>>> h, w = x.shape
>>> s = 2
>>> f = x.reshape(h//s, s, w//s, s)
>>> mx = np.max(f, axis=(1, 3))
>>> mx
array([[9, 9, 7, 5],
[7, 9, 6, 7],
[8, 9, 8, 6]])
For example, the 8 in the lower left corner of mx is the greatest element from subregion [[1,6], [8, 7]] in the lower left corner of x.
What I want is to get an array similar to mx, that keeps the indices of the largest elements, like this:
[[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]]
where, for example, the 2 in the lower left corner is the index of 8 in the linear representation of [[1, 6], [8, 7]].
I could do it like this: np.argmax(f[i, :, j, :]) and iterate over i and j, but the speed difference is enormous for large amounts of computation. To give you an idea, I'm trying to use (only) Numpy for max pooling. Basically, I'm asking if there is a faster alternative than what I'm using.
Here's one approach -
# Get shape of output array
m,n = np.array(x.shape)//s
# Reshape and permute axes to bring the block as rows
x1 = x.reshape(h//s, s, w//s, s).swapaxes(1,2).reshape(-1,s**2)
# Use argmax along each row and reshape to output shape
out = x1.argmax(1).reshape(m,n)
Sample input, output -
In [362]: x
Out[362]:
array([[9, 4, 8, 9, 5, 7, 3, 3],
[3, 1, 8, 0, 7, 7, 5, 1],
[7, 7, 3, 6, 0, 2, 1, 0],
[7, 3, 9, 8, 1, 6, 7, 7],
[1, 6, 0, 7, 5, 1, 2, 0],
[8, 7, 9, 5, 8, 3, 6, 0]])
In [363]: out
Out[363]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])
Alternatively, to simplify things, we could use scikit-image that does the heavy work of reshaping and permuting axes for us -
In [372]: from skimage.util import view_as_blocks as viewB
In [373]: viewB(x, (s,s)).reshape(-1,s**2).argmax(1).reshape(m,n)
Out[373]:
array([[0, 1, 1, 2],
[0, 2, 3, 2],
[2, 2, 2, 2]])

Extract rows from python array in python

I have a numpy array X with shape (768, 8).
The last value for each row can either be 0 or 1, I only want rows with value 1, and call this T.
I did:
T = [x for x in X if x[7]==1]
This is correct, however, this is now a list, not a numpy array (in fact I cannot print T.shape).
What should I do instead to keep this a numpy array?
NumPy's boolean indexing gets the job done in a fully vectorized manner. This approach is generally more efficient (and arguably more elegant) than using list comprehensions and type conversions.
T = X[X[:, -1] == 1]
Demo:
In [232]: first_columns = np.random.randint(0, 10, size=(10, 7))
In [233]: last_column = np.random.randint(0, 2, size=(10, 1))
In [234]: X = np.hstack((first_columns, last_column))
In [235]: X
Out[235]:
array([[4, 3, 3, 2, 6, 2, 2, 0],
[2, 7, 9, 4, 7, 1, 8, 0],
[9, 8, 2, 1, 2, 0, 5, 1],
[4, 4, 4, 9, 6, 4, 9, 1],
[9, 8, 7, 6, 4, 4, 9, 0],
[8, 3, 3, 2, 9, 5, 5, 1],
[7, 1, 4, 5, 2, 4, 7, 0],
[8, 0, 0, 1, 5, 2, 6, 0],
[7, 9, 9, 3, 9, 3, 9, 1],
[3, 1, 8, 7, 3, 2, 9, 0]])
In [236]: mask = X[:, -1] == 1
In [237]: mask
Out[237]: array([False, False, True, True, False, True, False, False, True, False], dtype=bool)
In [238]: T = X[mask]
In [239]: T
Out[239]:
array([[9, 8, 2, 1, 2, 0, 5, 1],
[4, 4, 4, 9, 6, 4, 9, 1],
[8, 3, 3, 2, 9, 5, 5, 1],
[7, 9, 9, 3, 9, 3, 9, 1]])
By calling
T = [x for x in X if x[8]==1]
you are making T as a list. To convert it any list to a numpy array, just use:
T = numpy.array([x for x in X if x[8]==1])
Here is what happens:
In [1]: import numpy as np
In [2]: a = [1,2,3,4]
In [3]: a.T
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-9f69ed463660> in <module>()
----> 1 a.T
AttributeError: 'list' object has no attribute 'T'
In [4]: a = np.array(a)
In [5]: a.T
Out[5]: array([1, 2, 3, 4])
In [6]:

Python generating repmat using each column individually

I have an array of shape 3x3 which looks something like:
import numpy as np
A = np.array(([1,2,3],[11,12,5],[4,9,1]))
>>> A
array([[ 1, 2, 3],
[11, 12, 5],
[ 4, 9, 1]])
I want to repmat one column at a time for 3 times so that I can achieve the following:
B
array([[ 1, 1, 1, 2, 2, 2, 3, 3, 3],
[11, 11, 11, 12, 12, 12, 5, 5, 5],
[ 4, 4, 4, 9, 9, 9, 1, 1, 1]])
I can do a loop for each column and repmat that but I am looking for smarter way to do it as my real life array has size 5000x300
This is the job of numpy.repeat. Quoting an example from the docs:
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])

Categories

Resources