Indexing ndarray by ndarray

Indexing ndarray by ndarray - python

I have a first ndarray, foo, in which I want to select several elements.
foo = array([0, 10, 30] , [20, 40, 60], [30, 50, 70])
To be precised, I have another ndarray, bar, in which I store the rows I want in each column of my first ndarray.
bar = array([1, 2, 0], [0, 0, 1])
What I want as result is :
array([20, 50, 30] , [0, 10, 60])
Is it a vectorized way to do it ?
When I try foo[bar], it increases the size of the array.
That is not what I'm looking for.

In [17]: foo[bar, np.arange(3)]
Out[17]:
array([[20, 50, 30],
[ 0, 10, 60]])
The 1-dimensional array np.arange(3) is broadcasted to the same shape as bar
so that it is equivalent to
In [35]: X, Y = np.broadcast_arrays(bar, np.arange(3)); Y
Out[35]:
array([[0, 1, 2],
[0, 1, 2]])
X is the same as bar since broadcasting does not change the shape of bar.
Then NumPy integer array indexing rules say that the (i,j) element of foo[X, Y] equals
foo[X, Y][i, j] = foo[X[i,j], Y[i,j]]
So for example,
foo[bar, np.arange(3)][0, 1] = foo[ bar[0,1], Y[0,1] ]
= foo[2, 1]
= 50

you need to also specify the columns to go with each index, respectively.
try this:
import numpy as np
foo = np.array([[0, 10, 30], [20, 40, 60], [30, 50, 70]])
bar = np.array([[1, 2, 0], [0, 0, 1]])
foo[bar, range(len(foo))]
Output:
array([[20, 50, 30],
[ 0, 10, 60]])

Related

Apply np.vectorize along one axis

Say I have two arrays arr1 and arr2:
arr1 = [0, 1, 2]
arr2 = [
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
]
And say I have a function that does something to the elements of this array:
def func(arr):
new_arr = arr.copy()
new_arr[0] = new_arr[0] * 2
new_arr[1] = new_arr[1] * 10
new_arr[2] = new_arr[2] * 100
return new_arr
Now I want to vectorize this, so that it works for both arr1 and arr2:
func(arr1)
# returns [0, 10, 200]
func(arr2)
# returns
# [0, 10, 200],
# [6, 40, 500],
# [12, 70, 800],
np.vectorize doesn't work because it breaks down each and every element in my array parameter. I want it to apply the function only along the first axis.
np.apply_along_axis almost works, except it won't consider 1-D array parameter to be a single parameter.
What's the best way to do this?

You can just directly multiply the arrays. It works thanks to numpy broadcasting:
factor = np.array([2, 10, 100])
arr1 * factor
array([ 0, 10, 200])
arr2 * factor
array([[ 0, 10, 200],
[ 6, 40, 500],
[ 12, 70, 800]])

If you take time to read the np.vectorize docs, you'll eventually encounter the signature option:
In [27]: f= np.vectorize(func, signature='(n)->(n)')
In [28]: f(arr1)
Out[28]: array([ 0, 10, 200])
In [29]: f(arr2)
Out[29]:
array([[ 0, 10, 200],
[ 6, 40, 500],
[ 12, 70, 800]])
And reading a bit further you'll encounter the caveats about performance.

Just do this:
import numpy as np
a = np.array([0, 1, 2])
b = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8],
])
c = np.array([2, 10, 100])
print(a*c)
print(b*c)
Output:
[ 0 10 200]
[[ 0 10 200]
[ 6 40 500]
[ 12 70 800]]

indexing in python numpy module

So, I'm new to python and learning about the NumPy module.
Here is my array
c = np.array([[[ 0, 1, 2],
[ 10, 12, 13]],
[[100, 101, 102],
[110, 112, 113]]])
in the above array if I try to access it through
c[:1,0:]
it produces expected output that
# expected because print from initial to row 1,0 excluding row 1,0
array([[[ 0, 1, 2],
[10, 12, 13]]])
but now when I try to access it through
c[:1,1:]
it produces output that
array([[[10, 12, 13]]])
why???

This is a 3D array. You can check it with
print(c.shape)
that yields
(2, 2, 3)
Is 3D array really what you wish to do ?
If so, if you slice it with two indices instead of three, that means that the third is implicitly :. So c[1, 1] is equivalent to c[1, 1, :] which is equivalent to c[1, 1, 0:3].
And your query c[:1,1:] is equivalent to c[0, 1, 0:3]: that is the correct result.
Now as per your comment I guess you wish to reshape, filter and reshape:
c.reshape(4, -1)[:3,:].reshape(1, 3, -1)
yields
array([[[ 0, 1, 2],
[ 10, 12, 13],
[100, 101, 102]]])

How to threshold based on the average value of a row?

I have a 2d array. I want to set all the values in each row that are greater than the mean value of that row to 0.
Some code that does this naively is:
new_arr = arr.copy()
for i, row in enumerate(arr):
avg = np.mean(row)
for j, pixel in enumerate(row):
if pixel > avg:
new_arr[i,j] = 0
else:
new_arr[i,j] = 1
This is pretty slow, and I want to know if there's some way to do this using Numpy indexing?
If it were the average value of the whole matrix, i could simply do:
mask = arr > np.mean(arr)
arr[mask] = 0
arr[np.logical_not(mask)] = 1
Is there some way to do this with the per-row average, using a one-dimensional array of averages or something similar?
EDIT:
The proposed solution:
avg = np.mean(arr, axis=0)
mask = arr > avg
new_arr = np.zeros(arr.shape)
arr[mask] = 1
was actually using the columnwise average, which might be useful to some people as well. It was equivalent to:
new_arr = arr.copy()
for i, row in enumerate(arr.T):
avg = np.mean(row)
for j, pixel in enumerate(row):
if pixel > avg:
new_arr[j,i] = 0
else:
new_arr[j,i] = 1

Setup
a = np.arange(25).reshape((5,5))
You can use keepdims with mean:
a[a > a.mean(1, keepdims=True)] = 0
array([[ 0, 1, 2, 0, 0],
[ 5, 6, 7, 0, 0],
[10, 11, 12, 0, 0],
[15, 16, 17, 0, 0],
[20, 21, 22, 0, 0]])
Using keepdims=True, gives the following result for mean:
array([[ 2.],
[ 7.],
[12.],
[17.],
[22.]])
The benefit to this is stated in the docs:
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.

You can use np.mean(a, axis=1) to get the mean of each row, broadcast that to the shape of a, and set all values where a > broadcasted_mean_array to 0:
Example:
a = np.arange(25).reshape((5,5))
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
a[a > np.broadcast_to(np.mean(a,axis=1),a.shape).T] = 0
>>> a
array([[ 0, 1, 2, 0, 0],
[ 5, 6, 7, 0, 0],
[10, 11, 12, 0, 0],
[15, 16, 17, 0, 0],
[20, 21, 22, 0, 0]])

Use the axis keyword for your mean:
avg = np.mean(arr, axis=0)
Then use this to create your mask and assign the values you want:
mask = avg>=arr
new_arr = np.zeros(arr.shape)
arr[mask] = 1
Of course, you can directly create a new array from the mask without the two steps approach.

numpy get values in array of arrays of arrays for array of indices

I have a np array of arrays of arrays:
arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr3 = np.array([[15,25,35],[45,55,65],[75,85,95]])
list_arr = np.array([arr1,arr2,arr3])
and indices array:
indices_array = np.array([1,0,2])
I want to get the array at index 1 for the first (array of arrays), the array at
index 0 for the second (array of arrays) and the array at index 2 for the third (array of arrays)
expected output:
#[[ 4 5 6]
#[10 20 30]
#[75 85 95]]
I am looking for a numpy way to do it. As I have large arrays, I prefer not to use comprehension lists.

Basically, you are selecting the second axis elements with indices_array corresponding to each position along the first axis for all the elements along the third axis. As such, you can do -
list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Sample run -
In [16]: list_arr
Out[16]:
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 20, 30],
[40, 50, 60],
[70, 80, 90]],
[[15, 25, 35],
[45, 55, 65],
[75, 85, 95]]])
In [17]: indices_array
Out[17]: array([1, 0, 2])
In [18]: list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Out[18]:
array([[ 4, 5, 6],
[10, 20, 30],
[75, 85, 95]])

Just acces by linking postions to desired indexes (0-1, 1-0, 2-2) as follows:
desired_array = np.array([list_arrr[x][y] for x,y in enumerate([1,0,2])])

Can I produce the result of np.outer using np.dot?

I am trying to improve my understanding of numpy functions. I understand the behaviour of numpy.dot. I'd like to understand the behaviour of numpy.outer in terms of numpy.dot.
Based on this Wikipedia article https://en.wikipedia.org/wiki/Outer_product I'd expect for array_equal to return True in the following code. However it does not.
X = np.matrix([
[1,5],
[5,9],
[4,1]
])
r1 = np.outer(X,X)
r2 = np.dot(X, X.T)
np.array_equal(r1, r2)
How can I assign r2 so that np.array_equal returns True? Also, why does numpy's implementation of np.outer not match the definition of outer multiplication on Wikipedia?
Using numpy 1.9.2

In [303]: X=np.array([[1,5],[5,9],[4,1]])
In [304]: X
Out[304]:
array([[1, 5],
[5, 9],
[4, 1]])
In [305]: np.inner(X,X)
Out[305]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [306]: np.dot(X,X.T)
Out[306]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
The Wiki outer link mostly talks about vectors, 1d arrays. Your X is 2d.
In [310]: x=np.arange(3)
In [311]: np.outer(x,x)
Out[311]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [312]: np.inner(x,x)
Out[312]: 5
In [313]: np.dot(x,x) # same as inner
Out[313]: 5
In [314]: x[:,None]*x[None,:] # same as outer
Out[314]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Notice that the Wiki outer does not involve summation. Inner does, in this example 5 is the sum of the 3 diagonal values of the outer.
dot also involves summation - all the products followed summation along a specific axis.
Some of the wiki outer equations use explicit indices. The einsum function can implement these calculations.
In [325]: np.einsum('ij,kj->ik',X,X)
Out[325]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [326]: np.einsum('ij,jk->ik',X,X.T)
Out[326]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [327]: np.einsum('i,j->ij',x,x)
Out[327]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [328]: np.einsum('i,i->',x,x)
Out[328]: 5
As mentioned in the comment, np.outer uses ravel, e.g.
return a.ravel()[:, newaxis]*b.ravel()[newaxis,:]
This the same broadcasted multiplication that I demonstrated earlier for x.

numpy.outer only works for 1-d vectors, not matrices. But for the case of 1-d vectors, there is a relation.
If
import numpy as np
A = np.array([1.0,2.0,3.0])
then this
np.matrix(A).T.dot(np.matrix(A))
should be the same as this
np.outer(A,A)

Another (clunky) version similar to a[:,None] * a[None,:]
a.reshape(a.size, 1) * a.reshape(1, a.size)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing ndarray by ndarray - python

you need to also specify the columns to go with each index, respectively. try this: import numpy as np foo = np.array([[0, 10, 30], [20, 40, 60], [30, 50, 70]]) bar = np.array([[1, 2, 0], [0, 0, 1]]) foo[bar, range(len(foo))] Output: array([[20, 50, 30], [ 0, 10, 60]])

Related

Apply np.vectorize along one axis

indexing in python numpy module

How to threshold based on the average value of a row?

numpy get values in array of arrays of arrays for array of indices

Can I produce the result of np.outer using np.dot?

Categories

Resources