selecting certain indices in Numpy ndarray using another array

selecting certain indices in Numpy ndarray using another array - python

I'm trying to mark the value and indices of max values in a 3D array, getting the max in the third axis.
Now this would have been obvious in a lower dimension:
argmaxes=np.argmax(array)
maximums=array[argmaxes]
but NumPy doesn't understand the second syntax properly for higher than 1D.
Let's say my 3D array has shape (8,8,250). argmaxes=np.argmax(array,axis=-1)would return a (8,8) array with numbers between 0 to 250. Now my expected output is an (8,8) array containing the maximum number in the 3rd dimension. I can achieve this with maxes=np.max(array,axis=-1) but that's repeating the same calculation twice (because I need both values and indices for later calculations)
I can also just do a crude nested loop:
for i in range(8):
for j in range(8):
maxes[i,j]=array[i,j,argmaxes[i,j]]
But is there a nicer way to do this?

You can use advanced indexing. This is a simpler case when shape is (8,8,3):
arr = np.random.randint(99, size=(8,8,3))
x, y = np.indices(arr.shape[:-1])
arr[x, y, np.argmax(array,axis=-1)]
Sample run:
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> y
array([[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7]])
>>> np.argmax(arr,axis=-1)
array([[2, 1, 1, 2, 0, 0, 0, 1],
[2, 2, 2, 1, 0, 0, 1, 0],
[1, 2, 0, 1, 1, 1, 2, 0],
[1, 0, 0, 0, 2, 1, 1, 0],
[2, 0, 1, 2, 2, 2, 1, 0],
[2, 2, 0, 1, 1, 0, 2, 2],
[1, 1, 0, 1, 1, 2, 1, 0],
[2, 1, 1, 1, 0, 0, 2, 1]], dtype=int64)
This is a visual example of array to help to understand it better:

Related

numpy equivalent of tf.math.segment_sum

What is the equivalent of tf.math.segment_sum in numpy?
So basically I like to rewrite the exact same code in tf to np where I am using segment sum to group together certain elements using a segment_ids array and sum those segments. What is the equivalent code in numpy? I have an array and the segment_ids array and I like to perform segment_sum but in numpy.

You can create something pretty close to tf.math.segment_sum with the method numpy.add.at, which is the at method of the add ufunc:
def segment_sum(data, segment_ids):
data = np.asarray(data)
s = np.zeros((np.max(segment_ids)+1,) + data.shape[1:], dtype=data.dtype)
np.add.at(s, segment_ids, data)
return s
For example,
In [53]: c = np.array([[1, 2, 3, 4], [4, 3, 2, 1], [5, 6, 7, 8]])
In [54]: ids = [0, 0, 1]
In [55]: segment_sum(c, ids)
Out[55]:
array([[5, 5, 5, 5],
[5, 6, 7, 8]])
In [56]: x = [10, 20, 20, 30, 10, 0, 1, 2]
In [57]: xids = [1, 1, 0, 0, 2, 2, 2, 3]
In [58]: segment_sum(x, xids)
Out[58]: array([50, 30, 11, 2])
In [59]: w = np.arange(72).reshape(6, 2, 6) % 5
In [60]: w
Out[60]:
array([[[0, 1, 2, 3, 4, 0],
[1, 2, 3, 4, 0, 1]],
[[2, 3, 4, 0, 1, 2],
[3, 4, 0, 1, 2, 3]],
[[4, 0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 0]],
[[1, 2, 3, 4, 0, 1],
[2, 3, 4, 0, 1, 2]],
[[3, 4, 0, 1, 2, 3],
[4, 0, 1, 2, 3, 4]],
[[0, 1, 2, 3, 4, 0],
[1, 2, 3, 4, 0, 1]]])
In [61]: wids = [0, 0, 1, 2, 2, 2]
In [62]: segment_sum(w, wids)
Out[62]:
array([[[2, 4, 6, 3, 5, 2],
[4, 6, 3, 5, 2, 4]],
[[4, 0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 0]],
[[4, 7, 5, 8, 6, 4],
[7, 5, 8, 6, 4, 7]]])

Replacing specific values of a 2d numpy array, but only at the edges

To illustrate my point, lets take this 2d numpy array:
array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
I want to replace the value 1 with some other value, let's say 0, but only at the edges. This is the desired result:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])
Note that the 1's surrounded by other values are not changed.
I could implement this by iterating over every row and element, but I feel like that would be very inefficient. Normally I would use the np.where function to replace a specific value, but I don't think you can add positional conditions?

m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
These three lines will give you the index for the trailling ones. The idea has been taken from trailing zeroes.
Try:
arr = np.array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
for row in arr:
m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
# print(w1, w2)
row[0:w1+1] = 0
row[w2:] = 0
# print(row)
arr:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])

How would I display the values of a selected column in Python array?

How would I find the values from a certain column in an array? For example I have:
[1, 1, 2, 4, 1, 7, 1, 7, 6, 9]
[1, 2, 5, 3, 9, 1, 1, 1, 9, 1]
[7, 4, 5, 1, 8, 1, 2, 0, 0, 4]
[1, 4, 1, 1, 1, 1, 1, 1, 8, 5]
[9, 0, 0, 0, 0, 0, 1, 1, 9, 8]
[7, 4, 2, 1, 8, 2, 2, 2, 9, 7]
[7, 4, 2, 1, 7, 1, 1, 1, 0, 5]
[3, 4, 5, 3, 4, 5, 9, 1, 0, 9]
[0, 0, 5, 1, 1, 1, 9, 7, 7, 7]
If I wanted to list all of the values of column 5, how would I do this? I have figured out how to do this for the rows, but for the columns it is tricky, since they are all part of a separate list. I have not been able to find anything about this and I am very new to Python so I don't really know what I don't know.

It's simple. Just use l[i][4] to print 5th column value.
l = [
[1, 1, 2, 4, 1, 7, 1, 7, 6, 9],
[1, 2, 5, 3, 9, 1, 1, 1, 9, 1],
[7, 4, 5, 1, 8, 1, 2, 0, 0, 4],
[1, 4, 1, 1, 1, 1, 1, 1, 8, 5],
[9, 0, 0, 0, 0, 0, 1, 1, 9, 8],
[7, 4, 2, 1, 8, 2, 2, 2, 9, 7],
[7, 4, 2, 1, 7, 1, 1, 1, 0, 5],
[3, 4, 5, 3, 4, 5, 9, 1, 0, 9],
[0, 0, 5, 1, 1, 1, 9, 7, 7, 7]
]
for i in l:
print(i[4])
# or simply use
[i[4] for i in l] #as pointed out by #COLDSPEED
# the above code will create a list with values from 5th column
See it in action here

For a two dimensional array, you can use array[row][column].

Efficiently define an implicit Numpy array

A and B are Numpy arrays of common shape [n1,n2,n3]. The values of B are all integers in [0,n3). I want A to "invert" B in the sense that each value of A satisfies A[i,j,B[i,j,k]]=k for all i,j,k in the appropriate ranges. While it's obvious how to do this with for loops, I suspect that there is a clever one-liner using fancy indexing. Does anyone see it?

Here are two methods.
The first method is a one-liner: A = B.argsort(axis=-1)
Here's an example. B has shape (3, 5, 7) and for each fixed i and j, B[i,j,:] is a permutation of range(B.shape[2]).
In [386]: B
Out[386]:
array([[[1, 5, 4, 6, 2, 3, 0],
[6, 5, 3, 4, 2, 1, 0],
[4, 5, 0, 3, 1, 2, 6],
[0, 5, 6, 3, 2, 1, 4],
[4, 1, 5, 2, 6, 3, 0]],
[[2, 6, 0, 1, 5, 4, 3],
[3, 2, 4, 0, 1, 5, 6],
[3, 4, 6, 5, 1, 2, 0],
[4, 6, 3, 0, 2, 5, 1],
[0, 3, 1, 6, 4, 5, 2]],
[[0, 3, 6, 2, 1, 5, 4],
[3, 1, 2, 4, 6, 0, 5],
[1, 3, 5, 6, 4, 0, 2],
[4, 1, 6, 0, 2, 3, 5],
[6, 4, 5, 1, 0, 3, 2]]])
In [387]: A = B.argsort(axis=-1)
In [388]: A
Out[388]:
array([[[6, 0, 4, 5, 2, 1, 3],
[6, 5, 4, 2, 3, 1, 0],
[2, 4, 5, 3, 0, 1, 6],
[0, 5, 4, 3, 6, 1, 2],
[6, 1, 3, 5, 0, 2, 4]],
[[2, 3, 0, 6, 5, 4, 1],
[3, 4, 1, 0, 2, 5, 6],
[6, 4, 5, 0, 1, 3, 2],
[3, 6, 4, 2, 0, 5, 1],
[0, 2, 6, 1, 4, 5, 3]],
[[0, 4, 3, 1, 6, 5, 2],
[5, 1, 2, 0, 3, 6, 4],
[5, 0, 6, 1, 4, 2, 3],
[3, 1, 4, 5, 0, 6, 2],
[4, 3, 6, 5, 1, 2, 0]]])
Verify the desired property by sampling a few values.
In [389]: A[0, 0, B[0, 0, 0]]
Out[389]: 0
In [390]: A[0, 0, B[0, 0, 1]]
Out[390]: 1
In [391]: A[0, 0, B[0, 0, :]]
Out[391]: array([0, 1, 2, 3, 4, 5, 6])
In [392]: A[2, 3, B[2, 3, :]]
Out[392]: array([0, 1, 2, 3, 4, 5, 6])
The second method has a lower time complexity than using argsort, but it is a three-liner rather than a one-liner. I'll use the same B as above.
Create A, but with no values assigned yet.
In [393]: A = np.empty_like(B)
Create index arrays for each dimension of B.
In [394]: i, j, k = np.ogrid[[slice(n) for n in B.shape]] # or np.ix_(*[range(n) for n in B.shape])
This is the cool part. Do the assignment exactly as you wrote it in the question.
In [395]: A[i, j, B[i, j, k]] = k
Verify that we have the same A as above.
In [396]: A
Out[396]:
array([[[6, 0, 4, 5, 2, 1, 3],
[6, 5, 4, 2, 3, 1, 0],
[2, 4, 5, 3, 0, 1, 6],
[0, 5, 4, 3, 6, 1, 2],
[6, 1, 3, 5, 0, 2, 4]],
[[2, 3, 0, 6, 5, 4, 1],
[3, 4, 1, 0, 2, 5, 6],
[6, 4, 5, 0, 1, 3, 2],
[3, 6, 4, 2, 0, 5, 1],
[0, 2, 6, 1, 4, 5, 3]],
[[0, 4, 3, 1, 6, 5, 2],
[5, 1, 2, 0, 3, 6, 4],
[5, 0, 6, 1, 4, 2, 3],
[3, 1, 4, 5, 0, 6, 2],
[4, 3, 6, 5, 1, 2, 0]]])
After poking around some more on SO, I see that both these methods appear in answers to the question "How to invert a permutation array in numpy". The only thing really new here is doing the inversion along one axis of a three-dimensional array.

Calculating and plotting count ratios with Pandas

I have multidimensional data in a pandas data frame with one variable indicating class. For example here is my attempt with a poor-maps heatmap scatter plot:
import pandas as pd
import random
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.cm import get_cmap
nrows=1000
df=pd.DataFrame([[random.random(), random.random()]+[random.randint(0, 1)] for _ in range(nrows)],
columns=list("ABC"))
bins=np.linspace(0, 1, 20)
df["Abin"]=[bins[i-1] for i in np.digitize(df.A, bins)]
df["Bbin"]=[bins[i-1] for i in np.digitize(df.B, bins)]
g=df.ix[:,["Abin", "Bbin"]+["C"]].groupby(["Abin", "Bbin"])
data=g.agg(["sum", "count"])
data.reset_index(inplace=True)
data["classratio"]=data[("C", "sum")]/data[("C","count")]
plt.scatter(data.Abin, data.Bbin, c=data.classratio, cmap=get_cmap("RdYlGn_r"), marker="s")
I'd like to plot class densities over binned features. Now I used np.digitize for binning and some complicating Python hand-made density calculation to plot a heatmap.
Surely, this can be done more compactly with Pandas (pivot?)? Do you know a neat way to bin the two features (for example 10 bins on the interval 0...1) and then plot a class density heatmap where color indicates the ratio of 1's to total rows within this 2D-bin?

Yep, it can be done in a very concise way using the build in cut function:
In [65]:
nrows=1000
df=pd.DataFrame([[random.random(), random.random()]+[random.randint(0, 1)] for _ in range(nrows)],
columns=list("ABC"))
In [66]:
#This does the trick.
pd.crosstab(np.array(pd.cut(df.A, 20)), np.array(pd.cut(df.B, 20))).values
Out[66]:
array([[2, 2, 2, 2, 7, 2, 3, 5, 1, 4, 2, 2, 1, 3, 2, 1, 7, 2, 4, 2],
[1, 2, 4, 2, 0, 3, 3, 3, 1, 1, 2, 1, 4, 3, 2, 1, 1, 2, 2, 1],
[0, 4, 1, 3, 1, 3, 2, 5, 2, 3, 1, 1, 1, 4, 2, 3, 6, 5, 2, 2],
[5, 2, 3, 2, 2, 1, 3, 2, 4, 0, 3, 2, 0, 4, 3, 2, 1, 3, 1, 3],
[2, 2, 4, 1, 3, 2, 2, 4, 1, 4, 3, 5, 5, 2, 3, 3, 0, 2, 4, 0],
[2, 3, 3, 5, 2, 0, 5, 3, 2, 3, 1, 2, 5, 4, 4, 3, 4, 3, 6, 4],
[3, 2, 2, 4, 3, 3, 2, 0, 0, 4, 3, 2, 2, 5, 4, 0, 1, 2, 2, 3],
[0, 0, 4, 4, 3, 2, 4, 6, 4, 2, 0, 5, 2, 2, 1, 3, 4, 4, 3, 2],
[3, 2, 2, 3, 4, 2, 1, 3, 1, 3, 4, 2, 4, 3, 2, 3, 2, 3, 4, 4],
[0, 1, 1, 4, 1, 4, 3, 0, 1, 1, 1, 2, 6, 4, 3, 5, 3, 3, 1, 4],
[2, 2, 4, 1, 3, 4, 1, 2, 1, 3, 3, 3, 1, 2, 1, 5, 2, 1, 4, 3],
[0, 0, 0, 4, 2, 0, 2, 3, 2, 2, 2, 4, 4, 2, 3, 2, 1, 2, 1, 0],
[3, 3, 0, 3, 1, 5, 1, 1, 2, 5, 6, 5, 0, 0, 3, 2, 1, 5, 7, 2],
[3, 3, 2, 1, 2, 2, 2, 2, 4, 0, 1, 3, 3, 1, 5, 6, 1, 3, 2, 2],
[3, 0, 3, 4, 3, 2, 1, 4, 2, 3, 4, 0, 5, 3, 2, 2, 4, 3, 0, 2],
[0, 3, 2, 2, 1, 5, 1, 4, 3, 1, 2, 2, 3, 5, 1, 2, 2, 2, 1, 2],
[1, 3, 2, 1, 1, 4, 4, 3, 2, 2, 5, 5, 1, 0, 1, 0, 4, 3, 3, 2],
[2, 2, 2, 1, 1, 3, 1, 6, 5, 2, 5, 2, 3, 4, 2, 2, 1, 1, 4, 0],
[3, 3, 4, 7, 0, 2, 6, 4, 1, 3, 4, 4, 1, 4, 1, 1, 2, 1, 3, 2],
[3, 6, 3, 4, 1, 3, 1, 3, 3, 1, 6, 2, 2, 2, 1, 1, 4, 4, 0, 4]])
In [67]:
abins=np.linspace(df.A.min(), df.A.max(), 21)
bbins=np.linspace(df.B.min(), df.B.max(), 21)
Z=pd.crosstab(np.array(pd.cut(df.ix[df.C==1, 'A'], abins)),
np.array(pd.cut(df.ix[df.C==1, 'B'], bbins)), aggfunc=np.mean).div(
pd.crosstab(np.array(pd.cut(df.A, abins)),
np.array(pd.cut(df.B, bbins)), aggfunc=np.mean)).values
Z = np.ma.masked_where(np.isinf(Z),Z)
x=np.linspace(df.A.min(), df.A.max(), 20)
y=np.linspace(df.B.min(), df.B.max(), 20)
X,Y=np.meshgrid(x, y)
plt.contourf(X, Y, Z, vmin=0, vmax=1)
plt.colorbar()
plt.pcolormesh(X, Y, Z, vmin=0, vmax=1)
plt.colorbar()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

selecting certain indices in Numpy ndarray using another array - python

Related

numpy equivalent of tf.math.segment_sum

Replacing specific values of a 2d numpy array, but only at the edges

How would I display the values of a selected column in Python array?

Efficiently define an implicit Numpy array

Calculating and plotting count ratios with Pandas

Categories

Resources