Efficiently define an implicit Numpy array

Efficiently define an implicit Numpy array - python

A and B are Numpy arrays of common shape [n1,n2,n3]. The values of B are all integers in [0,n3). I want A to "invert" B in the sense that each value of A satisfies A[i,j,B[i,j,k]]=k for all i,j,k in the appropriate ranges. While it's obvious how to do this with for loops, I suspect that there is a clever one-liner using fancy indexing. Does anyone see it?

Here are two methods.
The first method is a one-liner: A = B.argsort(axis=-1)
Here's an example. B has shape (3, 5, 7) and for each fixed i and j, B[i,j,:] is a permutation of range(B.shape[2]).
In [386]: B
Out[386]:
array([[[1, 5, 4, 6, 2, 3, 0],
[6, 5, 3, 4, 2, 1, 0],
[4, 5, 0, 3, 1, 2, 6],
[0, 5, 6, 3, 2, 1, 4],
[4, 1, 5, 2, 6, 3, 0]],
[[2, 6, 0, 1, 5, 4, 3],
[3, 2, 4, 0, 1, 5, 6],
[3, 4, 6, 5, 1, 2, 0],
[4, 6, 3, 0, 2, 5, 1],
[0, 3, 1, 6, 4, 5, 2]],
[[0, 3, 6, 2, 1, 5, 4],
[3, 1, 2, 4, 6, 0, 5],
[1, 3, 5, 6, 4, 0, 2],
[4, 1, 6, 0, 2, 3, 5],
[6, 4, 5, 1, 0, 3, 2]]])
In [387]: A = B.argsort(axis=-1)
In [388]: A
Out[388]:
array([[[6, 0, 4, 5, 2, 1, 3],
[6, 5, 4, 2, 3, 1, 0],
[2, 4, 5, 3, 0, 1, 6],
[0, 5, 4, 3, 6, 1, 2],
[6, 1, 3, 5, 0, 2, 4]],
[[2, 3, 0, 6, 5, 4, 1],
[3, 4, 1, 0, 2, 5, 6],
[6, 4, 5, 0, 1, 3, 2],
[3, 6, 4, 2, 0, 5, 1],
[0, 2, 6, 1, 4, 5, 3]],
[[0, 4, 3, 1, 6, 5, 2],
[5, 1, 2, 0, 3, 6, 4],
[5, 0, 6, 1, 4, 2, 3],
[3, 1, 4, 5, 0, 6, 2],
[4, 3, 6, 5, 1, 2, 0]]])
Verify the desired property by sampling a few values.
In [389]: A[0, 0, B[0, 0, 0]]
Out[389]: 0
In [390]: A[0, 0, B[0, 0, 1]]
Out[390]: 1
In [391]: A[0, 0, B[0, 0, :]]
Out[391]: array([0, 1, 2, 3, 4, 5, 6])
In [392]: A[2, 3, B[2, 3, :]]
Out[392]: array([0, 1, 2, 3, 4, 5, 6])
The second method has a lower time complexity than using argsort, but it is a three-liner rather than a one-liner. I'll use the same B as above.
Create A, but with no values assigned yet.
In [393]: A = np.empty_like(B)
Create index arrays for each dimension of B.
In [394]: i, j, k = np.ogrid[[slice(n) for n in B.shape]] # or np.ix_(*[range(n) for n in B.shape])
This is the cool part. Do the assignment exactly as you wrote it in the question.
In [395]: A[i, j, B[i, j, k]] = k
Verify that we have the same A as above.
In [396]: A
Out[396]:
array([[[6, 0, 4, 5, 2, 1, 3],
[6, 5, 4, 2, 3, 1, 0],
[2, 4, 5, 3, 0, 1, 6],
[0, 5, 4, 3, 6, 1, 2],
[6, 1, 3, 5, 0, 2, 4]],
[[2, 3, 0, 6, 5, 4, 1],
[3, 4, 1, 0, 2, 5, 6],
[6, 4, 5, 0, 1, 3, 2],
[3, 6, 4, 2, 0, 5, 1],
[0, 2, 6, 1, 4, 5, 3]],
[[0, 4, 3, 1, 6, 5, 2],
[5, 1, 2, 0, 3, 6, 4],
[5, 0, 6, 1, 4, 2, 3],
[3, 1, 4, 5, 0, 6, 2],
[4, 3, 6, 5, 1, 2, 0]]])
After poking around some more on SO, I see that both these methods appear in answers to the question "How to invert a permutation array in numpy". The only thing really new here is doing the inversion along one axis of a three-dimensional array.

Related

numpy equivalent of tf.math.segment_sum

What is the equivalent of tf.math.segment_sum in numpy?
So basically I like to rewrite the exact same code in tf to np where I am using segment sum to group together certain elements using a segment_ids array and sum those segments. What is the equivalent code in numpy? I have an array and the segment_ids array and I like to perform segment_sum but in numpy.

You can create something pretty close to tf.math.segment_sum with the method numpy.add.at, which is the at method of the add ufunc:
def segment_sum(data, segment_ids):
data = np.asarray(data)
s = np.zeros((np.max(segment_ids)+1,) + data.shape[1:], dtype=data.dtype)
np.add.at(s, segment_ids, data)
return s
For example,
In [53]: c = np.array([[1, 2, 3, 4], [4, 3, 2, 1], [5, 6, 7, 8]])
In [54]: ids = [0, 0, 1]
In [55]: segment_sum(c, ids)
Out[55]:
array([[5, 5, 5, 5],
[5, 6, 7, 8]])
In [56]: x = [10, 20, 20, 30, 10, 0, 1, 2]
In [57]: xids = [1, 1, 0, 0, 2, 2, 2, 3]
In [58]: segment_sum(x, xids)
Out[58]: array([50, 30, 11, 2])
In [59]: w = np.arange(72).reshape(6, 2, 6) % 5
In [60]: w
Out[60]:
array([[[0, 1, 2, 3, 4, 0],
[1, 2, 3, 4, 0, 1]],
[[2, 3, 4, 0, 1, 2],
[3, 4, 0, 1, 2, 3]],
[[4, 0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 0]],
[[1, 2, 3, 4, 0, 1],
[2, 3, 4, 0, 1, 2]],
[[3, 4, 0, 1, 2, 3],
[4, 0, 1, 2, 3, 4]],
[[0, 1, 2, 3, 4, 0],
[1, 2, 3, 4, 0, 1]]])
In [61]: wids = [0, 0, 1, 2, 2, 2]
In [62]: segment_sum(w, wids)
Out[62]:
array([[[2, 4, 6, 3, 5, 2],
[4, 6, 3, 5, 2, 4]],
[[4, 0, 1, 2, 3, 4],
[0, 1, 2, 3, 4, 0]],
[[4, 7, 5, 8, 6, 4],
[7, 5, 8, 6, 4, 7]]])

selecting certain indices in Numpy ndarray using another array

I'm trying to mark the value and indices of max values in a 3D array, getting the max in the third axis.
Now this would have been obvious in a lower dimension:
argmaxes=np.argmax(array)
maximums=array[argmaxes]
but NumPy doesn't understand the second syntax properly for higher than 1D.
Let's say my 3D array has shape (8,8,250). argmaxes=np.argmax(array,axis=-1)would return a (8,8) array with numbers between 0 to 250. Now my expected output is an (8,8) array containing the maximum number in the 3rd dimension. I can achieve this with maxes=np.max(array,axis=-1) but that's repeating the same calculation twice (because I need both values and indices for later calculations)
I can also just do a crude nested loop:
for i in range(8):
for j in range(8):
maxes[i,j]=array[i,j,argmaxes[i,j]]
But is there a nicer way to do this?

You can use advanced indexing. This is a simpler case when shape is (8,8,3):
arr = np.random.randint(99, size=(8,8,3))
x, y = np.indices(arr.shape[:-1])
arr[x, y, np.argmax(array,axis=-1)]
Sample run:
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> y
array([[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7]])
>>> np.argmax(arr,axis=-1)
array([[2, 1, 1, 2, 0, 0, 0, 1],
[2, 2, 2, 1, 0, 0, 1, 0],
[1, 2, 0, 1, 1, 1, 2, 0],
[1, 0, 0, 0, 2, 1, 1, 0],
[2, 0, 1, 2, 2, 2, 1, 0],
[2, 2, 0, 1, 1, 0, 2, 2],
[1, 1, 0, 1, 1, 2, 1, 0],
[2, 1, 1, 1, 0, 0, 2, 1]], dtype=int64)
This is a visual example of array to help to understand it better:

How to return indices from sorting a 2d numpy array row-by-row?

Input: A 2D numpy array
Output: An array of indices that will sort the array row by row (or column by column)
E.g.: Say the function is get_sorted_indices(array, axis=0)
a = np.array([[1,2,3,4,5]
,[2,3,4,5,6]
,[1,2,3,4,5]
,[2,3,4,6,6]
,[2,3,4,5,6]])
ind = get_sorted_indices(a, axis=0)
Then we will get
>>> ind
[0, 2, 1, 4, 3]
>>> a[ind] # should be equals to a.sort(axis = 0)
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[2, 3, 4, 5, 6],
[2, 3, 4, 6, 6]])
>>> a.sort(axis=0)
array([[1, 2, 3, 4, 5],
[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[2, 3, 4, 5, 6],
[2, 3, 4, 6, 6]])
I've looked at argsort but I don't understand its output and reading the documentation doesn't help:
>>> a.argsort()
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
>>> a.argsort(axis=0)
array([[0, 0, 0, 0, 0],
[2, 2, 2, 2, 2],
[1, 1, 1, 1, 1],
[3, 3, 3, 4, 3],
[4, 4, 4, 3, 4]])
I can do this manually but I think I'm misunderstanding argsort or I'm missing something from numpy.
Is there a standard way to do this or I have no choice but to do this manually?

Advanced List Coding using multiple lists

So we are given two lists.
groups = [[0,1],[2],[3,4,5],[6,7,8,9]]
A = [[[0, 1, 6, 7, 8, 9], [0, 1, 6, 7, 8, 9]], [[2]], [[3, 4, 5, 6, 7, 8, 9], [3, 4, 5, 6, 7, 8, 9], [3, 4, 5, 6, 7, 8, 9]], [[0, 1, 3, 4, 5, 6, 8, 9], [0, 1, 3, 4, 5, 7, 8, 9], [0, 1, 3, 4, 5, 6, 7, 8, 9], [0, 1, 3, 4, 5, 6, 7, 8, 9]]]
How do we replace the the elements in A with their corresponding indexes in groups: i.e., replace the 0 and 1 in A with 0, the 2 in A with 1, the 3, 4 and 5 with 2 and so on.
Output:
A = [[[0, 0, 3, 3, 3, 3], [0, 0, 3, 3, 3, 3]], [[1]], [[2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3]], [[0, 0, 2, 2, 2, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3, 3]]]

create a dictionry which store the index value for those numbers and then for those number in list A add the index
groups = [[0,1],[2],[3,4,5],[6,7,8,9]]
A = [[[0, 1, 6, 7, 8, 9], [0, 1, 6, 7, 8, 9]], [[2]], [[3, 4, 5, 6, 7, 8, 9], [3, 4, 5, 6, 7, 8, 9], [3, 4, 5, 6, 7, 8, 9]], [[0, 1, 3, 4, 5, 6, 8, 9], [0, 1, 3, 4, 5, 7, 8, 9], [0, 1, 3, 4, 5, 6, 7, 8, 9], [0, 1, 3, 4, 5, 6, 7, 8, 9]]]
from collections import defaultdict
dic = defaultdict(int)
for i in range(len(groups)):
for j in groups[i]:
dic[j]=i
for i in A:
for j in i:
for l in range(len(j)):
j[l] = dic[j[l]]
output
[[[0, 0, 3, 3, 3, 3], [0, 0, 3, 3, 3, 3]],
[[1]],
[[2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3]],
[[0, 0, 2, 2, 2, 3, 3, 3],
[0, 0, 2, 2, 2, 3, 3, 3],
[0, 0, 2, 2, 2, 3, 3, 3, 3],
[0, 0, 2, 2, 2, 3, 3, 3, 3]]]

Even though there is no attempt from your side, here you go :
def f(l,i):
for k in l:
if i in k:
return l.index(k)
output_ = [[[f(groups,n) for n in a0] for a0 in a] for a in A]
Output :
[[[0, 0, 3, 3, 3, 3], [0, 0, 3, 3, 3, 3]], [[1]], [[2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3], [2, 2, 2, 3, 3, 3, 3]], [[0, 0, 2, 2, 2, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3, 3], [0, 0, 2, 2, 2, 3, 3, 3, 3]]]

try this:
def replace_items(i, inner_list, *lists):
for l in lists:
for item in l:
if item in inner_list:
index= l.index(item)
l[index] = i
for i,inner_list in enumerate(groups):
for lists in A:
replace_items(i, inner_list, *lists)
print(A)

If you convert your groups into a dictionary, it will be easy to process the 3 level list using a list comprehension:
groupDict = { v:i for i,g in enumerate(groups) for v in g }
A = [ [ [ groupDict[z] for z in yz ] for yz in xyz] for xyz in A ]

comparison of two numpy arrays

I've two numpy 2d arrays (say A & B, sometime size equal or sometime not equal). I need to compare first column of both arrays and find the index of elements that occur in both arrays.
The below shown code gave me solution whenever the both arrays have different size and all elements of A are not present in B.
C=np.squeeze(A[np.array(np.where(np.in1d(A[:,0],B[:,1]))).T],axis=None)
But it is incorrect whenever all elements of A are present in B.
Can anyone suggest a solution ?

If A and B are the following:
A=np.random.randint(0,5,(10,8))
B=np.random.randint(3,7,(10,8))
>>> A
array([[4, 4, 2, 1, 4, 3, 1, 2],
[1, 1, 1, 2, 0, 3, 0, 4],
[4, 3, 1, 1, 2, 1, 1, 3],
[3, 4, 3, 0, 3, 4, 2, 0],
[4, 1, 3, 0, 1, 4, 1, 2],
[1, 1, 1, 2, 2, 2, 0, 2],
[4, 3, 4, 2, 3, 2, 3, 2],
[4, 1, 4, 0, 3, 1, 2, 3],
[3, 2, 3, 2, 4, 4, 4, 2],
[0, 1, 4, 0, 2, 2, 1, 4]])
>>> B
array([[4, 3, 5, 6, 4, 6, 3, 5],
[6, 3, 4, 4, 4, 6, 5, 4],
[5, 4, 5, 5, 5, 6, 3, 3],
[3, 5, 6, 5, 5, 5, 3, 6],
[5, 6, 5, 3, 5, 5, 5, 3],
[3, 3, 5, 3, 5, 6, 6, 3],
[6, 6, 6, 4, 6, 3, 4, 6],
[4, 4, 3, 5, 6, 6, 3, 3],
[5, 3, 4, 5, 3, 5, 5, 6],
[4, 3, 3, 6, 6, 4, 3, 4]])
You could use intersect1d to find the values that are in both
np.intersect1d(A,B)
array([3, 4])
And then argwhere to find the indices of the values in, for example, column 0 of A:
[np.argwhere(x==A[:,0]) for x in np.intersect1d(A,B)]
returns
[array([[3],
[8]]), array([[0],
[2],
[4],
[6],
[7]])]

import numpy as np
A=np.array([[4, 4, 2, 1, 4, 3, 1, 2],
[1, 1, 1, 2, 0, 3, 0],
[4, 3, 1, 1, 2, 1, 1],
[3, 4, 3, 0, 3, 4, 2],
[4, 1, 3, 0, 1, 4, 1],
[1, 1, 1, 2, 2, 2, 0],
[4, 3, 4, 2, 3, 2, 3],
[4, 1, 4, 0, 3, 1, 2],
[3, 2, 3, 2, 4, 4, 4],
[0, 1, 4, 0, 2, 2, 1]])
B=np.array([[4, 3, 5, 6, 4, 6, 3, 5],
[6, 3, 4, 4, 4, 6, 5, 4],
[5, 4, 5, 5, 5, 6, 3, 3],
[3, 5, 6, 5, 5, 5, 3, 6],
[5, 6, 5, 3, 5, 5, 5, 3],
[3, 3, 5, 3, 5, 6, 6, 3],
[6, 6, 6, 4, 6, 3, 4, 6],
[4, 4, 3, 5, 6, 6, 3, 3],
[5, 3, 4, 5, 3, 5, 5, 6],
[4, 3, 3, 6, 6, 4, 3, 4]])
matched = A.T[0][A.T[0] == B.T[0]]
>> [4,3,4]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficiently define an implicit Numpy array - python

Related

numpy equivalent of tf.math.segment_sum

selecting certain indices in Numpy ndarray using another array

How to return indices from sorting a 2d numpy array row-by-row?

Advanced List Coding using multiple lists

comparison of two numpy arrays

Categories

Resources