Related
I have a 1d array of ids, for example:
a = [1, 3, 4, 7, 9]
Then another 2d array:
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:
c = [[0, 2, 3, 4], [1, 3, 4, 0]]
What's a vectorized way to do that using numpy?
this may not make sense but ... you can use np.interp to do that ...
a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.
this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)
Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:
import numpy as np
a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape
c = np.where(b.reshape(b.size, 1) == a)[1]
c = c.reshape(original_shape)
This results with:
[[0 2 3 4]
[1 3 4 0]]
Broadcasting to the rescue!
>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
[1, 3, 4, 0]])
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 3 years ago.
I have a list of lists of indices for an numpy array, but do not quite come to the wanted result when using them.
n = 3
a = np.array([[8, 1, 6],
[3, 5, 7],
[4, 9, 2]])
np.random.seed(7)
idx = np.random.choice(np.arange(n), size=(n, n-1))
# array([[0, 1],
# [2, 0],
# [1, 2]])
In this case I want:
element 0 and 1 of row 0
element 2 and 0 of row 1
element 1 and 2 of row 2
My list has n sublists and all of those lists have the same length.
I want that each sublist is only used once and not for all axis.
# Wanted result
# b = array[[8, 1],
# [7, 3],
# [9, 2]])
I can achieve this but it seems rather cumbersome with a lot of repeating and reshaping.
# Possibility 1
b = a[:, idx]
# array([[[8, 1], | [[3, 5], | [[4, 9],
# [6, 8], | [7, 3], | [2, 4],
# [1, 6]], | [5, 7]], | [9, 2]])
b = b[np.arange(n), np.arange(n), :]
# Possibility 2
b = a[np.repeat(range(n), n-1), idx.ravel()]
# array([8, 1, 7, 3, 9, 2])
b = b.reshape(n, n-1)
Are there easier ways?
You can use np.take_along_axis here:
np.take_along_axis(a, idx, 1)
array([[8, 1],
[7, 3],
[9, 2]])
Or using broadcasting:
a[np.arange(a.shape[0])[:,None], idx]
array([[8, 1],
[7, 3],
[9, 2]])
Note that your using integer array indexing here, you need to specify over which axis and rows you want to index using idx.
I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]
Suppose I have a numpy array as below
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
array([[1, 2, 3],
[1, 4, 3],
[2, 5, 4],
[2, 7, 5]])
How can I flatten column 2 and 3 for each unique element in column 1 like below:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],])
Thank you for your help.
Another option using list comprehension:
np.array([np.insert(a[a[:,0] == k, 1:].flatten(), 0, k) for k in np.unique(a[:,0])])
# array([[1, 2, 3, 4, 3],
# [2, 5, 4, 7, 5]])
import numpy as np
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
d = {}
for row in a:
d[row[0]] = np.concatenate( (d.get(row[0], []), row[1:]) )
r = np.array([np.concatenate(([key], d[key])) for key in d])
print(r)
This prints:
[[ 1. 2. 3. 4. 3.]
[ 2. 5. 4. 7. 5.]]
Since as posted in the comments, we know that each unique element in column-0 would have a fixed number of rows and by which I assumed it was meant same number of rows, we can use a vectorized approach to solve the case. We sort the rows based on column-0 and look for shifts along it, which would signify group change and thus give us the exact number of rows associated per unique element in column-0. Let's call it L. Finally, we slice sorted array to select columns-1,2 and group L rows together by reshaping. Thus, the implementation would be -
sa = a[a[:,0].argsort()]
L = np.unique(sa[:,0],return_index=True)[1][1]
out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
For more performance boost, we can use np.diff to calculate L, like so -
L = np.where(np.diff(sa[:,0])>0)[0][0]+1
Sample run -
In [103]: a
Out[103]:
array([[1, 2, 3],
[3, 7, 8],
[1, 4, 3],
[2, 5, 4],
[3, 8, 2],
[2, 7, 5]])
In [104]: sa = a[a[:,0].argsort()]
...: L = np.unique(sa[:,0],return_index=True)[1][1]
...: out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
...:
In [105]: out
Out[105]:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],
[3, 7, 8, 8, 2]])
I have a numpy array, say:
>>> a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
>>> a
array([[0, 1, 2],
[4, 3, 6],
[9, 5, 7],
[8, 9, 8]])
I want to replace the second and third column elements with the minimum of them (row by row), except if one of these 2 elements is < 3.
The resulting array should be:
array([[0, 1, 2],# nothing changes since 1 and 2 are <3
[4, 3, 3], #min(3,6)=3 => 6 changed to 3
[9, 5, 5], #min(5,7)=5 => 7 changed to 5
[8, 8, 8]]) #min(9,8)=8 => 9 changed to 8
I know I can use clip, for instance a[:,1:3].clip(2,6,a[:,1:3]), but
1) clip will be applied to all elements, including those <3.
2) I don't know how to set the min and max values of clip to the minimum values of the 2 related elements of each row.
Just use the >= operator to first select what you are interested of:
b = a[:, 1:3] # select the columns
matching = numpy.all(b >= 3, axis=1) # find rows with all elements matching
b = b[matching, :] # select rows
Now you can replace the content with the minimum by e.g.:
# find row minimum and convert to a column vector
b[:, :] = b.min(1, keepdims=True)
We first defined a row_mask, depicting the <3 condition, and then apply a minimum along an axis to find the minimum (for rows in row_mask).
The newaxis part is required for the broadcasting of a 1dim array (of minimums) to the 2-dim target of the assignment.
a=np.array([[0,1,2],[4,3,6],[9,5,7],[8,9,8]])
row_mask = (a[:,0]>=3)
a[row_mask, 1:] = a[row_mask, 1:].min(axis=1)[...,np.newaxis]
a
=>
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])
Here's a one liner:
a[np.where(np.sum(a,axis=1)>3),1:3]=np.min(a[np.where(np.sum(a,axis=1)>3),1:3],axis=2).reshape(1,3,1)
Here's a breakdown:
>>> b = np.where(np.sum(a,axis=1)>3) # finds rows where, in a, row sums are > 3
(array([1, 2, 3]),)
>>> c = a[b,1:3] # the part of a that needs to change
array([[[3, 3],
[5, 5],
[8, 8]]])
>>> d = np.min(c,axis=2) # the minimum values in each row (cols 1 and 2)
array([[3, 5, 8]])
>>> e = d.reshape(1,3,1) # adjust shape for broadcast to a
array([[[3],
[5],
[8]]])
>>> a[np.where(np.sum(a,axis=1)>3),1:3] = e # set the values in a
>>> a
array([[0, 1, 2],
[4, 3, 3],
[9, 5, 5],
[8, 8, 8]])