This question already has answers here:
numpy replace groups of elements with integers incrementally
(2 answers)
Closed 4 years ago.
Hi I have a numpy array like
a = np.array([1,1,1,3,3,6,6,6,6,6,6])
I want to convert it to a continuous array like below
b = np.array([0,0,0,1,1,2,2,2,2,2,2])
I have a code for this using for loop
def fun1(a):
b = a.copy()
for i in range(1,a.shape[0]):
if a[i] != a[i-1]:
b[i] = b[i-1]+1
else:
b[i] = b[i-1]
b = b - b.min()
return b
Is there a way to vectorize this using numpy? I can use numba to make it faster but I was wondering if there is way to do it with just numpy.
You can do the following with np.unique, using the return_inverse argument. According to the docs:
return_inverse : bool, optional
If True, also return the indices of the unique array (for the specified axis, if provided) that can be used to reconstruct ar.
a = np.array([1,1,1,3,3,6,6,6,6,6,6])
_, b = np.unique(a, return_inverse=True)
>>> b
array([0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2])
Edit If the list starts out unsorted, and you want such a continuous array as displayed in your output, you can do the same as above, but passing a sorted array to np.unique:
a = np.array([1,3,1,3,3,6,6,3,6,6,6])
_, b = np.unique(sorted(a), return_inverse=True)
>>> b
array([0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2])
Related
This question already has answers here:
How to return all the minimum indices in numpy
(5 answers)
Closed 1 year ago.
I have an array for example:
[1 2 3 4
2 3 4 0
5 4 0 6]
And I want to find the indexes of all the values that are closer to to the value 3.9 (in my example,4)
I tried using :
import numpy as np
def find_nearest(array, value):
idx = (np.abs(array - value)).argmin()
and
np.where(array== array.min())
but none of the options gives me the correct answer.
I do excpect to get:
(3,1),(1,2),(2,1)
In my original code, I iterate an array with the shape of 3648X5472 so "FOR" loops might be too heavy.
hope to get some help here, thank you
You can use:
a = np.array([[1, 2, 3, 4],
[2, 3, 4, 0],
[5, 4, 0, 6]])
v = 3.9
b = abs(a-v)
xs, ys = np.where(b == np.min(b))
output:
>>> xs
array([0, 1, 2])
>>> ys
array([3, 2, 1])
Alternative output:
>>> np.c_[np.where(b == np.min(b))]
array([[0, 3],
[1, 2],
[2, 1]])
# or
>>> np.argwhere(b==np.min(b))
You are pretty close. When you found the index of the first closest value, you can find indices of the equal values with `np.argwhere:
closest = array[np.abs(array - value).argmin()]
np.argwhere(array == closest)
This question already has answers here:
Finding indices of matches of one array in another array
(4 answers)
Closed 3 years ago.
I have a numpy array A which contains unique IDs that can be in any order - e.g. A = [1, 3, 2]. I have a second numpy array B, which is a record of when the ID is used - e.g. B = [3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1]. Array B is always much longer than array A.
I need to find the indexed location of the ID in A for each time the ID is used in B. So in the example above my returned result would be: result = [1, 1, 0, 1, 2, 0, 2, 1, 0, 0, 2, 1, 1, 0].
I've already written a simple solution that gets the correct result using a for loop to append the result to a new list and using numpy.where, but I can't figure out the correct syntax to vectorize this.
import numpy as np
A = np.array([1, 3, 2])
B = np.array([3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1])
IdIndxs = []
for ID in B:
IdIndxs.append(np.where(A == ID)[0][0])
IdIndxs = np.array(IdIndxs)
Can someone come up with a simple vector based solution that runs quickly - the for loop becomes very slow when running on a typical problem where is A is of the size of 10K-100K elements and B is some multiple, usually 5-10x larger than A.
I'm sure the solution is simple, but I just can't see it today.
You can use this:
import numpy as np
# test data
A = np.array([1, 3, 2])
B = np.array([3, 3, 1, 3, 2, 1, 2, 3, 1, 1, 2, 3, 3, 1])
# get indexes
sorted_keys = np.argsort(A)
indexes = sorted_keys[np.searchsorted(A, B, sorter=sorted_keys)]
Output:
[1 1 0 1 2 0 2 1 0 0 2 1 1 0]
The numpy-indexed library (disclaimer: I am its author) was designed to provide these type of vectorized operations where numpy for some reason does not. Frankly given how common this vectorized list.index equivalent is useful it definitely ought to be in numpy; but numpy is a slow-moving project that takes backwards compatibility very seriously, and I dont think we will see this until numpy2.0; but until then this is pip and conda installable with the same ease.
import numpy_indexed as npi
idx = npi.indices(A, B)
Reworking your logic but using a list comprehension and numpy.fromiter which should boost performance.
IdIndxs = np.fromiter([np.where(A == i)[0][0] for i in B], B.dtype)
About performance
I've done a quick test comparing fromiter with your solution, and I do not see such boost in performance. Even using a B array of millions of elements, they are of the same order.
This question already has answers here:
Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it
(4 answers)
Closed 4 years ago.
I would like to sort a numpy array and find out where each element went.
numpy.argsort will tell me for each index in the sorted array, which index in the unsorted array goes there. I'm looking for something like the inverse: For each index in the unsorted array, where does it go in the sorted array.
a = np.array([1, 4, 2, 3])
# a sorted is [1,2,3,4]
# the 1 goes to index 0
# the 4 goes to index 3
# the 2 goes to index 1
# the 3 goes to index 2
# desired output
[0, 3, 1, 2]
# for comparison, argsort output
[0, 2, 3, 1]
A simple solution uses numpy.searchsorted
np.searchsorted(np.sort(a), a)
# produces [0, 3, 1, 2]
I'm unhappy with this solution, because it seems very inefficient. It sorts and searches in two separate steps.
This fancy indexing fails for arrays with duplicates, look at:
a = np.array([1, 4, 2, 3, 5])
print(np.argsort(a)[np.argsort(a)])
print(np.searchsorted(np.sort(a),a))
a = np.array([1, 4, 2, 3, 5, 2])
print(np.argsort(a)[np.argsort(a)])
print(np.searchsorted(np.sort(a),a))
You can just use argsort twice on the list.
At first the fact that this works seems a bit confusing, but if you think about it for a while it starts to make sense.
a = np.array([1, 4, 2, 3])
argSorted = np.argsort(a) # [0, 2, 3, 1]
invArgSorted = np.argsort(argSorted) # [0, 3, 1, 2]
You just need to invert the permutation that sorts the array. As shown in the linked question, you can do that like this:
import numpy as np
def sorted_position(array):
a = np.argsort(array)
a[a.copy()] = np.arange(len(a))
return a
print(sorted_position([0.1, 0.2, 0.0, 0.5, 0.8, 0.4, 0.7, 0.3, 0.9, 0.6]))
# [1 2 0 5 8 4 7 3 9 6]
This question already has answers here:
Construct two dimensional numpy array from indices and values of a one dimensional array
(3 answers)
Closed 4 years ago.
I am trying to convert a numpy array
np.array([1,3,2])
to
np.array([[1,0,0],[0,0,1],[0,1,0]])
Any idea of how to do this efficiently?
Thanks!
Create an bool array, and then fill it:
import numpy as np
a = np.array([1, 2, 3, 0, 3, 2, 1])
b = np.zeros((len(a), a.max() + 1), bool)
b[np.arange(len(a)), a] = 1
It is also possible to just select the right values from np.eye or the identity matrix:
a = np.array([1,3,2])
b = np.eye(max(a))[a-1]
This would probably be the most straight forward.
You can compare to [1, 2, 3] like so:
>>> a = np.array([1,3,2])
>>> np.equal.outer(a, np.arange(1, 4)).view(np.int8)
array([[1, 0, 0],
[0, 0, 1],
[0, 1, 0]], dtype=int8)
or equivalent but slightly faster
>>> (a[:, None] == np.arange(1, 4)).view(np.int8)
Try pandas's get dummy method.
import pandas as pd
import numpy as np
arr = np.array([1 ,3, 2])
df = pd.get_dummies(arr)
if what you need is numpy array object, do:
arr2 = df.values
This question already has an answer here:
Matching an array to a row in Numpy
(1 answer)
Closed 5 years ago.
I want to find indexes of array like x = np.array([[1, 1, 1], [2, 2, 2]]) where elements equals to y = np.array([1, 1, 1]). So I did this:
In: np.where(x == y)
Out: (array([0, 0, 0]), array([0, 1, 2]))
It is the correct answer. But I expect to get only index 0 because the zero element of x is equal to y.
You need to use (x == y).all(axis=1) to reduce the comparison result over axis=1 first, i.e all elements are equal:
np.where((x == y).all(axis=1))[0]
# array([0])