I have numpy 2d array having duplicate values.
I am searching the array like this.
In [104]: import numpy as np
In [105]: array = np.array
In [106]: a = array([[1, 2, 3],
...: [1, 2, 3],
...: [2, 5, 6],
...: [3, 8, 9],
...: [4, 8, 9],
...: [4, 2, 3],
...: [5, 2, 3])
In [107]: num_list = [1, 4, 5]
In [108]: for i in num_list :
...: print(a[np.where(a[:,0] == num_list)])
...:
[[1 2 3]
[1 2 3]]
[[4 8 9]
[4 2 3]]
[[5 2 3]]
The input is list having number similar to column 0 values.
The end result I want is the resulting rows in any format like array, list or tuple for example
array([[1, 2, 3],
[1, 2, 3],
[4, 8, 9],
[4, 2, 3],
[5, 2, 3]])
My code works fine but doesn't seem pythonic. Is there any better searching strategy with multiple values?
like a[np.where(a[:,0] == l)] where only one time lookup is done to get all the values.
my real array is large
Approach #1 : Using np.in1d -
a[np.in1d(a[:,0], num_list)]
Approach #2 : Using np.searchsorted -
num_arr = np.sort(num_list) # Sort num_list and get as array
# Get indices of occurrences of first column in num_list
idx = np.searchsorted(num_arr, a[:,0])
# Take care of out of bounds cases
idx[idx==len(num_arr)] = 0
out = a[a[:,0] == num_arr[idx]]
You can do
a[numpy.in1d(a[:, 0], num_list), :]
Related
Let the 2-dimensional array is as below:
In [1]: a = [[1, 2], [3, 4], [5, 6], [1, 2], [7, 8]]
a = np.array(a)
a, type(a)
Out [1]: (array([[1, 2],
[3, 4],
[5, 6],
[1, 2],
[7, 8]]),
numpy.ndarray)
I have tried to do this procedure:
In [2]: a = a[a != [1, 2])
a = np.reshape(a, (int(a.size/2), 2) # I have to do this since on the first line in In [2] change the dimension to 1 [3, 4, 5, 6, 7, 8] (the initial array is 2-dimensional array)
a
Out[2]: array([[3, 4],
[5, 6],
[7, 8]])
My question is, is there any function in NumPy that can directly do that?
Updated Question
Here's the semi-full source code that I've been working on:
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = pd.DataFrame(data.target)
bucket = df[df['Target'] == 0]
bucket = bucket.iloc[:,[0,1]].values
lp, rp = leftestRightest(bucket)
bucket = np.array([x for x in bucket if list(x) != lp])
bucket = np.array([x for x in bucket if list(x) != rp])
Notes:
leftestRightest(arg) is a function that returns 2 one-dimensional NumPy arrays of size 2 (which are lp and rp). For instances, lp = [1, 3], rp = [2, 4] and the parameter is 2-dimensional NumPy array
There should be a more delicate approach, but here what I have come up with:
np.array([x for x in a if list(x) != [1,2]])
Output
[[3, 4], [5, 6], [7, 8]]
Note that I wouldn't recommend working with list comprehensions in the large array since it would be highly time-consuming.
You're approach is correct, but the mask needs to be single-dimensional:
a[(a != [1, 2]).all(-1)]
Output:
array([[3, 4],
[5, 6],
[7, 8]])
Alternatively, you can collect the elements and infer the dimension with -1:
a[a != [1, 2]].reshape(-1, 2)
the boolean condition creates a 2D array of True/False. You have to apply and operation across the columns to make sure the match is not a partial match. Consider a row [5,2] in your above array, the script you wrote will add 5 and ignore 2 in the resultant 1D array. It can be done as follows:
a[np.all(a != [1, 2],axis=1)]
I want to comparing 2 values in a 2-dimensional numpy array. The array is as follows:
a = [[1, 3, 5],
[4, 8, 1]]
I want to comparing [1, 3, 5] with [4, 8, 1] with a greater value into 1 group.
The result I want is like this:
a1 = [4, 8, 5]
a2 = [1, 3, 1]
How could the code be written in python?
You can use np.sort on axis 0 (column-wise). Reverse the order using [::-1] to get them in descending order
>>> np.sort(a, axis = 0)[::-1]
array([[4, 8, 5],
[1, 3, 1]])
Since you only have 2 rows you can do this:
a = np.array([[1, 3, 5],
[4, 8, 1]])
idx = np.greater(*a)
a1 = a[idx.astype(int),np.arange(3)]
a2 = a[~idx.astype(int),np.arange(3)]
This question already has answers here:
Indexing one array by another in numpy
(4 answers)
Closed 3 years ago.
I have a list of lists of indices for an numpy array, but do not quite come to the wanted result when using them.
n = 3
a = np.array([[8, 1, 6],
[3, 5, 7],
[4, 9, 2]])
np.random.seed(7)
idx = np.random.choice(np.arange(n), size=(n, n-1))
# array([[0, 1],
# [2, 0],
# [1, 2]])
In this case I want:
element 0 and 1 of row 0
element 2 and 0 of row 1
element 1 and 2 of row 2
My list has n sublists and all of those lists have the same length.
I want that each sublist is only used once and not for all axis.
# Wanted result
# b = array[[8, 1],
# [7, 3],
# [9, 2]])
I can achieve this but it seems rather cumbersome with a lot of repeating and reshaping.
# Possibility 1
b = a[:, idx]
# array([[[8, 1], | [[3, 5], | [[4, 9],
# [6, 8], | [7, 3], | [2, 4],
# [1, 6]], | [5, 7]], | [9, 2]])
b = b[np.arange(n), np.arange(n), :]
# Possibility 2
b = a[np.repeat(range(n), n-1), idx.ravel()]
# array([8, 1, 7, 3, 9, 2])
b = b.reshape(n, n-1)
Are there easier ways?
You can use np.take_along_axis here:
np.take_along_axis(a, idx, 1)
array([[8, 1],
[7, 3],
[9, 2]])
Or using broadcasting:
a[np.arange(a.shape[0])[:,None], idx]
array([[8, 1],
[7, 3],
[9, 2]])
Note that your using integer array indexing here, you need to specify over which axis and rows you want to index using idx.
Suppose I have a numpy array as below
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
array([[1, 2, 3],
[1, 4, 3],
[2, 5, 4],
[2, 7, 5]])
How can I flatten column 2 and 3 for each unique element in column 1 like below:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],])
Thank you for your help.
Another option using list comprehension:
np.array([np.insert(a[a[:,0] == k, 1:].flatten(), 0, k) for k in np.unique(a[:,0])])
# array([[1, 2, 3, 4, 3],
# [2, 5, 4, 7, 5]])
import numpy as np
a = np.asarray([[1,2,3],[1,4,3],[2,5,4],[2,7,5]])
d = {}
for row in a:
d[row[0]] = np.concatenate( (d.get(row[0], []), row[1:]) )
r = np.array([np.concatenate(([key], d[key])) for key in d])
print(r)
This prints:
[[ 1. 2. 3. 4. 3.]
[ 2. 5. 4. 7. 5.]]
Since as posted in the comments, we know that each unique element in column-0 would have a fixed number of rows and by which I assumed it was meant same number of rows, we can use a vectorized approach to solve the case. We sort the rows based on column-0 and look for shifts along it, which would signify group change and thus give us the exact number of rows associated per unique element in column-0. Let's call it L. Finally, we slice sorted array to select columns-1,2 and group L rows together by reshaping. Thus, the implementation would be -
sa = a[a[:,0].argsort()]
L = np.unique(sa[:,0],return_index=True)[1][1]
out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
For more performance boost, we can use np.diff to calculate L, like so -
L = np.where(np.diff(sa[:,0])>0)[0][0]+1
Sample run -
In [103]: a
Out[103]:
array([[1, 2, 3],
[3, 7, 8],
[1, 4, 3],
[2, 5, 4],
[3, 8, 2],
[2, 7, 5]])
In [104]: sa = a[a[:,0].argsort()]
...: L = np.unique(sa[:,0],return_index=True)[1][1]
...: out = np.column_stack((sa[::L,0],sa[:,1:].reshape(-1,2*L)))
...:
In [105]: out
Out[105]:
array([[1, 2, 3, 4, 3],
[2, 5, 4, 7, 5],
[3, 7, 8, 8, 2]])
I am trying to find a way how to create a new array from a multidimensional array by taking only elements that are unique in the first column, for example if I have an array
[[1,2,3],
[1,2,3],
[5,2,3]]
After the operation I would like to get this output
[[1,2,3],
[5,2,3]]
Obviously the second an third columns do not need to be unique.
Thanks
Since you are looking to keep the first row of first column uniqueness, you can just use np.unique with its optional return_index argument which will give you the first occurring index (thus fulfils the first row criteria) among the uniqueness on A[:,0] elements, where A is the input array. Thus, we would have a vectorized solution, like so -
_,idx = np.unique(A[:,0],return_index=True)
out = A[idx]
Sample run -
In [16]: A
Out[16]:
array([[1, 2, 3],
[5, 2, 3],
[1, 4, 3]])
In [17]: _,idx = np.unique(A[:,0],return_index=True)
...: out = A[idx]
...:
In [18]: out
Out[18]:
array([[1, 2, 3],
[5, 2, 3]])
main = [[1, 2, 3], [1, 3, 4], [2, 4, 5], [3, 6, 5]]
used = []
new = [[sub, used.append(sub[0])][0] for sub in main if sub[0] not in used]
print(new)
# Output: [[1, 2, 3], [2, 3, 4], [3, 6, 5]]