Python: how to delete duplicates in Nx3 numpy array - python

I have Nx3 numpy array, let say:
a=[[1,1,1],[1,2,3],...,[2,1,3],[2,2,2]]
In my case, I don't care about the position of the elements in my "sub 3D array" and I consider them as duplicates:
[1,2,3] == [2,1,3] == [3,1,2] = ...
I would like to delete these duplicates and so get:
a_new = [[1,1,1],[1,2,3],...,[2,2,2]]
The problem is that I have no idea how to do this job.
Any help are welcome and thanks in advance :)

Use sort and unique:
import numpy as np
a=np.array([[1,1,1],[1,2,3],[2,1,3],[2,2,2]])
np.unique(np.sort(a, axis=1), axis=0)
array([[1, 1, 1],
[1, 2, 3],
[2, 2, 2]])

Related

Remove first occurence of elements in a numpy array

I have a numpy array:
a = np.array([-1,2,3,-1,5,-2,2,9])
I want to only keep values in the array which occurs more than 2 times, so the result should be:
a = np.array([-1,2,-1,2])
Is there a way to do this only using numpy?
I have a solution using a dictionary and dictionary filtering, but this is kind of slow, and I was wondering if there was a faster solution only using numpy.
Thanks !
import numpy as np
a = np.array([-1, 2, 3, -1, 5, -2, 2, 9])
values, counts = np.unique(a, return_counts=True)
values_filtered = values[counts >= 2]
result = a[np.isin(a, values_filtered)]
print(result) # return [-1 2 -1 2]
import numpy as np
arr = np.array([1, 2, 3,4,4,4,1])
filter_arr = [np.count_nonzero(arr == i)>1 for i in arr]
newarr = arr[filter_arr]
print(filter_arr)
print(np.unique(newarr))
thanks a lot!
All answers solved the problem, but the solution from Matvey_coder3 was the fastest.
KR

How to calculate the difference b/w each element in an numpy array with a shape[0]>2

Please feel free to let me know whether it is a duplicate question.
From
in_arr1 = np.array([[2,0], [-6,0], [3,0]])
How can I get:
diffInElements = [[5,0]] ?
I tried np.diff(in_arr1, axis=0) but it does not generate what I want.
is there a NumPy function I can use ?
Cheers,
You can negate and then sum all but the first value, and then add the first value:
diff = (-a[1:]).sum(axis=0) + a[0]
Output:
>>> diff
array([5, 0])
You want to subtract the remaining rows from the first row. The straightforward answer does just that:
>>> arr = np.array([[2, 1], [-6, 3], [3, -4]])
>>> arr[0, :] - arr[1:, :].sum(0)
array([5, 2])
There is also, however, a more advanced option making use of the somewhat obscure reduce() method of numpy ufuncs:
>>> np.subtract.reduce(arr, axis=0)
array([5, 2])

Concatenate 2d list in python

I'm trying to create a 2d list with shape of [n,784] (the same shape as the MNIST image batches) using multiple [1,784] lists.
mylist.append(element) doesn't give me what I'm looking for, where mylist is the 2d [n,784] list and element is the [1,784] lists. It would return a list with shape [n,1,784].
I've also tried mylist[index].append(element), and I got a [784] 1d list instead.
Any idea how to solve my problem?
Thanks a lot
import numpy as np
myarray = np.array(mylist)
newarray = np.concatenate((myarray, element))
And if you want to turn it back into a list:
newlist = newarray.tolist()
a = [[1,1],[2,2]]
b = np.concatenate([a, a], axis=1).tolist()
The output will be:
[[1, 1, 1, 1], [2, 2, 2, 2]]

Appending a new row to a numpy array

I am trying to append a new row to an existing numpy array in a loop. I have tried the methods involving append, concatenate and also vstack none of them end up giving me the result I want.
I have tried the following:
for _ in col_change:
if (item + 2 < len(col_change)):
arr=[col_change[item], col_change[item + 1], col_change[item + 2]]
array=np.concatenate((array,arr),axis=0)
item+=1
I have also tried it in the most basic format and it still gives me an empty array.
array=np.array([])
newrow = [1, 2, 3]
newrow1 = [4, 5, 6]
np.concatenate((array,newrow), axis=0)
np.concatenate((array,newrow1), axis=0)
print(array)
I want the output to be [[1,2,3][4,5,6]...]
The correct way to build an array incrementally is to not start with an array:
alist = []
alist.append([1, 2, 3])
alist.append([4, 5, 6])
arr = np.array(alist)
This is essentially the same as
arr = np.array([ [1,2,3], [4,5,6] ])
the most common way of making a small (or large) sample array.
Even if you have good reason to use some version of concatenate (hstack, vstack, etc), it is better to collect the components in a list, and perform the concatante once.
If you want [[1,2,3],[4,5,6]] I could present you an alternative without append: np.arange and then reshape it:
>>> import numpy as np
>>> np.arange(1,7).reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
Or create a big array and fill it manually (or in a loop):
>>> array = np.empty((2, 3), int)
>>> array[0] = [1,2,3]
>>> array[1] = [4,5,6]
>>> array
array([[1, 2, 3],
[4, 5, 6]])
A note on your examples:
In the second one you forgot to save the result, make it array = np.concatenate((array,newrow1), axis=0) and it works (not exactly like you want it but the array is not empty anymore). The first example seems badly indented and without know the variables and/or the problem there it's hard to debug.

Remove duplicates for 2d array Python

I have an 2d array. When I'm adding more values to the array, some values are duplicates. How can I remove these? My aray, named a: looks like this:
[[u'82', Button], [u'67', Button], [u'23', Button], [u'19', Button], [u'23', Button]]
I have tried
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
And
[list(t) for t in set(tuple(element) for element in a)]
And
from pandas import *
import numpy as np
a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
DataFrame(a).drop_duplicates().values
But none of them work. How can i delete duplicates from the 2d array?
The problem is you are trying to do everything in one step. You need to break it up (EdChum was almost there)
df = pd.DataFrame(data=a)
df = df.drop_duplicates(subset=a)
Per EdChum's comment, this will not work unless the dataframe (df) has been created, otherwise we cannot reference it as the subset.

Categories

Resources