Remove duplicates for 2d array Python

Remove duplicates for 2d array Python - python

I have an 2d array. When I'm adding more values to the array, some values are duplicates. How can I remove these? My aray, named a: looks like this:
[[u'82', Button], [u'67', Button], [u'23', Button], [u'19', Button], [u'23', Button]]
I have tried
import numpy as np
def unique(a):
order = np.lexsort(a.T)
a = a[order]
diff = np.diff(a, axis=0)
ui = np.ones(len(a), 'bool')
ui[1:] = (diff != 0).any(axis=1)
return a[ui]
And
[list(t) for t in set(tuple(element) for element in a)]
And
from pandas import *
import numpy as np
a = np.array([[1, 1], [2, 3], [1, 1], [5, 4], [2, 3]])
DataFrame(a).drop_duplicates().values
But none of them work. How can i delete duplicates from the 2d array?

The problem is you are trying to do everything in one step. You need to break it up (EdChum was almost there)
df = pd.DataFrame(data=a)
df = df.drop_duplicates(subset=a)
Per EdChum's comment, this will not work unless the dataframe (df) has been created, otherwise we cannot reference it as the subset.

Related

Lists into Array (Python)

I have 3 lists and I need them to be 1 np.array() with 3 rows. The append method has not been working because it is created 3 separate arrays.
A = array([[1, 4, 1],
[4, 1, 9],
[1, 9, 1]])
[0.01665703 0.06662812 0.01665703]
[0.00049017 0.00012254 0.00110289]
[0.00012333 0.00110994 0.00012333]
ideal output (dtype: numpy.ndarray):
[[0.01665703 0.06662812 0.01665703]
[0.00049017 0.00012254 0.00110289]
[0.00012333 0.00110994 0.00012333]]
Attempted Code:
em = []
for list in A:
result = list / np.exp(list).sum(axis=0)
em.append(result)
Attempted code's output:
[array([0.01665703, 0.06662812, 0.01665703]),
array([0.00049017, 0.00012254, 0.00110289]),
array([0.00012333, 0.00110994, 0.00012333])]

import numpy as np
list_1 = [0.01665703, 0.06662812, 0.01665703]
list_2 = [0.00049017, 0.00012254, 0.00110289]
list_3 = [0.00012333, 0.00110994, 0.00012333]
combined = np.array([
list_1,
list_2,
list_3
])

Is this what you want?
This gives you the ideal output you specified.
probably not the best way to do it, but it works
import numpy as np
myList_1 = [0.01665703, 0.06662812, 0.01665703]
myList_2 = [0.00049017, 0.00012254, 0.00110289]
myList_3 = [0.00012333, 0.00110994, 0.00012333]
print(np.array([myList_1, myList_2, myList_3]))

Python: Counting Zeros in multiple array columns and store them efficently

I create an array:
import numpy as np
arr = [[0, 2, 3], [0, 1, 0], [0, 0, 1]]
arr = np.array(arr)
Now I count every zero per column and store it in a variable:
a = np.count_nonzero(arr[:,0]==0)
b = np.count_nonzero(arr[:,1]==0)
c = np.count_nonzero(arr[:,2]==0)
This code works fine. But in my case I have many more columns with over 70000 values in each. This would be many more lines of code and a very messy variable expolorer in spyder.
My questions:
Is there a possibility to make this code more efficient and save the values only in one type of data, e.g. a dictionary, dataframe or tuple?
Can I use a loop for creating the dic, dataframe or tuple?
Thank you

You can construct a boolean array arr == 0 and then take its sum along the rows.
>>> (arr == 0).sum(0)
array([3, 1, 1])

Use an ordered dict from the collections module:
from collections import OrderedDict
import numpy as np
from pprint import pprint as pp
import string
arr = np.array([[0, 2, 3], [0, 1, 0], [0, 0, 1]])
letters = string.ascii_letters
od = OrderedDict()
for i in range(len(arr)):
od[letters[i]] = np.count_nonzero(arr[:, i]==0)
pp(od)
Returning:
OrderedDict([('a', 3), ('b', 1), ('c', 1)])
Example usage:
print(f"First number of zeros: {od.get('a')}")
Will give you:
First number of zeros: 3

To count zeros you can count non-zeros along each column and subtract result from length of each column:
arr.shape[0] - np.count_nonzero(arr, axis=0)
produces [3,1,1].
This solution is very fast because no extra large objects are created.

Python: how to delete duplicates in Nx3 numpy array

I have Nx3 numpy array, let say:
a=[[1,1,1],[1,2,3],...,[2,1,3],[2,2,2]]
In my case, I don't care about the position of the elements in my "sub 3D array" and I consider them as duplicates:
[1,2,3] == [2,1,3] == [3,1,2] = ...
I would like to delete these duplicates and so get:
a_new = [[1,1,1],[1,2,3],...,[2,2,2]]
The problem is that I have no idea how to do this job.
Any help are welcome and thanks in advance :)

Use sort and unique:
import numpy as np
a=np.array([[1,1,1],[1,2,3],[2,1,3],[2,2,2]])
np.unique(np.sort(a, axis=1), axis=0)
array([[1, 1, 1],
[1, 2, 3],
[2, 2, 2]])

Appending a new row to a numpy array

I am trying to append a new row to an existing numpy array in a loop. I have tried the methods involving append, concatenate and also vstack none of them end up giving me the result I want.
I have tried the following:
for _ in col_change:
if (item + 2 < len(col_change)):
arr=[col_change[item], col_change[item + 1], col_change[item + 2]]
array=np.concatenate((array,arr),axis=0)
item+=1
I have also tried it in the most basic format and it still gives me an empty array.
array=np.array([])
newrow = [1, 2, 3]
newrow1 = [4, 5, 6]
np.concatenate((array,newrow), axis=0)
np.concatenate((array,newrow1), axis=0)
print(array)
I want the output to be [[1,2,3][4,5,6]...]

The correct way to build an array incrementally is to not start with an array:
alist = []
alist.append([1, 2, 3])
alist.append([4, 5, 6])
arr = np.array(alist)
This is essentially the same as
arr = np.array([ [1,2,3], [4,5,6] ])
the most common way of making a small (or large) sample array.
Even if you have good reason to use some version of concatenate (hstack, vstack, etc), it is better to collect the components in a list, and perform the concatante once.

If you want [[1,2,3],[4,5,6]] I could present you an alternative without append: np.arange and then reshape it:
>>> import numpy as np
>>> np.arange(1,7).reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
Or create a big array and fill it manually (or in a loop):
>>> array = np.empty((2, 3), int)
>>> array[0] = [1,2,3]
>>> array[1] = [4,5,6]
>>> array
array([[1, 2, 3],
[4, 5, 6]])
A note on your examples:
In the second one you forgot to save the result, make it array = np.concatenate((array,newrow1), axis=0) and it works (not exactly like you want it but the array is not empty anymore). The first example seems badly indented and without know the variables and/or the problem there it's hard to debug.

How to replace one column by a value in a numpy array?

I have an array like this
import numpy as np
a = np.zeros((2,2), dtype=np.int)
I want to replace the first column by the value 1. I did the following:
a[:][0] = [1, 1] # not working
a[:][0] = [[1], [1]] # not working
Contrariwise, when I replace the rows it worked!
a[0][:] = [1, 1] # working
I have a big array, so I cannot replace value by value.

You can replace the first column as follows:
>>> a = np.zeros((2,2), dtype=np.int)
>>> a[:, 0] = 1
>>> a
array([[1, 0],
[1, 0]])
Here a[:, 0] means "select all rows from column 0". The value 1 is broadcast across this selected column, producing the desired array (it's not necessary to use a list [1, 1], although you can).
Your syntax a[:][0] means "select all the rows from the array a and then select the first row". Similarly, a[0][:] means "select the first row of a and then select this entire row again". This is why you could replace the rows successfully, but not the columns - it's necessary to make a selection for axis 1, not just axis 0.

You can do something like this:
import numpy as np
a = np.zeros((2,2), dtype=np.int)
a[:,0] = np.ones((1,2), dtype=np.int)
Please refer to Accessing np matrix columns

Select the intended column using a proper indexing and just assign the value to it using =. Numpy will take care of the rest for you.
>>> a[::,0] = 1
>>> a
array([[1, 0],
[1, 0]])
Read more about numpy indexing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates for 2d array Python - python

Related

Lists into Array (Python)

Python: Counting Zeros in multiple array columns and store them efficently

Python: how to delete duplicates in Nx3 numpy array

Appending a new row to a numpy array

How to replace one column by a value in a numpy array?

Categories

Resources