numpy.unique gives non-unique output? - python

I am trying to get the indices of unique elements of a numpy array (long vector of 3628621 elements).
However, I must do something wrong, because when I try to select the unique elements I am still finding duplicates:
Vector
Out[165]: array([712450, 714390, 718560, ..., 384390, 992041, 94852])
Loc = np.where(np.unique(Vector)) # Find indices of unique elements
Vector_New = Vector[Loc] # Create new vector with all unique elements
np.where(Vector_New == 173020) # See how often/where '173020' exists
Out[166]: (array([ 7098, 11581], dtype=int64),)
So, the integer '173020' exists still twice in the new vector, although I expected that all elements should be unique. The new vector is 11594 elements long.
Thanks for the help!
Regards,
Timen

np.unique has several parameters that can be activated and will give you the needed information. It's calling signature is:
np.unique(ar, return_index=False, return_inverse=False, return_counts=False)
read the docs.
In [50]: keys
Out[50]:
array([1, 3, 5, 2, 0, 7, 4, 7, 7, 2, 7, 5, 5, 3, 6, 2, 3, 5, 5, 5, 6, 9, 6,
5, 2, 1, 6, 6, 5, 9, 9, 6, 5, 5, 9, 9, 6, 3, 7, 0, 5, 1, 7, 6, 2, 4,
1, 0, 6, 5, 4, 8, 8, 4, 2, 1, 8, 3, 1, 9, 8, 4, 4, 2, 4, 7, 2, 6, 8,
6, 5, 2, 4, 9, 1, 5, 3, 1, 5, 6, 2, 2, 8, 4, 0, 4, 9, 0, 8, 1, 5, 3,
1, 3, 7, 1, 5, 8, 5, 8])
In [51]: np.unique(keys, return_counts=True, return_index=True)
Out[51]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 4, 0, 3, 1, 6, 2, 14, 5, 51, 21], dtype=int32),
array([ 5, 11, 11, 8, 10, 18, 12, 8, 9, 8]))

Related

Python: How to efficiently create all possible 2 element swaps of an array?

I try to generate all possible 2-element swaps of a given array.
For example:
candidate = [ 5, 9, 1, 8, 3, 7, 10, 6, 4, 2]
result = [[ 9, 5, 1, 8, 3, 7, 10, 6, 4, 2]
[ 1, 9, 5, 8, 3, 7, 10, 6, 4, 2]
[ 8, 9, 1, 5, 3, 7, 10, 6, 4, 2]
[ 3, 9, 1, 8, 5, 7, 10, 6, 4, 2]
[ 7, 9, 1, 8, 3, 5, 10, 6, 4, 2]
[10, 9, 1, 8, 3, 7, 5, 6, 4, 2]
[ 6, 9, 1, 8, 3, 7, 10, 5, 4, 2]
[ 4, 9, 1, 8, 3, 7, 10, 6, 5, 2]
[ 2, 9, 1, 8, 3, 7, 10, 6, 4, 5]
[ 5, 1, 9, 8, 3, 7, 10, 6, 4, 2]
[ 5, 8, 1, 9, 3, 7, 10, 6, 4, 2]
[ 5, 3, 1, 8, 9, 7, 10, 6, 4, 2]
[ 5, 7, 1, 8, 3, 9, 10, 6, 4, 2]
[ 5, 10, 1, 8, 3, 7, 9, 6, 4, 2]
[ 5, 6, 1, 8, 3, 7, 10, 9, 4, 2]
[ 5, 4, 1, 8, 3, 7, 10, 6, 9, 2]
[ 5, 2, 1, 8, 3, 7, 10, 6, 4, 9]
[ 5, 9, 8, 1, 3, 7, 10, 6, 4, 2]
[ 5, 9, 3, 8, 1, 7, 10, 6, 4, 2]
[ 5, 9, 7, 8, 3, 1, 10, 6, 4, 2]
[ 5, 9, 10, 8, 3, 7, 1, 6, 4, 2]
[ 5, 9, 6, 8, 3, 7, 10, 1, 4, 2]
[ 5, 9, 4, 8, 3, 7, 10, 6, 1, 2]
[ 5, 9, 2, 8, 3, 7, 10, 6, 4, 1]
[ 5, 9, 1, 3, 8, 7, 10, 6, 4, 2]
[ 5, 9, 1, 7, 3, 8, 10, 6, 4, 2]
[ 5, 9, 1, 10, 3, 7, 8, 6, 4, 2]
[ 5, 9, 1, 6, 3, 7, 10, 8, 4, 2]
[ 5, 9, 1, 4, 3, 7, 10, 6, 8, 2]
[ 5, 9, 1, 2, 3, 7, 10, 6, 4, 8]
[ 5, 9, 1, 8, 7, 3, 10, 6, 4, 2]
[ 5, 9, 1, 8, 10, 7, 3, 6, 4, 2]
[ 5, 9, 1, 8, 6, 7, 10, 3, 4, 2]
[ 5, 9, 1, 8, 4, 7, 10, 6, 3, 2]
[ 5, 9, 1, 8, 2, 7, 10, 6, 4, 3]
[ 5, 9, 1, 8, 3, 10, 7, 6, 4, 2]
[ 5, 9, 1, 8, 3, 6, 10, 7, 4, 2]
[ 5, 9, 1, 8, 3, 4, 10, 6, 7, 2]
[ 5, 9, 1, 8, 3, 2, 10, 6, 4, 7]
[ 5, 9, 1, 8, 3, 7, 6, 10, 4, 2]
[ 5, 9, 1, 8, 3, 7, 4, 6, 10, 2]
[ 5, 9, 1, 8, 3, 7, 2, 6, 4, 10]
[ 5, 9, 1, 8, 3, 7, 10, 4, 6, 2]
[ 5, 9, 1, 8, 3, 7, 10, 2, 4, 6]
[ 5, 9, 1, 8, 3, 7, 10, 6, 2, 4]]
I currently achive this by using two nested for-loops:
neighborhood = []
for node1 in range(candidate.size - 1):
for node2 in range(node1 + 1, candidate.size):
neighbor = np.copy(candidate)
neighbor[node1] = candidate[node2]
neighbor[node2] = candidate[node1]
neighborhood.append(neighbor)
The larger the array gets, the more inefficient and slower it becomes. Is there a more efficient way here that can also process arrays with three-digit contents?
Thank you!
You can use a generator if you need to use those arrays one by one (in this way, you don't need to memorize them all, you need very little space):
from itertools import combinations
def gen(lst):
for i, j in combinations(range(len(lst)), 2):
yield lst[:i] + lst[j] + lst[i:j] + lst[i] + lst[j:]
And then you can use it in this way:
for lst in gen(candidate):
# do something with your list with two swapped elements
This is going to save much space, but it's probably going to be still slow overall.
Here is a solution using NumPy. This is not space efficient (because it's memorizing all possible lists with swapped elements), but it might be much faster because of NumPy optimizations. Give it a try!
from itertools import combinations
from math import comb
arr = np.tile(candidate, (comb(len(candidate), 2), 1))
indices = np.array(list(combinations(range(len(candidate)), 2)))
arr[np.arange(arr.shape[0])[:, None], indices] = arr[np.arange(arr.shape[0])[:, None], np.flip(indices, axis=-1)]
Example (with candidate = [0, 1, 2, 3]):
>>> arr
array([[1, 0, 2, 3],
[2, 1, 0, 3],
[3, 1, 2, 0],
[0, 2, 1, 3],
[0, 3, 2, 1],
[0, 1, 3, 2]])
Notice that math.comb (which gives you the total number of possible lists with 2 swapped elements) is available only with python >= 3.8. Please have a look at this question to know how to replace math.comb in case you're using an older python version.
To swap only two items in any given list, I'd recommend using range with itertools.combinations. It is probably good to use a generator with the yield statement too, though if you are getting all results at once, it probably doesn't matter much.
from itertools import combinations
def swap2(l):
for pair in combinations(range(len(l)), 2):
l2 = l[:]
l2[pair[0]], l2[pair[1]] = l2[pair[1]], l2[pair[0]]
yield l2
if __name__ == "__main__":
candidate = [5, 9, 1, 8, 3, 7, 10, 6, 4, 2]
result = [l for l in swap2(candidate)]

How to duplicate a NumPy array to form a new array with several rows of the original array? [duplicate]

This question already has an answer here:
Numpy - create matrix with rows of vector
(1 answer)
Closed 2 years ago.
I want to create a NumPy array by duplicating another array by a few rows. I did it as shown below. Is there a NumPyier way of doing this?
>>> a = np.arange(0,10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = tuple( a for _ in range(3) )
>>> b
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
>>> c = np.vstack( b )
>>> c
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
I found a way to do it. Sharing it here.
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[None,:]
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> np.repeat( a[None,:], 3, axis=0 )
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Making a 10x10 grid from a list of arrays

I'm struggling to list my array as a 10x10 grid, the output I keep getting isn't what I'm looking for. I was hoping someone could help me out.
import numpy as np
x = 1
y = 1
scale = 10
nn = []
for x in range(1,scale+1):
mm = []
for y in range(1,scale+1):
xy = [x,y]
mm.append(xy)
#print(xy)
y=+1
nn.append(mm)
x=+1
nn
grid_array = np.array(nn)
grid=np.meshgrid(grid_array)
But the output I get isn't displaying 10x10
[array([ 1, 1, 1, 2, 1, 3, 1, 4, 1, 5, 1, 6, 1, 7, 1, 8, 1,
9, 1, 10, 2, 1, 2, 2, 2, 3, 2, 4, 2, 5, 2, 6, 2, 7,
2, 8, 2, 9, 2, 10, 3, 1, 3, 2, 3, 3, 3, 4, 3, 5, 3,
6, 3, 7, 3, 8, 3, 9, 3, 10, 4, 1, 4, 2, 4, 3, 4, 4,
4, 5, 4, 6, 4, 7, 4, 8, 4, 9, 4, 10, 5, 1, 5, 2, 5,
3, 5, 4, 5, 5, 5, 6, 5, 7, 5, 8, 5, 9, 5, 10, 6, 1,
6, 2, 6, 3, 6, 4, 6, 5, 6, 6, 6, 7, 6, 8, 6, 9, 6,
10, 7, 1, 7, 2, 7, 3, 7, 4, 7, 5, 7, 6, 7, 7, 7, 8,
7, 9, 7, 10, 8, 1, 8, 2, 8, 3, 8, 4, 8, 5, 8, 6, 8,
7, 8, 8, 8, 9, 8, 10, 9, 1, 9, 2, 9, 3, 9, 4, 9, 5,
9, 6, 9, 7, 9, 8, 9, 9, 9, 10, 10, 1, 10, 2, 10, 3, 10,
4, 10, 5, 10, 6, 10, 7, 10, 8, 10, 9, 10, 10])]
Edited.
This is what I have thus far, thanks for the help guys.
import numpy as np
scale = 10
array = np.empty(shape=(scale, scale, 2)).astype(int)
for x in range(1,scale+1):
for y in range(1,scale+1):
#print([x,y])
array[x-1,y-1] = [x,y]
print(array)
You can use numpy to do that. like this
np.reshape(arr, (-1,10))
See.
Convert a 1D array to a 2D array in numpy
It's pretty far from clear what you want to achieve, but if you simply want to know how to generate a 10x10 numpy array using two for loops, here is what you can do (not he most pythonic way to do it though):
import numpy as np
scale = 10
array = np.empty(shape=(scale, scale))
for x in range(scale):
for y in range(scale):
array[x,y] = 42 # replace with whatever dynamically assigned value you want there
print(array)

Fastest numpy way to remove a list of cells from a 2d array

I have a very large 2D numpy array of m x n elements. For each row, I need to remove exactly one element. So for example from a 4x6 matrix I might need to delete [0, 1], [1, 4], [2, 3], and [3, 3] - I have this set of coordinates stored in a list. In the end, the matrix will ultimately shrink in width by 1.
Is there a standard way to do this using a mask? Ideally, I need this to be as performant as possible.
Here is a method that use ravel_multi_index() to calculate one-dim index, and then delete() the elements, and reshape back to two-dim array:
import numpy as np
n = 12
a = np.repeat(np.arange(10)[None, :], n, axis=0)
index = np.random.randint(0, 10, n)
ravel_index = np.ravel_multi_index((np.arange(n), index), a.shape)
np.delete(a, ravel_index).reshape(n, -1)
the index:
array([4, 6, 9, 0, 3, 5, 3, 8, 9, 8, 4, 4])
the result:
array([[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 5, 6, 7, 8, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8],
[0, 1, 2, 4, 5, 6, 7, 8, 9]])

simple way to select numpy subarray using boolean conditional vector in python 3

How can you select only the columns of a 2-d numpy array that correspond to a conditional boolean vector?
Say you have a 10x10 matrix, generated by, say:
a = np.random.randint(0,1,(10,10))
a =
array([[4, 9, 1, 9, 5, 2, 1, 7, 6, 5],
[5, 4, 2, 4, 8, 1, 5, 5, 7, 5],
[3, 8, 7, 4, 3, 4, 8, 8, 8, 3],
[5, 4, 4, 4, 9, 6, 7, 1, 6, 8],
[8, 3, 2, 1, 7, 5, 8, 8, 4, 9],
[9, 5, 6, 8, 6, 8, 1, 4, 4, 5],
[5, 4, 3, 2, 8, 3, 2, 2, 8, 6],
[2, 5, 4, 5, 9, 7, 9, 2, 5, 6],
[4, 5, 9, 7, 3, 1, 5, 7, 4, 8],
[6, 1, 3, 8, 8, 3, 2, 6, 6, 7]])
and you want to cut out all the rows corresponding to a vector containing (True/False or 0/1), like, say:
b = np.random.randint(0,2,10)
b =
array([0, 1, 1, 0, 1, 0, 1, 1, 1, 1])
I spent some time trying to find the simple syntax to return only specified colummns in a numpy array in python 3 and finally have it figured out. There are a number of other threads which show more complicated ways to do this, so I thought I would put the simple solution here. This will be very obvious to more experienced python users, but for a beginner like me, it would have been useful.
The simplest way is:
new_matrix = a[:,b==1]
which yields:
new_matrix =
array([[9, 1, 5, 1, 7, 6, 5],
[4, 2, 8, 5, 5, 7, 5],
[8, 7, 3, 8, 8, 8, 3],
[4, 4, 9, 7, 1, 6, 8],
[3, 2, 7, 8, 8, 4, 9],
[5, 6, 6, 1, 4, 4, 5],
[4, 3, 8, 2, 2, 8, 6],
[5, 4, 9, 9, 2, 5, 6],
[5, 9, 3, 5, 7, 4, 8],
[1, 3, 8, 2, 6, 6, 7]])
This would have saved me a lot of time.

Categories

Resources