How can I use the unique(a, 'rows') from MATLAB in Python? - python

I'm translating some stuff from MATLAB to the Python language.
There's this command, unique(a), in NumPy. But since the MATLAB program runs the 'rows' command also, it gives something a little different.
Is there a similar command in Python or should I make some algorithm that does the same thing?

Assuming your 2D array is stored in the usual C order (that is, each row is counted as an array or list within the main array; in other words, row-major order), or that you transpose the array beforehand otherwise, you could do something like...
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [2, 3, 4], [1, 2, 3], [3, 4, 5]])
>>> a
array([[1, 2, 3],
[2, 3, 4],
[1, 2, 3],
[3, 4, 5]])
>>> np.array([np.array(x) for x in set(tuple(x) for x in a)]) # or "list(x) for x in set[...]"
array([[3, 4, 5],
[2, 3, 4],
[1, 2, 3]])
Of course, this doesn't really work if you need the unique rows in their original order.
By the way, to emulate something like unique(a, 'columns'), you'd just transpose the original array, do the step shown above, and then transpose back.

You can try:
ii = 0; wrk_arr = your_arr
idx = numpy.arange(0,len(wrk_arr))
while ii<=len(wrk_arr)-1:
i_list = numpy.arange(0,len(wrk_arr)
candidate = numpy.matrix(wrk_arr[ii,:])
i_dup = numpy.array([0] * len(wrk_arr))
numpy.all(candidate == wrk_arr,axis=1, iout = idup)
idup[ii]=0
i_list = numpy.unique(i_list * (1-idup))
idx = numpy.unique(idx * (1-idup))
wrk_arr = wrk_arr[i_list,:]
ii += 1
The results are wrk_arr which is the unique sorted array of your_arr. The relation is:
your_arr[idx,:] = wrk_arr
It works like MATLAB in the sense that the returned array (wrk_arr) keeps the order of the original array (your_arr). The idx array differs from MATLAB since it contains the indices of first appearance whereas MATLAB returns the LAST appearance.
From my experience it worked as fast as MATLAB on a 10000 X 4 matrix.
And a transpose will do the trick for the column case.

Related

Need a "sortorder" funtion [duplicate]

I have a numerical list:
myList = [1, 2, 3, 100, 5]
Now if I sort this list to obtain [1, 2, 3, 5, 100].
What I want is the indices of the elements from the
original list in the sorted order i.e. [0, 1, 2, 4, 3]
--- ala MATLAB's sort function that returns both
values and indices.
If you are using numpy, you have the argsort() function available:
>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html
This returns the arguments that would sort the array or list.
Something like next:
>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]
enumerate(myList) gives you a list containing tuples of (index, value):
[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]
You sort the list by passing it to sorted and specifying a function to extract the sort key (the second element of each tuple; that's what the lambda is for. Finally, the original index of each sorted element is extracted using the [i[0] for i in ...] list comprehension.
myList = [1, 2, 3, 100, 5]
sorted(range(len(myList)),key=myList.__getitem__)
[0, 1, 2, 4, 3]
I did a quick performance check on these with perfplot (a project of mine) and found that it's hard to recommend anything else but
np.argsort(x)
(note the log scale):
Code to reproduce the plot:
import perfplot
import numpy as np
def sorted_enumerate(seq):
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def sorted_enumerate_key(seq):
return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]
def sorted_range(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
b = perfplot.bench(
setup=np.random.rand,
kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, np.argsort],
n_range=[2 ** k for k in range(15)],
xlabel="len(x)",
)
b.save("out.png")
The answers with enumerate are nice, but I personally don't like the lambda used to sort by the value. The following just reverses the index and the value, and sorts that. So it'll first sort by value, then by index.
sorted((e,i) for i,e in enumerate(myList))
Updated answer with enumerate and itemgetter:
sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]
Zip the lists together: The first element in the tuple will the index, the second is the value (then sort it using the second value of the tuple x[1], x is the tuple)
Or using itemgetter from the operatormodule`:
from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))
Essentially you need to do an argsort, what implementation you need depends if you want to use external libraries (e.g. NumPy) or if you want to stay pure-Python without dependencies.
The question you need to ask yourself is: Do you want the
indices that would sort the array/list
indices that the elements would have in the sorted array/list
Unfortunately the example in the question doesn't make it clear what is desired because both will give the same result:
>>> arr = np.array([1, 2, 3, 100, 5])
>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)
>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)
Choosing the argsort implementation
If you have NumPy at your disposal you can simply use the function numpy.argsort or method numpy.ndarray.argsort.
An implementation without NumPy was mentioned in some other answers already, so I'll just recap the fastest solution according to the benchmark answer here
def argsort(l):
return sorted(range(len(l)), key=l.__getitem__)
Getting the indices that would sort the array/list
To get the indices that would sort the array/list you can simply call argsort on the array or list. I'm using the NumPy versions here but the Python implementation should give the same results
>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)
The result contains the indices that are needed to get the sorted array.
Since the sorted array would be [1, 2, 3, 4] the argsorted array contains the indices of these elements in the original.
The smallest value is 1 and it is at index 1 in the original so the first element of the result is 1.
The 2 is at index 2 in the original so the second element of the result is 2.
The 3 is at index 0 in the original so the third element of the result is 0.
The largest value 4 and it is at index 3 in the original so the last element of the result is 3.
Getting the indices that the elements would have in the sorted array/list
In this case you would need to apply argsort twice:
>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)
In this case :
the first element of the original is 3, which is the third largest value so it would have index 2 in the sorted array/list so the first element is 2.
the second element of the original is 1, which is the smallest value so it would have index 0 in the sorted array/list so the second element is 0.
the third element of the original is 2, which is the second-smallest value so it would have index 1 in the sorted array/list so the third element is 1.
the fourth element of the original is 4 which is the largest value so it would have index 3 in the sorted array/list so the last element is 3.
If you do not want to use numpy,
sorted(range(len(seq)), key=seq.__getitem__)
is fastest, as demonstrated here.
The other answers are WRONG.
Running argsort once is not the solution.
For example, the following code:
import numpy as np
x = [3,1,2]
np.argsort(x)
yields array([1, 2, 0], dtype=int64) which is not what we want.
The answer should be to run argsort twice:
import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))
gives array([2, 0, 1], dtype=int64) as expected.
Most easiest way you can use Numpy Packages for that purpose:
import numpy
s = numpy.array([2, 3, 1, 4, 5])
sort_index = numpy.argsort(s)
print(sort_index)
But If you want that you code should use baisc python code:
s = [2, 3, 1, 4, 5]
li=[]
for i in range(len(s)):
li.append([s[i],i])
li.sort()
sort_index = []
for x in li:
sort_index.append(x[1])
print(sort_index)
We will create another array of indexes from 0 to n-1
Then zip this to the original array and then sort it on the basis of the original values
ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()
`
s = [2, 3, 1, 4, 5]
print([sorted(s, reverse=False).index(val) for val in s])
For a list with duplicate elements, it will return the rank without ties, e.g.
s = [2, 2, 1, 4, 5]
print([sorted(s, reverse=False).index(val) for val in s])
returns
[1, 1, 0, 3, 4]
Import numpy as np
FOR INDEX
S=[11,2,44,55,66,0,10,3,33]
r=np.argsort(S)
[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])
argsort Returns the indices of S in sorted order
FOR VALUE
np.sort(S)
[output]=array([ 0, 2, 3, 10, 11, 33, 44, 55, 66])
Code:
s = [2, 3, 1, 4, 5]
li = []
for i in range(len(s)):
li.append([s[i], i])
li.sort()
sort_index = []
for x in li:
sort_index.append(x[1])
print(sort_index)
Try this, It worked for me cheers!
firstly convert your list to this:
myList = [1, 2, 3, 100, 5]
add a index to your list's item
myList = [[0, 1], [1, 2], [2, 3], [3, 100], [4, 5]]
next :
sorted(myList, key=lambda k:k[1])
result:
[[0, 1], [1, 2], [2, 3], [4, 5], [3, 100]]
A variant on RustyRob's answer (which is already the most performant pure Python solution) that may be superior when the collection you're sorting either:
Isn't a sequence (e.g. it's a set, and there's a legitimate reason to want the indices corresponding to how far an iterator must be advanced to reach the item), or
Is a sequence without O(1) indexing (among Python's included batteries, collections.deque is a notable example of this)
Case #1 is unlikely to be useful, but case #2 is more likely to be meaningful. In either case, you have two choices:
Convert to a list/tuple and use the converted version, or
Use a trick to assign keys based on iteration order
This answer provides the solution to #2. Note that it's not guaranteed to work by the language standard; the language says each key will be computed once, but not the order they will be computed in. On every version of CPython, the reference interpreter, to date, it's precomputed in order from beginning to end, so this works, but be aware it's not guaranteed. In any event, the code is:
sizediterable = ...
sorted_indices = sorted(range(len(sizediterable)), key=lambda _, it=iter(sizediterable): next(it))
All that does is provide a key function that ignores the value it's given (an index) and instead provides the next item from an iterator preconstructed from the original container (cached as a defaulted argument to allow it to function as a one-liner). As a result, for something like a large collections.deque, where using its .__getitem__ involves O(n) work (and therefore computing all the keys would involve O(n²) work), sequential iteration remains O(1), so generating the keys remains just O(n).
If you need something guaranteed to work by the language standard, using built-in types, Roman's solution will have the same algorithmic efficiency as this solution (as neither of them rely on the algorithmic efficiency of indexing the original container).
To be clear, for the suggested use case with collections.deque, the deque would have to be quite large for this to matter; deques have a fairly large constant divisor for indexing, so only truly huge ones would have an issue. Of course, by the same token, the cost of sorting is pretty minimal if the inputs are small/cheap to compare, so if your inputs are large enough that efficient sorting matters, they're large enough for efficient indexing to matter too.

How to mirror the left side of an 1d numpy array with variable length?

With a given 1d numpy array I would like to mirror the left part on the right hand side of the array. So given [1,2,3,4] the output should be [1,2,2,1]. This should work for different lengths of arrays, always mirroring at the midpoint, i.e. [1,2,3,4,5] should yield [1,2,3,2,1].
This feels as if there is a nice one-liner in numpy but with different lengths it trips up and so I could only come up with this helper function:
def mirror_left(arr):
""" Mirrors the left/up part of an array. """
if len(arr) % 2 == 0:
arr[len(arr)//2:] = arr[len(arr)//2 - 1::-1]
else:
arr[len(arr)//2:] = arr[len(arr)//2::-1]
return arr
a = np.array([1, 2, 3, 4, 5])
b = np.array([1, 2, 3, 4, 5, 6])
print(mirror_left(a))
print(mirror_left(b))
Which gives the correct output: [1 2 3 2 1] and [1 2 3 3 2 1] but looks contrived.
Is there a better way for mirroring / copying the left half onto the right half of an array?
Take care of your assignment, your expression arr[len(arr)//2:] = arr[len(arr)//2 - 1::-1] modify your array in place. As you return arr, I think you don't want to modify the original array but get a copy of it?
You can use this:
def merge_left(arr):
return np.concatenate([arr[:len(arr)//2 + len(arr)%2], arr[len(arr)//2 - 1::-1]])
Output:
>>> merge_left(a)
array([1, 2, 3, 2, 1])
>>> merge_left(b)
array([1, 2, 3, 3, 2, 1])

Check if any row in a numpy array is part of another array [duplicate]

This question already has an answer here:
Check if two 3D numpy arrays contain overlapping 2D arrays
(1 answer)
Closed 2 years ago.
I'm using numpy for the first time. I am trying to achieve the following:
There are 2 arrays:
a = np.array([[1, 3], [2, 5], [1, 2], [2, 1], [1,6]])
b = np.array([[3, 5], [1, 2]])
I need to check if ANY pair (or a row in other words) in array b is present in array a, in the same order (as in, [1, 2] is not to be considered same as [2, 1])
The above example should return True since both a and b contain [1, 2]
I've tried:
for [x, y] in b
if [x, y] in a
and:
if (a == b).all(1).any() # --> This throws "AttributeError: 'bool' object has no attribute 'all'"
but failed.
Thanks in advance
Let do it the numpyic way (loops are not advised with numpy). Add a dimension using None to let the numpy do the correct broadcasting, then use any and all along correct axis:
(a==b[:,None]).all(-1).any()
Output for sample input in question:
True
This solution use np.ravel_multi_index to avoid broadcasting. If your array is big, this is helpful since it doesn't use broadcasting
d = np.maximum(a.max(0), b.max(0))+1
np.in1d(np.ravel_multi_index(a.T,d), np.ravel_multi_index(b.T,d)).any()
Out[71]: True
This solution is also able to give position of the row in a where it matches
np.nonzero(np.in1d(np.ravel_multi_index(a.T,d), np.ravel_multi_index(b.T,d)))[0]
Out[72]: array([2], dtype=int64)
Note: I learned this trick a long time ago from #Divakar . so, credit should go to him.
Try:
a = np.array([[1, 3], [2, 5], [1, 2], [2, 1], [1,6]])
b = np.array([[3, 5], [1, 2]])
check = any(map(lambda x: x in b, a))
Explanation:
lambda is a key word to create a function. In this case:
lambda x: x in b
it represents a function that takes an x and returns if x is in array b
map is a built-in function that takes a function as a first argument, and an iterable as a second argument.
what it does is apply the first argument (the function) to every item in the iterable (the second argument) and return an iterable with these values.
In this case:
map(lambda x: x in b, a)
it returns an iterable of True and False depending the result of applying the function throw the elements.
Finally, any its another build-in function that takes and iterable of True's and False's and returns True if any item on the iterable is True
EDIT:
You can also do it using list comprehension (as someone write it down in comments):
a = np.array([[1, 3], [2, 5], [1, 2], [2, 1], [1,6]])
b = np.array([[3, 5], [1, 2]])
check = any(x in b for x in a)
It is exactly the same and even more legible.

Adding elements to numpy array

Using NumPy:
X= numpy.zeros(shape=[1, 4], dtype=np.int)
How can I add a list, such as [1,2,3,4]? I tried numpy.add(X,[1,2,3,4]) and np.hstack((1,2,3,4)) but none of them work!
I know how to use that in standard Python list using append method but I want to use numpy for performance.
Numpy arrays don't change shape after they are created. So after invoking method zeros((1,4), ...), you already have a 1x4 matrix full of zeroes. To set its elements to values other than zeroes, you need to use the assignment operator:
X[0] = [1, 2, 3, 4] # does what you are trying to achieve in your question
X[0, :] = [1, 2, 3, 4] # equivalent to the above
X[:] = [1, 2, 3, 4] # same
X[0, 1] = 2 # set the individual element at [0, 1] to 2

Retrieve column slices from a NumPy array using variables as the indexer

Say I have an array and I want a function to select some of its columns based on an argument a that is pre-defined :
extracted_columns = array[:,a].
If I have e.g. a = np.arange(10), I'll get the first ten columns,
What if I want to define a so that all the columns are selected without knowing the size of the array ?
I'd like to set a = : so that the function does
extracted_columns = array[:,:]
but it seems : can't pas passed as an argument. I also tried a = None but this gives me an array of dimensions 3 with the second dimension equal to 1.
Is there a nice way of doing it ?
Thanks,
Pass a slice object to your function.
MCVE:
x = np.arange(9).reshape(3, 3)
print(x)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a = slice(None)
print(x[:, a])
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
For your case, you'd define a function along these lines:
def foo(array, a):
return array[:, a]
And call it like this:
arr_slice = foo(array, slice(None))

Categories

Resources