Python Numpy's argsort duplicate issue [duplicate]

Python Numpy's argsort duplicate issue [duplicate] - python

Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()

Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])

This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

Related

Change dtype of none square numpy ndarray

a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([a, b])
c has two np.ndarrays inside of different size, when I try to call c.astype(np.int8), I would get a value error of ValueError: setting an array element with a sequence.. How can I change dtype of c?

To specify the type of your array during the creation, simply use dtype=xxx.
Ex:
c = np.array([a,b], dtype=object)
If you want to change the type from int64 to int8, you could use:
a.dtype = np.int8
b.dtype = np.int8
Or you can copy a and b:
c = np.array(a, dtype=np.int8)
d = np.array(a, dtype=np.int8)
Finally, if you don't have access to a and b but only to c, here how you can do the same:
for arr in c:
arr.dtype = np.int8

Assuming arr is a numpy array of dtype object containing numpy arrays, you could do:
arr8 = np.array([i.astype('int8') for i in arr])
Demo:
arr = array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
... array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
... array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
... dtype=object)
print(arr)
array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
dtype=object)
print(np.array([i.astype('int8') for i in arr]))
array([array([0], dtype=int8), array([0, 1], dtype=int8),
array([0, 1, 2], dtype=int8), array([0, 1, 2, 3], dtype=int8),
array([0, 1, 2, 3, 4], dtype=int8),
array([0, 1, 2, 3, 4, 5], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int8)], dtype=object)

Maybe you could do something like this:
arr = list()
for row in range(len(df.desired_column)):
arr.append(np.array(df.desired_column.loc[row], dtype=np.int8))
arr = np.array(arr)
This way every element of arr will be a numpy array with the desired dtype. On this example, np.int8.

Split an integer into bins

Given a single integer and the number of bins, how to split the integer into as equal parts as possible?
E.g. the sum of the outputs should be equals to the input integer
[in]: x = 20 , num_bins = 3
[out]: (7, 7, 6)
Another e.g.
[in]: x = 20 , num_bins = 6
[out]: (4, 4, 3, 3, 3, 3)
I've tried this:
x = 20
num_bins = 3
y = [int(x/num_bins)] * num_bins
for i in range(x%num_bins):
y[i] += 1
It works but there must be a simpler/better way, maybe using bisect or numpy?
Using numpy from https://stackoverflow.com/a/48899071/610569 , I could do this too:
list(map(len, np.array_split(range(x), num_bins)))
But that's a little convoluted with creating a generate to get the a pretend list and getting the length.

The built-in divmod function could be useful for this.
def near_split(x, num_bins):
quotient, remainder = divmod(x, num_bins)
return [quotient + 1] * remainder + [quotient] * (num_bins - remainder)
Demo
In [11]: near_split(20, 3)
Out[11]: [7, 7, 6]
In [12]: near_split(20, 6)
Out[12]: [4, 4, 3, 3, 3, 3]

Updated simplified using integer arithmetic.
Here's a one-liner:
np.arange(n+k-1, n-1, -1) // k
Little demo:
>>> for k in range(4, 10, 3):
... for n in range(10, 17):
... np.arange(n+k-1, n-1, -1) // k
...
array([3, 3, 2, 2])
array([3, 3, 3, 2])
array([3, 3, 3, 3])
array([4, 3, 3, 3])
array([4, 4, 3, 3])
array([4, 4, 4, 3])
array([4, 4, 4, 4])
array([2, 2, 2, 1, 1, 1, 1])
array([2, 2, 2, 2, 1, 1, 1])
array([2, 2, 2, 2, 2, 1, 1])
array([2, 2, 2, 2, 2, 2, 1])
array([2, 2, 2, 2, 2, 2, 2])
array([3, 2, 2, 2, 2, 2, 2])
array([3, 3, 2, 2, 2, 2, 2])

Address of last value in 1d NumPy array

I have a 1d array with zeros scattered throughout. Would like to create a second array which contains the position of the last zero, like so:
>>> a = np.array([1, 0, 3, 2, 0, 3, 5, 8, 0, 7, 12])
>>> foo(a)
[0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
Is there a built-in NumPy function or broadcasting trick to do this without using a for loop or other iterator?

>>> (a == 0).cumsum()
array([0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])

transform an array of array to an array of numbers

I have an array of values and an array of repeated times
>>> x=np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> n=np.random.randint(1,3,5)
>>> n
array([2, 1, 1, 2, 2])
And I do
>>> y=np.array([np.repeat(x[i],n[i]) for i in range(5)])
>>> y
array([array([0, 0]), array([1]), array([2]), array([3, 3]), array([4, 4])], dtype=object)
But I want my result to be array([0, 0, 1, 2, 3, 3, 4, 4]).
How can I do it?

I think this is simpler than you're making it (docs):
>>> x = np.arange(5)
>>> y = np.array([2, 1, 1, 2, 2])
>>> np.repeat(x,y)
array([0, 0, 1, 2, 3, 3, 4, 4])

how to make argsort result to be random between equal values?

Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()

Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])

This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Numpy's argsort duplicate issue [duplicate] - python

Related

Change dtype of none square numpy ndarray

Split an integer into bins

Address of last value in 1d NumPy array

transform an array of array to an array of numbers

how to make argsort result to be random between equal values?

Categories

Resources