Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.
Related
a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([a, b])
c has two np.ndarrays inside of different size, when I try to call c.astype(np.int8), I would get a value error of ValueError: setting an array element with a sequence.. How can I change dtype of c?
To specify the type of your array during the creation, simply use dtype=xxx.
Ex:
c = np.array([a,b], dtype=object)
If you want to change the type from int64 to int8, you could use:
a.dtype = np.int8
b.dtype = np.int8
Or you can copy a and b:
c = np.array(a, dtype=np.int8)
d = np.array(a, dtype=np.int8)
Finally, if you don't have access to a and b but only to c, here how you can do the same:
for arr in c:
arr.dtype = np.int8
Assuming arr is a numpy array of dtype object containing numpy arrays, you could do:
arr8 = np.array([i.astype('int8') for i in arr])
Demo:
arr = array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
... array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
... array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
... dtype=object)
print(arr)
array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
dtype=object)
print(np.array([i.astype('int8') for i in arr]))
array([array([0], dtype=int8), array([0, 1], dtype=int8),
array([0, 1, 2], dtype=int8), array([0, 1, 2, 3], dtype=int8),
array([0, 1, 2, 3, 4], dtype=int8),
array([0, 1, 2, 3, 4, 5], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int8)], dtype=object)
Maybe you could do something like this:
arr = list()
for row in range(len(df.desired_column)):
arr.append(np.array(df.desired_column.loc[row], dtype=np.int8))
arr = np.array(arr)
This way every element of arr will be a numpy array with the desired dtype. On this example, np.int8.
Given a single integer and the number of bins, how to split the integer into as equal parts as possible?
E.g. the sum of the outputs should be equals to the input integer
[in]: x = 20 , num_bins = 3
[out]: (7, 7, 6)
Another e.g.
[in]: x = 20 , num_bins = 6
[out]: (4, 4, 3, 3, 3, 3)
I've tried this:
x = 20
num_bins = 3
y = [int(x/num_bins)] * num_bins
for i in range(x%num_bins):
y[i] += 1
It works but there must be a simpler/better way, maybe using bisect or numpy?
Using numpy from https://stackoverflow.com/a/48899071/610569 , I could do this too:
list(map(len, np.array_split(range(x), num_bins)))
But that's a little convoluted with creating a generate to get the a pretend list and getting the length.
The built-in divmod function could be useful for this.
def near_split(x, num_bins):
quotient, remainder = divmod(x, num_bins)
return [quotient + 1] * remainder + [quotient] * (num_bins - remainder)
Demo
In [11]: near_split(20, 3)
Out[11]: [7, 7, 6]
In [12]: near_split(20, 6)
Out[12]: [4, 4, 3, 3, 3, 3]
Updated simplified using integer arithmetic.
Here's a one-liner:
np.arange(n+k-1, n-1, -1) // k
Little demo:
>>> for k in range(4, 10, 3):
... for n in range(10, 17):
... np.arange(n+k-1, n-1, -1) // k
...
array([3, 3, 2, 2])
array([3, 3, 3, 2])
array([3, 3, 3, 3])
array([4, 3, 3, 3])
array([4, 4, 3, 3])
array([4, 4, 4, 3])
array([4, 4, 4, 4])
array([2, 2, 2, 1, 1, 1, 1])
array([2, 2, 2, 2, 1, 1, 1])
array([2, 2, 2, 2, 2, 1, 1])
array([2, 2, 2, 2, 2, 2, 1])
array([2, 2, 2, 2, 2, 2, 2])
array([3, 2, 2, 2, 2, 2, 2])
array([3, 3, 2, 2, 2, 2, 2])
I have a 1d array with zeros scattered throughout. Would like to create a second array which contains the position of the last zero, like so:
>>> a = np.array([1, 0, 3, 2, 0, 3, 5, 8, 0, 7, 12])
>>> foo(a)
[0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3]
Is there a built-in NumPy function or broadcasting trick to do this without using a for loop or other iterator?
>>> (a == 0).cumsum()
array([0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3])
I have an array of values and an array of repeated times
>>> x=np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> n=np.random.randint(1,3,5)
>>> n
array([2, 1, 1, 2, 2])
And I do
>>> y=np.array([np.repeat(x[i],n[i]) for i in range(5)])
>>> y
array([array([0, 0]), array([1]), array([2]), array([3, 3]), array([4, 4])], dtype=object)
But I want my result to be array([0, 0, 1, 2, 3, 3, 4, 4]).
How can I do it?
I think this is simpler than you're making it (docs):
>>> x = np.arange(5)
>>> y = np.array([2, 1, 1, 2, 2])
>>> np.repeat(x,y)
array([0, 0, 1, 2, 3, 3, 4, 4])
Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.