transform an array of array to an array of numbers - python

I have an array of values and an array of repeated times
>>> x=np.arange(5)
>>> x
array([0, 1, 2, 3, 4])
>>> n=np.random.randint(1,3,5)
>>> n
array([2, 1, 1, 2, 2])
And I do
>>> y=np.array([np.repeat(x[i],n[i]) for i in range(5)])
>>> y
array([array([0, 0]), array([1]), array([2]), array([3, 3]), array([4, 4])], dtype=object)
But I want my result to be array([0, 0, 1, 2, 3, 3, 4, 4]).
How can I do it?

I think this is simpler than you're making it (docs):
>>> x = np.arange(5)
>>> y = np.array([2, 1, 1, 2, 2])
>>> np.repeat(x,y)
array([0, 0, 1, 2, 3, 3, 4, 4])

Related

Numpy: copying numpy array at specific indexes

I have arrays like
arr1['a'] = np.array([1, 1, 1])
arr1['b'] = np.array([1, 1, 1])
arr1['c'] = np.array([1, 1, 1])
b_index = [0, 2, 5]
arr2['a'] = np.array([2, 2, 2, 2, 2, 2])
arr2['b'] = np.array([2, 2, 2, 2, 2, 2])
arr2['c'] = np.array([2, 2, 2, 2, 2, 2])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
b_index is the list of indexes.
I want to copy from arr1 to arr2 at indexes in b_index.
so the result should be something like
arr2['a'] = np.array([1, 2, 1, 2, 2, 1])
arr2['b'] = np.array([1, 2, 1, 2, 2, 1])
arr2['c'] = np.array([1, 2, 1, 2, 2, 1])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
I can obviously do using loops, but not sure if that is a right way to do that.
We are talking about 100 columns('a','b','c') and around a 1 million rows.
One solution, which might not be optimal, is to use advanced array indexing:
In [1]: arr = np.ones((5, 3))
In [2]: arr2 = np.full((5, 5), 2)
In [3]: arr2[:, [1, 2, 4]] = arr
In [4]: arr2
Out[4]:
array([[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1]])
Does it help ?

Change dtype of none square numpy ndarray

a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([a, b])
c has two np.ndarrays inside of different size, when I try to call c.astype(np.int8), I would get a value error of ValueError: setting an array element with a sequence.. How can I change dtype of c?
To specify the type of your array during the creation, simply use dtype=xxx.
Ex:
c = np.array([a,b], dtype=object)
If you want to change the type from int64 to int8, you could use:
a.dtype = np.int8
b.dtype = np.int8
Or you can copy a and b:
c = np.array(a, dtype=np.int8)
d = np.array(a, dtype=np.int8)
Finally, if you don't have access to a and b but only to c, here how you can do the same:
for arr in c:
arr.dtype = np.int8
Assuming arr is a numpy array of dtype object containing numpy arrays, you could do:
arr8 = np.array([i.astype('int8') for i in arr])
Demo:
arr = array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
... array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
... array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
... dtype=object)
print(arr)
array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
dtype=object)
print(np.array([i.astype('int8') for i in arr]))
array([array([0], dtype=int8), array([0, 1], dtype=int8),
array([0, 1, 2], dtype=int8), array([0, 1, 2, 3], dtype=int8),
array([0, 1, 2, 3, 4], dtype=int8),
array([0, 1, 2, 3, 4, 5], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int8)], dtype=object)
Maybe you could do something like this:
arr = list()
for row in range(len(df.desired_column)):
arr.append(np.array(df.desired_column.loc[row], dtype=np.int8))
arr = np.array(arr)
This way every element of arr will be a numpy array with the desired dtype. On this example, np.int8.

Python Numpy's argsort duplicate issue [duplicate]

Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

python sparse csr matrix: how to serialize it

I have a csr_matrix, which is constructed as follows:
from scipy.sparse import csr_matrix
import numpy as np
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, (row, col)), shape=(3, 3))
Now to serialize (and for some other purpose), I want to get row, col and data information from matrix "a".
Kindly tell me an easy way to achieve it.
Edit: a.data will give me the data, but how to get row and col informaion
coo format has the values that you want:
In [3]: row = np.array([0, 0, 1, 2, 2, 2])
In [4]: col = np.array([0, 2, 2, 0, 1, 2])
In [5]: data = np.array([1, 2, 3, 4, 5, 6])
In [6]: a = sparse.csr_matrix((data,(row,col)), shape=(3,3))
In [7]: a.data
Out[7]: array([1, 2, 3, 4, 5, 6])
In [8]: a.indices # csr has coor in indices and indptr
Out[8]: array([0, 2, 2, 0, 1, 2])
In [9]: a.indptr
Out[9]: array([0, 2, 3, 6])
In [10]: ac=a.tocoo()
In [11]: ac.data
Out[11]: array([1, 2, 3, 4, 5, 6])
In [12]: ac.col
Out[12]: array([0, 2, 2, 0, 1, 2])
In [13]: ac.row
Out[13]: array([0, 0, 1, 2, 2, 2])
These values are compatible with the ones you input, but aren't guaranteed to be the same.
In [14]: a.nonzero()
Out[14]: (array([0, 0, 1, 2, 2, 2]), array([0, 2, 2, 0, 1, 2]))
In [17]: a[a.nonzero()].A
Out[17]: array([[1, 2, 3, 4, 5, 6]])
nonzero also returns the coor, by the same coo conversion, but first it cleans up the data (removing extra zeros, etc).

how to make argsort result to be random between equal values?

Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

Categories

Resources