Related
I have arrays like
arr1['a'] = np.array([1, 1, 1])
arr1['b'] = np.array([1, 1, 1])
arr1['c'] = np.array([1, 1, 1])
b_index = [0, 2, 5]
arr2['a'] = np.array([2, 2, 2, 2, 2, 2])
arr2['b'] = np.array([2, 2, 2, 2, 2, 2])
arr2['c'] = np.array([2, 2, 2, 2, 2, 2])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
b_index is the list of indexes.
I want to copy from arr1 to arr2 at indexes in b_index.
so the result should be something like
arr2['a'] = np.array([1, 2, 1, 2, 2, 1])
arr2['b'] = np.array([1, 2, 1, 2, 2, 1])
arr2['c'] = np.array([1, 2, 1, 2, 2, 1])
arr2['f'] = np.array([2, 2, 2, 2, 2, 2])
I can obviously do using loops, but not sure if that is a right way to do that.
We are talking about 100 columns('a','b','c') and around a 1 million rows.
One solution, which might not be optimal, is to use advanced array indexing:
In [1]: arr = np.ones((5, 3))
In [2]: arr2 = np.full((5, 5), 2)
In [3]: arr2[:, [1, 2, 4]] = arr
In [4]: arr2
Out[4]:
array([[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1],
[2, 1, 1, 2, 1]])
Does it help ?
a = np.array([1,2,3])
b = np.array([1,2,3,4])
c = np.array([a, b])
c has two np.ndarrays inside of different size, when I try to call c.astype(np.int8), I would get a value error of ValueError: setting an array element with a sequence.. How can I change dtype of c?
To specify the type of your array during the creation, simply use dtype=xxx.
Ex:
c = np.array([a,b], dtype=object)
If you want to change the type from int64 to int8, you could use:
a.dtype = np.int8
b.dtype = np.int8
Or you can copy a and b:
c = np.array(a, dtype=np.int8)
d = np.array(a, dtype=np.int8)
Finally, if you don't have access to a and b but only to c, here how you can do the same:
for arr in c:
arr.dtype = np.int8
Assuming arr is a numpy array of dtype object containing numpy arrays, you could do:
arr8 = np.array([i.astype('int8') for i in arr])
Demo:
arr = array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
... array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
... array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
... dtype=object)
print(arr)
array([array([0]), array([0, 1]), array([0, 1, 2]), array([0, 1, 2, 3]),
array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4, 5]),
array([0, 1, 2, 3, 4, 5, 6]), array([0, 1, 2, 3, 4, 5, 6, 7])],
dtype=object)
print(np.array([i.astype('int8') for i in arr]))
array([array([0], dtype=int8), array([0, 1], dtype=int8),
array([0, 1, 2], dtype=int8), array([0, 1, 2, 3], dtype=int8),
array([0, 1, 2, 3, 4], dtype=int8),
array([0, 1, 2, 3, 4, 5], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6], dtype=int8),
array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int8)], dtype=object)
Maybe you could do something like this:
arr = list()
for row in range(len(df.desired_column)):
arr.append(np.array(df.desired_column.loc[row], dtype=np.int8))
arr = np.array(arr)
This way every element of arr will be a numpy array with the desired dtype. On this example, np.int8.
Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.
I have a csr_matrix, which is constructed as follows:
from scipy.sparse import csr_matrix
import numpy as np
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
a = csr_matrix((data, (row, col)), shape=(3, 3))
Now to serialize (and for some other purpose), I want to get row, col and data information from matrix "a".
Kindly tell me an easy way to achieve it.
Edit: a.data will give me the data, but how to get row and col informaion
coo format has the values that you want:
In [3]: row = np.array([0, 0, 1, 2, 2, 2])
In [4]: col = np.array([0, 2, 2, 0, 1, 2])
In [5]: data = np.array([1, 2, 3, 4, 5, 6])
In [6]: a = sparse.csr_matrix((data,(row,col)), shape=(3,3))
In [7]: a.data
Out[7]: array([1, 2, 3, 4, 5, 6])
In [8]: a.indices # csr has coor in indices and indptr
Out[8]: array([0, 2, 2, 0, 1, 2])
In [9]: a.indptr
Out[9]: array([0, 2, 3, 6])
In [10]: ac=a.tocoo()
In [11]: ac.data
Out[11]: array([1, 2, 3, 4, 5, 6])
In [12]: ac.col
Out[12]: array([0, 2, 2, 0, 1, 2])
In [13]: ac.row
Out[13]: array([0, 0, 1, 2, 2, 2])
These values are compatible with the ones you input, but aren't guaranteed to be the same.
In [14]: a.nonzero()
Out[14]: (array([0, 0, 1, 2, 2, 2]), array([0, 2, 2, 0, 1, 2]))
In [17]: a[a.nonzero()].A
Out[17]: array([[1, 2, 3, 4, 5, 6]])
nonzero also returns the coor, by the same coo conversion, but first it cleans up the data (removing extra zeros, etc).
Say you have a numpy vector [0,3,1,1,1] and you run argsort
you will get [0,2,3,4,1] but all the ones are the same!
What I want is an efficient way to shuffle indices of identical values.
Any idea how to do that without a while loop with two indices on the sorted vector?
numpy.array([0,3,1,1,1]).argsort()
Use lexsort:
np.lexsort((b,a)) means Sort by a, then by b
>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736, 0.90089115, 0.31407214, 0.24299867, 0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.
>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])
Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.
In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.