Generate sparse vector - python

I have two lists. One with indexes (set obviously) and one with values. Is it possible to convert them in a numpy array with fixed size efficiently?
indexes = [1,2,6,7]
values = [0.2,0.5,0.6,0.2]
size = 10
What I want to output is:
print(magic_func(indexes,values,size))
array(0,0.2,0.5,0,0,
0,0.6,0.2,0,0)

It's easy in two lines, if you want:
In [1]: import numpy as np
In [2]: arr = np.zeros(size)
In [3]: arr[indexes] = values
In [4]: arr
Out[4]: array([ 0. , 0.2, 0.5, 0. , 0. , 0. , 0.6, 0.2, 0. , 0. ])

Related

Numpy covariance command returning matrix with more dimensions than input

I have an arbitrary row vector "u" and an arbitrary matrix "e" as follows:
u = np.resize(np.array([8,3]),[1,2])
e = np.resize(np.array([[2,2,5,5],[1, 6, 7, 4]]),[4,2])
np.cov(u,e)
array([[ 12.5, 0. , 0. , -12.5, 7.5],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[-12.5, 0. , 0. , 12.5, -7.5],
[ 7.5, 0. , 0. , -7.5, 4.5]])
The matrix that this returns is 5x5. This is confusing to me because the largest dimension of the inputs is only 4.
Thus, this may be less of a numpy question and more of a math question...not sure...
Please refer to the official numpy documentation (https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.cov.html) and check whether you usage of the numpy.cov function is consistent with what you are trying to achieve and you understand what you are trying to do.
When looking at the signature
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)
m : array_like
A 1-D or 2-D array containing multiple variables and observations.
Each row of m represents a variable, and each column a single observation > > of all those variables. Also see rowvar below.
y : array_like, optional
An additional set of variables and observations. y has the same form as that of m.
Note how m and y are combined as shown in the last example on the page
>>> x = [-2.1, -1, 4.3]
>>> y = [3, 1.1, 0.12]
>>> X = np.stack((x, y), axis=0)
>>> print(np.cov(X))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x, y))
[[ 11.71 -4.286 ]
[ -4.286 2.14413333]]
>>> print(np.cov(x))
11.71

How to nullify all entries except for argmax?

Assuming I have a matrix / array / list like a=[1,2,3,4,5] and I want to nullify all entries except for the max so it would be a=[0,0,0,0,5].
I'm using b = [val if idx == np.argmax(a) else 0 for idx,val in enumerate(a)] but is there a better (and faster) way (especially for more than 1-dim arrays...)
You can use numpy for an in-place solution. Note that the below method will make all matches for the max value equal to 0.
import numpy as np
a = np.array([1,2,3,4,5])
a[np.where(a != a.max())] = 0
# array([0, 0, 0, 0, 5])
For unique maxima, see #cᴏʟᴅsᴘᴇᴇᴅ's solution.
Rather than masking, you can create an array of zeros and set the right index appropriately?
1-D (optimised) Solution
(Setup) Convert a to a 1D array: a = np.array([1,2,3,4,5]).
To replace just one instance of the max
b = np.zeros_like(a)
i = np.argmax(a)
b[i] = a[i]
To replace all instances of the max
b = np.zeros_like(a)
m = a == a.max()
b[m] = a[m]
N-D solution
np.random.seed(0)
a = np.random.randn(5, 5)
b = np.zeros_like(a)
m = a == a.max(1, keepdims=True)
b[m] = a[m]
b
array([[0. , 0. , 0. , 2.2408932 , 0. ],
[0. , 0.95008842, 0. , 0. , 0. ],
[0. , 1.45427351, 0. , 0. , 0. ],
[0. , 1.49407907, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 2.26975462]])
Works for all instances of max per row.

Building NumPy array using values from another array

Consider the following code:
import numpy as np
index_info = np.matrix([[1, 1], [1, 2]])
value = np.matrix([[0.5, 0.5]])
initial = np.zeros((3, 3))
How can I produce a matrix, final, which has the structure of initial with the elements specified by value at the locations specified by index_info WITHOUT a for loop? In this toy example, see below.
final = np.matrix([[0, 0, 0], [0, 0.5, 0.5], [0, 0, 0]])
With a for loop, you can easily loop through all of the index's in index_info and value and use that to populate initial and form final. But is there a way to do so with vectorization (no for loop)?
Convert index_info to a tuple and use it to assign:
>>> initial[(*index_info,)]=value
>>> initial
array([[0. , 0. , 0. ],
[0. , 0.5, 0.5],
[0. , 0. , 0. ]])
Please note that use of the matrix class is discouraged. Use ndarray instead.
You can do this with NumPy's array indexing:
>>> initial = np.zeros((3, 3))
>>> row = np.array([1, 1])
>>> col = np.array([1, 2])
>>> final = np.zeros_like(initial)
>>> final[row, col] = [0.5, 0.5]
>>> final
array([[0. , 0. , 0. ],
[0. , 0.5, 0.5],
[0. , 0. , 0. ]])
This is similar to #PaulPanzer's answer, where he is unpacking row and col from index_info all in one step. In other words:
row, col = (*index_info,)

avoid unnecessary matrix multiplications with sparse dummies

import numpy as np
import pandas as pd
catVar = np.array(list('abcbca')) #categorical independent variable
groupIDs = np.array([10,10,20,20,30,30]) #groups(/strata)
p = np.array([0.5, 0.5, 0.25, 0.75, 1, 0]) #'probabilities'
dummies = pd.get_dummies(catVar)
_,idx,tags = np.unique(groupIDs, return_index=1, return_inverse=1)
np.add.reduceat((p * dummies.T).T, idx)[tags]
[[ 0.5 0.5 0. ]
[ 0.5 0.5 0. ]
[ 0. 0.75 0.25]
[ 0. 0.75 0.25]
[ 0. 0. 1. ]
[ 0. 0. 1. ]]
In the last two lines of code, I am creating a new table with the sum of products between p and X per group for each column. Because my data set is ~ .5m x 4k, this calculation takes quite some time which I am trying to reduce. My question is, whether it would be possible to get the same result when I define my dummies as a sparse matrix,
from scipy import sparse
dumSp = sparse.csc_matrix(dummies)
and whether the output of the above calculation could go directly to a sparse matrix as well.

Numpy: placing values into an 1-of-n array based on indices in another array

Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?

Categories

Resources