python sample from a scipy sparse matrix - python

I have a scipy sparse matrix as for example:
import scipy as sp
from scipy import sparse
X = sparse.csr_matrix(np.random.randint(0, 10, (100, 10)))
I need to add K rows to this matrix. Each column of these new rows should be obtained sampling from the same column in the original matrix.
So for example. The desired result should be something like:
Z = np.concat(X, X_sampled, axis=0)
where X_sampled[:,i] = np.random.choice(X[:,i], k)
How can I do that without moving to a dense matrix?
EDIT: An example with dense array
import numpy as np
import scipy as sp
k = 20
X = np.random.randint(0, 10, (100, 10))
X2 = np.zeros(shape=(k, X.shape[1]))
for col_id in range(X.shape[1]):
X2[:, col_id] = np.random.choice(X[:, col_id], k)
res = np.concatenate([X, X2])

Related

Vectors and Matrices from the NumPy Module

In python, how to write program that create two 4 * 4 matrices A and B whose elements are random numbers. Then create a matrix C that looks like
C = ⎡A B⎤
⎣B A⎦
Find the diagonal of the matrix C. The diagonal elements are to be presented in a 4 * 2 matrix.
import numpy as np
matrix_A = np.random.randint(10, size=(4, 4))
matrix_B = np.random.randint(10, size=(4, 4))
matrix_C = np.array([[matrix_A, matrix_B], [matrix_B, matrix_A]])
d= matrix_C.diagonal()
D=d.reshape(2,4)
print(f'This is matrix C:\n{matrix_C}')
print(f'These are the diagonals of Matrix C:\n{D}')
The construction
matrix_C = np.array([[matrix_A, matrix_B], [matrix_B, matrix_A]])
does not concatenate matrices, but creates 4th order tensor (put matrices inside matrix). You can check that by
print(matrix_C.shape) # (2, 2, 4, 4)
To lay out blocks call np.block, then all other parts of your code should work fine:
matrix_C = np.block([[matrix_A, matrix_B], [matrix_B, matrix_A]])
print(matrix_C.shape) # (8, 8)
d= matrix_C.diagonal()
D=d.reshape(2,4) # np.array([matrix_A.diagonal(), matrix_A.diagonal()])

2D indexing of scipy sparse matrix

import numpy as np
import scipy.sparse
x = np.random.randint(0, 1000, (1000, 100))
# prob better way to do this
d = np.random.random((1000,1000))
d[d < 0.99] = 0
y = scipy.sparse.csr_matrix(d)
What I would like to do is to create a new matrix z containing the values of y at the indices in x.
ie [0, 0] of z should contain the y[0, x[0, 0]]
[0, 1] of z should contain the y[0, x[0, 1]]
%time for i in range(1000): x[i, y[i]].todense()
~247ms
%time for i in range(1000): np.take(x[i].todense(), y[i])
~150ms
both of the above work, but I am looking for a faster method- this is currently the bottleneck on my code.
Please assume that representing the whole scipy.sparse matrix as dense isn't feasible.
edit:
%time z = np.vstack([q.todense()[0, p] for q, p in zip(x, y)])
is ~110ms
The answer seems to be to use an appropriately shaped broadcasting index, as outlined here: How to generate multi-dimensional 2D numpy index using a sub-index for one dimension
(answer deserves more upvotes)!
%time res = y[np.arange(0, 1000).reshape((-1, 1)), x].todense()

Multipy numpy 3 dim array by 2d array

I have two matrices, m0.shape = [10, 3, 3] and m1.shape = [10, 3]. What I want to do would done this way using loops:
m0 = np.zeros((10, 3, 3))
m1 = np.zeros((10, 3))
a = np.zeros((10, 3))
for i in range(10):
a += m1 # m0[i]
The question is: Can I somehow achieve the same result by using builtin numpy operations?
I think you have two options:
import numpy as np
np.sum(m1 # m0, axis=0)
or using numpy.einsum
np.einsum('ij,kjl->il', m1, m0)

Initialize a numpy sparse matrix efficiently

I have an array with m rows and arrays as values, which indicate the index of columns and are bounded to a large number n.
E.g:
Y = [[1,34,203,2032],...,[2984]]
Now I want an efficient way to initialize a sparse numpy matrix X with dimensions m,n and values corresponding to Y (X[i,j] = 1, if j is in Y[i], = 0 otherwise).
Your data are already close to csr format, so I suggest using that:
import numpy as np
from scipy import sparse
from itertools import chain
# create an example
m, n = 20, 10
X = np.random.random((m, n)) < 0.1
Y = [list(np.where(y)[0]) for y in X]
# construct the sparse matrix
indptr = np.fromiter(chain((0,), map(len, Y)), int, len(Y) + 1).cumsum()
indices = np.fromiter(chain.from_iterable(Y), int, indptr[-1])
data = np.ones_like(indices)
S = sparse.csr_matrix((data, indices, indptr), (m, n))
# or
S = sparse.csr_matrix((data, indices, indptr))
# check
assert np.all(S==X)

broadcasting a function on a 2-dimensional numpy array

I would like to improve the speed of my code by computing a function once on a numpy array instead of a for loop is over a function of this python library. If I have a function as following:
import numpy as np
import galsim
from math import *
M200=1e14
conc=6.9
def func(M200, conc):
halo_z=0.2
halo_pos =[1200., 3769.7]
halo_pos = galsim.PositionD(x=halo_pos_arcsec[0],y=halo_pos_arcsec[1])
nfw = galsim.NFWHalo(mass=M200, conc=conc, redshift=halo_z,halo_pos=halo_pos, omega_m = 0.3, omega_lam =0.7)
for i in range(len(shear_z)):
shear_pos=galsim.PositionD(x=pos_arcsec[i,0],y=pos_arcsec[i,1])
model_g1, model_g2 = nfw.getShear(pos=self.shear_pos, z_s=shear_z[i])
l=np.sum(model_g1-model_g2)/sqrt(np.pi)
return l
While pos_arcsec is a two-dimensional array of 24000x2 and shear_z is a 1D array with 24000 elements as well.
The main problem is that I want to calculate this function on a grid where M200=np.arange(13., 16., 0.01) and conc = np.arange(3, 10, 0.01). I don't know how to broadcast this function to be estimated for this two dimensional array over M200 and conc. It takes a lot to run the code. I am looking for the best approaches to speed up these calculations.
This here should work when pos is an array of shape (n,2)
import numpy as np
def f(pos, z):
r=np.sqrt(pos[...,0]**2+pos[...,1]**2)
return np.log(r)*(z+1)
Example:
z = np.arange(10)
pos = np.arange(20).reshape(10,2)
f(pos,z)
# array([ 0. , 2.56494936, 5.5703581 , 8.88530251,
# 12.44183436, 16.1944881 , 20.11171117, 24.17053133,
# 28.35353608, 32.64709419])
Use numpy.linalg.norm
If you have an array:
import numpy as np
import numpy.linalg as la
a = np.array([[3, 4], [5, 12], [7, 24]])
then you can determine the magnitude of the resulting vector (sqrt(a^2 + b^2)) by
b = np.sqrt(la.norm(a, axis=1)
>>> print b
array([ 5., 15. 25.])

Categories

Resources