mongodb to python sparse matrix, how to make it faster? - python

I have n documents in MongoDB containing a scipy sparse vector, stored as a pickle object and initially created with scipy.sparse.lil. The vectors are all of the same size, say p x 1.
What I need to do is to put all these vectors into a sparse n x p matrix back in python. I am using mongoengine and thus defined a property to load each pickle vector:
class MyClass(Document):
vector_text = StringField()
#property
def vector(self):
return cPickle.loads(self.vector_text)
Here's what I'm doing now, with n = 4700 and p = 67:
items = MyClass.objects()
M = items[0].vector
for item in items[1:]:
to_add = item.vector
M = scipy.sparse.hstack((M, to_add))
The loading part (i.e. calling n times the property) takes about 1.3s. The stacking part about 2.7s. Since in the future n is going to seriously increase (possibly more than a few hundred thousands), I sense that this is not optimal :)
Any idea to speed the whole thing up? If you know how to fasten the "loading" or the "stacking" only, I'm happy to hear it. For instance maybe the solution is to store the entire matrix in mongoDB? Thanks !

First, what you describe you want to do would require you using vstack, not hstack. In any case, your choice of sparse format is part of your performance problem. Try the following:
n, p = 4700, 67
csr_vecs = [sps.rand(1, p, density=0.5, format='csr') for j in xrange(n)]
lil_vecs = [vec.tolil() for vec in csr_vecs]
%timeit sps.vstack(csr_vecs, format='csr')
1 loops, best of 3: 722 ms per loop
%timeit sps.vstack(lil_vecs, format='lil')
1 loops, best of 3: 1.34 s per loop
So there's already a 2x improvement simply from swithcing to CSR. Furthermore, the stacking functions of scipy.sparse do not seem to be very optimized, definitely not for sparse vectors. The following two functions stack a list of CSR or LIL vectors, returning a CSR sparse matrix:
def csr_stack(vectors):
data = np.concatenate([vec.data for vec in vectors])
indices = np.concatenate([vec.indices for vec in vectors])
indptr = np.cumsum([0] + [vec.nnz for vec in vectors])
return sps.csr_matrix((data, indices, indptr), shape=(len(vectors),
vectors[0].shape[1]))
import itertools as it
def lil_stack(vectors):
indptr = np.cumsum([0] + [vec.nnz for vec in vectors])
data = np.fromiter(it.chain(*(vec.data[0] for vec in vectors)),
dtype=vectors[0].dtype, count=indptr[-1])
indices = np.fromiter(it.chain(*(vec.rows[0] for vec in vectors)),
dtype=np.intp, count=indptr[-1])
return sps.csr_matrix((data, indices, indptr), shape=(len(vectors),
vectors[0].shape[1]))
It works:
>>> np.allclose(sps.vstack(csr_vecs).A, csr_stack(csr_vecs).A)
True
>>> np.allclose(csr_stack(csr_vecs).A, lil_stack(lil_vecs).A)
True
And is substantially faster:
%timeit csr_stack(csr_vecs)
100 loops, best of 3: 11.7 ms per loop
%timeit lil_stack(lil_vecs)
10 loops, best of 3: 37.6 ms per loop
%timeit lil_stack(lil_vecs).tolil()
10 loops, best of 3: 53.6 ms per loop
So, by switching to CSR, you can improve performance by over 100x. If you stick with LIL, your performance improvement will be only around 30x, more if you can live with CSR in the combined matrix, less if you insist on LIL.

I think, you should try to use ListField, which is essentially a python list representation of BSON array, to store your vectors. In that situation, you won't need to unpickle them every time.
class MyClass(Document):
vector = ListField()
items = MyClass.objects()
M = items[0].vector
The only problem I can see in that solution, is that you have to convert python lists to scipy sparse vector type, but I believe, that should be faster.

Related

Numpy matrix-wise dot product [duplicate]

I have two 3d arrays A and B with shape (N, 2, 2) that I would like to multiply element-wise according to the N-axis with a matrix product on each of the 2x2 matrix. With a loop implementation, it looks like
C[i] = dot(A[i], B[i])
Is there a way I could do this without using a loop? I've looked into tensordot, but haven't been able to get it to work. I think I might want something like tensordot(a, b, axes=([1,2], [2,1])) but that's giving me an NxN matrix.
It seems you are doing matrix-multiplications for each slice along the first axis. For the same, you can use np.einsum like so -
np.einsum('ijk,ikl->ijl',A,B)
We can also use np.matmul -
np.matmul(A,B)
On Python 3.x, this matmul operation simplifies with # operator -
A # B
Benchmarking
Approaches -
def einsum_based(A,B):
return np.einsum('ijk,ikl->ijl',A,B)
def matmul_based(A,B):
return np.matmul(A,B)
def forloop(A,B):
N = A.shape[0]
C = np.zeros((N,2,2))
for i in range(N):
C[i] = np.dot(A[i], B[i])
return C
Timings -
In [44]: N = 10000
...: A = np.random.rand(N,2,2)
...: B = np.random.rand(N,2,2)
In [45]: %timeit einsum_based(A,B)
...: %timeit matmul_based(A,B)
...: %timeit forloop(A,B)
100 loops, best of 3: 3.08 ms per loop
100 loops, best of 3: 3.04 ms per loop
100 loops, best of 3: 10.9 ms per loop
You just need to perform the operation on the first dimension of your tensors, which is labeled by 0:
c = tensordot(a, b, axes=(0,0))
This will work as you wish. Also you don't need a list of axes, because it's just along one dimension you're performing the operation. With axes([1,2],[2,1]) you're cross multiplying the 2nd and 3rd dimensions. If you write it in index notation (Einstein summing convention) this corresponds to c[i,j] = a[i,k,l]*b[j,k,l], thus you're contracting the indices you want to keep.
EDIT: Ok, the problem is that the tensor product of a two 3d object is a 6d object. Since contractions involve pairs of indices, there's no way you'll get a 3d object by a tensordot operation. The trick is to split your calculation in two: first you do the tensordot on the index to do the matrix operation and then you take a tensor diagonal in order to reduce your 4d object to 3d. In one command:
d = np.diagonal(np.tensordot(a,b,axes=()), axis1=0, axis2=2)
In tensor notation d[i,j,k] = c[i,j,i,k] = a[i,j,l]*b[i,l,k].

Fastest way to use Numpy - multi-dimensional sums and products

I have these variables with the following dimensions:
A - (3,)
B - (4,)
X_r - (3,K,N,nS)
X_u - (4,K,N,nS)
k - (K,)
and I want to compute (A.dot(X_r[:,:,n,s])*B.dot(X_u[:,:,n,s])).dot(k) for every possible n and s, the way I am doing it now is the following:
np.array([[(A.dot(X_r[:,:,n,s])*B.dot(X_u[:,:,n,s])).dot(k) for n in xrange(N)] for s in xrange(nS)]) #nSxN
But this is super slow and I was wondering if there was a better way of doing it but I am not sure.
However there is another computation that I am doing and I am sure it can be optimized:
np.sum(np.array([(X_r[:,:,n,s]*B.dot(X_u[:,:,n,s])).dot(k) for n in xrange(N)]),axis=0)
In this one I am creating a numpy array just to sum it in one axis and discard the array after. If this was a list in 1-D I would use reduce and optimize it, what should I use for numpy arrays?
Using few np.einsum calls -
# Calculation of A.dot(X_r[:,:,n,s])
p1 = np.einsum('i,ijkl->jkl',A,X_r)
# Calculation of B.dot(X_u[:,:,n,s])
p2 = np.einsum('i,ijkl->jkl',B,X_u)
# Include .dot(k) part to get the final output
out = np.einsum('ijk,i->kj',p1*p2,k)
About the second example, this solves it:
p1 = np.einsum('i,ijkl->jkl',B,X_u)#OUT_DIM - (k,N,nS)
sol = np.einsum('ijkl,j->il',X_r*p1[None,:,:,:],k)#OUT_DIM (3,nS)
You can use dot for multiplication of matrices in higher dimensions but the running indices must be the last two.
When we reorder your matrices
X_r_t = X_r.transpose(2,3,0,1)
X_u_t = X_u.transpose(2,3,0,1)
we obtain for your first expression
res1_imp = (A.dot(X_r_t)*B.dot(X_u_t)).dot(k).T # shape nS x N
and for the second expression
res2_imp = np.sum((X_r_t * B.dot(X_u_t)[:,:,None,:]).dot(k),axis=0)[-1]
Timings
Divakars solution gives on my computer 10000 loops, best of 3: 21.7 µs per loop
my solution gives 10000 loops, best of 3: 101 µs per loop
Edit
My upper Timings included the computation of both expressions. When I include only the first expression (as Divakar) I obtain 10000 loops, best of 3: 41 µs per loop ... which is still slower but closer to his timings

Numpy: 2d list min max is slow [duplicate]

This question already has answers here:
Faster alternatives to numpy.argmax/argmin which is slow
(3 answers)
Closed 6 years ago.
I'm completely new to numpy and unable to find a solution.
I have a 2d list of floating point numbers in python like:
list1[0..8][0..2]
Where e.g.:
print(list1[0][0])
> 0.1122233784
Now I want to find min and max values:
b1 = numpy.array(list1)
list1MinX, list1MinY, list1MinZ = b1.min(axis=0)
list1MaxX, list1MaxY, list1MaxZ = b1.max(axis=0)
I need to do this about a million times in a loop.
It works correctly, but it's about 3x slower than my previous native python approach.
(1:15 min[numpy] vs 0:25 min[native])
What am I doing wrong?
I've read that the list conversion could be the problem, but I don't know how to do it better.
EDIT
As request some non-pseudo code, although in my script the list is created in another way.
import numpy
import random
def moonPositionNow():
#assume we read like from a file, line by line
#nextChunk = readNextLine()
#the file is build like this
#x-coord
#y-coord
#z-coord
#x-coord
#...
#but we don't have that data here, so as a **placeholder** we return a random number
nextChunk = random.random()
return nextChunk
for w in range(1000000):
list1 = [[moonPositionNow() for i in range(3)] for j in range(9)]
b1 = numpy.array(list1)
list1MinX, list1MinY, list1MinZ = b1.min(axis=0)
list1MaxX, list1MaxY, list1MaxZ = b1.max(axis=0)
#Print out results
Although the list creation may be a bottle neck here I guaranty in the original code it's not the problem.
EDIT2:
Updated the example code to clarify, I don't need a numpy array of random numbers.
Since your data is available as a Python list it seems reasonable to me that a native implementation (which likely calls some optimized C code) could be faster than converting to numpy first and then calling optimized C code.
You basically loop over your data twice: once for converting the python objects to numpy arrays, and once for computing the maximum or minimum.
The native implementation (I assume it is something like calling min/max on the Python list) only needs to loop over the data once.
Furthermore, it seems that numpy's min/max functions are surprisingly slow: https://stackoverflow.com/a/12200671/3005167
The problem arises because you are passing a python list to a numpy function. The numpy function is significantly faster if you pass a numpy array as the argument.
#Create numpy numbers
nptest = np.random.uniform(size=(10000, 10))
#Create a native python list
listtest = list(nptest)
#Compare performance
%timeit np.min(nptest, axis=0)
%timeit np.min(listtest, axis=0)
Output
1000 loops, best of 3: 394 µs per loop
100 loops, best of 3: 20 ms per loop
EDIT: Added example on how to evaluate a cost function over a grid.
The following evaluates a quadratic cost function over a grid and then takes the minimum along the first axis. In particular, np.meshgrid is your friend.
def cost_function(x, y):
return x ** 2 + y ** 2
x = linspace(-1, 1)
y = linspace(-1, 1)
def eval_python(x, y):
matrix = [cost_function(_x, _y) for _x in x for _y in y]
return np.min(matrix, axis=0)
def eval_numpy(x, y):
xx, yy = np.meshgrid(x, y)
matrix = cost_function(xx, yy)
return np.min(matrix, axis=0)
%timeit eval_python(x, y)
%timeit eval_numpy(x, y)
Output
100 loops, best of 3: 13.9 ms per loop
10000 loops, best of 3: 136 µs per loop
Finally, if you cannot cast your problem in this form, you can preallocated the memory and then fill in each element.
matrix = np.empty((num_x, num_y))
for i in range(num_x):
for j in range(num_y):
matrix[i, j] = cost_function(i, j)

How would you efficiently vectorize that kind of operation using numpy?

Input data
Produce n matrices of a given size (here, 3x2). I also chose n=25, but I let n to lay the emphasis on the fact that what we have is a bunch of matrices.
import numpy as np
n = 25
data = np.random.rand(n, 3, 2)
This is just a format example : I can't change it. Or if I do, one must take into account the computational cost of this change.
Current implementation
What I want to achieve atomically is:
output = []
for datum in data: # This outputs on (3x2) matrix after the other
d0 = datum[0]
dr = datum[1:]
output.append(dr-d0)
or, in a faster fashion:
output = [dr-d0 for (dr, d0) in zip(datum[:,0], datum[:,1:])]
Problem
This is too slow and:
output = datum[:,1:] - datum[:,0]
does not work since the behavior of the subtraction operation is not well defined in that case. Plus, this kind of slicing is not very efficient.
Cython/Nuitka/PyPy and the likes are possible solutions, but I'd like to stick with raw Numpy for now, if possible. Maybe some kind of function that can be applied on elements of the outer loop of a numpy array very quickly without the overhead of python stuff...
The np.vectorize function doesn't work on:
def get_diff(mat):
return mat[1:] - mat[0]
So I invoke ye, High Priests of Numpy, servants of Python to enlighten my poor soul!
EDIT:
XY Problem
(I didn't know it had a name)
What I actually want to do is to determine the content (read "volume") of a lot of simplices (read "tetrahedra"). The easiest and most efficient way to do it, AFAIK is to calculate:
np.linalg.det(mat[:1]-mat[0])
Then let me rephrase my question: how can I efficiently compute the content of any ensemble of simplices of dimension k using plain python and numpy?
I suggest data[:,1:] - data[:,0,None]. The None creates a new axis (officially you're supposed to use np.newaxis, which makes it very clear what you're doing), and then the subtraction will behave the way you want it to.
Correcting what I think are errors in your list comprehension:
def loop(data):
output = []
for datum in data: # This outputs on (3x2) matrix after the other
d0 = datum[0]
dr = datum[1:]
output.append(dr-d0)
return output
def listcomp(data):
output = [dr-d0 for (d0, dr) in zip(data[:,0], data[:,1:])]
return output
def sub(data):
output = data[:,1:] - data[:,0,None]
return output
we have
>>> import numpy as np
>>> n = 25
>>> data = np.random.rand(n, 3, 2)
>>> res_loop = loop(data)
>>> res_listcomp = listcomp(data)
>>> res_sub = sub(data)
>>> np.allclose(res_loop, res_listcomp)
True
>>> np.allclose(res_loop, res_sub)
True
>>>
>>> %timeit loop(data)
10000 loops, best of 3: 184 µs per loop
>>> %timeit listcomp(data)
10000 loops, best of 3: 158 µs per loop
>>> %timeit sub(data)
100000 loops, best of 3: 12.8 µs per loop

Fastest Way to generate 1,000,000+ random numbers in python

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes.
Constraints on the random numbers:
A Set of 7 numbers that can all have different bounds:
eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]
A Set of 13 numbers that all add up to 1
Currently just generating 13 numbers then dividing by their sum
Any ideas? Would pre calculating these numbers and storing them in a file make this faster?
Thanks!
You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...
Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.
In [53]: def rand_row_doubles(row_limits, num):
....: ncols = len(row_limits)
....: x = np.random.random((num, ncols))
....: x *= row_limits
....: return x
....:
In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000)
10 loops, best of 3: 187 ms per loop
As compared to:
In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000)
1 loops, best of 3: 222 ms per loop
It's not a huge difference, but if you're really worried about speed, it's something.
Just to show that it's correct:
In [68]: x.max(0)
Out[68]:
array([ 0.99999991, 1.99999971, 2.99999737, 3.99999569, 4.99999836,
5.99999114, 6.99999738])
In [69]: x.min(0)
Out[69]:
array([ 4.02099599e-07, 4.41729377e-07, 4.33480302e-08,
7.43497138e-06, 1.28446819e-05, 4.27614385e-07,
1.34106753e-05])
Likewise, for your "rows sum to one" part...
In [70]: def rand_rows_sum_to_one(nrows, ncols):
....: x = np.random.random((ncols, nrows))
....: y = x.sum(axis=0)
....: x /= y
....: return x.T
....:
In [71]: %timeit rand_rows_sum_to_one(1000000, 13)
1 loops, best of 3: 455 ms per loop
In [72]: x = rand_rows_sum_to_one(1000000, 13)
In [73]: x.sum(axis=1)
Out[73]: array([ 1., 1., 1., ..., 1., 1., 1.])
Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!
EDIT Created functions that return the full set of numbers, not just one row at a time.
EDIT 2 Make the functions more pythonic (and faster), add solution for second question
For the first set of numbers, you might consider numpy.random.randint or numpy.random.uniform, which take low and high parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:
def LimitedRandInts(XLim, N):
rowlen = (1,N)
return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim]
def LimitedRandDoubles(XLim, N):
rowlen = (1,N)
return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim]
>>> import numpy as np
>>> N = 1000000 #number of randoms in each range
>>> xLim = [x*500 for x in range(1,8)] #convenient limit generation
>>> fLim = [x/7.0 for x in range(1,8)]
>>> aa = LimitedRandInts(xLim, N)
>>> ff = LimitedRandDoubles(fLim, N)
This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.
For the second set, I used #Joe Kingston's suggestion.
def SumToOneRands(NumToSum, N):
aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance
s = np.reciprocal(aa.sum(0))
aa *= s
return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers
>>> ll = SumToOneRands(13, N)
This takes ~1.6 seconds.
In all cases, result[k] gives you the kth set of data.
Try r = 1664525*r + 1013904223
from "an even quicker generator"
in "Numerical Recipes in C" 2nd edition, Press et al., isbn 0521431085, p. 284.
np.random is certainly "more random"; see
Linear congruential generator .
In python, use np.uint32 like this:
python -mtimeit -s '
import numpy as np
r = 1
r = np.array([r], np.uint32)[0] # 316 py -> 16 us np
# python longs can be arbitrarily long, so slow
' '
r = r*1664525 + 1013904223 # NR2 p. 284
'
To generate big blocks at a time:
# initialize --
np.random.seed( ... )
R = np.random.randint( 0, np.iinfo( np.uint32 ).max, size, dtype=np.uint32 )
...
R *= 1664525
R += 1013904223
Making your code run in parallel certainly couldn't hurt. Try adapting it for SMP with Parallel Python
As others have already pointed out, numpy is a very good start, fast and easy to use.
If you need random numbers on a massive scale, consider eas-ecb or rc4. Both can be parallelised, you should reach performance in several GB/s.
achievable numbers posted here
If you have access to multiple cores, the computations can be done in parallel with dask.array:
import dask.array as da
x = da.random.random(size=(rows, cols)).compute()
# .compute is not necessary here, because calculations
# can continue in a lazy form and .compute is used
# on the final result
import random
for i in range(1000000):
print(random.randint(1, 1000000))
Here's a code in Python that you can use to generate one million random numbers, one per line!
Just a quick example of numpy in action:
data = numpy.random.rand(1000000)
No need for loop, you can pass in how many numbers you want to generate.

Categories

Resources