python performance bottleneck with lil_matrix

python performance bottleneck with lil_matrix - python

I am currently working with sparse matrix in python. I choose to use lil_matrix for my problem because as explained in the documentation lil_matrix are intended to be used for constructing a sparse matrix. My sparse matrix has dimensions 2500x2500
I have two piece of code inside two loops (which iterate in the matrix elements) which are having different execution time and I want to understand why. The first one is
current = lil_matrix_A[i,j]
lil_matrix_A[i, j] = current + 1
lil_matrix_A[j, i] = current + 1
Basically just taking every element of the matrix and incrementing its value by one.
And the second one is as below
value = lil_matrix_A[i, j]
temp = (value * 10000) / (dictionary[listA[i]] * dictionary[listB[j]])
lil_matrix_A[i, j] = temp
lil_matrix_A[j, i] = temp
Basically taking the value, making the calculation of a formula and inserting this new value to the matrix.
The first code is executed for around 0.4 seconds and the second piece of code is executed for around 32 seconds.
I understand that the second one has an extra calculation in the middle, but the time difference, in my opinion, does not make sense. The dictionary and list indexing have O(1) complexity so it is not supposed to be a problem. Is there any suggestion what it is causing this difference in execution time?
Note: The number of elements in list and dictionary is also 2500.

Related

Mean till i-ith element of an Array

I need to calculate the mean of an array (length n) but only to the i-ith element (i<=n). For example an Array filled with dice rolls.
x = {1,4,5,3,6,...}
My current method is to use a loop and numpy.mean and slice the array each step:
x_mean_ith[0] = x[0]
for i in range(1,n):
x_mean_ith[i] = np.mean(x[:i])
This Method is too slow and i need it to be significantly faster. Currently it takes this part of the code ~ 2mins when the array is in the order of n = 10^6.
Is there maybe a smarter way to calculate this without it taking to much time , memory usage is not important.

You could do it using efficient (vectorized) cumulative sum:
x_mean_ith = np.cumsum(x) / np.arange(1,len(x)+1)

Time Complexity of Spirally Traversing a 2D Matrix?

I am learning how to traverse a 2D matrix spirally, and I came across this following algorithm:
def spiralOrder(self, matrix):
result = []
while matrix:
result.extend(matrix.pop(0))
matrix = zip(*matrix)[::-1]
return result
I am currently having a hard time figuring out the time complexity of this question with the zip function being in the while loop.
It would be greatly appreciated if anyone could help me figure out the time complexity with explanations.
Thank you!

The known time complexity for this problem is a constant O(MxN) where M is the number of rows and N is the number of columns in a MxN matrix. This is an awesome algorithm but it looks like it might be slower.
Looking at it more closely, with every iteration of the loop you are undergoing the following operations:
pop() # O(1)
extend() # O(k) where k is the number of elements added that operation
*matrix # O(1) -> python optimizes unpacking for native lists
list(zip()) # O(j) -> python 2.7 constructs the list automatically and python 3 requires the list construction to run
[::-1] # O(j/2) -> reverse sort divided by two because zip halved the items
Regardless of how many loop iterations, by the time this completes you will have at least called result.extend on every element (MxN elements) in the matrix. So best case is O(MxN).
Where I am less sure is how much time the repeated zips and list reversals are adding. The loop is only getting called roughly M+N-1 times but the zip/reverse is done on (M-1) * N elements and then on (M-1) * (N-1) elements, and so on. My best guess is that this type of function is at least logarithmic so I would guess overall time complexity is somewhere around O(MxN log(MxN)).
https://wiki.python.org/moin/TimeComplexity

No matter how you traverse a 2D matrix, the time complexity will always be quadratic in terms of the dimensions.
A m×n matrix therefore takes O(mn) time to traverse, regardless if it is spiral or row-major.

Coding an iterated sum of sums in python

For alpha and k fixed integers with i < k also fixed, I am trying to encode a sum of the form
where all the x and y variables are known beforehand. (this is essentially the alpha coordinate of a big iterated matrix-vector multiplication)
For a normal sum varying over one index I usually create a 1d array A and set A[i] equal to the i indexed entry of the sum then use sum(A), but in the above instance the entries of the innermost sum depend on the indices in the previous sum, which in turn depend on the indices in the sum before that, all the way back out to the first sum which prevents me using this tact in a straightforward manner.
I tried making a 2D array B of appropriate length and width and setting the 0 row to be the entries in the innermost sum, then the 1 row as the entries in the next sum times sum(np.transpose(B),0) and so on, but the value of the first sum (of row 0) needs to vary with each entry in row 1 since that sum still has indices dependent on our position in row 1, so on and so forth all the way up to sum k-i.
A sum which allows for a 'variable' filled in by each position of the array it's summing through would thusly do the trick, but I can't find anything along these lines in numpy and my attempts to hack one together have thus far failed -- my intuition says there is a solution that involves summing along the axes of a k-i dimensional array, but I haven't been able to make this precise yet. Any assistance is greatly appreciated.

One simple attempt to hard-code something like this would be:
for j0 in range(0,n0):
for j1 in range(0,n1):
....
Edit: (a vectorized version)
You could do something like this: (I didn't test it)
temp = np.ones(n[k-i])
for j in range(0,k-i):
temp = x[:n[k-i-1-j],:n[k-i-j]].T#(y[:n[k-i-j]]*temp)
result = x[alpha,:n[0]]#(y[:n[0]]*temp)
The basic idea is that you try to press it into a matrix-vector form. (note that this is python3 syntax)
Edit: You should note that you need to change the "k-1" to where the innermost sum is (I just did it for all sums up to index k-i)

This is 95% identical to #sehigle's answer, but includes a generic N vector:
def nested_sum(XX, Y, N, alpha):
intermediate = np.ones(N[-1], dtype=XX.dtype)
for n1, n2 in zip(N[-2::-1], N[:0:-1]):
intermediate = np.sum(XX[:n1, :n2] * Y[:n2] * intermediate, axis=1)
return np.sum(XX[alpha, :N[0]] * Y[:N[0]] * intermediate)
Similarly, I have no knowledge of the expression, so I'm not sure how to build appropriate tests. But it runs :\

Is MATLAB's bsxfun the best? Python's numpy.einsum?

I have a very large multiply and sum operation that I need to implement as efficiently as possible. The best method I've found so far is bsxfun in MATLAB, where I formulate the problem as:
L = 10000;
x = rand(4,1,L+1);
A_k = rand(4,4,L);
tic
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1)+sum(sum(bsxfun(#times,A_k(:,:,2:k),x(:,1,k+1-i)),2),3);
end
toc
Note that L will be larger in practice. Is there a faster method? It's strange that I need to first add the singleton dimension to x and then sum over it, but I can't get it to work otherwise.
It's still much faster than any other method I've tried, but not enough for our application. I've heard rumors that the Python function numpy.einsum may be more efficient, but I wanted to ask here first before I consider porting my code.
I'm using MATLAB R2017b.

I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k array:
B_k = sum(A_k,2);
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(#times,B_k(:,1,2:k),x(:,1,k+1-i)),3);
end
With this single change the runtime is reduced from ~8 seconds to ~2.5 seconds on my laptop.
The second summation could also be removed, by transforming times+sum into a matrix-vector product. It needs some singleton fiddling to get the dimensions right, but if you define an auxiliary array that is B_k with the second dimension reversed, you can generate the remaining sum as ~x*C_k with this auxiliary array C_k, give or take a few calls to reshape.
So after a closer look I realized that my original assessment was overly optimistic: you have multiplications in both dimensions in your remaining term, so it's not a simple matrix product. Anyway, we can rewrite that term to be the diagonal of a matrix product. This implies that we're computing a bunch of unnecessary matrix elements, but this still seems to be slightly faster than the bsxfun approach, and we can get rid of your pesky singleton dimension too:
L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';
tic
for k = 2:L
ii = 1:k-1;
x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1-ii,:));
end
toc
This runs in ~2.2 seconds on my laptop, somewhat faster than the ~2.5 seconds obtained previously.

Since you're using an new version of Matlab you might try broadcasting / implicit expansion instead of bsxfun:
x(:,1,k+1) = x(:,1,k+1)+sum(sum(A_k(:,:,2:k).*x(:,1,k-1:-1:1),3),2);
I also changed the order of summation and removed the i variable for further improvement. On my machine, and with Matlab R2017b, this was about 25% faster for L = 10000.

Iterating over elements, finding minima per each element

First time posting, so I apologize for any confusion.
I have two numpy arrays which are time stamps for a signal.
chan1,chan2 looks like:
911.05, 7.7
1055.6, 455.0
1513.4, 1368.15
4604.6, 3004.4
4970.35, 3344.25
13998.25, 4029.9
15008.7, 6310.15
15757.35, 7309.75
16244.2, 8696.1
16554.65, 9940.0
..., ...
and so on, (up to 65000 elements per chan. pre file)
Edit : The lists are already sorted but the issue is that they are not always equal in spacing. There are gaps that could show up, which would misalign them, so chan1[3] could be closer to chan2[23] instead of, if the spacing was qual chan2[2 or 3 or 4] : End edit
For each elements in chan1, I am interested in finding the closest neighbor in chan2, which is done with:
$ np.min(np.abs(chan2-chan1[i]))
and to keep track of positive or neg. difference:
$ index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
$ if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
Lastly, I create a histogram of all the differences, in a range I am interested in.
My concern is that I do this in the for loop. I usually try to avoid for loops when I can by utilizing the numpy arrays, as each operation can be performed on the entire array. However, in this case I am unable to find a solution or a build in function (which I understand run significantly faster than anything I can make).
The routine takes about 0.03 seconds per file. There are a few more things happening outside of the function but not a significant number, mostly plotting after everything is done, and a loop to read in files.
I was wondering if anyone has seen a similar problem, or is familiar enough with the python libraries to suggest a solution (maybe a build in function?) to obtain the data I am interested in? I have to go over hundred of thousands of files, and currently my data analysis is about 10 slower than data acquisition. We are also in the middle of upgrading our instruments to where we will be able to obtain data 10-100 times faster, and so the analysis speed is going to become an serious issue.
I would prefer not to use a cluster to brute force the problem, and not too familiar with parallel processing, although I would not mind dabbling in it. It would take me a while to write it in C, and I am not sure if I would be able to make it faster.
Thank you in advance for your help.
def gen_hist(chan1,chan2):
res=np.arange(1,len(chan1)+1,1)*0.0
for i in range(len(chan1)):
res[i]=np.min(np.abs(chan2-chan1[i]))
index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
return np.histogram(res,bins=np.arange(time_range[0]-interval,\
time_range[-1]+interval,\
interval))[0]
After all the files are cycled through I obtain a plot of the data:
Example of the histogram

Your question is a little vague, but I'm assuming that, given two sorted arrays, you're trying to return an array containing the differences between each element of the first array and the closest value in the second array.
Your algorithm will have a worst case of O(n^2) (np.where() and np.min() are O(n)). I would tackle this by using two iterators instead of one. You store the previous (r_p) and current (r_c) value of the right array and the current (l_c) value of the left array. For each value of the left array, increment the right array until r_c > l_c. Then append min(abs(r_p - l_c), abs(r_c - l_c)) to your result.
In code:
l = [ ... ]
r = [ ... ]
i = 0
j = 0
result = []
r_p = r_c = r[0]
while i < len(l):
l_c = l[i]
while r_c < l and j < len(r):
j += 1
r_c = r[j]
r_p = r[j-1]
result.append(min(abs(r_c - l_c), abs(r_p - l_c)))
i += 1
This runs in O(n). If you need additional speed out of it, try writing it in C or running it in Cython.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python performance bottleneck with lil_matrix - python

Related

Mean till i-ith element of an Array

Time Complexity of Spirally Traversing a 2D Matrix?

Coding an iterated sum of sums in python

Is MATLAB's bsxfun the best? Python's numpy.einsum?

Iterating over elements, finding minima per each element

Categories

Resources