I recently came across this for loop code in MATLAB that confused me because the inverse loop do the same thing faster. why this happens?
clear all
a = rand(1000,1000);
b = rand(1000,1000);
for i=1:1000
for j=1:1000
c(i,j) = a(i,j) + b(i,j);
end
end
and the same code with inverse loop:
clear all
a = rand(1000,1000);
b = rand(1000,1000);
for i=1000:-1:1
for j=1000:-1:1
c(i,j) = a(i,j) + b(i,j);
end
end
i do the same in python with range(1000,1,-1) and found the same result(the inverse loop is still faster).
Since you did not preallocate your output variable c when you go in the reverse order c is initially preallocated to a 1000 x 1000 matrix after the first for loop iteration. When you count up c increases in size each loop which requires reallocation of memory on each iteration and thus is slower. Matlab will show this as a warning if you have them turned on.
The inverse loop is faster, because the first iteration (c(1000,1000)=..) creates an array of size 1000x1000 while first piece of code continuously increases the size of the variable c.
To avoid such problems, preallocated the variables you write in loops. Insert a c=zeros(1000,1000) and both versions run fast. Your matlab editor shows you warnings (yellow lines) which indicate potential performance problems and other problems with your code. Read these messages!
Related
I am currently working with sparse matrix in python. I choose to use lil_matrix for my problem because as explained in the documentation lil_matrix are intended to be used for constructing a sparse matrix. My sparse matrix has dimensions 2500x2500
I have two piece of code inside two loops (which iterate in the matrix elements) which are having different execution time and I want to understand why. The first one is
current = lil_matrix_A[i,j]
lil_matrix_A[i, j] = current + 1
lil_matrix_A[j, i] = current + 1
Basically just taking every element of the matrix and incrementing its value by one.
And the second one is as below
value = lil_matrix_A[i, j]
temp = (value * 10000) / (dictionary[listA[i]] * dictionary[listB[j]])
lil_matrix_A[i, j] = temp
lil_matrix_A[j, i] = temp
Basically taking the value, making the calculation of a formula and inserting this new value to the matrix.
The first code is executed for around 0.4 seconds and the second piece of code is executed for around 32 seconds.
I understand that the second one has an extra calculation in the middle, but the time difference, in my opinion, does not make sense. The dictionary and list indexing have O(1) complexity so it is not supposed to be a problem. Is there any suggestion what it is causing this difference in execution time?
Note: The number of elements in list and dictionary is also 2500.
Problem
The code performs geostatistic interpolation by applying kriging. For small data size, it works great. However, when the data is large, the computational time increases drastically.
Constants
c1, c2, c3, c4 are constants
matrx is a dataset of size 6000 x 6000
Variables
data is a 6000 x 3 array
gdata is a 10000 x 2 array
Code
An extract of the code where I am having the problem is below:
prediction = []
for i, dummy_val in range(10000):
semivariance = []
for j in range(len(data[:, 2])):
distance = np.sqrt((gdata[i, 0]-data[j, 0])**2 + (gdata[i, 1]-data[j, 1])**2)
semivariance.append((c1 + c2*(1-np.exp(-(distance/c3)**c4))))
semivariance.append(1)
iweights = np.linalg.lstsq(matrx, semivariance, rcond=None)
weights = iweights[:-3][0][:-1]
prediction.append(np.sum(data[:, 2]*weights))
When I debug the code, I realize that the problem comes from the
iweights = np.linalg.lstsq(matrx, semivariance, rcond=None)
which runs very slow for the large matrx and semivariance array that I am using.
Is there a pythonic way to help improve the computational speed or a way I could rewrite the entire block of code to improve the speed?
Are you using the MKL library binding to numpy? If not you can try it and check if it affect the performance of your code.
Also for the prediction and semivariance lists you are creating empty lists for both of them, then append a value for them on each iteration. As per my understanding, the number of iterations is fixed in your code, so will the code be faster if you created the lists with full size from the start to avoid dynamic creation of new list with every append? I don't know if the interpreter is smart enough to detect that the size of the list is fixed to create it and avoid the reallocation withou your help.
I have a very large multiply and sum operation that I need to implement as efficiently as possible. The best method I've found so far is bsxfun in MATLAB, where I formulate the problem as:
L = 10000;
x = rand(4,1,L+1);
A_k = rand(4,4,L);
tic
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1)+sum(sum(bsxfun(#times,A_k(:,:,2:k),x(:,1,k+1-i)),2),3);
end
toc
Note that L will be larger in practice. Is there a faster method? It's strange that I need to first add the singleton dimension to x and then sum over it, but I can't get it to work otherwise.
It's still much faster than any other method I've tried, but not enough for our application. I've heard rumors that the Python function numpy.einsum may be more efficient, but I wanted to ask here first before I consider porting my code.
I'm using MATLAB R2017b.
I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k array:
B_k = sum(A_k,2);
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(#times,B_k(:,1,2:k),x(:,1,k+1-i)),3);
end
With this single change the runtime is reduced from ~8 seconds to ~2.5 seconds on my laptop.
The second summation could also be removed, by transforming times+sum into a matrix-vector product. It needs some singleton fiddling to get the dimensions right, but if you define an auxiliary array that is B_k with the second dimension reversed, you can generate the remaining sum as ~x*C_k with this auxiliary array C_k, give or take a few calls to reshape.
So after a closer look I realized that my original assessment was overly optimistic: you have multiplications in both dimensions in your remaining term, so it's not a simple matrix product. Anyway, we can rewrite that term to be the diagonal of a matrix product. This implies that we're computing a bunch of unnecessary matrix elements, but this still seems to be slightly faster than the bsxfun approach, and we can get rid of your pesky singleton dimension too:
L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';
tic
for k = 2:L
ii = 1:k-1;
x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1-ii,:));
end
toc
This runs in ~2.2 seconds on my laptop, somewhat faster than the ~2.5 seconds obtained previously.
Since you're using an new version of Matlab you might try broadcasting / implicit expansion instead of bsxfun:
x(:,1,k+1) = x(:,1,k+1)+sum(sum(A_k(:,:,2:k).*x(:,1,k-1:-1:1),3),2);
I also changed the order of summation and removed the i variable for further improvement. On my machine, and with Matlab R2017b, this was about 25% faster for L = 10000.
My timelines are stored in simple numpy Arrays, and they are long (>10 Million entrys)
I have to detect machine shutdowns, that show in jumps in the time vector . After that shutdown I want do delete the next 10 values (The sensors do give bad results for a while after being switched on) and continue.
I came up with the following code:
Keep_data=np.empty_like(Timestamp_new,dtype=np.bool)
Keep_data[0]=False
Keep_data[1:]=Timestamp_new[1:]>(Timestamp_new[:-1]+min_shutdown_length)
for item in np.nonzero(np.logical_not(Keep_data))[0]:
Keep_data[item:min(item+10,len(Keep_data)]=False
Timestampnew=Timestampnew[Keep_data]
Can anyone suggest a more effective code, without a pure python Loop?
Thank you.
Basically you are trying to spread/grow or in image-processing terms dilate the False regions. For the same, we have a built-in as scipy's binary_dilation. Now, you are trying to make it grow starting from each such False element in input array Keep_data towards higher indices. So, we need to use a different offset (or as scipy calls it : origin) than the default one as 0, which otherwise would have dilated across both ends for each element.
Thus, to sum up, an implementation with it to get rid of the loopy portion of the code, we would have an implementation like so -
N = 10 # Interval length
dilated_mask = binary_dilation(~Keep_data, structure=np.ones(N),origin=-int(N/2))
Keep_data[dilated_mask] = False
An alternative approach that would be closer to the one posted as the loopy code in the question, but vectorized with NumPy's broadcasting feature, would look something like this -
N = 10 # Interval length
idx = np.nonzero(np.logical_not(Keep_data[:-N]))[0]
Keep_datac[(idx + np.arange(N)[:,None]).ravel()] = False
rest = np.nonzero(np.logical_not(Keep_data[-N:]))[0]
if len(rest)>0:
Keep_datac[-N+rest[0]:] = False
First time posting, so I apologize for any confusion.
I have two numpy arrays which are time stamps for a signal.
chan1,chan2 looks like:
911.05, 7.7
1055.6, 455.0
1513.4, 1368.15
4604.6, 3004.4
4970.35, 3344.25
13998.25, 4029.9
15008.7, 6310.15
15757.35, 7309.75
16244.2, 8696.1
16554.65, 9940.0
..., ...
and so on, (up to 65000 elements per chan. pre file)
Edit : The lists are already sorted but the issue is that they are not always equal in spacing. There are gaps that could show up, which would misalign them, so chan1[3] could be closer to chan2[23] instead of, if the spacing was qual chan2[2 or 3 or 4] : End edit
For each elements in chan1, I am interested in finding the closest neighbor in chan2, which is done with:
$ np.min(np.abs(chan2-chan1[i]))
and to keep track of positive or neg. difference:
$ index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
$ if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
Lastly, I create a histogram of all the differences, in a range I am interested in.
My concern is that I do this in the for loop. I usually try to avoid for loops when I can by utilizing the numpy arrays, as each operation can be performed on the entire array. However, in this case I am unable to find a solution or a build in function (which I understand run significantly faster than anything I can make).
The routine takes about 0.03 seconds per file. There are a few more things happening outside of the function but not a significant number, mostly plotting after everything is done, and a loop to read in files.
I was wondering if anyone has seen a similar problem, or is familiar enough with the python libraries to suggest a solution (maybe a build in function?) to obtain the data I am interested in? I have to go over hundred of thousands of files, and currently my data analysis is about 10 slower than data acquisition. We are also in the middle of upgrading our instruments to where we will be able to obtain data 10-100 times faster, and so the analysis speed is going to become an serious issue.
I would prefer not to use a cluster to brute force the problem, and not too familiar with parallel processing, although I would not mind dabbling in it. It would take me a while to write it in C, and I am not sure if I would be able to make it faster.
Thank you in advance for your help.
def gen_hist(chan1,chan2):
res=np.arange(1,len(chan1)+1,1)*0.0
for i in range(len(chan1)):
res[i]=np.min(np.abs(chan2-chan1[i]))
index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
return np.histogram(res,bins=np.arange(time_range[0]-interval,\
time_range[-1]+interval,\
interval))[0]
After all the files are cycled through I obtain a plot of the data:
Example of the histogram
Your question is a little vague, but I'm assuming that, given two sorted arrays, you're trying to return an array containing the differences between each element of the first array and the closest value in the second array.
Your algorithm will have a worst case of O(n^2) (np.where() and np.min() are O(n)). I would tackle this by using two iterators instead of one. You store the previous (r_p) and current (r_c) value of the right array and the current (l_c) value of the left array. For each value of the left array, increment the right array until r_c > l_c. Then append min(abs(r_p - l_c), abs(r_c - l_c)) to your result.
In code:
l = [ ... ]
r = [ ... ]
i = 0
j = 0
result = []
r_p = r_c = r[0]
while i < len(l):
l_c = l[i]
while r_c < l and j < len(r):
j += 1
r_c = r[j]
r_p = r[j-1]
result.append(min(abs(r_c - l_c), abs(r_p - l_c)))
i += 1
This runs in O(n). If you need additional speed out of it, try writing it in C or running it in Cython.