Python Nested Loops to get a subcovariance matrix optimizations - python

I am trying to optimize a function that returns a subcovariance matrix from the full covariance matrix given the desired memmbers. The full covariance matrix could contains 500+ items and I could be looking at a variable number of members each time but likely 20 or less per call. This gets called 10,000+ times.
My code works but I was wondering how to optimize it.
def subcovar(covar,elements):
numelements = elements.shape[0]
subcovar = np.zeros((numelements,numelements))
for i in range(0,numelements):
for j in range(0,numelements):
subcovar[i,j]= covar[elements[i],elements[j]]
return subcovar
Thanks
Paul

Why not use matrix slicing? [offical python docs]
Here's a SO thread on building a sub-matrix by extracting arbitrary (ie non-sequential) rows and columns
Slicing of a numpy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)

Based on Mike's response I went back, looked at slicing and found the following:
Old way time = 0.016201255250881966 sec, choosing a 20x20 from a 500x500
New way time = 0.0016199544633708396 sec, choosing a 20x20 from a 500x500
10x faster
Old way time = 0.09903732167528723 sec, choosing a 50x50 from a 500x500
New way time = 0.00229222701258387 sec, choosing a 50x50 from a 500x500
43x faster
Old way time = 2.669313751708479 sec, choosing a 250x250 from a 500x500
New way time = 0.003080821529599664 sec, choosing a 250x250 from a 500x500
866x faster
oldway
def subcovarold(covar,elements):
start2 = time.clock()
covar = np.arange(250000).reshape((500, 500))
elements = np.arange(0,500,25) # this is the elements to choose
numelements = elements.shape[0]
subcovar= np.zeros((numelements,numelements))
for i in range(0,numelements):
for j in range(0,numelements):
subcovar[i,j]= covar[elements[i],elements[j]]
end2 = time.clock()
print ("time ", end2 - start2)
return subcovar
newway
def subcovarnew(covar,elements):
start2 = time.clock()
covar = np.arange(250000).reshape((500, 500))
elements = np.arange(0,500,2)
msize = elements.shape[0]
ii = elements.reshape(msize,1)
jj = elements.reshape(1,msize)
subcovar = covar[ii,jj]
end2 = time.clock()
print ("time ", end2 - start2)
return subcovar
Thanks
Paul

Related

Speeding up a large number of small matrix multiplications

I have a large number of 2x2 matrices that I need multiplied together.
I can initiate the general shape of the problem as
import numpy as np
import time
A_dim = 6*6
B_dim = 2**8
C_dim = B_dim
A = np.random.rand(A_dim,A_dim,2,2)
B = np.random.rand(B_dim,2,2)
C = np.random.rand(C_dim,2,2)
tic = time.perf_counter()
X = A[None,None,:,:,:,:] # B[:,None,None,None,:,:] # A[None,None,:,:,:,:] # C[None,:,None,None,:,:]
toc = time.perf_counter()
print(f"matrix multiplication took {toc - tic:0.4f} seconds")
where I get
matrix multiplication took 14.4403 seconds
This seems to be a vectorized implementation, but is there anything I can do to speed this up? My native Numpy library runs this on only one core, so possibly if I figured out how to get OpenBLAS to work, this would be faster? The problem is that each matrix operation takes a very small amount of time to complete. Is there a better way to construct such a multidimensional array?

How to make a graph between order of the matrix and the time taken to multiply the two matrices?

import numpy as np
from time import time
import matplotlib.pyplot as plt
np.random.seed(27)
mysetup = "from math import sqrt"
begin=time()
i=int(input("Number of rows in first matrix"))
k=int(input("Number of column in first and rows in second matrix"))
j=int(input("Number of columns in second matrix"))
A = np.random.randint(1,10,size = (i,k))
B = np.random.randint(1,10,size = (k,j))
def multiply_matrix(A,B):
global C
if A.shape[1]==B.shape[0]:
C=np.zeros((A.shape[0],B.shape[1]),dtype=int)
for row in range(i):
for col in range(j):
for elt in range(0,len(B)):
C[row,col] += A[row,elt]*B[elt,col]
return C
else:
return "Cannot multiply A and B"
print(f"Matrix A:\n {A}\n")
print(f"Matrix B:\n {B}\n")
D=print(multiply_matrix(A, B))
end=time()
t=print(end-begin)
x=[0,100,10]
y=[100,100,1000]
plt.plot(x,y)
plt.xlabel('Time taken for the program to run')
plt.ylabel('Order of the matrix multiplication')
plt.show()
In the program, I have generated random elements for the matrices to be multiplied.Basically I am trying to compute the time it takes to multiply two matrices.The i,j and k will be considered as the order used for the matrix.As we cannot multiply matrices where number of columns of the first is not equal to the number of the rows in the second, I have already given them the variable 'k'.
Initially I considered to increment the order of the matrix using for loop but wasn't able to do so. I want the graph to display the time it took to multiply the matrices on the x axis and the order of the resultant matrix on the y axis.
There is a problem in the logic I applied but I am not able to find out how to do this problem as I am a beginner in programming
I was expecting to get the result as Y axis having a scale ranging from 0 to 100 with a difference of 10 and x axis with a scale of 100 to 1000 with a difference of 100.
The thousandth entity on the x axis will correspond to the time it took to compute the multiplication of two matrices with numbers of rows and columns as 1000.
Suppose the time it took to compute this was 200seconds. So the graph should be showing the point(1000,200).
Some problematic points I'd like to address -
You're starting the timer before the user chooses an input - which can differ, we want to be as precise as possible, thus we need to only calculate how much time it takes for the multiply_matrix function to run.
Because you're taking an input - it means that each run you will get one result, and one result is only a single point - not a full graph, so we need to get rid of the user input and generate our own.
Moreover to point #2 - we are not interested in giving "one shot" for each matrix order - that means that when we want to test how much time it takes to multiply two matrices of order 300 (for example) - we need to do it N times and take the average in order to be more precise, not to mention we are generating random numbers, and it is possible that some random generated matrices will be easier to compute than other... although taking the average over N tests is not 100% accurate - it does help.
You don't need to set C as a global variable as it can be a local variable of the function multiply_matrix that we anyways return. Also this is not the usage of globals as even with the global C - it will be undefined in the module level.
This is not a must, but it can improve a little bit your program - use time.perf_counter() as it uses the clock with the highest (available) resolution to measure a short duration, and it avoids precision loss by the float type.
You need to change the axes because we want to see how the time is affected by the order of the matrices, not the opposite! (so our X axis is now the order and the Y is the average time it took to multiply them)
Those fixes translate to this code:
Calculating how much it takes for multiply_matrix only.
begin = time.perf_counter()
C = multiply_matrix(A, B)
end = time.perf_counter()
2+3. Generating our own data, looping from order 1 to order maximum_order, taking 50 tests for each order:
maximum_order = 50
tests_number_for_each_order = 50
def generate_matrices_to_graph():
matrix_orders = [] # our X
multiply_average_time = [] # our Y
for order in range(1, maximum_order):
print(order)
times_for_each_order = []
for _ in range(tests_amount_for_each_order):
# generating random square matrices of size order.
A = np.random.randint(1, 10, size=(order, order))
B = np.random.randint(1, 10, size=(order, order))
# getting the time it took to compute
begin = time.perf_counter()
multiply_matrix(A, B)
end = time.perf_counter()
# adding it to the times list
times_for_each_order.append(end - begin)
# adding the data about the order and the average time it took to compute
matrix_orders.append(order)
multiply_average_time.append(sum(times_for_each_order) / tests_amount_for_each_order) # average
return matrix_orders, multiply_average_time
Minor changes to multiply_matrix as we don't need i, j, k from the user:
def multiply_matrix(A, B):
matrix_order = A.shape[1]
C = np.zeros((matrix_order, matrix_order), dtype=int)
for row in range(matrix_order):
for col in range(matrix_order):
for elt in range(0, len(B)):
C[row, col] += A[row, elt] * B[elt, col]
return C
and finally call generate_matrices_to_graph
# calling the generate_data_and_compute function
plt.plot(*generate_matrices_to_graph())
plt.xlabel('Matrix order')
plt.ylabel('Time [in seconds]')
plt.show()
Some outputs:
We can see that when our tests_number_for_each_order is small, the graph loses precision and crisp.
Going from order 1-40 with 1 test for each order:
Going from order 1-40 with 30 tests for each order:
Going from order 1-40 with 80 tests for each order:
I love this kind of questions:
import numpy as np
from time import time
import matplotlib.pyplot as plt
np.random.seed(27)
dim = []
times = []
for i in range(1,10001,10):
A = np.random.randint(1,10,size=(1,i))
B = np.random.randint(1,10,size=(i,1))
begin = time()
C = A*B
times.append(time()-begin)
dim.append(i)
plt.plot(times,dim)
This is a simplified test in which I tested 1 dimension matrices, (1,1)(1,1), (1,10)(10,1), (1,20)(20,1) and so on...
But you can make a double iteration to change also the "outer" dimension of the matrices and see how this affect the computational time

How to improve the efficiency of array slicing and manipulation in Python (Numpy)?

I have a dataset carrying N arrays, each array may have a fixed or variable length. I am going to extract a window from the dataset and counts the occurrence of a given number. To make it simple, I assume each array contains 90 numbers. The model will pick a random position on each array and extract M consecutive numbers started from that position as a window strip, and thus total N strip will be obtained. I assume each strip has the same size so all N strips composing a view (window). I have a C++ code to perform the job and work very well, for a dataset of 5x90, repeating the window sampling 1000000 times, it takes roughly 0.002 seconds to finish every 10,000 iterations. I am trying to port the C++ to Python such that I could use the powerful data frame (Pandas) and other tools for related analysis. After some research, it seems that the Numpy array is a good representation of the data array but since the size of each array varied, I need to warp all arrays into a list. The code I have shown below (majority of irrelated code are removed)
import numpy as np
import random
import time
class dataSource:
_data = None
_NCOLS = 0
_NROWS = 0
# NCOLS, NROWS is the dimension of a window (view) of the data
# _data is a list of NCOLS Numpy array, each array carry 90 numbers
def __init__(self, NCOLS, NROWS):
# assuming each array has 90 numbers for this exmaple, it could be varied in actual case
src = [[1]*90]*NCOLS # just assume all numbers in the array to be 1 for this example
self._data = [np.array(cc) for cc in src]
self._NCOLS = NCOLS
self._NROWS = NROWS
def extractView(self, view):
pos = [random.randint(0, len(cc)-1-self._NROWS) for cc in self._data]
for col in range(len(self._data)):
view[:, col] = self._data[col][pos[col]:pos[col]+self._NROWS]
class dataView:
_NCOLS = 0
_NROWS = 0
_view = None
def __init__(self, NCOLS, NROWS):
self._NCOLS = NCOLS
self._NROWS = NROWS
self._view = np.zeros([self._NROWS, self._NCOLS], order='F')
def __setitem__(self, key, value):
self._view[key] = value
def count(self, elem):
return np.count_nonzero(self._view==elem)
if __name__ == '__main__':
ds = dataSource(5, 3)
dv = dataView(5, 3)
batchCount, totalTime, startTime = 1, 0, time.process_time()
for n in range(10000000):
if ((n+1)%10000==0):
dt = time.process_time() - startTime
totalTime += dt
print(totalTime / batchCount, ' ', dt)
batchCount += 1
startTime = time.process_time()
ds.extractView(dv)
cnt = dv.count(1) # this could then be used for other analysis, dv may be used by other functions
Running this code in Python 3.8.8 showing that the code takes 0.23 seconds to finish every 10,000 iterations. The Python code is about 100 times slower than C++. In an actual case, the data source could have up to 50 arrays and each may have 50 to 500 numbers, so the run times in Python could be even longer. To apply the C++ code in the real data, it takes about 5 hours to finish all iterations, based on the above estimation, it takes 500 hours to do the same job in Python. I understand that there may be some limit in Python that makes it hard to boost the speed up, but if anything that I could do to optimize my code so to reduce the time (even 10%) will help a lot.

speed up finite difference model

I have a complex finite difference model which is written in python using the same general structure as the below example code. It has two for loops one for each iteration and then within each iteration a loop for each position along the x array. Currently the code takes two long to run (probably due to the for loops). Is there a simple technique to use numpy to remove the second for loop?
Below is a simple example of the general structure I have used.
import numpy as np
def f(x,dt, i):
xn = (x[i-1]-x[i+1])/dt # a simple finite difference function
return xn
x = np.linspace(1,10,10) #create initial conditions with x[0] and x[-1] boundaries
dt = 10 #time step
iterations = 100 # number of iterations
for j in range(iterations):
for i in range(1,9): #length of x minus the boundaries
x[i] = f(x, dt, i) #return new value for x[i]
Does anyone have any ideas or comments on how I could make this more efficient?
Thanks,
Robin
For starters, this little change to the structure improves efficiency by roughly 15%. I would not be surprised if this code can be further optimized but that will most likely be algorithmic inside the function, i.e. some way to simplify the array element operation. Using a generator may likely help, too.
import numpy as np
import time
time0 = time.time()
def fd(x, dt, n): # x is an array, n is the order of central diff
for i in range(len(x)-(n+1)):
x[i+1] = (x[i]-x[i+2])/dt # a simple finite difference function
return x
x = np.linspace(1, 10, 10) # create initial conditions with x[0] and x[-1] boundaries
dt = 10 # time step
iterations = 1000000 # number of iterations
for __ in range(iterations):
x = fd(x, dt, 1)
print(x)
print('time elapsed: ', time.time() - time0)

Trying to get Python to concatenate generated column vectors to form a two dimensional array. It's not working

I'm new to Python. I've done this particular task before in MATLAB, and I'm trying to get the hang of the syntax and particular behaviour of Python, as I'll be using this language much more in future.
The task: I am taking 43,200 single data points (integers, but written as decimals) and performing a fast-fourier transform on a "window" of 600 at a time, shifting this window by 60 data points each time. Hence, this transform will output 600 fourier coefficients, 720 times - I will end up with a 600 x 720 matrix (rows, columns).
These data points are initially contained within a list and turned into a column vector after being FFT'd. The issue comes when I try to build the maxtrix from a loop - take the first 600 points, FFT them, and dump them in an empty array. Take the next 600, do the same thing, but now add these two columns together to make two rows, then three, then four... etc. I've been trying for several hours now, but whatever I try I cannot get it to work - it consistently outputs my "final" matrix (the one that was meant to be the generated 600 x 720) as being the exact same dimensions as each generated "block".
My code (relevant sections):
for i in range(npoints):
newdata.append(float(newy.readline())) #Read data from file
FFT_out = [] #Initialize empty FFT output array
window_size = 600 #Number of points in data "window"
window_skip = 60 #Number of points window moves across
j = 0 #FFT count variable
for i in range(0, npoints, window_skip):
block = np.fft.fft(newdata[i:i+window_size]) #FFT Computation of "window"
block = block[:, np.newaxis] #turn into column vector (n, 1)
if j == 0:
FFT_out = block
j = 1
else:
np.hstack((FFT_out, block))
j = j + 1
print("Shape of FFT matrix:")
print(np.shape(FFT_out))
print("Number of times FFT completed:")
print(j)
At this point, I'm willing to believe it's a fundamental flaw on my understanding of how Python does matrices or deals with arrays. I've tried reading about it, but I still cannot see where I'm going wrong. Any help would be greatly appreciated!
First thing to note is that Python is uses indentation to form blocks, so as posted you would only ever assign once to FFT_out and never actually call np.hstack.
Then assuming that this was in fact only a cut&paste issue when posting your question, you should note that hstack returns a concatenation of its arguments without actually modifying them. To accumulate the concatenation, you should then assign the result back to FFT_out:
FFT_out = np.hstack((FFT_out, block))
You should then be able to get a 600 x 720 matrix with:
for i in range(0, npoints, window_skip):
block = np.fft.fft(newdata[i:i+window_size])
block = block[:, np.newaxis] #turn into column vector (n, 1)
if j == 0:
FFT_out = block
j = 1
else:
FFT_out = np.hstack((FFT_out, block))
j = j + 1

Categories

Resources