How to parallelize a computation?

How to parallelize a computation? - python

I am trying to compute a ordinary ODE (ordinary differential equation) on a distance matrix but I do not know how to parallelize my code.
from scipy.integrate import quad
from math import exp
import numpy as np
import matplotlib.pyplot as plt
#I have my distance matrix and I wanna count how many points are distanced
# from point i with distance r at maximum
def v(dist, r, i):
return 1/N*(np.count_nonzero(np.select([dist[i,:]<r],[dist[i,:]]))+1)
#integral of rho from r to infinity
def rho_barre(rho, r):
return quad(rho, r, np.inf)
# integral over r of a certain integrand
def grad_F(i, j, rho, v, v_r, dist):
return quad(lambda r : ((v(dist, r, i)+v(dist, r, j))/2-v_r)*rho_barre(rho, max(r, dist[i,j])), 0, np.inf)
#parameters
delta_T = 0.1
rho = (lambda x: exp(-x))
v_r =0
for t in range (1000):
for i in range(N):
for j in range(N):
d_matrix[i,j] = d_matrix[i,j] + delta_T* grad_F(i,j,rho, v, v_r, d_matrix)
First I have the following error can't multiply sequence by non-int of type 'float' that I don't understand why. Then, I know that three loops are too much in python and I want to know how can we make it faster in Python.

It sounds like you have a few different questions. Let me see if I can answer more abstractly and you can piece it together
Parallel
One very easy easy way to work in parallel in Python is multiprocessing
If you apply the same function many times, instead of:
res = [myfun(arg) for arg in args]
you can do:
import multiprocessing as mp
with mp.Pool() as pool:
res = pool.map(myfun,args)
There are limitations. Both myfun and args must be pickleable (which lambda is not so you will want to address that in your code)
Nested Loops
In general, python loops are slow. When working with NumPy, it is better to "vectorize" if you can.
So instead of woking on each [i,j] element of d_matrix, see if you can work on them all at the same time. So compute a matrix grad_F (rather than a function) and add it. You will still need your time loop but you may be able to solve your d_matrix in a single, very fast, action.
Other tips:
Can you precompute rho_barre. Maybe use scipy.integrate.cumtrapz to compute that?
Also, try to write fewer one-liners. Use new functions instead of lambdas. It will make understanding your code much easier!

Related

Creating a numpy array from repeating a function n times with random behavior inside the function without for loops

I am doing optimizations in my code and would like to remove this for loop if possible. I have the following code:
def f(x):
return np.pi * x * np.cos((np.pi / 2) * x ** 2)
def monte_carlo(b, n):
xrand = np.random.uniform(0, b, (1, n))
integral = np.sum(f(xrand))
return b/n * integral
data = np.array([monte_carlo(b, n) for _ in range(100)])
In this code b is the upper limit (usually between 0 and 16) and n is the number of samples used (usually between 10E4 and 10E7). The monte carlo function is used for a basic monte carlo integration of the integral between 0 and b of the function f(x).
What I'm trying to do is to remove the for loop in data. I know it's not really a performance issue, but my assignment asks to try and do performance optimizations. I'm using a comprehension right now which is a little faster than a normal for loop but I think there's more optimization that can be done, pythons for loops are rather slow so I'm trying to get rid of them first.
I've looked at functions like numpy.repeat and numpy.tile but they don't work since that only runs the function once and then keeps repeating that result of the function. I've also looked at numpy.fromfunction, but I can't seem to get that to work and don't think I can use that. The documentation on numpy.fromfunction says "Construct an array by executing a function over each coordinate". I don't think that's particularly useful since the arguments for the functions stay the same. It's the random element inside the function that creates the difference in each iteration.
If anyone knows how I can solve this that would be very much appreciated, or has a different numpy function I didn't find during my google search that does what I need.
Thanks in advance!

You can basically do all the 100 iterations or how many ever you want in a single shot. See the code below-
def f(x):
return np.pi * x * np.cos((np.pi / 2) * x ** 2)
def monte_carlo(b, n, N):
xrand = np.random.uniform(0, b, (N, n))
integral = np.sum(f(xrand), axis=1)
return b / n * integral
# data = np.array([monte_carlo(b, n) for _ in range(100)])
data = monte_carlo(b, n, 100).T

FFT polynomial multiplication in Python using inbuilt Numpy.fft

I want to multiply two polynomials fast in python. As my polynomials are rather large (> 100000) elements and I have to multiply lots of them. Below, you will find my approach,
from numpy.random import seed, randint
from numpy import polymul, pad
from numpy.fft import fft, ifft
from timeit import default_timer as timer
length=100
def test_mul(arr_a,arr_b): #inbuilt python multiplication
c=polymul(arr_a,arr_b)
return c
def sb_mul(arr_a,arr_b): #my schoolbook multiplication
c=[0]*(len(arr_a) + len(arr_b) - 1 )
for i in range( len(arr_a) ):
for j in range( len(arr_b) ):
k=i+j
c[k]=c[k]+arr_a[i]*arr_b[j]
return c
def fft_test(arr_a,arr_b): #fft based polynomial multuplication
arr_a1=pad(arr_a,(0,length),'constant')
arr_b1=pad(arr_b,(0,length),'constant')
a_f=fft(arr_a1)
b_f=fft(arr_b1)
c_f=[0]*(2*length)
for i in range( len(a_f) ):
c_f[i]=a_f[i]*b_f[i]
return c_f
if __name__ == '__main__':
seed(int(timer()))
random=1
if(random==1):
x=randint(1,1000,length)
y=randint(1,1000,length)
else:
x=[1]*length
y=[1]*length
start=timer()
res=test_mul(x,y)
end=timer()
print("time for built in pol_mul", end-start)
start=timer()
res1=sb_mul(x,y)
end=timer()
print("time for schoolbook mult", end-start)
res2=fft_test(x,y)
print(res2)
#########check############
if( len(res)!=len(res1) ):
print("ERROR");
for i in range( len(res) ):
if( res[i]!=res1[i] ):
print("ERROR at pos ",i,"res[i]:",res[i],"res1[i]:",res1[i])
Now, here are my approach in detail,
1. First, I tried myself with a naive implementation of Schoolbook with complexity O(n^2). But as you may expect it turned out to be very slow.
Second, I came to know polymul in the Numpy library. This function is a lot faster than the previous one. But I realized this is also a O(n^2) complexity. You can see, if you increase the length k the time increases by k^2 times.
My third approach is to try a FFT based multiplication using the inbuilt FFT functions. I followed the the well known approach also described here but Iam not able to get it work.
Now my questions are,
Where am I going wrong in my FFT based approach? Can you please tell me how can I fix it?
Is my observation that polymul function has O(n^2) complexity correct?
Please, let me know if you have any question.
Thanks in advance.

Where am I going wrong in my FFT based approach? Can you please tell me how can I fix it?
The main problem is that in the FFT based approach, you should be taking the inverse transform after the multiplication, but that step is missing from your code. With this missing step your code should look like the following:
def fft_test(arr_a,arr_b): #fft based polynomial multiplication
arr_a1=pad(arr_a,(0,length),'constant')
arr_b1=pad(arr_b,(0,length),'constant')
a_f=fft(arr_a1)
b_f=fft(arr_b1)
c_f=[0]*(2*length)
for i in range( len(a_f) ):
c_f[i]=a_f[i]*b_f[i]
return ifft(c_f)
Note that there may also a few opportunities for improvements:
The zero padding can be handled directly by passing the required FFT length as the second argument (e.g. a_f = fft(arr_a, length))
The coefficient multiplication in your for loop may be directly handled by numpy.multiply.
If the polynomial coefficients are real-valued, then you can use numpy.fft.rfft and numpy.fft.irfft (instead of numpy.fft.fft and numpy.fft.ifft) for some extra performance boost.
So an implementation for real-valued inputs may look like:
from numpy.fft import rfft, irfft
def fftrealpolymul(arr_a, arr_b): #fft based real-valued polynomial multiplication
L = len(arr_a) + len(arr_b)
a_f = rfft(arr_a, L)
b_f = rfft(arr_b, L)
return irfft(a_f * b_f)
Is my observation that polymul function has O(n2) complexity correct?
That also seem to be the performance I am observing, and matches the available code in my numpy installation (version 1.15.4, and there doesn't seem any change in that part in the more recent 1.16.1 version).

Linear combination of function objects in python

Problem: I want to numerically integrate a function f(t,N) that may be written as a linear combination of N other known functions g_1(t), ..., g_N(t).
My Solution I: I know the functions g_i and also the coefficients, so my initial idea was to create an row vector of coefficients and a column vector containing the lambda functions g_i and then use np.dot for the inner product to get the function object I want. Unfortunately, you cannot just add two function objects nor multiply a function object by a scalar.
My Solution II: Of course I can do something like (basically defining point wise what I want):
def f(t,N,a,g):
"""
a = numpy array of coefficients
g = numpy array of lambda functions corresponding to functions g_i
"""
res = 0
for i in xrange(N):
res += a[i] * g[i](t)
return res
But the for loop is of course not very great, especially when:
I need to run this function at many many time steps t
I pass this function f into a numerical integration routine like scipy.integrate.quad.

briefly:
In Cython You could speed up indexing using memoryviews.
If these equations are linear You could superimpose them using sympy:
example:
import sympy as sy
x,y = sy.symbols('x y')
g0 = x*0.33 + 6
g1 = x*0.72 + 1.3
g2 = x*11.2 - 6.5
gn = x*3.3 - 7.3
G = [g0,g1,g2,gn]
#this is superimposition
print sum(G).subs(x,15.1)
print sum(gi.subs(x,15.1) for gi in G)
'''
output:
228.305000000000
228.305000000000
'''
If its not what You want, give some example input and output, so that I can try and dont go blind...
With low ram avaiable You could get finall equation to numexpr and evaluate it with some input. Otherwise its best to work on numpy arrays.

Finding zeros of equation using python

I'm trying to write code that will find n, in this equation.
with the rest as user defined variables.
from scipy.optimize import fsolve
from scipy.stats import t
def f(alpha, beta, sigma, delta, eps):
n = ((t.ppf(1-alpha,2*n-2) + t.ppf((1-beta)/2,2*n-2))**2*sigma**2)/(2* (delta-abs(eps))**2)
I'd also like to be able to set up different scenarios of parameters and then have it output a table of the parameters and the results (e.g., input alpha1, alpha2, beta1, beta2 etc. and get out [alpha1, beta1,..., n],[alpha1, beta2,...,n]). I'm not quite sure what the best way to do that would be if anyone can genrally point me in the right direction.

By the looks of your equation you are trying to find the number of observations (n) that satisfy the statistical test equation. If that is the case, then n are natural numbers (0, 1, 2..etc.) and are easily iterable.
You could set up a solver yourself, where you have n as the iterable and the equation with result as the "result" of your equation:
for n in range(0, 1000):
result = your_function(n, other_parameters)
Then you simply need to check if the equation is satisfied by setting:
if n >= result:
print "result:", n
break # This will exit the loop
What comes to testing different user given parameters, you can set up another loop which iterates different values for alpha, beta and so on.

python numpy optimization n-dimensional projection

I am relatively new to python and am interested in any ideas to optimize and speed up this function. I have to call it tens~hundreds of thousands of times for a numerical computation I am doing and it takes a major fraction of the code's overall computational time.
I have written this in c, but I am interested to see any tricks to make it run faster in python specifically.
This code calculates a stereographic projection of a bigD-length vector to a littleD-length vector, per http://en.wikipedia.org/wiki/Stereographic_projection. The variable a is a numpy array of length ~ 96.
import numpy as np
def nsphere(a):
bigD = len(a)
littleD = 3
temp = a
# normalize before calculating projection
temp = temp/np.sqrt(np.dot(temp,temp))
# calculate projection
for i in xrange(bigD-littleD + 2,2,-1 ):
temp = temp[0:-1]/(1.0 - temp[-1])
return temp
#USAGE:
q = np.random.rand(96)
b = nsphere(q)
print b

This should be faster:
def nsphere(a, littleD=3):
a = a / np.sqrt(np.dot(a, a))
z = a[littleD:].sum()
return a[:littleD] / (1. - z)
Please do the math to double check that this is in fact the same as your iterative algorithm.
Obviously the main speedup here is going to come from the fact that this is a O(n) algorithm that replaces your O(n**2) algorithm for computing the projection. But specifically to speeding things up in python, you want to "vectorize your inner loop". Meaning try and avoid loops and anything else that is going to have high python overhead in the most performance critical parts of your code and instead try and use python and numpy builtins which are highly optimized. Hope that helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parallelize a computation? - python

Related

Creating a numpy array from repeating a function n times with random behavior inside the function without for loops

FFT polynomial multiplication in Python using inbuilt Numpy.fft

Linear combination of function objects in python

Finding zeros of equation using python

python numpy optimization n-dimensional projection

Categories

Resources