Multiprocessing and scipy (dblquad)

Multiprocessing and scipy (dblquad) - python

I am trying to speed up the following code in python:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from scipy import integrate
import camb
from tqdm import tqdm
import os
#Reading a PS
dir = os.getcwd()
data = np.loadtxt(dir+"/ps1-peacock.txt")
kh = data[:,0]
p_lin = data[:,1]
p_nlin = data[:,2]
p_linear = interpolate.interp1d(kh,p_lin)
#Integrand of P22
def upper_mu(x):
return min(1.0,(kk**2 + np.exp(2*x))/(2*kk*np.exp(x)))
def lower_mu(x):
return max(-1.0,-(kk**2+np.exp(x))/(2*kk*np.exp(x)))
def mulow(x):
return max(-1.0,(kh[-1]**2.0-kk**2.0-np.exp(x)**2.0)/(-2.0*kk*np.exp(x)))
def muhigh(x):
return min(1.0,(kh[0]**2.0-kk**2.0-np.exp(x)**2.0)/(-2.0*kk*np.exp(x)))
def f22(mu,q,k):
r = np.exp(q)/k
F = (7.0*mu+(3.0-10.0*mu**2)*r)/(14.0*r*(r**2-2.0*mu*r+1.0))
psik = (k**2+np.exp(2*q)-2.0*k*mu*np.exp(q))**0.5
if (psik>kh[0] and psik<kh[-1]):
return 1.0/2.0/np.pi**2.0*np.exp(3*q)*p_linear(np.exp(q))*p_linear(psik)*F**2
else:
return 0
P22 = np.zeros_like(kh)
error = np.zeros_like(kh)
for i in tqdm(range(0,np.shape(kh)[0])):
kk = kh[i]
P22[i], error[i] = integrate.dblquad(f22,np.log(kh[0]),np.log(kh[-1]),mulow,muhigh,args=(kh[i],),epsrel=1e-3, epsabs=50)[:2]
Here follows the integral in text for reasons of clarity:
I would like to use multiprocessing to improve the performance of dblquad(). Does anyone know how can I implement it in this specific case?

Multiprocessing won't help here, you cannot split the dblquad work between python processes.
If you have several integrals to compute, then yes, you can split integrals between processes. Whether this is worth it strongly depends on the amount of work there is for each process.

Related

How can i use a function who use other function in multiprocess.pool.starmap without passing it to the initializer?

The following is the main body of my code
import multiprocess
import random
from multiprocess import Pool
import multi_process_file
import time
import Function_file
import numpy as np
start = time.perf_counter()
if __name__ == '__main__':
multiprocess.freeze_support()
with multiprocess.Pool(processes=6,initializer = Initializer,initargs=(Norm_lists,points_lists,function_lists,gradient_lists,base_func_lists)) as pool:
results = pool.starmap(test_func,zip(list(total_index[:,0]),list(np.zeros((len(total_index))).astype(np.int32))))
print(results)
end = time.perf_counter()
And the test_func is:
def test_func(e,j):
import quadpy
import numpy as np
scheme = quadpy.t2.get_good_scheme(6)
global Norm_list
return np.sum(scheme.integrate(lambda x:function_lists[0](x,e)*gradient_lists[j](x,e),points_lists[e]))
In the function_lists are the functions I need to use,and all of them need to call another function.
def schied_stand_phi_1(X,i):
import numpy as np
return schied_phi_1(X,i)/np.sqrt(Norm_lists[i][0])
def schied_phi_1(X,i):
x = X[0]
t = X[1]
return (x+2)/(x+2)
And error is that
name 'schied_phi_1' is not defined
I suppose it would work if i pass the all functions needed for the function in the function_lists to the initializer function when I initialize the pool, but i want to know if there are other ways?
I am using windows 10 and Jupyter lab,python 3.9, any help would be appreciated.

How to use Zero-skewness log transform in Python

How to do zero-skewness log transform in Python?
For example in Stata it is implemented in lnskew0 (see https://www.stata.com/manuals13/rlnskew0.pdf).
I didn't find an implementation in Python. Is anyone aware of an implementation?
Otherwise, a first try would be:
from scipy.stats import skew
import numpy as np
from scipy.optimize import root_scalar
def lnskew0(x):
def skew_ln(k):
return skew(np.log(x - k))
res = root_scalar(
skew_ln,
bracket=[-x.min(), x.max()*0.99999],
method='bisect'
)
return np.log(x - res.root)
Works fine on numpy arrays with only positive numbers. How is Stata's lnskew0 implemented that it works with negative numbers as well?

I gave it another try so that it works with negative numbers as well:
from scipy.stats import skew
import numpy as np
from scipy.optimize import root_scalar
def lnskew0(x):
x0 = x + 1
def skew_ln_pos(k):
with np.errstate(all='ignore'):
return skew(np.log(x0 - k))
res_pos = root_scalar(
skew_ln_pos,
bracket=[-150, 150],
method='bisect'
)
def skew_ln_neg(k):
with np.errstate(all='ignore'):
return skew(np.log(-x0 - k))
res_neg = root_scalar(
skew_ln_neg,
bracket=[-150, 150],
method='bisect'
)
res = (res_pos.root - 1, res_neg.root + 1)
lnskew0_res = (
np.log(x - res[0]),
np.log(-x - res[1])
)
whichmin = np.nanargmin([abs(skew(x)) for x in lnskew0_res])
return lnskew0_res[whichmin]
Note: It still has one issue. The bracket of the root_scalar needs to be choosen manually.

I have converted PCA related matlab code into python code, how may i correct the last line code of python?

I have made PCA related code in python, which is converted from Matlab code to python code but last line code is not working.
how may I correct it?
MatLAB Code:
[coeff,score,~,~,explained] = pca(train);
sm = 0;
no_components = 0;
for k = 1:size(explained,1)
sm = sm+explained(k);
if sm <= 99.4029
no_components= no_components+1;
end
end
m = mean(train,1);
mat1 = score(:,1:no_components);
Python Code:
Reference: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.score
import numpy as np
import cv2
import os
from sklearn.decomposition import PCA
[x,y] = trainData.shape
pca = PCA(n_components=(x-1))
varPca = pca.fit(trainData)
explainedVariance = pca.explained_variance_ratio_*100
sm = 0
no_components = 0
for k in range(0, x-1):
sm = sm+explainedVariance[k]
if sm <= 99.4029:
no_components= no_components+1
print(no_components)
m = trainData.mean()
print(m)
mat1 = score(trainData[:,0:no_components])
Here score function is not performing well.
How may I correct it?

score is the method of the class PCA(). As such it can only be called on a PCA() object.
In your case pca is an object of the class PCA(). Therefore you can call pca.score().
However, calling score() by itself supposes that that is a function score() defined somewhere, which is not the case. This is why you get a NameError.

Python 3: Sympy: Include list information to optimize lambdify

I use lambdify to compile an expression which is a function of certain parameters. Each parameter has N points. So I need to evaluate the expression N times. The following shows a simplified example on how this is done.
import numpy as np
from sympy.parsing.sympy_parser import parse_expr
from sympy.utilities.lambdify import lambdify, implemented_function
from sympy import S, Symbol
from sympy.utilities.autowrap import ufuncify
def CreateMagneticFieldsList(dataToSave,equationString,DSList):
expression = S(equationString)
numOfElements = len(dataToSave["MagneticFields"])
#initialize the magnetic field output array
magFieldsArray = np.empty(numOfElements)
magFieldsArray[:] = np.NaN
lam_f = lambdify(tuple(DSList),expression,modules='numpy')
try:
for i in range(numOfElements):
replacementList = np.zeros(len(DSList))
for j in range(len(DSList)):
replacementList[j] = dataToSave[DSList[j]][i]
try:
val = np.double(lam_f(*replacementList))
except:
val = np.nan
magFieldsArray[i] = val
except:
print("Error while evaluating the magnetic field expression")
return magFieldsArray
list={"MagneticFields":list(range(10000)), "Chx":list(range(10000))}
out=CreateMagneticFieldsList(list,"MagneticFields*5+Chx",["MagneticFields","Chx"])
print(out)
Is there a way to optimize this call further? Specifically, I mean is there a way to make lambdify include that I'm calculating for a list of points, so that the loop evalulation can be optimized?

Thanks to #asmeurer, he gave the idea on how to do it.
Since lambdify is compiled using numpy, then one could simply pass the lists as arguments! The following is a working example
#!/usr/bin/python3
import numpy as np
from sympy.parsing.sympy_parser import parse_expr
from sympy.utilities.lambdify import lambdify, implemented_function
from sympy import S, Symbol
from sympy.utilities.autowrap import ufuncify
def CreateMagneticFieldsListOpt(dataToSave,equationString,DSList):
expression = S(equationString)
numOfElements = len(dataToSave["MagneticFields"])
#initialize the magnetic field output array
magFieldsArray = np.empty(numOfElements)
magFieldsArray[:] = np.NaN
lam_f = lambdify(tuple(DSList),expression,modules='numpy')
replacementList = [None]*len(DSList)
for j in range(len(DSList)):
replacementList[j] = np.array(dataToSave[DSList[j]])
print(replacementList)
magFieldsArray = np.double(lam_f(*replacementList))
return magFieldsArray
list={"MagneticFields":[1,2,3,4,5],"ChX":[2,4,6,8,10]}
out=CreateMagneticFieldsListOpt(list,"MagneticFields*5+ChX",["MagneticFields","ChX"])
print(out)

Python's random module made inaccessible

When I call random.sample(arr,length) an error returns random_sample() takes at most 1 positional argument (2 given).I've tried importing numpy under a different name, which doesn't fix the problem.Any thoughts? Thanks
import numpy.random
import random
import numpy as np
from numpy import *
points = [[1,1],[1.5,2],[3,4],[5,7],[3.5,5],[4.5,5], [3.5,4]]
def cluster(X,center):
clusters = {}
for x in X:
z= min([(i[0], np.linalg.norm(x-center[i[0]])) for i in enumerate(center)], key=lambda t:t[1])
try:
clusters[z].append(x)
except KeyError:
clusters[z]=[x]
return clusters
def update(oldcenter,clusters):
d=[]
r=[]
newcenter=[]
for k in clusters:
if k[0]==0:
d.append(clusters[(k[0],k[1])])
else:
r.append(clusters[(k[0],k[1])])
c=np.mean(d, axis=0)
u=np.mean(r,axis=0)
newcenter.append(c)
newcenter.append(u)
return newcenter
def shouldStop(oldcenter,center, iterations):
MAX_ITERATIONS=0
if iterations > MAX_ITERATIONS: return True
u=np.array_equal(center,oldcenter)
return u
def init_board(N):
X = np.array([(random.uniform(1,4), random.uniform(1, 4)) for i in range(4)])
return X
def kmeans(X,k):
clusters={}
iterations = 0
oldcenter=([[],[]])
center = random.sample(X,k)
while not shouldStop(oldcenter, center, iterations):
# Save old centroids for convergence test. Book keeping.
oldcenter=center
iterations += 1
clusters=cluster(X,center)
center=update(oldcenter,clusters)
return (center,clusters)
X=init_board(4)
(center,clusters)=kmeans(X,2)
print "center:",center
#print "clusters:", clusters

When you use from numpy import * you import all items that are in the numpy namespace into your scripts namespace. When you do this any functions/variables/etc that have the same name will be overwritten by the items from the numpy namespace.
numpy has a subpackage called numpy.random which then overwrote your random import as the code below shows:
import random
# Here random is the Python stdlib random package.
from numpy import *
# As numpy has a random package, numpy.random,
# random is now the numpy.random package as it has been overwritten.
In this case you should instead use import numpy as np which will allow you to access both:
import random
# random now contains the stdlib random package
import numpy as np
# np.random now contains the numpy.random package

Your points is List type, so you should convert it to an array
points = np.asarray(point)
Also, you should use import random

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing and scipy (dblquad) - python

Multiprocessing won't help here, you cannot split the dblquad work between python processes. If you have several integrals to compute, then yes, you can split integrals between processes. Whether this is worth it strongly depends on the amount of work there is for each process.

Related

How can i use a function who use other function in multiprocess.pool.starmap without passing it to the initializer?

How to use Zero-skewness log transform in Python

I have converted PCA related matlab code into python code, how may i correct the last line code of python?

Python 3: Sympy: Include list information to optimize lambdify

Python's random module made inaccessible

Categories

Resources