I would like to numerically integration a function using multiple cpus in python. I would like to do something like:
from scipy.integrate import quad
import multiprocessing
def FanDDW(arguments):
wtq,eigq_files,DDB_files,EIGR2D_files,FAN_files = arguments
...
return tot_corr
# Numerical integration
def integration(frequency):
# Parallelize the work over cpus
pool = multiprocessing.Pool(processes=nb_cpus)
total = pool.map(FanDDW, zip(wtq,eigq_files,DDB_files,EIGR2D_files,FAN_files))
FanDDW_corr = sum(total)
return quad(FanDDW, -Inf, Inf, args=(zip(wtq,eigq_files,DDB_files,EIGR2D_files,FAN_files)))[0]
vec_functionint = vectorize(integration)
vec_functionint(3,arange(1.0,4.0,0.5))
Also "frequency" is a global variable (external to FanDDW(arguments)). It is a vector containing the position where the function must be evaluated. I guess that quad should choose frequency in a clever way. How to pass it to FanDDW knowing that it should NOT be distributed among CPUs and that pool.map does exactly that (it is the reason why I did put it as a global variable and did not pass it to the definition as argument).
Thank you for any help.
Samuel.
All classical quadrature rules have the form
The computation of the f(x_i) is typically the most costly, so if you want to use multiple CPUs, you'll have to think about how to design your f. The sum can be expressed as a scalar product <w, f(x_i)>, and when using numpy.dot for it, it uses threading on most architectures.
quadpy (a project of mine) calls your integrand with all points and once, so in f you have to chance to get fancy with the computations.
import quadpy
def f(x):
print(x.shape) # (1, 50)
return x[0] ** 2
scheme = quadpy.e1r2.gauss_hermite(50)
val = scheme.integrate(f)
print(val) # 0.886226925452758
Related
I'm trying to solve a dynamic food web with JiTCODE. One aspect of the network is that populations which undergo a threshold are set to zero. So I'm getting a not differentiable equation. Is there a way to implement that in JiTCODE?
Another similar problem is a Heaviside dependency of the network.
Example code:
import numpy as np
from jitcode import jitcode, y, t
def f():
for i in range(N):
if i <5:
#if y(N-1) > y(N-2): #Heavyside, how to make the if-statement
#yield (y(i)*y(N-2))**(0.02)
#else:
yield (y(i)*y(N-1))**(0.02)
else:
#if y(i) > thr:
#yield y(i)**(0.2) #?? how to set the population to 0 ??
#else:
yield y(i)**(0.3)
N = 10
thr = 0.0001
initial_value = np.zeros(N)+1
ODE = jitcode(f)
ODE.set_integrator("vode",interpolate=True)
ODE.set_initial_value(initial_value,0.0)
Python conditionals will be evaluated during the code-generation and not during the simulation (using the generated code). Therefore you cannot use them here. Instead you need to use special conditional objects that provide a reasonably smooth approximation of a step function (or build such a thing yourself):
def f():
for i in range(N):
if i<5:
yield ( y(i)*conditional(y(N-1),y(N-2),y(N-2),y(N-1)) )**0.2
else:
yield y(i)**conditional(y(i),thr,0.2,0.3)
For example, you can treat conditional(y(i),thr,0.2,0.3) to be evaluated as 0.2 if y(i)>thr and 0.3 otherwise (at simulation time).
how to set the population to 0 ??
You cannot do such a discontinuous jump within JiTCODE or the framework of differential equations in general. Usually, you would use a sharp population decline to simulate this, possibly introducing a delay (and thus JiTCDDE). If you really need this, you can either:
Detect threshold crossings after each integration step and reinitialise the integrator with respective initial conditions. If you just want to fully kill populations that went below a reproductive threshold, this seems to be a valid solution.
Implement a binary-switch dynamical variable.
Also see this GitHub issue.
I have next first order differential equation (example):
dn/dt=A*n; n(0)=28
When A is constant, it is perfectly solved with python odeint.
But i have an array of different values of A from .txt file [not function,just an array of values]
A = [0.1,0.2,0.3,-0.4,0.7,...,0.0028]
And i want that in each iteration (or in each moment of time t) of solving ode A is a new value from array.
I mean that:
First iteration (or t=0) - A=0.1
Second iteration (or t=1) - A=0.2 and etc from array.
How can i do it with using python odeint?
Yes, you can to that, but not directly in odeint, as that has no event mechanism, and what you propose needs an event-action mechanism.
But you can separate your problem into steps, use inside each step odeint with the now constant A parameter, and then in the end join the steps.
T = [[0]]
N = [[n0]]
for k in range(len(A)):
t = np.linspan(k,k+1,11);
n = odeint(lambda u,t: A[k]*u, [n0],t)
n0 = n[-1]
T.append(t[1:])
N.append(n[1:])
T = np.concatenate(T)
N = np.concatenate(N)
If you are satisfied with less efficiency, both in the evaluation of the ODE and in the number of internal steps, you can also implement the parameter as a piecewise constant function.
tA = np.arange(len(A));
A_func = interp1d(tA, A, kind="zero", fill_value="extrapolate")
T = np.linspace(0,len(A)+1, 10*len(A)+11);
N = odeint(lambda u,t: A_func(t)*u, [n0], T)
The internal step size controller works on the assumption that the ODE function is well differentiable to 5th or higher order. The jumps are then seen via the implicit numerical differentiation inherent in the step error calculation as highly oscillatory events, requiring a very small step size. There is some mitigation inside the code that usually allows the solver to eventually step over such a jump, but it will require much more internal steps and thus function evaluations than the first variant above.
I would like to propagate uncertainty using python. This is relatively easy for simple functions via the uncertainties package. However, it is not that obvious to achieve the same with a user defined function. What follows is an example of what I am trying to do.
import mcerp as err
import numpy as np
def mult_func(x,xm ,a):
x[x==0.] = 1e-20
v = (1.-(xm/x)**a) * (x > xm)
v[np.isnan(v)] = 0.
return v
def intg(e,f,cut,s):
t = mult_func(e,cut,s)
res = np.trapz(t*f,e)
return res
x=np.linspace(0,1,10000)
y=np.exp(x)
m=0.
mm=0.
N=100000
for i in range(0,N):
cut=np.random.normal(0.21,0.02)
stg=np.random.normal(1.1,0.1)
v=intg(x,y,cut,stg)
m=m+v
mm=mm+v*v
print("avg. %10.5E +/- %10.5E fixed %10.5E"%(m/N,np.sqrt((mm/N-(m/N)**2)),intg(x,y,0.21,1.1)))
What is done above is just random sampling of two parameters and calculating the mean and the variance. I am not sure however, how much this brute-force method is adequate. I could use the law of large numbers and to try estimate how many trials N are needed to get a certain value (P=1-1/(N*k**2)) to be around k times standard deviations around the true mean.
In principle what I wrote could work. However, my assumption is that being such a flexible language with many powerful packages, python could do this task much more effectively. I was thinking about uncertainties, mcerp and pymc. Due to my limited experience using those packages, I am not sure how to proceed.
EDIT:
MY original example was not that much informative, this is why I decided to do a new example which actually works to illustrate my idea.
Numpy supports arrays of arbitrary numeric types. However, not all functions are supporter for arbitrary numeric types.
In this case, both numpy.exp and trapz are not supported.
Note that the uncertanties module contains the unumpy package. numpy.exp has a replacement here: uncertainties.unumpy.exp
We, define trapz as a ufunc.
Check it out here!
a=un.ufloat(0.3,0.01)
b=un.ufloat(1.2,0.071)
def sample_func(a: un.UFloat, b: un.UFloat) -> np.ndarray:
x=np.linspace(0,a,100)
y = un.unumpy.exp(x)
return utrapz(y, x)
def utrapz(y: np.ndarray, x: np.ndarray) -> np.ndarray:
Δx = x[1:]-x[:-1]
avg_y = (y[1:]+y[:-1])/2
return (Δx*avg_y)
print(sample_func(a, b))
OUT:
[0.00026601240063021264+/-nan 0.0005935120815465686+/-6.429403852670308e-06
0.0006973604419223405+/-3.888235103342809e-06 ...,
0.002095505706899622+/-6.503985178118233e-05
0.0021019968633076134+/-6.545802781649068e-05
0.0021084415802710295+/-6.587387316821736e-05]
Intro
There is a pattern that I use all the time in my Python code which analyzes
numerical data. All implementations seem overly redundant or very cumbersome or
just do not play nicely with NumPy functions. I'd like to find a better way to
abstract this pattern.
The Problem / Current State
A method of statistical error propagation is the bootstrap method. It works by
running the same analysis many times with slightly different inputs and look at
the distribution of final results.
To compute the actual value of ams_phys, I have the following equation:
ams_phys = (amk_phys**2 - 0.5 * ampi_phys**2) / aB - amcr
All the values that go into that equation have a statistical error associated
with it. These values are also computed from other equations. For instance
amk_phys is computed from this equation, where both numbers also have
uncertainties:
amk_phys_dist = mk_phys / a_inv
The value of mk_phys is given as (494.2 ± 0.3) in a paper. What I now do is
parametric bootstrap and generate R samples from a Gaussian distribution
with mean 494.2 and standard deviation 0.3. This is what I store in
mk_phys_dist:
mk_phys_dist = bootstrap.make_dist(494.2, 0.3, R)
The same is done for a_inv which is also quoted with an error in the
literature. Above equation is then converted into a list comprehension to yield
a new distribution:
amk_phys_dist = [mk_phys / a_inv
for a_inv, mk_phys in zip(a_inv_dist, mk_phys_dist)]
The first equation is then also converted into a list comprehension:
ams_phys_dist = [
(amk_phys**2 - 0.5 * ampi_phys**2) / aB - amcr
for ampi_phys, amk_phys, aB, amcr
in zip(ampi_phys_dist, amk_phys_dist, aB_dist, amcr_dist)]
To get the end result in terms of (Value ± Error), I then take the average and
standard deviation of this distribution of numbers:
ams_phys_val, ams_phys_avg, ams_phys_err \
= bootstrap.average_and_std_arrays(ams_phys_dist)
The actual value is supposed to be computed with the actual value coming in,
not the mean of this bootstrap distribution. Before I had the code replicated
for that, now I have the original value at the 0th position in the _dist
arrays. The arrays now contain 1 + R elements and the
bootstrap.average_and_std_arrays function will separate that element.
This kind of line occurs for every number that I might want to quote in my
writing. I got annoyed by the writing and created a snippet for it:
$1_val, $1_avg, $1_err = bootstrap.average_and_std_arrays($1_dist)
The need for the snippet strongly told me that I need to do some refactoring.
Also the list comprehensions are always of the following pattern:
foo_dist = [ ... bar ...
for bar in bar_dist]
It feels bad to write bar three times there.
The Class Approach
I have tried to make those _dist things a Boot class such that I would not
write ampi_dist and ampi_val but could just use ampi.val without having
to explicitly call this average_and_std_arrays functions and type a bunch of
names for it.
class Boot(object):
def __init__(self, dist):
self.dist = dist
def __str__(self):
return str(self.dist)
#property
def cen(self):
return self.dist[0]
#property
def val(self):
x = np.array(self.dist)
return np.mean(x[1:,], axis=0)
#property
def err(self):
x = np.array(self.dist)
return np.std(x[1:,], axis=0)
However, this still does not solve the problem of the list comprehensions. I
fear that I still have to repeat myself there three times. I could make the
Boot object inherit from list, such that I could at least write it like
this (without the _dist):
bar = Boot([... foo ... for foo in foo])
Magic Approach
Ideally all those list comprehensions would be gone such that I could just
write
bar = ... foo ...
where the dots mean some non-trivial operation. Those can be simple arithmetic
as above, but that could also be a function call to something that does not
support being called with multiple values (like NumPy function do support).
For instance the scipy.optimize.curve_fit function needs to be called a bunch of times:
popt_dist = [op.curve_fit(linear, mpi, diff)[0]
for mpi, diff in zip(mpi_dist, diff_dist)]
One would have to write a wrapper for that because it does not automatically loops over list of arrays.
Question
Do you see a way to abstract this process of running every transformation with
1 + R sets of data? I would like to get rid of those patterns and the huge
number of variables in each namespace (_dist, _val, _avg, ...) as this
makes passing it to function rather tedious.
Still I need to have a lot of freedom in the ... foo ... part where I need to
call arbitrary functions.
I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.
You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.
Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.
Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.