Let's say I have a set of data points called signal and I want to integrate it twice with respect to time (i.e., if signal was acceleration, I'd like to integrate it twice w.r.t. time to get the position). I can integrate it once using simps but the output here is a scalar. How can you numerically integrate a (random) data set twice? I'd imagine it would look something like this, but obviously the inputs are not compatible after the first integration.
n_samples = 5000
t_range = np.arange(float(n_samples))
signal = np.random.normal(0.,1.,n_samples)
signal_integration = simps(signal, t_range)
signal_integration_double = simps(simps(signal, t_range), t_range)
Any help would be appreciated.
Sorry I answered too fast. scipy.integrate.simps give the value of the integration over the range you give it, similar to np.sum(signal).
What you want is the integration beween the start and each data point, which is what cumsum does. A better method could be scipy.integrate.cumtrapz. You can apply either method twice to get the result you want.
See:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simps.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.cumtrapz.html
Original answer:
I think you want np.cumsum. Integration of discrete data is just a sum. You have to multiply the result by the step value to get the correct scale.
See https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.cumsum.html
By partial integration you get from y''=f to
y(t) = y(0) + y'(0)*t + integral from 0 to t of (t-s)*f(s) ds
As you seem to assume that y(0)=0 and also y'(0)=0, you can thus get the the desired integral value in one integration as
simps((t-t_range)*signal, t_range)
Related
Recently I wanted to demonstrate generating a continuous random variable using the universality of the Uniform. For that, I wanted to use the combination of numpy and matplotlib. However, the generated random variable seems a little bit off to me - and I don't know whether it is caused by the way in which NumPy's random uniform and vectorized works or if I am doing something fundamentally wrong here.
Let U ~ Unif(0, 1) and X = F^-1(U). Then X is a real variable with a CDF F (please note that the F^-1 here denotes the quantile function, I also omit the second part of the universality because it will not be necessary).
Let's assume that the CDF of interest to me is:
then:
According to the universality of the uniform, to generate a real variable, it is enough to plug U ~ Unif(0, 1) in the F-1. Therefore, I've written a very simple code snippet for that:
U = np.random.uniform(0, 1, 1000000)
def logistic(u):
x = np.log(u / (1 - u))
return x
logistic_transform = np.vectorize(logistic)
X = logistic_transform(U)
However, the result seems a little bit off to me - although the histogram of a generated real variable X resembles a logistic distribution (which simplified CDF I've used) - the r.v. seems to be distributed in a very unequal way - and I can't wrap my head around exactly why it is so. I would be grateful for any suggestions on that. Below are the histograms of U and X.
You have a large sample size, so you can increase the number of bins in your histogram and still get a good number samples per bin. If you are using matplotlib's hist function, try (for exampe) bins=400. I get this plot, which has the symmetry that I think you expected:
Also--and this is not relevant to the question--your function logistic will handle a NumPy array without wrapping it with vectorize, so you can save a few CPU cycles by writing X = logistic(U). And you can save a few lines of code by using scipy.special.logit instead of implementing it yourself.
I'm computing the first and second derivatives of a signal and then plot. I chose the Savitzky-Golay filter as implemented in SciPy (signal module). I'm wondering if the output needs to be scaled - in the Matlab implementation of the same filter, it is specified that scaling is needed on the output of the filter:
savitzkyGolayFilt(X,N,DN,F) filters the signal X using a
Savitzky-Golay (polynomial) filter. The polynomial order, N, must be
less than the frame size, F, and F must be odd. DN specifies the
differentiation order (DN=0 is smoothing). For a DN higher than zero,
you'll have to scale the output by 1/T^DN to acquire the DNth smoothed
derivative of input X, where T is the sampling interval.
However, I didn't find anything similar in SciPy's documentation. Has anybody tried and knows if the output in Python is correct and needs no further scaling? The line of code I'm running for the first derivative is this one: first_deriv = signal.savgol_filter(spectra_signal,sigma=7,2, deriv=1, delta=3.1966) The spectra_signal is my "y" variable and delta is the variation of "x" variable.
Also, I tried to compute the first derivative without using the savgol_filter, but using np.diffon the smoothed signal instead (based on the formula derivative = dy/dx).first_deriv_alternative = np.diff(signal.savgol_filter(spectra_signal, sigma=7,2))/3.1966. And the results are not the same.
Working code example:
import numpy as np
from scipy import signal
x =[405.369888, 408.561553, 411.753217, 414.944882, 418.136547, 421.328212, 424.519877, 427.711541, 430.903206]
y =[5.001440644264221191e-01,
4.990128874778747559e-01,
4.994551539421081543e-01,
5.002806782722473145e-01,
5.027571320533752441e-01,
5.053851008415222168e-01,
5.082427263259887695e-01,
5.122825503349304199e-01,
5.167465806007385254e-01]
#variation of x variable, constant step
sampling_step = x[1]-x[0]
#Method 1: using savgol_filter
deriv1_method1 = signal.savgol_filter(y,5,2,deriv=1, delta=sampling_step)
#Method 2: using np.diff to compute the derivative of the filtered original data
dy=np.diff(signal.savgol_filter(y, 5,2))
dx=np.diff(x)
deriv1_method2=dy/dx
#Method 3: filtering the first derivative of the original data
deriv1_method3=signal.savgol_filter((np.diff(y)/np.diff(x)), 5,2)
Under the hood signal.savgol_filter uses signal.savgol_coeffs if you look a the source code it says that "The coefficient assigned to y[deriv] scales the result to take into account the order of the derivative and the sample spacing". The results are hance scaled before performing the fitting and the convolve1d. So by default, it seems that the results are already scaled taking into account the order of derivatives.
I think that performing the derivative after computing Savitzky-Golay filter won't give you the same results because in this case, you are computing the derivative on the spectrum already filtered, while in the first case you are performing the derivative before performing the fitting and the scaling.
To check some of my results more easily I used an Excel sheet to make a few diagrams. However, I noticed something really awkward.
EDIT :
So let present the problem in another way, I found something that represent what I don't understand in my code.
import numpy as np
from scipy.integrate import odeint
A = []
def F(y, z):
global A
a = y[0]
b = y[1]
A.append(a)
return [a, b]
y0 = [1, 1]
z = np.linspace(0, 1, 101)
y = odeint(F, y0, z)
print(len(z), len(A))
The question is why the length of z and A are different (e.g. 101 and 55)?
For me ,during the solving, a should vary len(z) times and so A. So it looks like the linspace is not doing anything on the solving of the equations. Or perhaps I haven't understood the usage of linspace in Python.
The solution via odeint uses an implicit linear multi-step method with adaptive internal time stepping. This is implemented via a PECE predictor-corrector scheme. The E there stands for "evaluation". Which means that in each internal integration step, the ODE function is called twice. You might get less internal steps than the input time list has entries, the output array is interpolated from the internal time steps, so that you can have multiple output values per internal step. But the other extreme is also possible, that to reach the requested tolerances the internal step size is so small that one output time step requires multiple internal steps.
If the problem were more stiff, there would be even more calls, periodically for the numerical approximation of the Jacobian, and possibly multiple calls per step of the Newton-like corrector step or just multiple simple correction steps, which is then called PE(CE)d.
To compare with, look at the explicit RK4 method. There you have 4 evaluations of the ODE function per time step. The Dormand-Prince method of ode45 has 6+1 evaluations per time step, however there the internal time steps need not correspond to the time sample list passed to the method, the requested output samples are interpolated from the internal steps.
I'm currently making the switch from MATLAB to Python for a project that involves solving differential equations.
In MATLAB if the t array that's passed only contains two elements, the solver outputs all the intermediate steps of the simulation. However, in Python you just get the start and end point. To get time points in between you have to explicitly specify the time points you want.
from scipy import integrate as sp_int
import numpy as np
def odeFun(t,y):
k = np.ones((2))
dy_dt = np.zeros(y.shape)
dy_dt[0]= k[1]*y[1]-k[0]*y[0]
dy_dt[1]=-dy_dt[0]
return(dy_dt)
t = np.linspace(0,10,1000)
yOut = sp_int.odeint(odeFun,[1,0],t)
I've also looked into the following method:
solver = sp_int.ode(odefun).set_integrator('vode', method='bdf')
solver.set_initial_value([1,0],0)
dt = 0.01
solver.integrate(solver.t+dt)
However, it still requires an explicit dt. From reading around I understand that Python's solvers (e.g. 'vode') calculates intermediate steps for the dt requested, and then interpolates that time point and outputs it. What I'd like though is to get all these intermediate steps directly without the interpolation. This is because they represent the minimum number of points required to fully describe the time series within the integration tolerances.
Is there an option available to do that?
I'm working in Python 3.
scipy.integrate.odeint
odeint has an option full_output that allows you to obtain a dictionary with information on the integration, including tcur which is:
vector with the value of t reached for each time step. (will always be at least as large as the input times).
(Note the second sentence: The actual steps are always as fine as your desired output. If you want use the minimum number of necessary step, you must ask for a coarse sampling.)
Now, this does not give you the values, but we can obtain those by integrating a second time using these very steps:
from scipy.integrate import odeint
import numpy as np
def f(y,t):
return np.array([y[1]-y[0],y[0]-y[1]])
start,end = 0,10 # time range we want to integrate
y0 = [1,0] # initial conditions
# Function to add the initial time and the target time if needed:
def ensure_start_and_end(times):
times = np.insert(times,0,start)
if times[-1] < end:
times = np.append(times,end)
return times
# First run to establish the steps
first_times = np.linspace(start,end,100)
first_run = odeint(f,y0,first_times,full_output=True)
first_steps = np.unique(first_run[1]["tcur"])
# Second run to obtain the results at the steps
second_times = ensure_start_and_end(first_steps)
second_run = odeint(f,y0,second_times,full_output=True,h0=second_times[0])
second_steps = np.unique(second_run[1]["tcur"])
# ensuring that the second run actually uses (almost) the same steps.
np.testing.assert_allclose(first_steps,second_steps,rtol=1e-5)
# Your desired output
actual_steps = np.vstack((second_times, second_run[0].T)).T
scipy.integrate.ode
Having some experience with this module, I am not aware of any way to obtain the step size without digging deeply into the internals.
Is it possible to perform "pseudoexperiments" using PyMC?
By pseudoexperiments, I mean generating random "observations" by sampling from the prior, and then, given each pseudoexperiment, drawing samples from the posterior. Afterwards, one would compare the trace for each parameter to the sample (obtained from the prior) used in sampling from the posterior.
A more concrete example: Suppose that I want to know the rate of process X. I count how many occurrences there are in a certain period of time. However, I know that process Y also sometimes occurs and will contaminate my count. The rate of process Y is known with some uncertainty. So, I build a model, include my observations, and sample from the posterior:
import pymc
class mymodel:
rate_x = pymc.Uniform('rate_x', lower=0, upper=100)
rate_y = pymc.Normal('rate_y', mu=150, tau=1./(15**2))
total_rate = pymc.LinearCombination('total_rate', [1,1], [rate_x, rate_y])
data = pymc.Poisson('data', mu=total_rate, value=193, observed=True)
Mod = pymc.Model(mymodel)
MCMC = pymc.MCMC(Mod)
MCMC.sample(100000, burn=5000, thin=5)
print MCMC.stats()['rate_x']['quantiles']
However, before I do my experiment (or before I "unblind" my analysis and look at my data), I would like to know how sensitive I expect to be -- what will be the uncertainty on my measurement of rate_x?
To answer this, I could sample from the prior
Mod.draw_from_prior()
but this only samples rate_x, rate_y, and calculates total_rate. But once the values of those are set by draw_from_prior(), I can draw a pseudoexperiment:
Mod.data.random()
This just returns a number, so I have to set the value of Mod.data to a random sample. Because Mod.data has the observed flag set, I have to also "force" it:
Mod.data.set_value(Mod.data.random(), force=True)
Now I can sample from the posterior again
MCMC.sample(100000, burn=500, thin=5)
print MCMC.stats()['rate_x']['quantiles']
All this works, so I suppose the simple answer to my question is "yes". But it feels very hacky. Is there a better or more natural way to accomplish this?