How to perform local polynomial fitting in Python - python

I have 200k data points and I'm trying to obtain derivative of fitted polynomial. I divided my data set into smaller ones every 0.5 K, the data is Voltage vs Temperature. My code roughly looks like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
testset=pd.read_csv('150615H0.csv',sep='\t')
x=np.linspace(1,220,219)
ub=min(testset['T(K)'])
lb=min(testset['T(K)'])-1
q={i:testset[(testset['T(K)'] < ub+i) & (testset['T(K)'] > lb+i)] for i in x}
f={j:np.polyfit(q[j]['T(K)'],q[j]['Vol(V)'],4) for j in q}
fs={k:np.poly1d(f[k]) for k in f}
fsd={l:np.polyder(fs[l],1) for l in fs}
for kk in q:
plt.plot(q[kk]['T(K)'],fsd[kk](q[kk]['T(K)']),color='blue',linewidth=2,label='fit')
Unsurprinsingly, the derivative is discontinuous and I don't like it. Is there any other way to fit polynomial locally and get continuous derivative at the same time ?

Have a look at the Savitzky-Gollay filter for an efficient local polynomial fitting.
It is implemented, for instance, in scipy.signal.savgol_filter. The derivative of the fitted polynomial can be obtained with the deriv=1 argument.

Related

Weighted 1D interpolation of cloud data point

I have a cloud of data points (x,y) that I would like to interpolate and smooth.
Currently, I am using scipy :
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
spl = interp1d(Cloud[:,1], Cloud[:,0]) # interpolation
x = np.linspace(Cloud[:,1].min(), Cloud[:,1].max(), 1000)
smoothed = savgol_filter(spl(x), 21, 1) #smoothing
This is working pretty well, except that I would like to give some weights to the data points given at interp1d. Any suggestion for another function that is handling this ?
Basically, I thought that I could just multiply the occurrence of each point of the cloud according to its weight, but that is not very optimized as it increases a lot the number of points to interpolate, and slows down the algorithm ..
The default interp1d uses linear interpolation, i.e., it simply computes a line between two points. A weighted interpolation does not make much sense mathematically in such scenario - there is only one way in euclidean space to make a straight line between two points.
Depending on your goal, you can look into other methods of interpolation, e.g., B-splines. Then you can use scipy's scipy.interpolate.splrep and set the w argument:
w - Strictly positive rank-1 array of weights the same length as x and y. The weights are used in computing the weighted least-squares spline fit. If the errors in the y values have standard-deviation given by the vector d, then w should be 1/d. Default is ones(len(x)).

Odd result in python scipy FFT

I have access scipy and want to create a FFT about simple Gaussian function which is exp(-t^2). And also it's well known that fourier transform of exp(−t^2) is √πexp(−π^2*k^2). But FFT of exp(-t^2) was not same as √πexp(−π^2*k^2).
I have tried the following code:
import scipy.fftpack as fft
from scipy import integrate
import numpy as np
import matplotlib.pyplot as plt
#FFT
N=int(1e+3)
T=0.01 #sample period
t = np.linspace(0,N*T, N)
h=np.exp(-t**2)
H_shift=2*np.abs(fft.fftshift(np.fft.fft(h)/N))
freq=fft.fftshift(fft.fftfreq(h.shape[0],t[1]-t[0]))
#Comparing FFT with fourier transform
def f(x):
return np.exp(-x**2)
def F(k):
return (np.pi**0.5)*np.exp((-np.pi**2)*(k**2))
plt.figure(num=1)
plt.plot(freq,F(freq),label=("Fourier Transform"))
plt.legend()
plt.figure(num=2)
plt.plot(freq,H_shift,label=("FFT"))
plt.legend()
plt.show()
#Checking Parseval's Theorm
S_h=integrate.simps(h**2,t)
#0.62665690150683084
S_H_s=integrate.simps(H_shift**2,freq)
#0.025215875346935791
S_F=integrate.simps(F(freq)**2,freq)
#1.2533141373154999
The graph I plotted is not same, also values of FFT do not follow Parseval's theorm. . It has to be S_H_s=S_h*2, but my result was not. I think that S_H_s which is result of FFT is wrong value Because of S_F=S_h*2.
Is there any problem in my code?? Help is greatly appreciated! Thanks in advance.
I suggest you plot your input signal h and verify that it looks like a Gaussian.
Spoiler alert: it doesn't, it is half a Gaussian!
By cutting it like this, you introduce a lot of high frequencies that you see in your plot.
To do this experiment correctly, follow this recipe to create your input signal:
t = np.linspace(-(N/2)*T,(N/2-1)*T, N)
h = np.exp(-t**2)
h = fft.ifftshift(h)
The ifftshift function serves to move the t=0 location to the leftmost array element. Note that t here is constructed carefully such that t=0 is exactly in the right place for this to work correctly, assuming an even-sized N. You can verify that fft.ifftshift(t)[0] is 0.0.

Estimating Posterior in Python?

I'm new to Bayesian stats and I'm trying to estimate the posterior of a poisson (likelihood) and gamma distribution (prior) in Python. The parameter I'm trying to estimate is the lambda variable in the poisson distribution. I think the posterior will take the form of a gamma distribution (conjugate prior?) but I don't want to leverage that. The only thing I'm given is the data (named "my_data"). Here's my code:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.stats
x=np.linspace(1,len(my_data),len(my_data))
lambda_estimate=np.mean(my_data)
prior= scipy.stats.gamma.pdf(x,alpha,beta) #the parameters dont matter for now
likelihood_temp = lambda yi, a: scipy.stats.poisson.pmf(yi, a)
likelihood = lambda y, a: np.log(np.prod([likelihood_temp(data, a) for data in my_data]))
posterior=likelihood(my_data,lambda_estimate) * prior
When I try to plot the posterior I get an empty plot. I plotted the prior and it looks fine, so I think the issue is the likelihood. I took the log because the data is fairly large and I didn't want things to get unstable. Can anyone point out the issues in my code? Any help would be appreciated.
In Bayesian statistics, one goal is to calculate the posterior distribution of the parameter (lambda) given the data and the prior over a range of possible values for lambda. In your code, you calculating the prior over the array x, but you are taking a single value for lambda to calculate the likelihood. The posterior and likelihood should be over x as well, something like:
posterior = [likelihood(my_data, lambda_i) for lambda_i in x] * prior
(assuming you are not taking the logs of the prior and likelihood)
You might want to take a look at the PyMC3 library.
I would recommend you to have a look at the conjugate_prior module.
You could just type:
from conjugate_prior import GammaPoisson
model = GammaPoisson(prior_a, prior_b)
model = model.update(...)
credible_interval = model.posterior(lower_bound, upper_bound)

Get statistical difference of correlation coefficient in python

To get the correlation between two arrays in python, I am using:
from scipy.stats import pearsonr
x, y = [1,2,3], [1,5,7]
cor, p = pearsonr(x, y)
However, as stated in the docs, the p-value returned from pearsonr() is only meaningful with datasets larger than 500. So how can I get a p-value that is reasonable for small datasets?
My temporary solution:
After reading up on linear regression, I have come up with my own small script, which basically uses Fischer transformation to get the z-score, from which the p-value is calculated:
import numpy as np
from scipy.stats import zprob
n = len(x)
z = np.log((1+cor)/(1-cor))*0.5*np.sqrt(n-3))
p = zprob(-z)
It works. However, I am not sure if it is more reasonable that p-value given by pearsonr(). Is there a python module which already has this functionality? I have not been able to find it in SciPy or Statsmodels.
Edit to clarify:
The dataset in my example is simplified. My real dataset is two arrays of 10-50 values.

Python Scipy Optimization curve_fit

I have two numpy arrays x and y and would like to fit a curve to the data. The fitting function is an exponential with a and t as fitting parameters, and another numpy array ex.
import numpy as np
import scipy
import scipy.optimize as op
k=1.38e-23
h=6.63e-34
c=3e8
def func(ex,a,t):
return a*np.exp(-h*c/(ex*1e-9*kb*t))
t0=300 #initial guess
print op.curve_fit(func,x,y,t0)
Your initial guess should contain two values like t0=(300, 1.) since you have two fitting parameters (a and t).
You need to define the points you want to fit, i.e. defining x and y before calling curve_fit().

Categories

Resources