Resistivity non linear fit in Python - python

I'm trying to fit Einstein approximation of resistivity in a solid in a set of experimental data.
I have resistivity vs temperature (from 200 to 4 K)
import xlrd as xd
import matplotlib.pyplot as plt
import numpy as np
import pylab as pl
import scipy as sp
from scipy.optimize import curve_fit
#retrieve data from file
data = pl.loadtxt('salita.txt')
Temp = data[:, 1]
Res = data[:, 2]
#define fitting function
def einstein_func( T, ro0, AE, TE):
nl = np.sinh(TE/(2*T))
return ro0 + AE*nl*T
p0 = sp.array([1 , 1, 1])
coeffs, cov = curve_fit(einstein_func, Temp, Res, p0)
But I get these warnings
crio.py:14: RuntimeWarning: divide by zero encountered in divide
nl = np.sinh(TE/(2*T))
crio.py:14: RuntimeWarning: overflow encountered in sinh
nl = np.sinh(TE/(2*T))
crio.py:15: RuntimeWarning: divide by zero encountered in divide
return ro0 + AE*np.sinh(TE/(2*T))*T
crio.py:15: RuntimeWarning: overflow encountered in sinh
return ro0 + AE*np.sinh(TE/(2*T))*T
crio.py:15: RuntimeWarning: invalid value encountered in multiply
return ro0 + AE*np.sinh(TE/(2*T))*T
Traceback (most recent call last):
File "crio.py", line 19, in <module>
coeffs, cov = curve_fit(einstein_func, Temp, Res, p0)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/minpack.py", line 511, in curve_fit
raise RuntimeError(msg)
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.
I don't understand why it keeps saying that there is a divide by zero in sinh, since I have strictly positive values. Varying my starting guess has no effect on it.
EDIT: My dataset is organized like this:
4.39531E+0 1.16083E-7
4.39555E+0 -5.92258E-8
4.39554E+0 -3.79045E-8
4.39525E+0 -2.13213E-8
4.39619E+0 -4.02736E-8
4.43130E+0 -1.42142E-8
4.45900E+0 -2.60594E-8
4.46129E+0 -9.00232E-8
4.46181E+0 1.42142E-7
4.46195E+0 -2.13213E-8
4.46225E+0 4.26426E-8
4.46864E+0 -2.60594E-8
4.47628E+0 1.37404E-7
4.47747E+0 9.47612E-9
4.48008E+0 2.84284E-8
4.48795E+0 1.35035E-7
4.49804E+0 1.39773E-7
4.51151E+0 -1.75308E-7
4.54916E+0 -1.63463E-7
4.59176E+0 -2.36902E-9
where the first column is temperature and the second one is resistivity (negative values are due to noise in trial current since the sample is a PbIn alloy which becomes superconductive at temperature lower than 6.7-6.9K, here we are at 4.5K).
Argument I'm providing to sinh are Numpy arrays, with a linear function ro0 + AE*T my code works. I've tried with scipy.optimize.minimize but the result is the same.
Now I see that I have almost nine hundred values in my file, may that be the problem?
I have edited my dataset removing some lines and now the only warning showing is
RuntimeWarning: overflow encountered in sinh
How can I workaround it?

Here are a couple of observations that could help:
You could try the least-squares fit directly with leastsq, providing the Jacobian, which might help tame it.
I'm guessing you don't want the superconducting temperatures in your data set at all if you're fitting to an Einstein model (do you have a source for this eqn, btw?)
Do make sure your initial guesses are as good as they could possibly be (ro0=AE=TE=1 probably won't cut it).
Plot your data and make sure there aren't any weird artefacts
You seem to be indexing your data array in the wrong way in your code example: if the data is structured as you say, you want:
Temp = data[:, 0]
Res = data[:, 1]
(Python indexes start at 0).

Related

Exponential fit to data using Scipy: Runtime error fix?

I am trying to do an exponential fit to some experimental data results, I have 6999 rows in each of my two columns. I have assigned the time to the x and the results to the y axis. I tried using scipy.optimize.curve_fit() with my data arrays but it produces a runtime error. I would highly appreciate knowing how I can get rid of the error or how I can code this in a different way, should my parameters be what produces the error.
Here's my code:
import pandas as pd
import numpy as np
import scipy as sp
import scipy.optimize
df =pd.read_csv("output.txt") #reads dataframe into our program
print(df) #to make sure this is what we want
x = np.array(df["t.timestep"]) #gets the data points and defines the timestep as x axis
print(x) #just checking
y = np.array(df["t.cTNF"]) #gets the data points and defines them as y axis
print(y) #just checking
sp.optimize.curve_fit(lambda t,a,b: a*np.exp(b*t), x, y) #should fit the line perfectly
It produces the following error message:
File "C:\Users\kimst\anaconda3\lib\site-packages\scipy\optimize\minpack.py", line 789, in curve_fit
raise RuntimeError("Optimal parameters not found: " + errmsg)
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 600.
Please help, I have a deadline and am completely stuck, thanks for reading :)

"RuntimeWarning: invalid value encountered in power" Using Scipy's ODR

I'm attempting to fit a function using Scipy's Orthogonal distance regression (odr) package and I keep getting the following error:
"RuntimeWarning: invalid value encountered in power"
this happened when I would use scipy's curve_fit function but I could always safely ignore the warning. But now it seems this is causing a numerical error that halts the fitting. I have based my code off of the example I found here:
python scipy.odrpack.odr example (with sample input / output)?
Here is my code:
import numpy as np
import scipy.odr.odrpack as odrpack
def divergence(x,xDiv):
return ( 1 - (x/xDiv) )**( -2.4 )
xValues = np.linspace(.25,.37,12)
yValues = np.array([ 6.94970607, 9.12475506, 10.65969954, 12.30241672,
14.44154148, 16.00261267, 19.98693664, 25.93076421,
30.89483997, 35.27106466, 50.81645983, 68.06009144])
xErrors = .0005*np.ones(len(xValues))
yErrors = np.array([ 0.31905094, 0.37956865, 0.24837562, 0.68320078, 1.25915789,
1.40241088, 0.33305157, 1.37165251, 0.32658393, 0.52253429,
1.04506858, 1.30633573])
wcModel = odrpack.Model(divergence)
mydata = odrpack.RealData(xValues, yValues, sx=xErrors, sy=yErrors)
myodr = odrpack.ODR(mydata, wcModel, beta0=[.8])
myoutput = myodr.run()
myoutput.pprint()
From looking at previous questions about this error I found here:
NumPy, RuntimeWarning: invalid value encountered in power
I suspected that the problem is that I'm raising a negatuve value to a power of a fractional value. But what I'm raising to the power -2.4 (1-x/xDiv) isn't negative (at least around the initial guess of xDiv=.8). But when I try to make my y-values of complex type I get a new error:
"ValueError: y could not be made into a suitable array"
from the line with the command
myoutput = myodr.run().
The only examples I can find that use this odr package are fitting to polynomials so I suspect that might be the problem?

ValueError: A value in x_new is below the interpolation range

This is a scikit-learn error that I get when I do
my_estimator = LassoLarsCV(fit_intercept=False, normalize=False, positive=True, max_n_alphas=1e5)
Note that if I decrease max_n_alphas from 1e5 down to 1e4 I do not get this error any more.
Anyone has an idea on what's going on?
The error happens when I call
my_estimator.fit(x, y)
I have 40k data points in 40 dimensions.
The full stack trace looks like this
File "/usr/lib64/python2.7/site-packages/sklearn/linear_model/least_angle.py", line 1113, in fit
axis=0)(all_alphas)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/polyint.py", line 79, in __call__
y = self._evaluate(x)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 498, in _evaluate
out_of_bounds = self._check_bounds(x_new)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/interpolate.py", line 525, in _check_bounds
raise ValueError("A value in x_new is below the interpolation "
ValueError: A value in x_new is below the interpolation range.
There must be something particular to your data. LassoLarsCV() seems to be working correctly with this synthetic example of fairly well-behaved data:
import numpy
import sklearn.linear_model
# create 40000 x 40 sample data from linear model with a bit of noise
npoints = 40000
ndims = 40
numpy.random.seed(1)
X = numpy.random.random((npoints, ndims))
w = numpy.random.random(ndims)
y = X.dot(w) + numpy.random.random(npoints) * 0.1
clf = sklearn.linear_model.LassoLarsCV(fit_intercept=False, normalize=False, max_n_alphas=1e6)
clf.fit(X, y)
# coefficients are almost exactly recovered, this prints 0.00377
print max(abs( clf.coef_ - w ))
# alphas actually used are 41 or ndims+1
print clf.alphas_.shape
This is in sklearn 0.16, I don't have positive=True option.
I'm not sure why you would want to use a very large max_n_alphas anyway. While I don't know why 1e+4 works and 1e+5 doesn't in your case, I suspect the paths you get from max_n_alphas=ndims+1 and max_n_alphas=1e+4 or whatever would be identical for well behaved data. Also the optimal alpha that is estimated by cross-validation in clf.alpha_ is going to be identical. Check out Lasso path using LARS example for what alpha is trying to do.
Also, from the LassoLars documentation
alphas_ array, shape (n_alphas + 1,)
Maximum of covariances (in
absolute value) at each iteration. n_alphas is either max_iter,
n_features, or the number of nodes in the path with correlation
greater than alpha, whichever is smaller.
so it makes sense that we end with alphas_ of size ndims+1 (ie n_features+1) above.
P.S. Tested with sklearn 0.17.1 and positive=True as well, also tested with some positive and negative coefficients, same result: alphas_ is ndims+1 or less.

Using Scipy's signal.welch command on data: ValueError and a dimension mismatch?

I'm trying to get my first power spectral density graph plotted using actual data instead of something that's purely theoretical and generated within Python. I'm having problems getting anything to work, however. Code is attached below, followed by the error I get in my console after line 19.
Don't know if it makes a difference, but I'm transitioning to Python from mostly working in MATLAB. I am not counting on having access to a license forever, so I really want to learn how to start doing everything in Python. But it's hard.
Code:
import numpy as np
from scipy import signal
import scipy.io
import matplotlib.pyplot as plt
#import data from a .mat file using the loadmat command
mat = scipy.io.loadmat('Mic_Data_Sums.mat')
# 1 x 1 array, sampling frequency of 22050 Hz
fs = mat['Fs']
# Attempted fix: change data type to 8-point float?
# fs = fs.astype('f8')
# 13 x 1323000 array - 13 separate time series of data, 60 seconds each
data = mat['Mic_Data_Sums']
# Welch function - transpose 'data' and use the 2nd time series
f, Pxx_spec = signal.welch(data.T[1], fs, window = 'hanning', nperseg = fs,
noverlap = fs/2, scaling = 'spectrum')
Console:
/Users/******/anaconda/lib/python3.4/site-packages/scipy/signal/spectral.py:297: RuntimeWarning: divide by zero encountered in double_scalars
scale = 1.0 / win.sum()**2
Traceback (most recent call last):
File "plotPSDs.py", line 20, in <module>
noverlap = fs/2, scaling = 'spectrum')
File "/Users/******/anaconda/lib/python3.4/site-packages/scipy/signal/spectral.py", line 333, in welch
xft = fftpack.rfft(x_dt*win, nfft)
ValueError: operands could not be broadcast together with shapes (22050,) (0,22051)
Note how the ValueError tag gives me weird shape (dimension) results: I have no idea where the 22051 is coming from.
Edit: As a workaround solution, I commented out the line of fs = mat['Fs'] and simply replaced it with fs = 22050, which made the code execute successfully. However, the question still remains, why can't I simply reference the variable as it was stored in the .mat file?
[from the comments above] If you know fs is 1x1, try passing fs[0,0] to welch. The docstring for welch says fs should be a float, so it might behave unpredictably if you give it a two-dimensional array. – Warren Weckesser 22 hours ago
This worked well. The code I implemented is:
# 1 x 1 array, sampling frequency (22050 Hz)
fs = mat['Fs']
fs = fs[0,0]
then using the code from before,
f, Pxx_spec = signal.welch(data.T[1], fs, window = 'hanning', nperseg = fs,
noverlap = fs/2, scaling = 'spectrum')

scipy.optimize.curvefit() - array must not contain infs or NaNs

I am trying to fit some data to a curve in Python using scipy.optimize.curve_fit. I am running into the error ValueError: array must not contain infs or NaNs.
I don't believe either my x or y data contain infs or NaNs:
>>> x_array = np.asarray_chkfinite(x_array)
>>> y_array = np.asarray_chkfinite(y_array)
>>>
To give some idea of what my x_array and y_array look like at either end (x_array is counts and y_array is quantiles):
>>> type(x_array)
<type 'numpy.ndarray'>
>>> type(y_array)
<type 'numpy.ndarray'>
>>> x_array[:5]
array([0, 0, 0, 0, 0])
>>> x_array[-5:]
array([2919, 2965, 3154, 3218, 3461])
>>> y_array[:5]
array([ 0.9999582, 0.9999163, 0.9998745, 0.9998326, 0.9997908])
>>> y_array[-5:]
array([ 1.67399000e-04, 1.25549300e-04, 8.36995200e-05,
4.18497600e-05, -2.22044600e-16])
And my function:
>>> def func(x,alpha,beta,b):
... return ((x/1)**(-alpha) * ((x+1*b)/(1+1*b))**(alpha-beta))
...
Which I am executing with:
>>> popt, pcov = curve_fit(func, x_array, y_array)
resulting in the error stack trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 426, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py", line 338, in leastsq
cov_x = inv(dot(transpose(R),R))
File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 285, in inv
a1 = asarray_chkfinite(a)
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 590, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
I'm guessing the error might not be with respect to my arrays, but rather an array created by scipy in an intermediate step? I've had a bit of a dig through the relevant scipy source
files, but things get hairy pretty quickly debugging the problem that way. Is there something obvious I'm doing wrong here? I've seen casually mentioned in other questions that sometimes certain initial parameter guesses (of which I currently don't have any explicit) might result in these kind of errors, but even if this is the case, it would be good to know a) why that is and b) how to avoid it.
Why it is failing
Not your input arrays are entailing nans or infs, but evaluation of your objective function at some X points and for some values of the parameters results in nans or infs: in other words, the array with values func(x,alpha,beta,b) for some x, alpha, beta and b is giving nans or infs over the optimization routine.
Scipy.optimize curve fitting function uses Levenberg-Marquardt algorithm. It is also called damped least square optimization. It is an iterative procedure, and a new estimate for the optimal function parameters is computed at each iteration. Also, at some point during optimization, algorithm is exploring some region of the parameters space where your function is not defined.
How to fix
1/Initial guess
Initial guess for parameters is decisive for the convergence. If initial guess is far from optimal solution, you are more likely to explore some regions where objective function is undefined. So, if you can have a better clue of what your optimal parameters are, and feed your algorithm with this initial guess, error while proceeding might be avoided.
2/Model
Also, you could modify your model, so that it is not returning nans. For those values of the parameters, params where original function func is not defined, you wish that objective function takes huge values, or in other words that func(params) is far from Y values to be fitted.
Also, at points where your objective function is not defined, you may return a big float, for instance AVG(Y)*10e5 with AVG the average (so that you make sure to be much bigger than average of Y values to be fitted).
Link
You could have a look at this post: Fitting data to an equation in python vs gnuplot
Your function has a negative power (x^-alpha) this is the same as (1/x)^(alpha). If x is ever 0 your function will return inf and your curve fit operation will break, I'm surprised a warning/error isn't thrown earlier informing you of a divide by 0.
BTW why are you multiplying and dividing by 1?
I was able to reproduce this error in python2.7 like so:
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this produces the error:
Which always produces the Error:
/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py:58: RuntimeWarning: invalid value encountered in sqrt
return np.dot(np.dot(u * (1. / np.sqrt(s)), u.T), W)
Traceback (most recent call last):
File "main.py", line 43, in <module>
ica()
File "main.py", line 18, in ica
ica.fit(X)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 523, in fit
self._fit(X, compute_sources=False)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 479, in _fit
compute_sources=compute_sources, return_n_iter=True)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 335, in fastica
W, n_iter = _ica_par(X1, **kwargs)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 108, in _ica_par
- g_wtx[:, np.newaxis] * W)
File "/usr/lib64/python2.7/site-packages/sklearn/decomposition/fastica_.py", line 55, in _sym_decorrelation
s, u = linalg.eigh(np.dot(W, W.T))
File "/usr/lib64/python2.7/site-packages/scipy/linalg/decomp.py", line 297, in eigh
a1 = asarray_chkfinite(a)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py", line 613, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
Solution:
from sklearn.decomposition import FastICA
X = load_data.load("stuff") #this sets X to a 2d numpy array containing
#large positive and negative numbers.
ica = FastICA(whiten=False)
#this is a column wise normalization function which flattens the
#two dimensional array from very large and very small numbers to
#reasonably sized numbers between roughly -1 and 1
X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)
print(np.isnan(X).any()) #this prints False
print(np.isinf(X).any()) #this prints False
ica.fit(X) #this works correctly.
Why does that normalization step fix the error?
I found the eureka moment here: sklearn's PLSRegression: "ValueError: array must not contain infs or NaNs"
What I think is happening is that numpy is being fed gigantic numbers and very tiny numbers, and inside it's tiny brain it's creating NaN's and Inf's. So it's a bug in the sklearn. The work around is to flatten your input data to the algorithm so that there are no very large or very small numbers.
Bad sklearn! NO biscuit!

Categories

Resources