data fitting with lookup table in python - python

I am currently trying to fit some data with python using scipy.optimize.leastsq. The data that I want to fit is of the form:
Mag(H,F,L) = F*sigmap(H) - sigman(H,L)
The Sigmap is a numeric integral which is a function of H and takes quite a while to calculate. I do not wish to include the integral as part of the fitting routine as otherwise the integral will be performed repeatedly and increase the time of the fitting routine significantly. As such I want to look up the values of the integral from elsewhere. The code I have used to implement this is:
integral = np.loadtxt(text file of form: H_Value Integral_Value)
lookupintegral = dict(integral)
sigmap = F*lookupintegral[H]
This is then included within the function which I am fitting to.
When I try to execute the code I generate an error: TypeError: unhashable type: 'numpy.ndarray'
Does anyone have any ideas as to how to implement a fitting routine that looks up data rather than calculating it every time?

That error suggests that the variable H is a numpy array, which can't be used as a dictionary key. Make sure that the variable H is an integer or float value.

Related

Recurrent problem using scipy.optimize.fmin

I am encountering some problems when translating the following code from MATLAB to Python:
Matlab code snippet:
x=M_test %M_test is a 1x3 array that holds the adjustment points for the function
y=R_test %R_test is also a 1x3 array
>> M_test=[0.513,7.521,13.781]
>> R_test=[2.39,3.77,6.86]
expo3= #(b,x) b(1).*(exp(-(b(2)./x).^b(3)));
NRCF_expo3= #(b) norm(y-expo3(b,x));
B0_expo3=[fcm28;1;1];
B_expo3=fminsearch(NRCF_expo3,B0_expo3);
Data_raw.fcm_expo3=(expo3(B_expo3,Data_raw.M));
The translated (python) code:
expo3=lambda x,M_test: x[0]*(1-exp(-1*(x[1]/M_test)**x[2]))
NRCF_expo3=lambda R_test,x,M_test: np.linalg.norm(R_test-expo3,ax=1)
B_expo3=scipy.optimize.fmin(func=NRCF_expo3,x0=[fcm28,1,1],args=(x,))
For clarity, the object function 'expo3' wants to go through the adjustment points defined by M_test.
'NRCF_expo3' is the function that wants to be minimised, which is basically the error between R_test and the drawn exponential function.
When I run the code, I obtain the following error message:
B_expo3=scipy.optimize.fmin(func=NRCF_expo3,x0=[fcm28,1,1]),args=(x,))
NameError: name 'x' is not defined
There are a lot of similar questions that I have perused.
If I delete the 'args' from the optimization function, as numpy/scipy analog of matlab's fminsearch
seems to indicate it is not necessary, I obtain the error:
line 327, in function_wrapper
return function(*(wrapper_args+args))
TypeError: <lambda>() missing 2 required positional arguments: 'x' and 'M_test'
There are a lot of other modifications that I have tried, following examples like Using scipy to minimize a function that also takes non variational parameters or those found in Open source examples, but nothing works for me.
I expect this is probably quite obvious, but I am very new to Python and I feel like I am looking for a needle in a haystack. What am I not seeing?
Any help would be really appreciated. I can also provide more code, if that is necessary.
I think you shouldn't use lambdas in your code, make instead a single target function with your three parameters (see PEP8). There is a lot of missing information in you post, but for what I can infer, you want something like this:
from scipy.optimize import fmin
# Define parameters
M_TEST = np.array([0.513, 7.521, 13.781])
X_ARR = np.array([2.39,3.77,6.86])
X0 = np.array([10, 1, 1]) # whatever your variable fcm28 is
def nrcf_exp3(r_test, m_test, x):
expo3 = x[0] * (1 - np.exp(-(x[1] / m_test) ** x[2]))
return np.linalg.norm(r_test - expo3)
fmin(nrcf_exp3, X0, args=(M_TEST, X_ARR))

How to fit my given data (cant be done by polyvar and model function is not helpful?)

I'm trying to fit my data points to a curve - it does not matter if I use the (existing) "approximated analytical" model function or not. Obviously I tried to use it at first, but I get a Runtime error whenever I include a parameter (and it's a "heavy" function).
It's an interference pattern:
My data looks looks like this:
When I simply use the analytical function (where all parameters are known), , I cannot get it right by guessing. This is how it looks like next my data
Here's the questionable code snippet (with parameter to be estimated by curve_fit):
def f(x, a):
return (((np.sin(a*np.pi*0.96*np.sin(x)*6.328*10**7))**2*(a*np.pi*0.96*np.sin(x)*6.328*10**7)**(-2))*10**11)*7**(-1)
params_, extras= curve_fit(f, x_data, y_data, maxfev=15000))
I then get the following error:

AdaDelta optimization algorithm using Python

I was studying the AdaDelta optimization algorithm so I tried to implement it in Python, but there is something wrong with my code, since I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
I did not find something about what is causing that error. According to the message, it's because of this line of code:
rms_grad = np.sqrt(self.e_grad + epsilon)
This line is similar to this equation:
RMS[g]t=√E[g^2]t+ϵ
I got the core equations of the algorithm in this article: http://ruder.io/optimizing-gradient-descent/index.html#adadelta
Just one more detail: I'm initializing the E[g^2]t matrix like this:
self.e_grad = (1 - mu)*np.square(nabla)
Where nabla is the gradient. Similar to this equation:
E[g2]t = γE[g2]t−1 + (1−γ)g2t
(the first term is equal to zero in the first iteration, just like the line of code above)
So I want to know if I'm initializing the E matrix the wrong way or I'm doing the square root inappropriately. I tried to use the pow() function but it doesn't work. If anyone could help me with this I would be very grateful, I'm trying this for weeks.
Additional details requested by andersource:
Here is the entire source code on github: https://github.com/pedrovbeltran/neural-networks-and-deep-learning/blob/experimental/modified-networks/network2_with_adadelta.py .
I think the problem is that self.e_grad_w is an ndarray of shape (2,) which further contains two additional ndarrays with 2d shapes, instead of directly containing data. This seems to be initialized in e_grad_initializer, in which nabla_w has the same structure. I didn't track where this comes from all the way back, but I believe once you fix this issue the problem will be resolved.

DBSCAN with custom metric

I have the following given:
a dataset in the range of thousands
a way of computing the similarity, but the datapoints themselves I cannot plot them in euclidian space
I know that DBSCAN should support custom distance metric but I dont know how to use it.
say I have a function
def similarity(x,y):
return similarity ...
and I have a list of data that can be passed pairwise into that function, how do I specify this when using the DBSCAN implementation of scikit-learn ?
Ideally what I want to do is to get a list of the clusters but I cant figure out how to get started in the first place.
There is a lot of terminology that still confuses me:
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
How do I pass a feature array and what is it ? How do I fit this implementation to my needs ? How will I be able to get my "sublists" from this algorithm ?
A "feature array" is simply an array of the features of a datapoint in your dataset.
metric is the parameter you're looking for. It can be a string (the name of a builtin metric), or a callable. Your similarity function is a callable. This isn't well described in the documentation, but a metric has to do just that, take two datapoints as parameters, and return a number.
def similarity(x, y):
return ...
reduced_dataset = sklearn.cluster.DBSCAN(metric=similarity).fit(dataset)
In case someone is searching the same for strings with a custom metric
def metric(x, y):
return yourDistFunc(string_seqs[int(x[0])],string_seqs[int(y[0])])
def clusterPockets():
global string_seqs
string_seqs = load_data() #["foo","bar"...]
dat = np.arange(len(string_seqs)).reshape(-1, 1)
clustered_dataset = DBSCAN(metric=metric)).fit(X=dat, y=dat)

Attribute Error from Minimizer object returned from scipy.optimize.minimize() function

Using the scipy.optimize.minimize() function I went trough different results using different methods for the same objective function. To evaluate the goodness-of-fit I use to look at the reduced chi squared as a first criterion. After some time I ended with this useful guide http://newville.github.io/lmfit-py/fitting.html#Minimizer where it is specified that the reduced chi squared is set as attribute of the Minimizer object returned from the minimize() function. But if I do
minobj = scipy.optimize.minimize(...)
minobj.redchi
I get
AttributeError: redchi
Meanwhile minobj.message and minobj.success are correctly displayed.
Any guess?
The documentation is a little misleading --- if you look at lmfit/minimizer.py, and do a string search for "redchi" in the entire file, it only appears once, and that is in the leastsq() method. So basically, it only calculates the reduced chi squared for least-squares fitting.
If you're feeling up to it, you could add redchi to the other methods in the appropriate places, fork the lmfit github repo, and commit your changes.
In addition to Ashwin's answer, you could always just use:
result = lmfit.minimize(...)
x2 = result.chisqr
nfree = result.nfree
red_x2 = x2/nfree

Categories

Resources