Finding best weight value for smooth constrained least squares with Python? - python
I have a least squares problem to solve without any known estimates of a parameter. I impose the constraint that my desired solution be smooth (the model parameters vary slowly), so I minimize the difference between adjacent parameters (a traditional remedy used for this geological problem).
The constraints are implemented by arranging the constraining equations as rows in the original data equation d = Gm. The auxiliary parameter w is chosen by trial and error (w is called Lagrange multiplier by some textbooks).
I have the following:
G = np.array([[1,0,1,0,0,6],
[1,0,0,1,0,6.708],
[1,0,0,0,1,8.485],
[0,1,1,0,0,7.616],
[0,1,0,1,0,7],
[0,1,0,0,1,7.616]])
d = np.array([[2.323],
[2.543],
[2.857],
[2.64],
[2.529],
[2.553]])
Now adding a constraint of an arbitrary w-weighted smoothness (w = 0.01):
w = 0.01
G = np.array([[1,0,1,0,0,6],
[1,0,0,1,0,6.708],
[1,0,0,0,1,8.485],
[0,1,1,0,0,7.616],
[0,1,0,1,0,7],
[0,1,0,0,1,7.616],
[w,-w,0,0,0,0],
[0,w,-w,0,0,0],
[0,0,w,-w,0,0],
[0,0,0,w,-w,0],
[0,0,0,0,w,-w]])
d = np.array([[2.323],
[2.543],
[2.857],
[2.64],
[2.529],
[2.553],
[0],
[0],
[0],
[0],
[0]])
However, choosing a proper value for w seems to be a key step to constraint a good solution for the model parameters.
So my question is: with Python, is there a way I can loop over many calculated solutions with different values for w and choose the value that was used to achieve the solution with the best quality?
In the presented solution I'll refer to G_0 as G without the additional constraint and similarly d_0 is d without the additional zeros. I'm also assuming you're reading G_0 and d_0 from somewhere and I'm referring them as known.
import numpy as np
def create_W(n_rows, w):
W = -np.diagflat(np.ones(n_rows), 1)
np.fill_diagonal(W, 1)
return W
def solution_quality_metric(m):
# this need to be implemented to determine what you mean by "best"
n_rows = 5
d_w = np.zeros(n_rows)
# choose range for w values for example w_min = 0, w_max = 1, dw = 0.01
best_m = -np.inf
best_w = w_min
for w in np.arange(w_min, w_max, dw):
W = create_W(n_rows, w)
G = np.concatenate([G_0, W], axis=0)
d = np.concatenate([d_0, d_w])
m = np.lstsq(G, d)
if solution_quality_metric(m) > best_m:
best_m = solution_quality_metric(m)
best_w = w
This code will obviously not work as is since you didn't specify what you mean by "solution with the best quality". For this you'll need to implement the solution_quality_metric function
Related
Adding constraints to my fitting model using lmfit
I am trying to fit a complex conductivity model (the drude-smith-anderson model) using lmfit.minimize. In that fitting, I want constraints on my parameters c and c1 such that 0<c<1, -1<c1<0 and 0<1+c1-c<1. So, I am using the following code: #reference: Juluri B.K. "Fitting Complex Metal Dielectric Functions with Differential Evolution Method". http://juluribk.com/?p=1597. #reference: https://lmfit.github.io/lmfit-py/fitting.html #import libraries (numdifftools needs to be installed but doesn't need to be imported) import matplotlib.pyplot as plt import numpy as np import lmfit as lmf import math as mt #define the complex conductivity model def model(params,w): sigma0 = params["sigma0"].value tau = params["tau"].value c = params["c"].value d = params["d"].value c1 = params["c1"].value druidanderson = (sigma0/(1-1j*2*mt.pi*w*tau))*(1 + c1/(1-1j*2*mt.pi*w*tau)) - sigma0*c/(1-1j*2*mt.pi*w*d*tau) return druidanderson #defining the complex residues (chi squared is sum of squares of residues) def complex_residuals(params,w,exp_data): delta = model(params,w) residual = (abs((delta.real - exp_data.real) / exp_data.real) + abs( (delta.imag - exp_data.imag) / exp_data.imag)) return residual # importing data from CSV file importpath = input("Path of CSV file: ") #Asking the location of where your data file is kept (give input in form of path\name.csv) frequency = np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(0)) #path to be changed to the file from which data is taken conductivity = np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(1)) + 1j*np.genfromtxt(rf"{importpath}",delimiter=",", usecols=(2)) #path to be changed to the file from which data is taken frequency = frequency[np.logical_not(np.isnan(frequency))] conductivity = conductivity[np.logical_not(np.isnan(conductivity))] w_for_fit = frequency eps_for_fit = conductivity #defining the bounds and initial guesses for the fitting parameters params = lmf.Parameters() params.add("sigma0", value = float(input("Guess for \u03C3\u2080: ")), min =10 , max = 5000) #bounds have to be changed manually params.add("tau", value = float(input("Guess for \u03C4: ")), min = 0.0001, max =10) #bounds have to be changed manually params.add("c1", value = float(input("Guess for c1: ")), min = -1 , max = 0) #bounds have to be changed manually params.add("constraint", value = float(input("Guess for constraint: ")), min = 0, max=1) params.add("c", expr="1+c1-constraint", min = 0, max = 1) #bounds have to be changed manually params.add("d", value = float(input("Guess for \u03C4_1/\u03C4: ")),min = 100, max = 100000) #bounds have to be changed manually # minimizing the chi square minimizer_results = lmf.minimize(complex_residuals, params, args=(w_for_fit, eps_for_fit), method = 'differential_evolution', strategy='best1bin', popsize=50, tol=0.01, mutation=(0, 1), recombination=0.9, seed=None, callback=None, disp=True, polish=True, init='latinhypercube') lmf.printfuncs.report_fit(minimizer_results, show_correl=False) As a result for the fit, I get the following output: sigma0: 3489.38961 (init = 1000) tau: 1.2456e-04 (init = 0.01) c1: -0.99816132 (init = -1) constraint: 0.98138820 (init = 1) c: 0.00000000 == '1+c1-constraint' d: 7333.82306 (init = 1000) These values don't make any sense as 1+c1-c = -0.97954952 which is not 0 and is thus invalid. How to fix this issue?
Your code is not runnable. The use of input() is sort of stunning - please do not do that. Write code that is pleasant to read and separates i/o from logic. To make a floating point residual from a complex array, use complex_array.view(float) Guessing any parameter value to be at or very close to its limit (here, c) is a very bad idea, likely to make the fit harder. More to your question, you defined c as "evaluate 1+c1-constant and then apply the bounds min=0, max=1". That is literally, precisely, and exactly what your params.add("c", expr="1+c1-constraint", min = 0, max = 1) means: calculate c as 1+c1-constraint, and then apply the bounds [0, 1]. The code is doing exactly what you told it to do. Unless you know what you are doing (I suspect maybe not ;)), I would strongly advise doing a fit with the default leastsq method before trying to use differential_evolution. It turns out that differential_evolution is not a very good global fitting method (shgo is generally better, though no "global" solver should be considered as very reliable). But, unless you know that you need such a method, you probably do not. I would also strongly advise you to plot your data and some models evaluated with what you think are reasonable parameters.
How to make scipy.optimize.curve_fit result in a better sine regression fit?
I have a problem where I am using scipy.optimize.curve_fit to do a regression fit to a sine/cosine function but the fit does not seem as optimized as I want it to be. How can I change my code to make the fitting better? I have already tried changing how parameters are tried for the dataset and there is always seemingly a difference in phase-offset of my generated fit or the fitting function is not fitting to the proper minima/maxima. Here is the code I am using to generate the regression fit. The output (fitfunc) can be plotted to show the result. def sin_regress(data_x, data_y): """Function regression fits data to SIN function; does not need guess of freq. Parameters ---------- data_x : Data for X values, most likely a set of voltages. data_y : Data for Y values, most likely the resulting powers from voltages. Returns ------- __ : Dictionary containing values for amplitude, angular frequency, phase, offset, frequency, period, fit function, max covariance, initial guess. """ data_x = np.array(data_x) data_y = np.array(data_y) freqz = np.fft.rfftfreq(len(data_x), (data_x[1] - data_x[0])) # uniform spacing freq_y = abs(np.fft.rfft(data_y)) guess_freq = abs(freqz[np.argmax(freq_y[1:])+1]) # exclude offset peak guess_amp = np.std(data_y) * 2.**0.5 guess_offset = np.mean(data_y) guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset]) def sinfunc(t, A, w, p, c): """Raw function to be used to fit data. Parameters ---------- t : Voltage array A : Amplitude w : Angular frequency p : Phase c : Constant value Returns ------- __ : Formed fit function with provided values. """ return A * np.sin(w*t + p) + c popt, pcov = scipy.optimize.curve_fit(sinfunc, data_x, data_y, p0=guess) A, w, p, c = popt f = w/(2.*np.pi) fitfunc = lambda t: A * np.sin(w*t + p) + c return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": np.max(pcov), "rawres": (guess,popt,pcov)} With my trial dataset being: x = np.linspace(3.5,9.5,(9.5-3.5)/0.00625 + 1) pow1 = [1.8262110863, 1.80944546009, 1.7970185646900003, 1.77120336754, 1.7458101235699999, 1.73597098224, 1.7122529922799998, 1.70015674142, 1.68968617429, 1.6989396515, 1.69760676076, 1.6946375613599998, 1.6895321899, 1.68145658386, 1.68581793183, 1.6920468775900002, 1.6865452951599997, 1.68570953338, 1.6922784791700003, 1.70958957412, 1.71683408637, 1.70360183933, 1.6919669752199997, 1.6669487117300001, 1.6351298032300001, 1.6061729066600001, 1.57344333403, 1.54723708217, 1.5277773737599998, 1.5122628414300001, 1.4962354965200002, 1.4873367459, 1.47567715522, 1.4696584634, 1.46159565032, 1.45320592315, 1.4487225244200002, 1.44572887186, 1.44089260198, 1.4367157657399998, 1.4349226211, 1.43614316806, 1.4381950627400002, 1.43947658627, 1.4483572314200002, 1.4504305909200002, 1.44436990692, 1.43367609757, 1.42637295252, 1.41197427963, 1.4067529511399999, 1.39714414185, 1.38309980493, 1.3730701362500004, 1.3693239836499997, 1.3729558979599998, 1.38291189477, 1.3988274622900003, 1.42112832324, 1.44217266068, 1.4578792438300001, 1.46478639274, 1.46676801398, 1.4646383458800003, 1.45918801344, 1.44561402809, 1.4212145146499997, 1.4012453921299999, 1.38070199226, 1.36215759642, 1.3540496661500003, 1.35470913884, 1.3481165993199997, 1.34059081754, 1.332964567, 1.33426054366, 1.34052562222, 1.3343255632100002, 1.3310385903, 1.33044179339, 1.32827462527, 1.3356201140500001, 1.3400144893900001, 1.3157198001600001, 1.27716313727, 1.2517667292400003, 1.2406836620500001, 1.2354036030700002, 1.23110776291, 1.22492582889, 1.22074838719, 1.21816502762, 1.21015135518, 1.20038737012, 1.1920263929700001, 1.18723010357, 1.19656731125, 1.2237068834899998, 1.2373841696199999, 1.2251076648299999, 1.1963014909299998, 1.16152861736, 1.13940556893, 1.12839812676, 1.12368066547, 1.1190219542100002, 1.11384679759, 1.10555781262, 1.0977575386300003, 1.0901734365399998, 1.0824275375699999, 1.07552931443, 1.0696565210100002, 1.06481394254, 1.0578173014299999, 1.05204230102, 1.0482530038799998, 1.04237087457, 1.0361766944300002, 1.0297906393, 1.0240842912299999, 1.01250548183, 0.9964340353700001, 0.9859450307400002, 0.98614987451, 0.9826424718800002, 0.9739505767299999, 0.9578738177999998, 0.9416973908799999, 0.92975112051, 0.9204409049900001, 0.91821299468, 0.9100360995600001, 0.89589154778, 0.8799530701000002, 0.8640439088, 0.8500274234399999, 0.8428500205999999, 0.8358678326, 0.8333072464999999, 0.83420148485, 0.8362578717, 0.83608947323, 0.83035464861, 0.82315039029, 0.81220152235, 0.80169300598, 0.7918658959, 0.7808782388700001, 0.77684747687, 0.7743299962, 0.76797978094, 0.7591097217, 0.7520710688500001, 0.7452609707, 0.73562753255, 0.7256206568399999, 0.71663518742, 0.70951165178, 0.7035884873, 0.6973768853, 0.6900439160299999, 0.68062538021, 0.67096725454, 0.66585371901, 0.6663177033900001, 0.67214877804, 0.6787934074299999, 0.68365489213, 0.68581510712, 0.6820892084400001, 0.67805153237, 0.67540688376, 0.6724865515, 0.6674502035, 0.6593852224500001, 0.6524835227400001, 0.64758563177, 0.6424489126599999, 0.63385426361, 0.6242639699699999, 0.6143974848999999, 0.60705328516, 0.60087306988, 0.5928024247700001, 0.5864009594799999, 0.5786877362899999, 0.57457744302, 0.57012636848, 0.56554310644, 0.5618750202299999, 0.55731189492, 0.55057384756, 0.5419996086800001, 0.52987726408, 0.51025575876, 0.48599474143000004, 0.46231124366000004, 0.44151899608999995, 0.42632008877, 0.42655368254, 0.42784393651999997, 0.42863940533999995, 0.42506971759, 0.41952014686999994, 0.41337420894, 0.40570705996, 0.39706149294, 0.38721395321, 0.3806321949, 0.37313342483999995, 0.36982676447, 0.36704194004, 0.36189430296, 0.3560628963, 0.34954350131, 0.34540695806, 0.34178605934, 0.33629549256, 0.3293877577, 0.32357672213, 0.31864117490000005, 0.31165906503, 0.30439039263000006, 0.29875160317, 0.29294459105000004, 0.28847285244, 0.28509162173, 0.28265949265, 0.28003828154, 0.27814630873999996, 0.27599048828, 0.27524025386, 0.27406833971, 0.27281988259, 0.27155314420999993, 0.26840999947000005, 0.2634181241, 0.25883622926000005, 0.25503165868, 0.25056988104, 0.24466620872, 0.23932761459000002, 0.23422685251999997, 0.22880456697, 0.22310130485000004, 0.21785542557999998, 0.21366651902000006, 0.20966530780999998, 0.20521315906, 0.20012157666000002, 0.19469597081, 0.18957032591999995, 0.18423432945, 0.17946309866000001, 0.17845044232, 0.17746098912000002, 0.17475331315, 0.17039776599, 0.16363173032999997, 0.15716942518, 0.15214176858, 0.14870803788, 0.14515563527000003, 0.14218680693, 0.13893215828, 0.13546723615, 0.13178983356, 0.12747471604, 0.12350983297, 0.12011202021999998, 0.11627787931000003, 0.11218377746, 0.10821276155, 0.10384311280999999, 0.09960625706000001, 0.09615194041000003, 0.09216061199, 0.08847719376999999, 0.08481545522999999, 0.08163922452000001, 0.07851820869000001, 0.07535195845, 0.07259346216999998, 0.06996658694999999, 0.06748611806, 0.06513859836, 0.06343437948, 0.06174502390000001, 0.059727113600000006, 0.05755100017, 0.054968070300000005, 0.052386214650000006, 0.05002439809, 0.04768410494, 0.04532047195999999, 0.04319275697, 0.04105023728, 0.03894787384, 0.03695523698, 0.03513302983, 0.033548459399999994, 0.032170295249999994, 0.030958654539999998, 0.02983605681, 0.028375548879999997, 0.02671830267, 0.024898224419999997, 0.0230959196, 0.02139548979, 0.01983882955, 0.018419727860000002, 0.017108712149999997, 0.01590183706, 0.01467630964, 0.01340369235, 0.01204181727, 0.011048145310000002, 0.01072443434, 0.010401953859999999, 0.010151465580000001, 0.00990748117, 0.00972232492, 0.00956939523, 0.009442617850000001, 0.009344043619999999, 0.009241641279999999, 0.00915107487, 0.009064981109999998, 0.008985430320000001, 0.00890431702, 0.00883441469, 0.008775488880000001, 0.00873752015, 0.00871498109, 0.008710938120000001, 0.00872328188, 0.00874796935, 0.008778945909999999, 0.00882859436, 0.00889468812, 0.00898683656, 0.00910033268, 0.009214043629999998, 0.00934455143, 0.00949293034, 0.00965939522, 0.009844610069999999, 0.01005115305, 0.010290684330000001, 0.01054888746, 0.010822364050000002, 0.011132617979999999, 0.012252539939999998, 0.013524844710000001, 0.01492336044, 0.01639437616, 0.01790093876, 0.01949634904, 0.02112754055, 0.022849025059999997, 0.02457990408, 0.02637656436, 0.02816101762, 0.02999357634, 0.031735392870000004, 0.03370418208999999, 0.03591160409, 0.03868365509, 0.0413049248, 0.043746897629999996, 0.04622211263, 0.04871939798, 0.051123460649999994, 0.05370180068, 0.05625859775000001, 0.058868656510000006, 0.06136678167, 0.06394643029, 0.06623680155999997, 0.06885605955999999, 0.07171654804, 0.07483811078, 0.07798461489, 0.08075584557000001, 0.08390440047999999, 0.08690709601, 0.09012059232, 0.09292447923, 0.09569860054, 0.09869240932999998, 0.10204307363999998, 0.10579037859, 0.10944262493000001, 0.11339190256000002, 0.11739889503, 0.12165444219999999, 0.12640639566999998, 0.13103823193000003, 0.13545668928, 0.13980243177, 0.1445100493, 0.14892381914000002, 0.15358704212000002, 0.15754780411999997, 0.1620275896, 0.16721823448, 0.17344235602999997, 0.17972712208000002, 0.18671513038999998, 0.19370331449, 0.1997322407, 0.20632862788999998, 0.21168169468000003, 0.2186676522, 0.22613634413, 0.23308478213, 0.24056257561, 0.24694894328, 0.25289726401, 0.26043587782, 0.26523394455, 0.27115650357, 0.27472996084, 0.27757628917, 0.28195025433, 0.28717476642, 0.29255468867, 0.29700002103, 0.29903203287999996, 0.30043668141, 0.30362955273000003, 0.30861634997000004, 0.3146493582, 0.32141648759, 0.33050709371, 0.34155311010999995, 0.35347176329, 0.3641544984300001, 0.37273471389, 0.37810184317999995, 0.38245108175, 0.38773739072, 0.39195147307000006, 0.39284567233, 0.39723110233000003, 0.39968268453, 0.40089368072000003, 0.40181627844999995, 0.40374096608, 0.40828194296, 0.41598909193000005, 0.42570815513, 0.43468223779000004, 0.4419052070599999, 0.44814120359, 0.4541516141699999, 0.45904682936999996, 0.46598345094999993, 0.47421183044, 0.48259810056, 0.49064425346, 0.49772194929999997, 0.50355609034, 0.5097226337399999, 0.5242588261700001, 0.53191943219, 0.5427558587299999, 0.5558334377799999, 0.57145400528, 0.58596031492, 0.6017949058700001, 0.61620852018, 0.62886383358, 0.63983492811, 0.64928899126, 0.65807748798, 0.66440410952, 0.67291110232, 0.68452424766, 0.6952567679499999, 0.7045326279799999, 0.7168566913700001, 0.72438360596, 0.7334800323799999, 0.73850692728, 0.7444589784699999, 0.75250327593, 0.7652333354299999, 0.7794230629700001, 0.79152575915, 0.80011656054, 0.80971581904, 0.8176350188100001, 0.82681863275, 0.83466310596, 0.84169904395, 0.85246648611, 0.8612931078200001, 0.8712971515300001, 0.88083937874, 0.89039777788, 0.89838717297, 0.90641512274, 0.9111584238600001, 0.9159304749999999, 0.9210217253499999, 0.92296264345, 0.9233887177, 0.9218466277399999, 0.9176133266600001, 0.91940151039, 0.9208485417400001, 0.9220888543199999, 0.9236718817800001, 0.9276074484799999, 0.93015244864, 0.9343631130099999, 0.93763016402, 0.9384009648400001, 0.93879867973, 0.93652442175, 0.93662918739, 0.9331820972899999, 0.93503584744, 0.9360406912399999, 0.93994795716, 0.9444487777899999, 0.95150762595, 0.9574753021500001, 0.9659650293199998, 0.9757605964, 0.9878513785299999, 0.99883880117, 1.01323052095, 1.0311493112499999, 1.04763474212, 1.0677277318200002, 1.086237323, 1.0988490621599998, 1.10287175775, 1.11006095748, 1.1203823058799998, 1.1266948453599999, 1.1295011150999998, 1.13468379124, 1.13839008058, 1.1417559206699999, 1.1386140845, 1.1368738695300002, 1.13791410398, 1.1443759989699998, 1.1533826011700001, 1.16127430094, 1.1771807669, 1.19318348288, 1.2014892452, 1.20715822998, 1.21764737132, 1.23158125907, 1.2387470993899998, 1.2441262208700001, 1.2562376475, 1.2682344256899998, 1.28293907518, 1.2903573374300001, 1.3040509126199997, 1.3260814219800001, 1.3595052134299999, 1.3870089263099998, 1.4040962907899999, 1.4190098465199998, 1.43005375357, 1.4343605702800002, 1.4355429141099998, 1.43638377355, 1.44962018073, 1.45147113789, 1.45921588453, 1.4661880139399999, 1.47414703793, 1.47941295628, 1.47950143284, 1.4748920184699998, 1.4692222329000004, 1.4631299473100001, 1.45757789614, 1.4527345168899999, 1.4434376802999997, 1.4390123479299999, 1.4387321330999998, 1.4376372501999999, 1.44922049319, 1.46122473234, 1.47480432313, 1.48463330822, 1.50740325124, 1.52143227566, 1.5388702456399996, 1.5586354228100001, 1.5670929624799999, 1.57654938893, 1.60239005482, 1.6187282200499997, 1.6195258763400002, 1.6341473226799998, 1.6455264836499999, 1.6550699218299996, 1.6682315829299998, 1.68167279482, 1.6900114477300001, 1.6978344170500002, 1.7018968392199998, 1.70642375358, 1.71237959385, 1.7205134225500003, 1.7311321537799997, 1.7430771546100001, 1.7517999091500003, 1.76491293742, 1.7833902824799999, 1.8081253623500004, 1.83075608662, 1.8524498577000004, 1.86711454623, 1.8814965784800002, 1.8857294108200002, 1.90378495898, 1.9156142957500002, 1.9241271088399998, 1.92694429655, 1.92836076148, 1.9246632612399999, 1.9177767372999999, 1.9240789057399996, 1.93491201195, 1.95508541182, 1.9667632837499998, 1.97663894849, 1.9838888513599997, 1.9862320351100002, 1.9850681678399997, 1.9724571903800001, 1.9569690057000002, 1.9450577939199998, 1.93385585952, 1.91272038928, 1.90263962687, 1.89419806376, 1.8846363638699999, 1.8752989218, 1.8721239020399998, 1.87465480067, 1.87635644139, 1.8883053875500004, 1.90622687322, 1.9326186524100002, 1.96217418184, 1.99341387155, 2.0052843606899997, 2.0198940101400003, 2.03224112041, 2.04585828934, 2.0482686606100002, 2.0761935844499995, 2.10636661393, 2.1218703845699998, 2.1265723770799996, 2.13344606897, 2.13480411595, 2.12395452534, 2.11298829408, 2.10366419185, 2.10279155509, 2.10582569592, 2.12401487691, 2.14351597204, 2.1603280826, 2.1732762280399998, 2.1829961701499996, 2.1825562873100006, 2.1829598615399997, 2.18269224434, 2.18542837733, 2.18136038877, 2.17195739983, 2.16672507523, 2.1595190200499994, 2.15408655871, 2.16100126623, 2.1646243915, 2.16989273172, 2.1760575368399997, 2.18993197141, 2.20082640578, 2.18953400264, 2.1673666182699995, 2.15301331645, 2.1344672799800004, 2.1212936853000004, 2.1081594070399996, 2.08825354625, 2.0697085058700004, 2.045492469, 2.02153998684, 2.0038663723099996, 2.0038828566799998, 2.0085019585599997, 2.0192783851200002, 2.03833670679, 2.05771370034, 2.08050465897, 2.1006803439999997, 2.1263974552, 2.14748327701, 2.17287144288, 2.1941383974899997, 2.19820122981, 2.2003345112000003, 2.20800316408, 2.21184328157, 2.21310867227, 2.21112832057, 2.1998480658600004, 2.1906804089599996, 2.17670294702, 2.1515223983699996, 2.1337058932199997, 2.11742559909, 2.1017357932899996, 2.0798991511200002, 2.05328198125, 2.02510619803, 2.00362619651, 1.98193234731, 1.9618359005700001, 1.9612528146099997, 1.97096636996, 1.9761617414300001, 1.9782324642600002, 1.99263889104, 2.00500029816, 2.01506871685, 2.02912785846, 2.04221860157, 2.06368362263, 2.07491317421, 2.08832055797, 2.09538342956, 2.1084886843899997, 2.1158979036700005, 2.1260576895499996, 2.13639327622, 2.14181249535, 2.1392352295499997, 2.14448495648, 2.1421138235, 2.14009620617, 2.1384934521399996, 2.1319765571600002, 2.1216323962400003, 2.1065051490999998, 2.08999485498, 2.06996758792, 2.05396301646, 2.0366352808700006, 2.023489069, 1.9927697308899996, 1.9807445347400001, 1.97629449536, 1.9772154719699997, 1.9837454333899998, 1.9903514690000002, 1.9990068602399997, 2.0052703762999995, 2.0102515290099996, 2.01071088451, 2.00780344289, 2.00202451671, 1.99526703575, 1.9894158244, 1.9859053554, 1.9872483633099995, 1.99006639085, 2.00697930222, 2.0329301048299997, 2.05059264513, 2.0540770985099996, 2.04176762498, 2.0093012359700007, 1.9757453156100002, 1.94977980597, 1.94015615295, 1.93165724611, 1.9207719523600002, 1.90945249843, 1.89062300491, 1.87690150004, 1.8621346825699998, 1.84607821661, 1.828253313, 1.8169694254700002, 1.8075289169999997, 1.8040289362800004, 1.79267489253, 1.78023102445, 1.7778953016200003, 1.7787011610500003, 1.78226670819, 1.7830425676100004, 1.77486727406, 1.7675372149399997, 1.7575688744100002, 1.7498299871300003, 1.74518012353, 1.73248096246, 1.7160241253800002, 1.70317674164, 1.6978293584500002, 1.6946921121299998, 1.6961595927200002, 1.70211670251, 1.7104493398199998, 1.7203816647499999, 1.7274331496, 1.7311123100199999, 1.73665119714, 1.74750018228, 1.7625600270900001, 1.76829838689, 1.7683754962599998, 1.7604641870999997, 1.7378729159800002, 1.7182883638100002, 1.7072806677199999, 1.7037852573199999, 1.6963237919299996, 1.67904111493, 1.64849412058, 1.61509034869, 1.58860298353, 1.56708077499, 1.5563275906199998, 1.5508352464699997, 1.5448227655799998, 1.53880546048, 1.54041544105, 1.5403843473000003, 1.53577729621, 1.5273169831, 1.51722079097, 1.5010415320300001, 1.4873523904299997, 1.47098713536, 1.45343877476, 1.4333900233299999, 1.4214382256099998, 1.4199358231499999, 1.42357822576, 1.42446916333, 1.4169634987200002, 1.40651060735, 1.39602957147, 1.38608337936, 1.38502109414, 1.38722933647, 1.3877573052599999, 1.38915685615, 1.3879546490299999, 1.38030042971, 1.37484574183, 1.36882917891, 1.36771619056, 1.36598312403, 1.35475238104, 1.3352715984299999, 1.31243304213, 1.29205091175, 1.26981483599, 1.25096920963, 1.23261465755, 1.2107178005399999, 1.1896016271599998, 1.1758782668, 1.17342422369, 1.17358562993, 1.17110207509, 1.1674486178099999, 1.1603703751, 1.1565048865399998, 1.15140617524, 1.15148740571, 1.15832875386, 1.16650391071, 1.1712949266600001, 1.16865191865, 1.16596408644, 1.1661593208199998, 1.16419447693, 1.15754447647, 1.15312982771, 1.1506705697300001, 1.14375644814, 1.13705099847, 1.12589113437, 1.11212277402, 1.10001296849, 1.08946394429, 1.0747068729400002, 1.05980790705, 1.0438431988799999, 1.02497712333, 1.00659505173, 0.98919173016, 0.9715707328300001, 0.95416868081, 0.9416231916500001, 0.92753217501, 0.91364512326, 0.90414607963, 0.8947884227199999, 0.8843405703999998, 0.8769049253500001, 0.8719632452999999, 0.86833484662, 0.8680955887799999, 0.86604049098, 0.86558996362, 0.86372701427, 0.85893691627, 0.85435131048, 0.84886228665, 0.8409088095199999, 0.82732292967, 0.8182398235399999, 0.81298593645, 0.8065804672500001, 0.7963832009099999, 0.7813524576499999, 0.7642633939500001, 0.74891606863, 0.73387495429, 0.72021307831, 0.70711249145, 0.6972523931, 0.68836254874, 0.6789805168, 0.66917573095, 0.65520369872, 0.6405349086200001, 0.6262600443299999, 0.6128265668199999, 0.6004827768800001, 0.58821246352, 0.5763513298499999, 0.56580466895, 0.55820613325, 0.5498382224900001, 0.5432313079700001, 0.5383656045, 0.53169802591]; Here are some additional values for the pow dataset: (Link to pastebin to not exceed post length limit) https://pastebin.com/5GP8sj4N The resulting fit that from the trial dataset (x, pow1) I get is shown here (orange) with the original (pow1) data (blue) As mentioned, there is an issue with how the phase fits the minima and maxima. Unfortunately the application of getting this fit function correct has very little room for error. Please help out if you have an idea of how to make this fit the data better! Edit: I tried what #Joe mentioned in the comments, with first filtering the data. I utilized a Savitzky-Golay filter and recieved the following result, Original data (blue), the filtered data (green), and the fit to the filtered data (orange). Again the same shift in minima and maxima is still present in the fit function to the filtered data.
Here are my results with more aggressive clipping bounds of 0.5 to 1.75 for each data set. for pow1: A = 9.6711505138648990E-01 c = 9.7613787086912507E-01 p = 4.0262076448344617E+00 w = 1.2654001570670070E+00 for pow2: A = 9.4894637490866129E-01 c = 9.6733405789489280E-01 p = 4.0892433833755097E+00 w = 1.2578627414445132E+00 for pow3: A = 9.8595630272060597E-01 c = 9.6749868212694512E-01 p = 4.0859456191316230E+00 w = 1.2598547148182329E+00 for pow4: A = -9.4636707498392481E-01 c = 9.5047597808408602E-01 p = -4.2643913461857056E+02 w = 1.2761107231684055E+00
I think I have this figured out - your data is not a mathematically perfect sine wave + noise, so the fitting software can only come close to modeling a sine function to this data. If you must have more accuracy, try splitting the model into different segments and use a piecewise fit. Here is a close-up of the problem area:
Is my problem suited for convex optimization, and if so, how to express it with cvxpy?
I have an array of scalars of m rows and n columns. I have a Variable(m) and a Variable(n) that I would like to find solutions for. The two variables represent values that need to be broadcast over the columns and rows respectively. I was naively thinking of writing the variables as Variable((m, 1)) and Variable((1, n)), and adding them together as if they're ndarrays. However, that doesn't work, as broadcasting is not allowed. import cvxpy as cp import numpy as np # Problem data. m = 3 n = 4 np.random.seed(1) data = np.random.randn(m, n) # Construct the problem. x = cp.Variable((m, 1)) y = cp.Variable((1, n)) objective = cp.Minimize(cp.sum(cp.abs(x + y + data))) # or: #objective = cp.Minimize(cp.sum_squares(x + y + data)) prob = cp.Problem(objective) result = prob.solve() print(x.value) print(y.value) This fails on the x + y expression: ValueError: Cannot broadcast dimensions (3, 1) (1, 4). Now I'm wondering two things: Is my problem indeed solvable using convex optimization? If yes, how can I express it in a way that cvxpy understands? I'm very new to the concept of convex optimization, as well as cvxpy, and I hope I described my problem well enough.
I offered to show you how to represent this as a linear program, so here it goes. I'm using Pyomo, since I'm more familiar with that, but you could do something similar in PuLP. To run this, you will need to first install Pyomo and a linear program solver like glpk. glpk should work for reasonable-sized problems, but if you are finding it's taking too long to solve, you could try a (much faster) commercial solver like CPLEX or Gurobi. You can install Pyomo via pip install pyomo or conda install -c conda-forge pyomo. You can install glpk from https://www.gnu.org/software/glpk/ or via conda install glpk. (I think PuLP comes with a version of glpk built-in, so that might save you a step.) Here's the script. Note that this calculates absolute error as a linear expression by defining one variable for the positive component of the error and another for the negative part. Then it seeks to minimize the sum of both. In this case, the solver will always set one to zero since that's an easy way to reduce the error, and then the other will be equal to the absolute error. import random import pyomo.environ as po random.seed(1) # ~50% sparse data set, big enough to populate every row and column m = 10 # number of rows n = 10 # number of cols data = { (r, c): random.random() for r in range(m) for c in range(n) if random.random() >= 0.5 } # define a linear program to find vectors # x in R^m, y in R^n, such that x[r] + y[c] is close to data[r, c] # create an optimization model object model = po.ConcreteModel() # create indexes for the rows and columns model.ROWS = po.Set(initialize=range(m)) model.COLS = po.Set(initialize=range(n)) # create indexes for the dataset model.DATAPOINTS = po.Set(dimen=2, initialize=data.keys()) # data values model.data = po.Param(model.DATAPOINTS, initialize=data) # create the x and y vectors model.X = po.Var(model.ROWS, within=po.NonNegativeReals) model.Y = po.Var(model.COLS, within=po.NonNegativeReals) # create dummy variables to represent errors model.ErrUp = po.Var(model.DATAPOINTS, within=po.NonNegativeReals) model.ErrDown = po.Var(model.DATAPOINTS, within=po.NonNegativeReals) # Force the error variables to match the error def Calculate_Error_rule(model, r, c): pred = model.X[r] + model.Y[c] err = model.ErrUp[r, c] - model.ErrDown[r, c] return (model.data[r, c] + err == pred) model.Calculate_Error = po.Constraint( model.DATAPOINTS, rule=Calculate_Error_rule ) # Minimize the total error def ClosestMatch_rule(model): return sum( model.ErrUp[r, c] + model.ErrDown[r, c] for (r, c) in model.DATAPOINTS ) model.ClosestMatch = po.Objective( rule=ClosestMatch_rule, sense=po.minimize ) # Solve the model # get a solver object opt = po.SolverFactory("glpk") # solve the model # turn off "tee" if you want less verbose output results = opt.solve(model, tee=True) # show solution status print(results) # show verbose description of the model model.pprint() # show X and Y values in the solution for r in model.ROWS: print('X[{}]: {}'.format(r, po.value(model.X[r]))) for c in model.COLS: print('Y[{}]: {}'.format(c, po.value(model.Y[c]))) Just to complete the story, here's a solution that's closer to your original example. It uses cvxpy, but with the sparse data approach from my solution. I don't know the "official" way to do elementwise calculations with cvxpy, but it seems to work OK to just use the standard Python sum function with a lot of individual cp.abs(...) calculations. This gives a solution that is very slightly worse than the linear program, but you may be able to fix that by adjusting the solution tolerance. import cvxpy as cp import random random.seed(1) # Problem data. # ~50% sparse data set m = 10 # number of rows n = 10 # number of cols data = { (i, j): random.random() for i in range(m) for j in range(n) if random.random() >= 0.5 } # Construct the problem. x = cp.Variable(m) y = cp.Variable(n) objective = cp.Minimize( sum( cp.abs(x[i] + y[j] + data[i, j]) for (i, j) in data.keys() ) ) prob = cp.Problem(objective) result = prob.solve() print(x.value) print(y.value)
I did not get the idea, but just some hacky stuff based on the assumption: you want some cvxpy-equivalent to numpy's broadcasting-rules behaviour on arrays (m, 1) + (1, n) So numpy-wise: m = 3 n = 4 np.random.seed(1) a = np.random.randn(m, 1) b = np.random.randn(1, n) a array([[ 1.62434536], [-0.61175641], [-0.52817175]]) b array([[-1.07296862, 0.86540763, -2.3015387 , 1.74481176]]) a + b array([[ 0.55137674, 2.48975299, -0.67719333, 3.36915713], [-1.68472504, 0.25365122, -2.91329511, 1.13305535], [-1.60114037, 0.33723588, -2.82971045, 1.21664001]]) Let's mimic this with np.kron, which has a cvxpy-equivalent: aLifted = np.kron(np.ones((1,n)), a) bLifted = np.kron(np.ones((m,1)), b) aLifted array([[ 1.62434536, 1.62434536, 1.62434536, 1.62434536], [-0.61175641, -0.61175641, -0.61175641, -0.61175641], [-0.52817175, -0.52817175, -0.52817175, -0.52817175]]) bLifted array([[-1.07296862, 0.86540763, -2.3015387 , 1.74481176], [-1.07296862, 0.86540763, -2.3015387 , 1.74481176], [-1.07296862, 0.86540763, -2.3015387 , 1.74481176]]) aLifted + bLifted array([[ 0.55137674, 2.48975299, -0.67719333, 3.36915713], [-1.68472504, 0.25365122, -2.91329511, 1.13305535], [-1.60114037, 0.33723588, -2.82971045, 1.21664001]]) Let's check cvxpy semi-blindly (we only dimensions; too lazy to setup a problem and fix variable to check the output :-D): import cvxpy as cp x = cp.Variable((m, 1)) y = cp.Variable((1, n)) cp.kron(np.ones((1,n)), x) + cp.kron(np.ones((m, 1)), y) # Expression(AFFINE, UNKNOWN, (3, 4)) # looks good! Now some caveats: i don't know how efficient cvxpy can reason about this matrix-form internally unclear if more efficient as a simple list-comprehension based form using cp.vstack and co (it probably is) this operation itself kills all sparsity (if both vectors are dense; your matrix is dense) cvxpy and more or less all convex-optimization solvers are based on some sparsity assumption scaling this problem up to machine-learning dimensions will not make you happy there is probably a much more concise mathematical theory for your problem then to use (sparsity-assuming) (pretty) general (DCP implemented in cvxpy is a subset) convex-optimization
Problems with scipy.optimize using matrix as input, bounds, constraints
I have used Python to perform optimization in the past; however, I am now trying to use a matrix as the input for the objective function as well as set bounds on the individual element values and the sum of the value of each row in the matrix, and I am encountering problems. Specifically, I would like to pass the objective function ObjFunc three parameters - w, p, ret - and then minimize the value of this function (technically I am trying to maximize the function by minimizing the value of -1*ObjFunc) by adjusting the value of w subject to the bound that all elements of w should fall within the range [0, 1] and the constraint that sum of each row in w should sum to 1. I have included a simplified piece of example code below to demonstrate the issue I'm encountering. As you can see, I am using the minimize function from scipy.opimize. The problems begin in the first line of objective function x = np.dot(p, w) in which the optimization procedure attempts to flatten the matrix into a one-dimensional vector - a problem that does not occur when the function is called without performing optimization. The bounds = b and constraints = c are both producing errors as well. I know that I am making an elementary mistake in how I am approaching this optimization and would appreciate any insight that can be offered. import numpy as np from scipy.optimize import minimize def objFunc(w, p, ret): x = np.dot(p, w) y = np.multiply(x, ret) z = np.sum(y, axis=1) r = z.mean() s = z.std() ratio = r/s return -1 * ratio # CREATE MATRICES # returns, ret, of each of the three assets in the 5 periods ret = np.matrix([[0.10, 0.05, -0.03], [0.05, 0.05, 0.50], [0.01, 0.05, -0.10], [0.01, 0.05, 0.40], [1.00, 0.05, -0.20]]) # probability, p, of being in each stae {X, Y, Z} in each of the 5 periods p = np.matrix([[0,0.5,0.5], [0,0.6,0.4], [0.2,0.4,0.4], [0.3,0.3,0.4], [1,0,0]]) # initial equal weights, w w = np.matrix([[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333],[0.33333,0.33333,0.33333]]) # OPTIMIZATION b = [(0, 1)] c = ({'type': 'eq', 'fun': lambda w_: np.sum(w, 1) - 1}) result = minimize(objFunc, w, (p, ret), method = 'SLSQP', bounds = b, constraints = c)
Digging into the code a bit. minimize calls optimize._minimize._minimize_slsqp. One of the first things it does is: x = asfarray(x0).flatten() So you need to design your objFunc to work with the flattened version of w. It may be enough to reshape it at the start of that function. I read the code from a IPython session, but you can also find it in your scipy directory: /usr/local/lib/python3.5/dist-packages/scipy/optimize/_minimize.py
Calculating medoid of a cluster (Python)
So I'm running a KNN in order to create clusters. From each cluster, I would like to obtain the medoid of the cluster. I'm employing a fractional distance metric in order to calculate distances: where d is the number of dimensions, the first data point's coordinates are x^i, the second data point's coordinates are y^i, and f is an arbitrary number between 0 and 1 I would then calculate the medoid as: where S is the set of datapoints, and δ is the absolute value of the distance metric used above. I've looked online to no avail trying to find implementations of medoid (even with other distance metrics, but most thing were specifically k-means or k-medoid which [I think] is relatively different from what I want. Essentially this boils down to me being unable to translate the math into effective programming. Any help would or pointers in the right direction would be much appreciated! Here's a short list of what I have so far: I have figured out how to calculate the fractional distance metric (the first equation) so I think I'm good there. I know numpy has an argmin() function (documented here). Extra points for increased efficiency without lack of accuracy (I'm trying not to brute force by calculating every single fractional distance metric (because the number of point pairs might lead to a factorial complexity...).
compute pairwise distance matrix compute column or row sum argmin to find medoid index i.e. numpy.argmin(distMatrix.sum(axis=0)) or similar.
So I've accepted the answer here, but I thought I'd provide my implementation if anyone else was trying to do something similar: (1) This is the distance function: def fractional(p_coord_array, q_coord_array): # f is an arbitrary value, but must be greater than zero and # less than one. In this case, I used 3/10. I took advantage # of the difference of cubes in this case, so that I wouldn't # encounter an overflow error. a = np.sum(np.array(p_coord_array, dtype=np.float64)) b = np.sum(np.array(q_coord_array, dtype=np.float64)) a2 = np.sum(np.power(p_coord_array, 2)) ab = np.sum(p_coord_array) * np.sum(q_coord_array) b2 = np.sum(np.power(p_coord_array, 2)) diffab = a - b suma2abb2 = a2 + ab + b2 temp_dist = abs(diffab * suma2abb2) temp_dist = np.power(temp_dist, 1./10) dist = np.power(temp_dist, 10./3) return dist (2) The medoid function (if the length of the dataset was less than 6000 [if greater than that, I ran into overflow errors... I'm still working on that bit to be perfectly honest...]): def medoid(dataset): point = [] w = len(dataset) if(len(dataset) < 6000): h = len(dataset) dist_matrix = [[0 for x in range(w)] for y in range(h)] list_combinations = [(counter_1, counter_2, data_1, data_2) for counter_1, data_1 in enumerate(dataset) for counter_2, data_2 in enumerate(dataset) if counter_1 < counter_2] for counter_3, tuple in enumerate(list_combinations): temp_dist = fractional(tuple[2], tuple[3]) dist_matrix[tuple[0]][tuple[1]] = abs(temp_dist) dist_matrix[tuple[1]][tuple[0]] = abs(temp_dist) Any questions, feel free to comment!
If you don't mind using brute force this might help: def calc_medoid(X, Y, f=2): n = len(X) m = len(Y) dist_mat = np.zeros((m, n)) # compute distance matrix for j in range(n): center = X[j, :] for i in range(m): if i != j: dist_mat[i, j] = np.linalg.norm(Y[i, :] - center, ord=f) medoid_id = np.argmin(dist_mat.sum(axis=0)) # sum over y return medoid_id, X[medoid_id, :]
Here is an example of computing a medoid for a single cluster with Euclidean distance. import numpy as np, pandas as pd, matplotlib.pyplot as plt a, b, c, d = np.array([0,1]), np.array([1, 3]), np.array([4,2]), np.array([3, 1.5]) vCenroid = np.mean([a, b, c, d], axis=0) def GetMedoid(vX): vMean = np.mean(vX, axis=0) # compute centroid return vX[np.argmin([sum((x - vMean)**2) for x in vX])] # pick a point closest to centroid vMedoid = GetMedoid([a, b, c, d]) print(f'centroid = {vCenroid}') print(f'medoid = {vMedoid}') df = pd.DataFrame([a, b, c, d], columns=['x', 'y']) ax = df.plot.scatter('x', 'y', grid=True, title='Centroid in 2D plane', s=100); plt.plot(vCenroid[0], vCenroid[1], 'ro', ms=10); # plot centroid as red circle plt.plot(vMedoid[0], vMedoid[1], 'rx', ms=20); # plot medoid as red star You can also use the following package to compute medoid for one or more clusters !pip -q install scikit-learn-extra > log from sklearn_extra.cluster import KMedoids GetMedoid = lambda vX: KMedoids(n_clusters=1).fit(vX).cluster_centers_ GetMedoid([a, b, c, d])[0]
I would say that you just need to compute the median. np.median(np.asarray(points), axis=0) Your median is the point with the biggest centrality. Note: if you are using distances different than Euclidean this doesn't hold.