Strange result from Fast Fourier transform signal reconstruction - python

I have some data which is shown in the below figure and am interested in finding some of its Fourier series coefficients.
r = np.array([119.80601628, 119.84629291, 119.85290735, 119.45778804,
115.64497439, 105.58519852, 100.72765819, 100.04327702,
100.08590518, 100.35824977, 101.58424993, 105.47976376,
112.27556007, 117.07679226, 118.99998888, 119.60458086,
119.78624424, 119.83022022, 119.36116943, 115.72323767,
106.58946834, 101.19479124, 100.11537349, 100.13313755,
100.41846106, 101.42255377, 104.33650237, 109.73625492,
115.14763728, 118.24665037, 119.35359999, 119.68061835])
z = np.array([-411.42980545, -384.98596279, -358.13032372, -330.89578468,
-303.39129113, -275.76248957, -248.24478443, -221.07069838,
-194.33260984, -168.05271807, -142.19357982, -116.62090103,
-91.15354178, -65.56745626, -39.65284757, -13.29632162,
13.54374939, 40.84929432, 68.50496394, 96.33720787,
124.08525182, 151.36802193, 177.98791952, 204.0805317 ,
229.85399128, 255.44727674, 281.02166554, 306.75399703,
332.74638285, 359.05528646, 385.74336711, 412.8189858 ])
plt.plot(z, r, label='data')
plt.legend()
Then I calculate the average sampling period, since it is not constant as seen in the Z variable:
l = []
for i in range(32-1):
l.append(z[i]-z[i+1])
Ts = np.mean(l)
Then I calculate the fft:
from scipy.fftpack import fft
rf = scipy.fftpack.fft(r)
For reconstruction of the signal then:
fs = 1/Ts
amp = np.abs(rf)/r.shape[0]
n = r.shape[0]
s = 0
for i in range(n//2):
phi = np.angle(rf[i], deg=False)
a = amp[i]
k = i*fs/n
s += a*np.cos(2*np.pi*k *(z) +phi)
plt.plot(z, s, label='fft result')
plt.plot(z, r, label='data')
plt.legend()
The result is strange however both in terms of amplitude and frequency.

The complex spectrum is a symmetric spectrum with the range of (-fMax/2, ..., +fMax/2).
You only used the right hand positive part of the spectrum. This means, your reconstructed signal contains only half of the spectrums frequencies.
Because the spectrum is symmetric, all you have to do is to double the calculated absolute values. However, there is an important exception. The DC value amplitude[0] must not be doubled.

Related

Cublic Spline Interpolation of Phase Space Plot

I am creating a phase-space plot of first derivative of voltage against voltage:
I want to interpolate the plot so so it is smooth. So far, I have approached this by interpolating the voltage and first derivative of the voltage separately with time, then generating phase space plots.
Python Code (toy data example)
import numpy as np
import scipy.interpolate
interp_factor = 100
n = 12
time = np.linspace(0, 10, n)
voltage = np.array([0, 1, 2, 10, 30, 70, 140, 150, 140, 80, 40, 10])
voltage_diff = np.diff(voltage)
voltage = voltage[:-1]
time = time[:-1]
interp_function_voltage = scipy.interpolate.interp1d(time, voltage, kind="cubic")
interp_function_voltage_diff = scipy.interpolate.interp1d(time, voltage_diff, kind="cubic")
new_sample_num = interp_factor * (n - 1) + 1
new_time = np.linspace(np.min(time), np.max(time), new_sample_num)
interp_voltage = interp_function_voltage(new_time)
interp_voltage_diff = interp_function_voltage_diff(new_time)
I would like to ask:
a) is the method as implemented reasonable?
b) is there a better method by interpolating directly in the phase-space? e.g. interpolating with voltage as x and voltage_diff as y? I do not think this makes sense, because the voltage values are not uniformly spaced and there may be repeated voltage values. I also tried the scipy parametric interpolation methods (e.g. scipy.interpolate.splprep) but these threw input value error. I expect (it would be nice to have this clarified) because this is raw data, rather than well behaved parametric functions.
I guess more generally, I am wondering if it makes sense to somehow do the interpolation in the phase-space to make use of the direct relationship between voltage and voltage_diff for interpolating / smoothing.
Many thanks
It is reasonable, but your difference will be biased, maybe the best approximation for the difference could be (v[i+1] - v[i-1])/(2*dt)
Another approach is using Fourier transform smoothing
def smoother_phase_space(y, sps=1, T=1):
Y = np.fft.rfft(y)
yu = np.fft.irfft(Y, len(y)*sps).real * sps
dyu = np.fft.irfft(Y * (2j * np.pi * np.fft.rfftfreq(len(y))), len(y)*sps).real
k = np.arange(len(yu)+2) % len(yu)
return yu[k], dyu[k] * sps / T
v, dv = smoother_phase_space(voltage, sps=1)
plt.plot(v, dv, '-ob')
v, dv = smoother_phase_space(voltage, sps=4)
plt.plot(v, dv, '-r')
plt.plot(v[::4], dv[::4], 'or')
v, dv = smoother_phase_space(voltage, sps=32)
plt.plot(v, dv, '-g')
plt.plot(v[::32], dv[::32], 'og')
try: # the data computed in the original post
plt.plot(interp_voltage, interp_voltage_diff, '--')
except:
pass

How to make scipy.optimize.curve_fit result in a better sine regression fit?

I have a problem where I am using scipy.optimize.curve_fit to do a regression fit to a sine/cosine function but the fit does not seem as optimized as I want it to be. How can I change my code to make the fitting better?
I have already tried changing how parameters are tried for the dataset and there is always seemingly a difference in phase-offset of my generated fit or the fitting function is not fitting to the proper minima/maxima.
Here is the code I am using to generate the regression fit. The output (fitfunc) can be plotted to show the result.
def sin_regress(data_x, data_y):
"""Function regression fits data to SIN function; does not need guess of freq.
Parameters
----------
data_x :
Data for X values, most likely a set of voltages.
data_y :
Data for Y values, most likely the resulting powers from voltages.
Returns
-------
__ :
Dictionary containing values for amplitude, angular frequency, phase, offset, frequency, period, fit function, max covariance, initial guess.
"""
data_x = np.array(data_x)
data_y = np.array(data_y)
freqz = np.fft.rfftfreq(len(data_x), (data_x[1] - data_x[0])) # uniform spacing
freq_y = abs(np.fft.rfft(data_y))
guess_freq = abs(freqz[np.argmax(freq_y[1:])+1]) # exclude offset peak
guess_amp = np.std(data_y) * 2.**0.5
guess_offset = np.mean(data_y)
guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset])
def sinfunc(t, A, w, p, c):
"""Raw function to be used to fit data.
Parameters
----------
t :
Voltage array
A :
Amplitude
w :
Angular frequency
p :
Phase
c :
Constant value
Returns
-------
__ :
Formed fit function with provided values.
"""
return A * np.sin(w*t + p) + c
popt, pcov = scipy.optimize.curve_fit(sinfunc, data_x, data_y, p0=guess)
A, w, p, c = popt
f = w/(2.*np.pi)
fitfunc = lambda t: A * np.sin(w*t + p) + c
return {"amp": A, "omega": w, "phase": p, "offset": c, "freq": f, "period": 1./f, "fitfunc": fitfunc, "maxcov": np.max(pcov), "rawres": (guess,popt,pcov)}
With my trial dataset being:
x = np.linspace(3.5,9.5,(9.5-3.5)/0.00625 + 1)
pow1 = [1.8262110863, 1.80944546009, 1.7970185646900003, 1.77120336754, 1.7458101235699999, 1.73597098224, 1.7122529922799998, 1.70015674142, 1.68968617429, 1.6989396515, 1.69760676076, 1.6946375613599998, 1.6895321899, 1.68145658386, 1.68581793183, 1.6920468775900002, 1.6865452951599997, 1.68570953338, 1.6922784791700003, 1.70958957412, 1.71683408637, 1.70360183933, 1.6919669752199997, 1.6669487117300001, 1.6351298032300001, 1.6061729066600001, 1.57344333403, 1.54723708217, 1.5277773737599998, 1.5122628414300001, 1.4962354965200002, 1.4873367459, 1.47567715522, 1.4696584634, 1.46159565032, 1.45320592315, 1.4487225244200002, 1.44572887186, 1.44089260198, 1.4367157657399998, 1.4349226211, 1.43614316806, 1.4381950627400002, 1.43947658627, 1.4483572314200002, 1.4504305909200002, 1.44436990692, 1.43367609757, 1.42637295252, 1.41197427963, 1.4067529511399999, 1.39714414185, 1.38309980493, 1.3730701362500004, 1.3693239836499997, 1.3729558979599998, 1.38291189477, 1.3988274622900003, 1.42112832324, 1.44217266068, 1.4578792438300001, 1.46478639274, 1.46676801398, 1.4646383458800003, 1.45918801344, 1.44561402809, 1.4212145146499997, 1.4012453921299999, 1.38070199226, 1.36215759642, 1.3540496661500003, 1.35470913884, 1.3481165993199997, 1.34059081754, 1.332964567, 1.33426054366, 1.34052562222, 1.3343255632100002, 1.3310385903, 1.33044179339, 1.32827462527, 1.3356201140500001, 1.3400144893900001, 1.3157198001600001, 1.27716313727, 1.2517667292400003, 1.2406836620500001, 1.2354036030700002, 1.23110776291, 1.22492582889, 1.22074838719, 1.21816502762, 1.21015135518, 1.20038737012, 1.1920263929700001, 1.18723010357, 1.19656731125, 1.2237068834899998, 1.2373841696199999, 1.2251076648299999, 1.1963014909299998, 1.16152861736, 1.13940556893, 1.12839812676, 1.12368066547, 1.1190219542100002, 1.11384679759, 1.10555781262, 1.0977575386300003, 1.0901734365399998, 1.0824275375699999, 1.07552931443, 1.0696565210100002, 1.06481394254, 1.0578173014299999, 1.05204230102, 1.0482530038799998, 1.04237087457, 1.0361766944300002, 1.0297906393, 1.0240842912299999, 1.01250548183, 0.9964340353700001, 0.9859450307400002, 0.98614987451, 0.9826424718800002, 0.9739505767299999, 0.9578738177999998, 0.9416973908799999, 0.92975112051, 0.9204409049900001, 0.91821299468, 0.9100360995600001, 0.89589154778, 0.8799530701000002, 0.8640439088, 0.8500274234399999, 0.8428500205999999, 0.8358678326, 0.8333072464999999, 0.83420148485, 0.8362578717, 0.83608947323, 0.83035464861, 0.82315039029, 0.81220152235, 0.80169300598, 0.7918658959, 0.7808782388700001, 0.77684747687, 0.7743299962, 0.76797978094, 0.7591097217, 0.7520710688500001, 0.7452609707, 0.73562753255, 0.7256206568399999, 0.71663518742, 0.70951165178, 0.7035884873, 0.6973768853, 0.6900439160299999, 0.68062538021, 0.67096725454, 0.66585371901, 0.6663177033900001, 0.67214877804, 0.6787934074299999, 0.68365489213, 0.68581510712, 0.6820892084400001, 0.67805153237, 0.67540688376, 0.6724865515, 0.6674502035, 0.6593852224500001, 0.6524835227400001, 0.64758563177, 0.6424489126599999, 0.63385426361, 0.6242639699699999, 0.6143974848999999, 0.60705328516, 0.60087306988, 0.5928024247700001, 0.5864009594799999, 0.5786877362899999, 0.57457744302, 0.57012636848, 0.56554310644, 0.5618750202299999, 0.55731189492, 0.55057384756, 0.5419996086800001, 0.52987726408, 0.51025575876, 0.48599474143000004, 0.46231124366000004, 0.44151899608999995, 0.42632008877, 0.42655368254, 0.42784393651999997, 0.42863940533999995, 0.42506971759, 0.41952014686999994, 0.41337420894, 0.40570705996, 0.39706149294, 0.38721395321, 0.3806321949, 0.37313342483999995, 0.36982676447, 0.36704194004, 0.36189430296, 0.3560628963, 0.34954350131, 0.34540695806, 0.34178605934, 0.33629549256, 0.3293877577, 0.32357672213, 0.31864117490000005, 0.31165906503, 0.30439039263000006, 0.29875160317, 0.29294459105000004, 0.28847285244, 0.28509162173, 0.28265949265, 0.28003828154, 0.27814630873999996, 0.27599048828, 0.27524025386, 0.27406833971, 0.27281988259, 0.27155314420999993, 0.26840999947000005, 0.2634181241, 0.25883622926000005, 0.25503165868, 0.25056988104, 0.24466620872, 0.23932761459000002, 0.23422685251999997, 0.22880456697, 0.22310130485000004, 0.21785542557999998, 0.21366651902000006, 0.20966530780999998, 0.20521315906, 0.20012157666000002, 0.19469597081, 0.18957032591999995, 0.18423432945, 0.17946309866000001, 0.17845044232, 0.17746098912000002, 0.17475331315, 0.17039776599, 0.16363173032999997, 0.15716942518, 0.15214176858, 0.14870803788, 0.14515563527000003, 0.14218680693, 0.13893215828, 0.13546723615, 0.13178983356, 0.12747471604, 0.12350983297, 0.12011202021999998, 0.11627787931000003, 0.11218377746, 0.10821276155, 0.10384311280999999, 0.09960625706000001, 0.09615194041000003, 0.09216061199, 0.08847719376999999, 0.08481545522999999, 0.08163922452000001, 0.07851820869000001, 0.07535195845, 0.07259346216999998, 0.06996658694999999, 0.06748611806, 0.06513859836, 0.06343437948, 0.06174502390000001, 0.059727113600000006, 0.05755100017, 0.054968070300000005, 0.052386214650000006, 0.05002439809, 0.04768410494, 0.04532047195999999, 0.04319275697, 0.04105023728, 0.03894787384, 0.03695523698, 0.03513302983, 0.033548459399999994, 0.032170295249999994, 0.030958654539999998, 0.02983605681, 0.028375548879999997, 0.02671830267, 0.024898224419999997, 0.0230959196, 0.02139548979, 0.01983882955, 0.018419727860000002, 0.017108712149999997, 0.01590183706, 0.01467630964, 0.01340369235, 0.01204181727, 0.011048145310000002, 0.01072443434, 0.010401953859999999, 0.010151465580000001, 0.00990748117, 0.00972232492, 0.00956939523, 0.009442617850000001, 0.009344043619999999, 0.009241641279999999, 0.00915107487, 0.009064981109999998, 0.008985430320000001, 0.00890431702, 0.00883441469, 0.008775488880000001, 0.00873752015, 0.00871498109, 0.008710938120000001, 0.00872328188, 0.00874796935, 0.008778945909999999, 0.00882859436, 0.00889468812, 0.00898683656, 0.00910033268, 0.009214043629999998, 0.00934455143, 0.00949293034, 0.00965939522, 0.009844610069999999, 0.01005115305, 0.010290684330000001, 0.01054888746, 0.010822364050000002, 0.011132617979999999, 0.012252539939999998, 0.013524844710000001, 0.01492336044, 0.01639437616, 0.01790093876, 0.01949634904, 0.02112754055, 0.022849025059999997, 0.02457990408, 0.02637656436, 0.02816101762, 0.02999357634, 0.031735392870000004, 0.03370418208999999, 0.03591160409, 0.03868365509, 0.0413049248, 0.043746897629999996, 0.04622211263, 0.04871939798, 0.051123460649999994, 0.05370180068, 0.05625859775000001, 0.058868656510000006, 0.06136678167, 0.06394643029, 0.06623680155999997, 0.06885605955999999, 0.07171654804, 0.07483811078, 0.07798461489, 0.08075584557000001, 0.08390440047999999, 0.08690709601, 0.09012059232, 0.09292447923, 0.09569860054, 0.09869240932999998, 0.10204307363999998, 0.10579037859, 0.10944262493000001, 0.11339190256000002, 0.11739889503, 0.12165444219999999, 0.12640639566999998, 0.13103823193000003, 0.13545668928, 0.13980243177, 0.1445100493, 0.14892381914000002, 0.15358704212000002, 0.15754780411999997, 0.1620275896, 0.16721823448, 0.17344235602999997, 0.17972712208000002, 0.18671513038999998, 0.19370331449, 0.1997322407, 0.20632862788999998, 0.21168169468000003, 0.2186676522, 0.22613634413, 0.23308478213, 0.24056257561, 0.24694894328, 0.25289726401, 0.26043587782, 0.26523394455, 0.27115650357, 0.27472996084, 0.27757628917, 0.28195025433, 0.28717476642, 0.29255468867, 0.29700002103, 0.29903203287999996, 0.30043668141, 0.30362955273000003, 0.30861634997000004, 0.3146493582, 0.32141648759, 0.33050709371, 0.34155311010999995, 0.35347176329, 0.3641544984300001, 0.37273471389, 0.37810184317999995, 0.38245108175, 0.38773739072, 0.39195147307000006, 0.39284567233, 0.39723110233000003, 0.39968268453, 0.40089368072000003, 0.40181627844999995, 0.40374096608, 0.40828194296, 0.41598909193000005, 0.42570815513, 0.43468223779000004, 0.4419052070599999, 0.44814120359, 0.4541516141699999, 0.45904682936999996, 0.46598345094999993, 0.47421183044, 0.48259810056, 0.49064425346, 0.49772194929999997, 0.50355609034, 0.5097226337399999, 0.5242588261700001, 0.53191943219, 0.5427558587299999, 0.5558334377799999, 0.57145400528, 0.58596031492, 0.6017949058700001, 0.61620852018, 0.62886383358, 0.63983492811, 0.64928899126, 0.65807748798, 0.66440410952, 0.67291110232, 0.68452424766, 0.6952567679499999, 0.7045326279799999, 0.7168566913700001, 0.72438360596, 0.7334800323799999, 0.73850692728, 0.7444589784699999, 0.75250327593, 0.7652333354299999, 0.7794230629700001, 0.79152575915, 0.80011656054, 0.80971581904, 0.8176350188100001, 0.82681863275, 0.83466310596, 0.84169904395, 0.85246648611, 0.8612931078200001, 0.8712971515300001, 0.88083937874, 0.89039777788, 0.89838717297, 0.90641512274, 0.9111584238600001, 0.9159304749999999, 0.9210217253499999, 0.92296264345, 0.9233887177, 0.9218466277399999, 0.9176133266600001, 0.91940151039, 0.9208485417400001, 0.9220888543199999, 0.9236718817800001, 0.9276074484799999, 0.93015244864, 0.9343631130099999, 0.93763016402, 0.9384009648400001, 0.93879867973, 0.93652442175, 0.93662918739, 0.9331820972899999, 0.93503584744, 0.9360406912399999, 0.93994795716, 0.9444487777899999, 0.95150762595, 0.9574753021500001, 0.9659650293199998, 0.9757605964, 0.9878513785299999, 0.99883880117, 1.01323052095, 1.0311493112499999, 1.04763474212, 1.0677277318200002, 1.086237323, 1.0988490621599998, 1.10287175775, 1.11006095748, 1.1203823058799998, 1.1266948453599999, 1.1295011150999998, 1.13468379124, 1.13839008058, 1.1417559206699999, 1.1386140845, 1.1368738695300002, 1.13791410398, 1.1443759989699998, 1.1533826011700001, 1.16127430094, 1.1771807669, 1.19318348288, 1.2014892452, 1.20715822998, 1.21764737132, 1.23158125907, 1.2387470993899998, 1.2441262208700001, 1.2562376475, 1.2682344256899998, 1.28293907518, 1.2903573374300001, 1.3040509126199997, 1.3260814219800001, 1.3595052134299999, 1.3870089263099998, 1.4040962907899999, 1.4190098465199998, 1.43005375357, 1.4343605702800002, 1.4355429141099998, 1.43638377355, 1.44962018073, 1.45147113789, 1.45921588453, 1.4661880139399999, 1.47414703793, 1.47941295628, 1.47950143284, 1.4748920184699998, 1.4692222329000004, 1.4631299473100001, 1.45757789614, 1.4527345168899999, 1.4434376802999997, 1.4390123479299999, 1.4387321330999998, 1.4376372501999999, 1.44922049319, 1.46122473234, 1.47480432313, 1.48463330822, 1.50740325124, 1.52143227566, 1.5388702456399996, 1.5586354228100001, 1.5670929624799999, 1.57654938893, 1.60239005482, 1.6187282200499997, 1.6195258763400002, 1.6341473226799998, 1.6455264836499999, 1.6550699218299996, 1.6682315829299998, 1.68167279482, 1.6900114477300001, 1.6978344170500002, 1.7018968392199998, 1.70642375358, 1.71237959385, 1.7205134225500003, 1.7311321537799997, 1.7430771546100001, 1.7517999091500003, 1.76491293742, 1.7833902824799999, 1.8081253623500004, 1.83075608662, 1.8524498577000004, 1.86711454623, 1.8814965784800002, 1.8857294108200002, 1.90378495898, 1.9156142957500002, 1.9241271088399998, 1.92694429655, 1.92836076148, 1.9246632612399999, 1.9177767372999999, 1.9240789057399996, 1.93491201195, 1.95508541182, 1.9667632837499998, 1.97663894849, 1.9838888513599997, 1.9862320351100002, 1.9850681678399997, 1.9724571903800001, 1.9569690057000002, 1.9450577939199998, 1.93385585952, 1.91272038928, 1.90263962687, 1.89419806376, 1.8846363638699999, 1.8752989218, 1.8721239020399998, 1.87465480067, 1.87635644139, 1.8883053875500004, 1.90622687322, 1.9326186524100002, 1.96217418184, 1.99341387155, 2.0052843606899997, 2.0198940101400003, 2.03224112041, 2.04585828934, 2.0482686606100002, 2.0761935844499995, 2.10636661393, 2.1218703845699998, 2.1265723770799996, 2.13344606897, 2.13480411595, 2.12395452534, 2.11298829408, 2.10366419185, 2.10279155509, 2.10582569592, 2.12401487691, 2.14351597204, 2.1603280826, 2.1732762280399998, 2.1829961701499996, 2.1825562873100006, 2.1829598615399997, 2.18269224434, 2.18542837733, 2.18136038877, 2.17195739983, 2.16672507523, 2.1595190200499994, 2.15408655871, 2.16100126623, 2.1646243915, 2.16989273172, 2.1760575368399997, 2.18993197141, 2.20082640578, 2.18953400264, 2.1673666182699995, 2.15301331645, 2.1344672799800004, 2.1212936853000004, 2.1081594070399996, 2.08825354625, 2.0697085058700004, 2.045492469, 2.02153998684, 2.0038663723099996, 2.0038828566799998, 2.0085019585599997, 2.0192783851200002, 2.03833670679, 2.05771370034, 2.08050465897, 2.1006803439999997, 2.1263974552, 2.14748327701, 2.17287144288, 2.1941383974899997, 2.19820122981, 2.2003345112000003, 2.20800316408, 2.21184328157, 2.21310867227, 2.21112832057, 2.1998480658600004, 2.1906804089599996, 2.17670294702, 2.1515223983699996, 2.1337058932199997, 2.11742559909, 2.1017357932899996, 2.0798991511200002, 2.05328198125, 2.02510619803, 2.00362619651, 1.98193234731, 1.9618359005700001, 1.9612528146099997, 1.97096636996, 1.9761617414300001, 1.9782324642600002, 1.99263889104, 2.00500029816, 2.01506871685, 2.02912785846, 2.04221860157, 2.06368362263, 2.07491317421, 2.08832055797, 2.09538342956, 2.1084886843899997, 2.1158979036700005, 2.1260576895499996, 2.13639327622, 2.14181249535, 2.1392352295499997, 2.14448495648, 2.1421138235, 2.14009620617, 2.1384934521399996, 2.1319765571600002, 2.1216323962400003, 2.1065051490999998, 2.08999485498, 2.06996758792, 2.05396301646, 2.0366352808700006, 2.023489069, 1.9927697308899996, 1.9807445347400001, 1.97629449536, 1.9772154719699997, 1.9837454333899998, 1.9903514690000002, 1.9990068602399997, 2.0052703762999995, 2.0102515290099996, 2.01071088451, 2.00780344289, 2.00202451671, 1.99526703575, 1.9894158244, 1.9859053554, 1.9872483633099995, 1.99006639085, 2.00697930222, 2.0329301048299997, 2.05059264513, 2.0540770985099996, 2.04176762498, 2.0093012359700007, 1.9757453156100002, 1.94977980597, 1.94015615295, 1.93165724611, 1.9207719523600002, 1.90945249843, 1.89062300491, 1.87690150004, 1.8621346825699998, 1.84607821661, 1.828253313, 1.8169694254700002, 1.8075289169999997, 1.8040289362800004, 1.79267489253, 1.78023102445, 1.7778953016200003, 1.7787011610500003, 1.78226670819, 1.7830425676100004, 1.77486727406, 1.7675372149399997, 1.7575688744100002, 1.7498299871300003, 1.74518012353, 1.73248096246, 1.7160241253800002, 1.70317674164, 1.6978293584500002, 1.6946921121299998, 1.6961595927200002, 1.70211670251, 1.7104493398199998, 1.7203816647499999, 1.7274331496, 1.7311123100199999, 1.73665119714, 1.74750018228, 1.7625600270900001, 1.76829838689, 1.7683754962599998, 1.7604641870999997, 1.7378729159800002, 1.7182883638100002, 1.7072806677199999, 1.7037852573199999, 1.6963237919299996, 1.67904111493, 1.64849412058, 1.61509034869, 1.58860298353, 1.56708077499, 1.5563275906199998, 1.5508352464699997, 1.5448227655799998, 1.53880546048, 1.54041544105, 1.5403843473000003, 1.53577729621, 1.5273169831, 1.51722079097, 1.5010415320300001, 1.4873523904299997, 1.47098713536, 1.45343877476, 1.4333900233299999, 1.4214382256099998, 1.4199358231499999, 1.42357822576, 1.42446916333, 1.4169634987200002, 1.40651060735, 1.39602957147, 1.38608337936, 1.38502109414, 1.38722933647, 1.3877573052599999, 1.38915685615, 1.3879546490299999, 1.38030042971, 1.37484574183, 1.36882917891, 1.36771619056, 1.36598312403, 1.35475238104, 1.3352715984299999, 1.31243304213, 1.29205091175, 1.26981483599, 1.25096920963, 1.23261465755, 1.2107178005399999, 1.1896016271599998, 1.1758782668, 1.17342422369, 1.17358562993, 1.17110207509, 1.1674486178099999, 1.1603703751, 1.1565048865399998, 1.15140617524, 1.15148740571, 1.15832875386, 1.16650391071, 1.1712949266600001, 1.16865191865, 1.16596408644, 1.1661593208199998, 1.16419447693, 1.15754447647, 1.15312982771, 1.1506705697300001, 1.14375644814, 1.13705099847, 1.12589113437, 1.11212277402, 1.10001296849, 1.08946394429, 1.0747068729400002, 1.05980790705, 1.0438431988799999, 1.02497712333, 1.00659505173, 0.98919173016, 0.9715707328300001, 0.95416868081, 0.9416231916500001, 0.92753217501, 0.91364512326, 0.90414607963, 0.8947884227199999, 0.8843405703999998, 0.8769049253500001, 0.8719632452999999, 0.86833484662, 0.8680955887799999, 0.86604049098, 0.86558996362, 0.86372701427, 0.85893691627, 0.85435131048, 0.84886228665, 0.8409088095199999, 0.82732292967, 0.8182398235399999, 0.81298593645, 0.8065804672500001, 0.7963832009099999, 0.7813524576499999, 0.7642633939500001, 0.74891606863, 0.73387495429, 0.72021307831, 0.70711249145, 0.6972523931, 0.68836254874, 0.6789805168, 0.66917573095, 0.65520369872, 0.6405349086200001, 0.6262600443299999, 0.6128265668199999, 0.6004827768800001, 0.58821246352, 0.5763513298499999, 0.56580466895, 0.55820613325, 0.5498382224900001, 0.5432313079700001, 0.5383656045, 0.53169802591];
Here are some additional values for the pow dataset:
(Link to pastebin to not exceed post length limit)
https://pastebin.com/5GP8sj4N
The resulting fit that from the trial dataset (x, pow1) I get is shown here (orange) with the original (pow1) data (blue)
As mentioned, there is an issue with how the phase fits the minima and maxima. Unfortunately the application of getting this fit function correct has very little room for error.
Please help out if you have an idea of how to make this fit the data better!
Edit:
I tried what #Joe mentioned in the comments, with first filtering the data. I utilized a Savitzky-Golay filter and recieved the following result, Original data (blue), the filtered data (green), and the fit to the filtered data (orange). Again the same shift in minima and maxima is still present in the fit function to the filtered data.
Here are my results with more aggressive clipping bounds of 0.5 to 1.75 for each data set.
for pow1:
A = 9.6711505138648990E-01
c = 9.7613787086912507E-01
p = 4.0262076448344617E+00
w = 1.2654001570670070E+00
for pow2:
A = 9.4894637490866129E-01
c = 9.6733405789489280E-01
p = 4.0892433833755097E+00
w = 1.2578627414445132E+00
for pow3:
A = 9.8595630272060597E-01
c = 9.6749868212694512E-01
p = 4.0859456191316230E+00
w = 1.2598547148182329E+00
for pow4:
A = -9.4636707498392481E-01
c = 9.5047597808408602E-01
p = -4.2643913461857056E+02
w = 1.2761107231684055E+00
I think I have this figured out - your data is not a mathematically perfect sine wave + noise, so the fitting software can only come close to modeling a sine function to this data. If you must have more accuracy, try splitting the model into different segments and use a piecewise fit. Here is a close-up of the problem area:

How to get value of area under multiple peaks

I have some data from a bioanalyzer which gives me time (x-axis) and absorbance values (y-axis). The time is every .05 seconds and its from 32s to 138 so you can imagine how many data points I have. I've created a graph using plotly and matplotlib, just so that I have more libraries to work with to find a solution, so a solution in either library is ok! What I'm trying to do is make my script find the area under each peak and return my value.
def create_plot(sheet_name):
sample = book.sheet_by_name(sheet_name)
data = [[sample.cell_value(r, c) for r in range(sample.nrows)] for c in range(sample.ncols)]
y = data[2][18:len(data[2]) - 2]
x = np.arange(32, 138.05, 0.05)
indices = peakutils.indexes(y, thres=0.35, min_dist=0.1)
peaks = [y[i] for i in indices]
This snippet gets my Y values, X values and indices of the peaks. Now is there a way to get the area under each curve? Let's say that there are 15 indices.
Here's what the graph looks like:
An automated answer
Given a set of x and y values as well as a set of peaks (the x-coordinates of the peaks), here's how you can automatically find the area under each of the peaks. I'm assuming that x, y, and peaks are all Numpy arrays:
import numpy as np
# find the minima between each peak
ixpeak = x.searchsorted(peaks)
ixmin = np.array([np.argmin(i) for i in np.split(y, ixpeak)])
ixmin[1:] += ixpeak
mins = x[ixmin]
# split up the x and y values based on those minima
xsplit = np.split(x, ixmin[1:-1])
ysplit = np.split(y, ixmin[1:-1])
# find the areas under each peak
areas = [np.trapz(ys, xs) for xs,ys in zip(xsplit, ysplit)]
Output:
The example data has been set up so that the area under each peak is (more-or-less) guaranteed to be 1.0, so the results in the bottom plot are correct. The green X marks are the locations of the minimum between each two peaks. The part of the curve "belonging" to each peak is determined as the part of the curve in-between the minima adjacent to each peak.
Complete code
Here's the complete code I used to generate the example data:
import scipy as sp
import scipy.stats
prec = 1e5
n = 10
N = 150
r = np.arange(0, N+1, N//n)
# generate some reasonable fake data
peaks = np.array([np.random.uniform(s, e) for s,e in zip(r[:-1], r[1:])])
x = np.linspace(0, N + n, num=int(prec))
y = np.max([sp.stats.norm.pdf(x, loc=p, scale=.4) for p in peaks], axis=0)
and the code I used to make the plots:
import matplotlib.pyplot as plt
# plotting stuff
plt.figure(figsize=(5,7))
plt.subplots_adjust(hspace=.33)
plt.subplot(211)
plt.plot(x, y, label='trace 0')
plt.plot(peaks, y[ixpeak], '+', c='red', ms=10, label='peaks')
plt.plot(mins, y[ixmin], 'x', c='green', ms=10, label='mins')
plt.xlabel('dep')
plt.ylabel('indep')
plt.title('Example data')
plt.ylim(-.1, 1.6)
plt.legend()
plt.subplot(212)
plt.bar(np.arange(len(areas)), areas)
plt.xlabel('Peak number')
plt.ylabel('Area under peak')
plt.title('Area under the peaks of trace 0')
plt.show()

Fit the gamma distribution only to a subset of the samples

I have the histogram of my input data (in black) given in the following graph:
I'm trying to fit the Gamma distribution but not on the whole data but just to the first curve of the histogram (the first mode). The green plot in the previous graph corresponds to when I fitted the Gamma distribution on all the samples using the following python code which makes use of scipy.stats.gamma:
img = IO.read(input_file)
data = img.flatten() + abs(np.min(img)) + 1
# calculate dB positive image
img_db = 10 * np.log10(img)
img_db_pos = img_db + abs(np.min(img_db))
data = img_db_pos.flatten() + 1
# data histogram
n, bins, patches = plt.hist(data, 1000, normed=True)
# slice histogram here
# estimation of the parameters of the gamma distribution
fit_alpha, fit_loc, fit_beta = gamma.fit(data, floc=0)
x = np.linspace(0, 100)
y = gamma.pdf(x, fit_alpha, fit_loc, fit_beta)
print '(alpha, beta): (%f, %f)' % (fit_alpha, fit_beta)
# plot estimated model
plt.plot(x, y, linewidth=2, color='g')
plt.show()
How can I restrict the fitting only to the interesting subset of this data?
Update1 (slicing):
I sliced the input data by keeping only values below the max of the previous histogram, but the results were not really convincing:
This was achieved by inserting the following code below the # slice histogram here comment in the previous code:
max_data = bins[np.argmax(n)]
data = data[data < max_data]
Update2 (scipy.optimize.minimize):
The code below shows how scipy.optimize.minimize() is used to minimize an energy function to find (alpha, beta):
import matplotlib.pyplot as plt
import numpy as np
from geotiff.io import IO
from scipy.stats import gamma
from scipy.optimize import minimize
def truncated_gamma(x, max_data, alpha, beta):
gammapdf = gamma.pdf(x, alpha, loc=0, scale=beta)
norm = gamma.cdf(max_data, alpha, loc=0, scale=beta)
return np.where(x < max_data, gammapdf / norm, 0)
# read image
img = IO.read(input_file)
# calculate dB positive image
img_db = 10 * np.log10(img)
img_db_pos = img_db + abs(np.min(img_db))
data = img_db_pos.flatten() + 1
# data histogram
n, bins = np.histogram(data, 100, normed=True)
# using minimize on a slice data below max of histogram
max_data = bins[np.argmax(n)]
data = data[data < max_data]
data = np.random.choice(data, 1000)
energy = lambda p: -np.sum(np.log(truncated_gamma(data, max_data, *p)))
initial_guess = [np.mean(data), 2.]
o = minimize(energy, initial_guess, method='SLSQP')
fit_alpha, fit_beta = o.x
# plot data histogram and model
x = np.linspace(0, 100)
y = gamma.pdf(x, fit_alpha, 0, fit_beta)
plt.hist(data, 30, normed=True)
plt.plot(x, y, linewidth=2, color='g')
plt.show()
The algorithm above converged for a subset of data, and the output in o was:
x: array([ 16.66912781, 6.88105559])
But as can be seen on the screenshot below, the gamma plot doesn't fit the histogram:
You can use a general optimization tool such as scipy.optimize.minimize to fit a truncated version of the desired function, resulting in a nice fit:
First, the modified function:
def truncated_gamma(x, alpha, beta):
gammapdf = gamma.pdf(x, alpha, loc=0, scale=beta)
norm = gamma.cdf(max_data, alpha, loc=0, scale=beta)
return np.where(x<max_data, gammapdf/norm, 0)
This selects values from the gamma distribution where x < max_data, and zero elsewhere. The np.where part is not actually important here, because the data is exclusively to the left of max_data anyway. The key is normalization, because varying alpha and beta will change the area to the left of the truncation point in the original gamma.
The rest is just optimization technicalities.
It's common practise to work with logarithms, so I used what's sometimes called "energy", or the logarithm of the inverse of the probability density.
energy = lambda p: -np.sum(np.log(truncated_gamma(data, *p)))
Minimize:
initial_guess = [np.mean(data), 2.]
o = minimize(energy, initial_guess, method='SLSQP')
fit_alpha, fit_beta = o.x
My output is (alpha, beta): (11.595208, 824.712481). Like the original, it is a maximum likelihood estimate.
If you're not happy with the convergence rate, you may want to
Select a sample from your rather big dataset:
data = np.random.choice(data, 10000)
Try different algorithms using the method keyword argument.
Some optimization routines output a representation of the inverse hessian, which is useful for uncertainty estimation. Enforcement of nonnegativity for the parameters may also be a good idea.
A log-scaled plot without truncation shows the entire distribution:
Here's another possible approach using a manually created dataset in excel that more or less matched the plot given.
Raw Data
Outline
Imported data into a Pandas dataframe.
Mask the indices after the
max response index.
Create a mirror image of the remaining data.
Append the mirror image while leaving a buffer of empty space.
Fit the desired distribution to the modified data. Below I do a normal fit by the method of moments and adjust the amplitude and width.
Working Script
# Import data to dataframe.
df = pd.read_csv('sample.csv', header=0, index_col=0)
# Mask indices after index at max Y.
mask = df.index.values <= df.Y.argmax()
df = df.loc[mask, :]
scaled_y = 100*df.Y.values
# Create new df with mirror image of Y appended.
sep = 6
app_zeroes = np.append(scaled_y, np.zeros(sep, dtype=np.float))
mir_y = np.flipud(scaled_y)
new_y = np.append(app_zeroes, mir_y)
# Using Scipy-cookbook to fit a normal by method of moments.
idxs = np.arange(new_y.size) # idxs=[0, 1, 2,...,len(data)]
mid_idxs = idxs.mean() # len(data)/2
# idxs-mid_idxs is [-53.5, -52.5, ..., 52.5, len(data)/2]
scaling_param = np.sqrt(np.abs(np.sum((idxs-mid_idxs)**2*new_y)/np.sum(new_y)))
# adjust amplitude
fmax = new_y.max()*1.2 # adjusted function max to 120% max y.
# adjust width
scaling_param = scaling_param*.7 # adjusted by 70%.
# Fit normal.
fit = lambda t: fmax*np.exp(-(t-mid_idxs)**2/(2*scaling_param**2))
# Plot results.
plt.plot(new_y, '.')
plt.plot(fit(idxs), '--')
plt.show()
Result
See the scipy-cookbook fitting data page for more on fitting a normal using method of moments.

Finding first derivative using DFT in Python

I want to find the first derivative of exp(sin(x)) on the interval [0, 2/pi] using a discrete Fourier transform. The basic idea is to first evaluate the DFT of exp(sin(x)) on the given interval, giving you say v_k, followed by computing the inverse DFT of ikv_k giving you the desired answer. In reality, due to the implementations of Fourier transforms in programming languages, you might need to reorder the output somewhere and/or multiply by different factors here and there.
I first did it in Mathematica, where there is an option FourierParameters, which enables you to specify a convention for the transform. Firstly, I obtained the Fourier series of a Gaussian, in order to see what the normalisation factors are that I have to multiply by and then went on finding the derivative. Unfortunately, translating my Mathematica code into Python thereafter (whereby again I first did the Fourier series of a Gaussian - this was successful), I didn't get the same results. Here is my code:
N=1000
xmin=0
xmax=2.0*np.pi
step = (xmax-xmin)/(N)
xdata = np.linspace(xmin, xmax-step, N)
v = np.exp(np.sin(xdata))
derv = np.cos(xdata)*v
vhat = np.fft.fft(v)
kvals1 = np.arange(0, N/2.0, 1)
kvals2 = np.arange(-N/2.0, 0, 1)
what1 = np.zeros(kvals1.size+1)
what2 = np.empty(kvals2.size)
it = np.nditer(kvals1, flags=['f_index'])
while not it.finished:
np.put(what1, it.index, 1j*(2.0*np.pi)/((xmax-xmin))*it[0]*vhat[[int(it[0])]])
it.iternext()
it = np.nditer(kvals2, flags=['f_index'])
while not it.finished:
np.put(what2, it.index, 1j*(2.0*np.pi)/((xmax-xmin))*it[0]*vhat[[int(it[0])]])
it.iternext()
xdatafull = np.concatenate((xdata, [2.0*np.pi]))
what = np.concatenate((what1, what2))
w = np.real(np.fft.ifft(what))
fig = plt.figure()
ax = plt.gca()
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.spines['bottom'].set_position(('data',0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data',0))
plt.plot(xdata, derv, color='blue')
plt.plot(xdatafull, w, color='red')
plt.show()
I can post the Mathematica code, if people want me to.
Turns out the problem is that np.zeros gives you an array of real zeroes and not complex ones, hence the assignments after that don't change anything, as they are imaginary.
Thus the solution is quite simply
import numpy as np
N=100
xmin=0
xmax=2.0*np.pi
step = (xmax-xmin)/(N)
xdata = np.linspace(step, xmax, N)
v = np.exp(np.sin(xdata))
derv = np.cos(xdata)*v
vhat = np.fft.fft(v)
what = 1j*np.zeros(N)
what[0:N/2.0] = 1j*np.arange(0, N/2.0, 1)
what[N/2+1:] = 1j*np.arange(-N/2.0 + 1, 0, 1)
what = what*vhat
w = np.real(np.fft.ifft(what))
# Then plotting
whereby the np.zeros is replaced by 1j*np.zeros

Categories

Resources