I want to interpolate some data, and to plot the result on a log scale (pyplot.loglog). The problem is that the resulting interpolation looks very strange and shows discontinuities when plotted on a log scale. What is the best way to interpolate log scaled data?
pyplot.loglog(x, y, '+')
pyplot.hold(True)
s = scipy.interpolate.InterpolatedUnivariateSpline(x, y)
xs = numpy.logspace(numpy.log10(numpy.min(x)), numpy.log10(numpy.max(x)))
pyplot.loglog(xs, s(xs)) # This looks very strange because of the log scale!
Actually, I succeed doing it by interpolating the log of the data, but I was wondering if there were a simpler way of achieving the same result?
pyplot.loglog(x, y, '+')
pyplot.hold(True)
s = scipy.interpolate.InterpolatedUnivariateSpline(numpy.log10(x), numpy.log10(y))
xs = numpy.logspace(numpy.log10(numpy.min(x)), numpy.log10(numpy.max(x)))
pyplot.loglog(xs, numpy.power(10, s(numpy.log10(xs)))
Looks like taking logarithm of the data first, then fitting is a normal way to do this. See Fitting a Power Law Distribution.
Related
There is a for-loop in my part of code, and every step it can generate new tpr(as X), fpr(as Y) like that
0.05263157894736842 0.1896551724137931
0.06578947368421052 0.19540229885057472
0.07894736842105263 0.22988505747126436
0.07894736842105263 0.25862068965517243
0.07894736842105263 0.28735632183908044
I want collect all these points and get a full plot, but it didn't work. And my code are attached below
for i in range (-30,20):
predicted = (np.sign(t+i*1e-4)+1)/2.
vals, cm = re.get_CM_vals(y_test, predicted)
tpr = re.TPR_CM(cm)
fpr = re.FPR_CM(cm)
#print(tpr, fpr)
plt.plot(fpr, tpr,'b.-',linewidth=1)
plt.show()
Beside, I want to the the right angle line between points like that.is there a func in matplotlib?
Using your current code, I suggest adding the x values to an array and the y values to another array. You could also use something like: ArrayName = [[],[]], then append the x and y values to ArrayName[0] and ArrayName[1], respectively. Not only would this actually work, but it would be slightly faster, since the plt.plot and plt.scatter functions work faster plotting all the points at once instead of through a for loop.
If you don't want to plot the points connected with lines, I still suggest using an array since that would be faster. (It wouldn't be that much faster in this case, but it's a good habit to have.
I am plotting both a distribution of test scores and a fitted curve to these test scores:
h = sorted(data['Baseline']) #sorted
fit = stats.norm.pdf(h, np.mean(h), np.std(h))
plt.plot(h,fit,'-o')
plt.hist(h,normed=True) #use this to draw histogram of your data
plt.show()
The plot of the pdf, however, does not look normal (see kink in curve near x=60). See output:
I'm not sure what is going on here...any help appreciated. Is this because the normal line is being drawn between supplied observations? Can provide you with the actual data if needed, there are only 60 observations.
Yes, you evaluate the norm-pdf on the overservation. You would instead want to create some other data like
h = sorted(data['Baseline']) #sorted
x = np.linspace(h.min(), h.max(), 151)
fit = stats.norm.pdf(x, np.mean(h), np.std(h))
plt.plot(x,fit,'-')
plt.hist(h,normed=True)
plt.show()
Note however, that the data does not look normally distributed at all. So potentially you would rather fit a different distribution, or maybe perform a kernel density estimate.
Let us assume I have data x with an error Sx which I want to plot with the method errorbar. Now I am wondering what happens if I rescale to logarithmic scale, does it do the error correctly? The error propagation should go
f(x) = log(x)
=> Sf = |Sx / x|
I could imagine that matplotlib just does
Sf = log Sx
which would be totally wrong. So, what is matplotlib actually doing?
Indeed, it puzzles me as well. Imagine that I have a file contains a set of data:
xi, yi(xi), sigma(yi) ; i=1,2,....,N
where sigma(yi) is the one standard error of yi(xi). Now, suppose I plot this data using matplotlib, where both x-scale and y-scale are linear. Certainly, the marks on the y axis will be one at yi(xi) - sigma(yi) and another at yi(xi) + sigma(yi). The difference of them is sigma(yi).
The question is, if I set
ax.set_yscale("log")
then, will I see the marks on the log10(y) axis being one at log10( yi(xi)-sigma(yi) ) and another at log10( yi(xi)+sigma(yi) ) ?
However, the above error is not true, since the error of log10(yi(xi)) is certainly not simply as log10( sigma(yi)), instead, error propagation has to be made, via
sigma( log10(yi) )= log10(e) * | sigma(yi)/yi |
So, does anyone know, will error propagation be done while plotting the data with yerrorbars in log y scale?
The way errorbar works is (more-or-less) at each point where you want an errobar drawn it puts a mark at y + err_p and y - err_n in data coordinates. The log scale is applied as part of the transformation from data space -> screen space.
This is rather unambiguously the right thing for a plotting library to do. What you seem to want is propagate error through some computations which requires knowing what the computations are (so you can get all the partials) and is not the business mpl is in. Maybe take a look at sympy.
I'm trying to get a nice upsampler using Python when I have non-uniform spaced inputs. Any suggestions would be helpful. I've tried a number of interp functions. Here's an example:
from scipy.interpolate import InterpolatedUnivariateSpline
from numpy import linspace, arange, append
from matplotlib.pyplot import plot
F=[0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M=[0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
ff=linspace(F[0],F[1],10)
for i in arange(2, len(F)):
ff=append(ff,linspace(F[i-1],F[i], 10))
aa=InterpolatedUnivariateSpline(x=F,y=M,k=2);
mm=aa(ff)
plot(F,M,'r-o'); plot(ff,mm,'bo'); show()
This is the plot I get:
I need to get interpolated values that don't go below 0. Note that the blue dots go below zero. The red line represents the original F vs. M data. If I use k=1 (piece-wise linear interp) then I get good values as shown here:
aa=InterpolatedUnivariateSpline(x=F,y=M,k=1)
mm=aa(ff); plot(F,M,'r-o');plot(ff,mm,'bo'); show()
The problem is that I need to have a "smooth" interpolation and not the piece-wise value. Does anyone know if the bbox argument in InterpolatedUnivarientSpline helps to fix that? I cant find any documentation on what bbox does. Is there another easier way to accomplish this?
Thanks in advance for any help.
Positivity-preserving interpolation is hard (if it wasn't, there wouldn't be a bunch of papers written about it). The splines of low degree (2, 3) usually do pretty well in this regard, but your data has that large gap in it, and it happens to be at the end of data range, making things worse.
One solution is to do interpolation in two steps: first upsample the data by piecewise linear interpolation, then interpolate new data with a smooth spline (I'll use cubic spline below, though quadratic also works).
The gap_size array records how large each gap is, relative to the smallest one. In subsequent loop, uniformly spaced points are replaced in large gaps (those that are at least twice the size of smallest one). The result is F_new, a nearly-uniform better grid that still includes the original points. The corresponding M values for it are generated by a piecewise linear spline.
Subsequent cubic interpolation produces a smooth curve that stays positive.
F = [0, 1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,22050]
M = [0.,2.85,2.49,1.65,1.55,1.81,1.35,1.00,1.13,1.58,1.21,0.]
gap_size = np.diff(F) // np.diff(F).min()
F_new = []
for i in range(len(F)-1):
F_new.extend(np.linspace(F[i], F[i+1], gap_size[i], endpoint=False))
F_new.append(F[-1])
pl_spline = InterpolatedUnivariateSpline(F, M, k=1);
M_new = pl_spline(F_new)
smooth_spline = InterpolatedUnivariateSpline(F_new, M_new, k=3)
ff = np.linspace(F[0], F[-1], 100)
plt.plot(F, M, 'ro')
plt.plot(ff, smooth_spline(ff), 'b')
plt.show()
Of course, no tricks can hide the truth that we don't know what happens between 5500 and 22050 (Hz, I presume), the nearly-linear part is just a placeholder.
I have plotted some experimental data in Python and need to find a cubic fit to the data. The reason I need to do this is because the cubic fit will be used to remove background (in this case resistance in a diode) and you will be left with the evident features. Here is the code I am currently using to make the cubic fit in the first place, where Vnew and yone represent arrays of the experimental data.
answer1=raw_input ('Cubic Plot attempt?\n ')
if answer1 in['y','Y','Yes']:
def cubic(x,A):
return A*x**3
cubic_guess=array([40])
popt,pcov=curve_fit(cubic,Vnew,yone,cubic_guess)
plot(Vnew,cubic(Vnew,*popt),'r-',label='Cubic Fit: curve_fit')
#ylim(-0.05,0.05)
legend(loc='best')
print 'Cubic plotted'
else:
print 'No Cubic Removal done'
I have knowledge of curve smoothing but only in theory. I do not know how to implement it. I would really appreciate any assistance.
Here is the graph generated so far:
To make the fitted curve "wider", you're looking for extrapolation. Although in this case, you could just make Vnew cover a larger interval, in which case you'd put this before your plot command:
Vnew = numpy.linspace(-1,1, 256) # min and max are merely an example, based on your graph
plot(Vnew,cubic(Vnew,*popt),'r-',label='Cubic Fit: curve_fit')
"Blanking out" the feature you see, can be done with numpy's masked arrays but also just by removing those elements you don't want from both your original Vnew (which I'll call xone) and yone:
mask = (xone > 0.1) & (xone < 0.35) # values between these voltages (?) need to be removed
xone = xone[numpy.logical_not(mask)]
yone = yone[numpy.logical_not(mask)]
Then redo the curve fitting:
popt,_ = curve_fit(cubic, xone, yone, cubic_guess)
This will have fitted only to the data that was actually there (which aren't that many points in your dataset, from the looks of it, so beware!).