Curve fitting in python - python

Hey,
I have a set of values for frequency and power spectrum and I have to plot Power spectrum Versus frequency on log scale. Once done, I need to pass the best fit straight line through it.. I get the line on a linear scale.. but when I try to superimpose it onto the freq-power spectrum plot, the resultant plot does not show any line, instead the data points of 1st plot are merely shifted in space.
Also, the same line, if plotted on log scale using loglog function, does not show up.
Can somebody tell me what I should do in order to get the line on a Log scale?
SO I have a file having three columns; Frequency, Power spec. Power signal.. Here is a piece of what i wrote to plot the data and line..
#initialize all variables to 0
#open the data file
while 1:
ln = datafile.readline()
if ln:
data = ln.split()
x = float(n)
y = float(data[0])
z = float(data[1])
xval.append(float(n))
yval.append(y)
zval.append(z)
n += 1
sum_z += z
sum_y += y
sum_y_squared += y*y
sum_yz += y*z
else:
break
datafile.close()
# calculate slope and intercept using formulae
for num in xval:
res = intercept + slope*num
line.append(res)
#Plot data
pylab.figure(0)
matplotlib.pylab.loglog(yval,zval)
#Plot line
pylab.figure(0)
pylab.plotloglog(line)

Despite the fact that the plot line commands are not correct in your example I assume it is similar to what you actually do.
The second plot command plots on a different x range:
loglog(yval,zval) # plot yval vs zval
loglog(line) # plots range(0,len(line)) vs line
Also have you look at the values of line, do they make sense are they in the same range as yval, zval?
Additionally you might want to use numpy.loadtxt to load your data file.

As I understand your problem, you want to plot two lines to the same diagram. Here is how it is done in general:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(line1_x, line1_y)
ax.plot(line2_x, line2_y)
ax.set_yscale("log")
So, first you put them both in the same Axes, so they appear in the same diagram. TO modify the scaling, you can use set_xscale and set_yscale respectively.
Apart from that, I cannot help but notice that your code for reading the file is horrible. As #Bernhard suggests in his answer, try using numpy.loadtxt. This could look like this:
data = numpy.loadtxt("data.txt")
n = len(data)
x = numpy.arange(n)
sum_z = sum(data.T[1])
sum_y = sum(data.T[0])
sum_y_squared = sum(data.T[0]**2)
sum_yz = sum(data.T[0]*data.T[1])
This should give you the same results as your loop, only it is much more concise. I strongly recommend you read the Tentative NumPy Tutorial, as it explain a lot of the really cool features of numpy arrays.

Related

Plotting periods of a trig function in matplotlib

I am writing some simple scripts to plot a graph given a trigonometric function (in this example, a sine).
My issue is that I'd like to plot JUST two periods of the given trig function. To clarify, in trigonometry a Period is the length (on a graph) that ONE wave takes up. For sin and cos, one period is 2pi.
I'd like to take my existing code, and (preferably) using matplotlib, plot two periods of my given trig function, and line up a couple of points on the graph with a couple of points on my function.
If it's possible, I would like to be able to plot my function so that the start of the first period lines up with my first label, the highest point of the first period lines up with the second label, the point where my function crosses the x-axis with the third label, the lowest point with the fourth label, and the end of my first period/beginning of my second period with the fifth label. This pattern would then repeat for the second period. From here on, I'm going to refer to the x labels as the "Period Markings".
I've come up with three possible solutions for this:
I could set the borders of my graph (in this case x = -4 and x = 4) to be labeled as the first and ninth Period Markings respectively, then constrain my function to just be within the graph somehow.
I could somehow set a parameter in matplotlib to only plot 4pi (the length of two periods) units worth of line, although in that case, however, I don't think that the Period Markings would match up with their desired points.
If matplotlib supports it, I could find the low points, x-intercepts, and high points of the graph, then assign my Period Markers to each one from left to right. This would have the advantage of removing the necessity to plot ONLY two periods, as the Period Markers would dictate the beginning and end of the two periods.
Below I've inserted a couple of things:
A copy of the plotting part of my code, containing a sample equation and some sample Period Markings
A screenshot of the graph of the given sample equation
A visual representation of where each Period Marking would line up with, ideally, as well as a line demarcating an estimation of two full periods.
The standard form of a sin function is y = aSIN(bx-c)+d. The equation here is just sin(x), but you can see how variables c and d play a role in determining the graph. Usually, the xlabels array would be filled in with variables that are determined earlier in the script, as would all the variables at the top (func, a, b, c, d).
import math
import matplotlib.pyplot as plt
import numpy as np
func = sin
a = 1
b = 1
c = 0
d = 0
xlabels = np.array(['-2pi', '-3pi/2', '-pi', '-pi/2',
'0','pi/2', 'pi', '3pi/2','2pi'])
xlabelcount = -4, -3, -2, -1, 0, 1, 2, 3, 4
x = np.arange(-4, 4, 0.01)
if func == 'sin':
ypoints = a*np.sin(2*x-c)+d
if func == 'cos':
ypoints = a*np.cos(2*x+c)+d
if b < 0:
plt.gca().invert_yaxis()
plt.title('Wave Function')
plt.xlabel('Period (Not to Scale)')
plt.ylabel('Amplitude')
plt.grid(True, which='both')
plt.axhline(y=0, color='k')
plt.plot(x, ypoints)
plt.xticks(ticks=xlabelcount,labels=xlabels)
plt.show()
Plot of sin(x)
Preferred Period Marking placements
I hope this can provide a comprehensive understanding of the issue I face, and any help would be greatly appreciated. I feel that I've done a fair amount of Googling around, but nothing has yielded a good answer. I apologize in advance if I'm missing something really obvious.
Thanks,
dreadlearner
If I understand this correctly, you would like to add points on the curve at certain predefined locations on x-axis (period markings). If this is correct, the best way is to evaluate the value of the function at those particular "period markings" and plot this as a single point. Something like:
fn = "sin"
if fn == "sin":
fn = np.sin
elif fn == "cos":
fn = np.cos
# if required, the next three statements can be
# customized for each function by shifting them
# inside the if ... else blocks
x = np.linspace(-2*np.pi, 2*np.pi, 1000)
points = [i * np.pi/2 for i in range(-4, 5)]
labels = ["-2π", "-3π/2", "-π", "-π/2", "0", "π/2", "π", "3π/2", "2π"]
fig, ax = plt.subplots()
ax.plot(x, fn(x))
ax.set_xticks(points)
ax.set_xticklabels(labels)
# the next line is what you probably want
for pt in points:
ax.plot(pt, fn(pt), "ok")
ax.hlines(0, x[0], x[-1], "r")
plt.show()
Looks like this:

Why I use matplotlib.pyplot(plt) lib to show some points but it cannot show line between the points

There is a for-loop in my part of code, and every step it can generate new tpr(as X), fpr(as Y) like that
0.05263157894736842 0.1896551724137931
0.06578947368421052 0.19540229885057472
0.07894736842105263 0.22988505747126436
0.07894736842105263 0.25862068965517243
0.07894736842105263 0.28735632183908044
I want collect all these points and get a full plot, but it didn't work. And my code are attached below
for i in range (-30,20):
predicted = (np.sign(t+i*1e-4)+1)/2.
vals, cm = re.get_CM_vals(y_test, predicted)
tpr = re.TPR_CM(cm)
fpr = re.FPR_CM(cm)
#print(tpr, fpr)
plt.plot(fpr, tpr,'b.-',linewidth=1)
plt.show()
Beside, I want to the the right angle line between points like that.is there a func in matplotlib?
Using your current code, I suggest adding the x values to an array and the y values to another array. You could also use something like: ArrayName = [[],[]], then append the x and y values to ArrayName[0] and ArrayName[1], respectively. Not only would this actually work, but it would be slightly faster, since the plt.plot and plt.scatter functions work faster plotting all the points at once instead of through a for loop.
If you don't want to plot the points connected with lines, I still suggest using an array since that would be faster. (It wouldn't be that much faster in this case, but it's a good habit to have.

How can I detect periodicity using auto-correlation automatically?

This is my code:
import matplotlib.pyplot as plt
import numpy as np
from pandas.plotting import autocorrelation_plot
y = np.sin(np.arange(1,6*3.14,0.1))
autocorrelation_plot(y)
plt.show()
And this is the output of the auto-correlation plot:
auto-correlation plot of y
I would like to figure out a way to classify whether the function is periodic or not automatically (without using the bare-eye to look at the autocorrelation plot). I read that it is related to the confidence interval which is the line shown in the attached plot, but still have doubt on what I should do with it to better decide. So is there an automated way for using auto-correlation to decide the perdiodicity of the data?
Though, this is my try for an automated way:
result = np.correlate(y, y, mode = "full")
ACF = result[np.round(result.size/2).astype(int):]
ACF = ACF/ACF[0]
acceptedVar = []
for i in range(len(ACF)):
if ACF[i] > 0.05:
acceptedVar = np.append(acceptedVar, ACF[i])
percent = len(acceptedVar)/len(ACF) * 100
I just made a threshold of 0.05 to detect the points for which the confidence interval is 95%. Don't know if this is right or wrong statistically and logically. I then see if percent is bigger than 95% for a periodic pattern. I'm not sure of that as well.
Credit to: the first answer to How can I use numpy.correlate to do autocorrelation?
To start, with e.g. ax = autocorrelation_plot(y) you can use ax.lines[5].get_data()[1] to use the values from the pandas autocorrelation function directly.
This may be a somewhat naïve solution, but say you are just looking for the first, most significant, periodicity, you could just grab the first index of the highest peak in the plot:
first_max = np.argmax(autocorr) + 1
Which gives you the lag for which autocorrelation is highest = the period of interest (in units of your data's sampling interval.)
Say you wanted the next most significant period:
second_max = np.argmax(autocorr[first_max:]) + first_max + 1
And so on and so forth...
To note: this wouldn't work as well if your data is not as regular and periodic as it seems to be from your autocorrelation plot.

Interpolating 1D nonfunction data points

I am having difficulties finding an interpolation for my data points. The line should slightly resemble a negative inverse quadratic (ie like a backwards 'c').
Since this is not a function (x can have multiple values of y), I am not sure what interpolation to use.
I was thinking that perhaps I should flip the axis to create the interpolation points/line using something like UnivariateSpline and then flip it back when I am plotting it?
This is a graph of just the individual points:
Here is my code:
import datetime as dt
import matplotlib.pyplot as plt
from scipy import interpolate
file = open_file("010217.hdf5", mode = "a", title = 'Sondrestrom1')
all_data = file.getNode('/Data/Table Layout').read()
file.close()
time = all_data['ut1_unix'] #time in seconds since 1/1/1970
alt = all_data['gdalt'] #all altitude points
electronDens = all_data['nel'] #all electron density points
x = []
y = []
positions = []
for t in range(len(time)): #Looking at this specific time, find all the respective altitude and electron density points
if time[t] == 982376726:
x.append(electronDens[t])
y.append(alt[t])
positions.append(t)
#FINDING THE DATE
datetime1970 = dt.datetime(1970,1,1,0,0,0)
seconds = long(time[t])
newDatetime = datetime1970 + dt.timedelta(0, seconds)
time1 = newDatetime.strftime('%Y-%m-%d %H:%M:%S')
title = "Electron Density vs. Altitude at "
title += time1
plt.plot(x,y,"o")
plt.title(title)
plt.xlabel('Electron Density (log_10[Ne])')
plt.ylabel('Altitude (km)')
plt.show()
As the graph heading says "electron density vs. Altidude", I suppose there's only one value per point on the vertical axis?
This means you are actually looking at a function that has been flipped, in order to make the x axis vertical because having altitude on the vertical axis is just more intuitive to humans.
Looking at your code, there seems to have been a measurement where both altitude and electron density were measured. Therefore, even if my theory above is wrong, you should still be able to interpolate everything in the time domain and create a spline from that.
... that's if you really want to have a curve that goes exactly through every point.
Seeing as how much scatter there is in the data, you should probably go for a curve fit that doesn't exactly replicate every measurement:
scipy.interpolate.Rbf should work alright, and again, for this you should switch the axes, i.e. compute electron density as function of altitude. Just be sure to use smooth=0.01 or maybe a little more (0.0 will exactly go through every point and look a little silly on noisy data).
... actually it seems most of your problem is understanding your data better :)

retrieving data from a plot in python?

suppose I have
t= [0,7,10,17,23,29,31]
f_t= [4,3,11,19,12,9,17]
and I have plotted f_t vs t.
Now from plotting these 7 data points, I want to retrieve 100 data points and save them in a text file. What do I have to do?
Note that I am not asking about the fitting of the plot; I know between two points the plot is linear.
What I am asking If I create a array like t=np.arange(0,31,.1), then what is the corresponding array of f_t which agrees well with the previous plot, i.e., for any t between t=0 to t=7, f_t will be determined by using a straight line connecting (0,4) and (7,3), and so on.
You should use a linear regression, that gives you a straight line formula, in which you can grasp as many points as you want.
If the line is more of a curve, then you should try to have a polynomial regression of higher degree.
ie:
import pylab
import numpy
py_x = [0,7,10,17,23,29,31]
py_y = [4,3,11,19,12,9,17]
x = numpy.asarray(py_x)
y = numpy.asarray(py_y)
poly = numpy.polyfit(x,y,1) # 1 is the degree here. If you want curves, put 2, 3 or 5...
poly is now the polynome you can use to calculate other points with.
for z in range(100):
print numpy.polyval(poly,z) #this returns the interpolated f(z)
The function np.interp will do linear interpolation between your data points:
f2 = np.interp(np.arange(0,31,.1), t, ft)

Categories

Resources