I am using the following code to draw a curve from my two column Raw data ( x=time , y=|float data|).The graph it is plotting is a rough edge graph. Is it possible to have a smooth edged on these data? I am attaching the code, data and curve.
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates
from matplotlib import style
# changing matplotlib the default style
matplotlib.style.use('ggplot')
#one of {'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}
plt.rcParams['lines.linewidth']=1
plt.rcParams['axes.facecolor']='.3'
plt.rcParams['xtick.color']='b'
plt.rcParams['ytick.color']='r'
x,y= np.loadtxt('MaxMin.txt', dtype=str, unpack=True)
x = np.array([datetime.strptime(i, "%H:%M:%S.%f") for i in x])
y = y.astype(float)
# naming the x axis
plt.xlabel('<------Clock-Time(HH:MM:SS)------>')
# naming the y axis
plt.ylabel('Acceleration (m/sq.sec)')
# giving a title to my graph
plt.title('Sample graph!')
# plotting the points
plt.plot(x, y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
#Custom Format
loc = matplotlib.dates.MicrosecondLocator(1000000)
plt.gca().xaxis.set_major_locator(loc)
plt.gca().xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%H:%M:%S'))
# function to show the plot
plt.show()
I have searched similar threads but the mathematical concepts used by them went over my head. So I cannot identify what exactly has to be done for my data.
Generated Graph from RAW data
I am also giving the sample data file so that you can re-construct it at your end.
Get Data File
PS. I am also not being able to change the line color in the graph from default red even after using
plt.rcParams['lines.color']='g'
Although that is a minor issue in this case.
The input data has wrong timestamps, the original author should have used zero-padding when formatting the milliseconds (%03d).
[...]
10:27:19.3 9.50560385141
10:27:19.32 9.48882194058
10:27:19.61 9.75936468731
10:27:19.91 9.96021690527
10:27:19.122 9.48972151383
10:27:19.151 9.49265161533
[...]
We need to fix that first:
x, y = np.loadtxt('MaxMin.txt', dtype=str, unpack=True)
# fix the zero-padding issue
x_fixed = []
for xx in x:
xs = xx.split(".")
xs[1] = "0"*(3-len(xs[1])) + xs[1]
x_fixed.append(xs[0] + '.' + xs[1])
x = np.array([datetime.strptime(i, "%H:%M:%S.%f") for i in x_fixed])
y = y.astype(float)
You can then use a smoothing kernel (e.g. moving average) to smooth the data:
window_len = 3
kernel = np.ones(window_len, dtype=float)/window_len
y_smooth = np.convolve(y, kernel, 'same')
The scipy module has some ways of getting smooth curves through your points. Try adding this to the top:
from scipy import interpolate
Then add these lines just before your plt.show():
xnew = np.linspace(x.min(), x.max(), 100)
bspline = interpolate.make_interp_spline(x, y)
y_smoothed = bspline(xnew)
plt.plot(xnew, y_smoothed)
If you do a little search for scipy.interpolate.make_interp_spline, you can find more info on what that does. But essentially, the combination of that and np.linspace generates a bunch of fake data points to make up a smooth curve.
Related
I'm trying to generate a Fit for the Data I have The Data
The Sample when Plotted directly is as follows: Sample Data
I've been trying to generate a Polynomial fit for this Data where T = Time in days & IC/IC100 is the data corresponding,
I've used 2 methods to generate the Polynomial Fit
1 Using Polyfit & Poly1D
Here is my code for this approach
import math
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
sns.set(style="darkgrid")
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 100
from scipy.stats import sem
from scipy import optimize
from scipy.optimize import curve_fit
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
IC_M = pd.read_csv("TvsC100_MES.csv")
IC_M.set_index('Group/#/', inplace=True)
IICM_1 = IC_M[0:5]
IICM_1
# DEGREE = 2
mymodel = np.poly1d(np.polyfit(IICM_1["IC/IC100"],IICM_1["T"], 2))
figure(figsize=(12, 8), dpi=100)
plt.plot(IICM_1["T"], IICM_1["IC/IC100"], marker = 'o', label = 'Original Plot', c = 'blue')
plt.plot(mymodel(IICM_1["IC/IC100"]),IICM_1["IC/IC100"], marker = 'x', label = 'New Y', color = 'red')
#plt.plot(mymodel(new_y),new_y, marker = 'x', label = 'New Y', color = 'red')
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.show()`
when i plot the graph i get this error in the graph, where one point is off, its not supposed to be like that, i haven't been able to fix this error... The behavior of this co-relation isn't the same during experimentation of the recorded values
The Error
The Second Method i used was using Polyfit,polytransform & predict
in this method, The coefficients are being generated for each point and the fit is as per each point and not the line as a whole (e.g. the equation is Y(X) = AX^2 + BX + C, ABC should remain constant for all points and that is the fit i am looking for.. If i were to extend the values of Y, i'm supposed to find the next predicted value according to the sample data, unfortunately this isnt the case...
here is my code:
just the main part differes after inputing the data from before...
`
# Poly Creation
# The degree here is in format (min degree, max degree) = according to our selection of (2,2), we are removing the term without the Degree^2
poly = PolynomialFeatures(degree=(2,2), include_bias= False)
# Actually Transforming the Data & applying the Polynomial Function to it,,
poly_features = poly.fit_transform(np.array(IICM_1["IC/IC100"]).reshape(-1,1)) # Y
#Creating an Instance of the Linear Regression Model
poly_reg_model = LinearRegression(fit_intercept = False, positive = True)
# Fitting is the procedure where we Train the Model based on X(input) & Y(response) to solve for the Coefficients during these Values
# y = A*(X) + C
poly_reg_model.fit(poly_features, np.array(IICM_1["T"]).reshape(-1,1)) #X
y_predicted = poly_reg_model.predict(poly_features)
figure(figsize=(12, 8), dpi=100)
# points + Curve
plt.plot(IICM_1["T"], IICM_1["IC/IC100"] ,marker = 'o', label = "Samp: C/3005-1", color = "blue")
plt.plot(y_predicted, IICM_1["IC/IC100"] ,marker = 'x', label = "Samp: Prediction", color = "red")
plt.legend()
plt.xlabel("T")
plt.ylabel("IC/IC100")
plt.show()`
This output i get, which is incorrect...IMO
I need to fix this, either my understanding of Polynomial is incorrect or maybe i'm using this function incorrectly or something else... How can i approach & fix this issue..?
I tried changing the Input order for the functions thinking that it will consider the points as a single line and not as individual lines, but the results were bad...
I need to fix this, either my understanding of Polynomial is incorrect or maybe i'm using this function incorrectly or something else... How can i approach & fix this issue..?
I'm trying to plot data with different colors depending on their classification. The data is in an nx3 array, with the first column the x position, the second column the y position, and the third column an integer defining their categorical value. I can do this by running a for loop over the entire array and plotting each point individually, but I have found that doing so massively slows down everything.
So, this works.
data = np.loadtxt('data.csv', delimiter = ",")
colors = ['r', 'g', 'b']
fig = plt.figure():
for i in data:
plt.scatter(i[0], i[1], color = colors[int(i[2] % 3]))
plt.show()
This does not work, but I want it to, as something along this line would avoid using a for loop.
data = np.loadtxt('data.csv', delimiter = ",")
colors = ['r', 'g', 'b']
fig = plt.figure():
plt.scatter(data[:,0], data[:,1], color = colors[int(data[:,2]) % 3])
plt.show()
Your code doesn't work because your x and y values are arrays from the data while color is not. So, you have to define it as an array. Just a look at the matplotlib page:
https://matplotlib.org/stable/gallery/shapes_and_collections/scatter.html They have this example there:
import numpy as np
import matplotlib.pyplot as plt
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = (30 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
Here, you have the same x and y. Probably, you won't need s. Color is an array. You can do something as follows:
colors = ['r', 'g', 'b']
colors_list = [colors[int(i) % 3] for i in data[:,2]]
plt.scatter(data[:,0], data[:,1], c = colors_list)
Just note that since I don't have the data to test it, you may need to tweak the code just in case.
Without seeing your array it's hard to know exactly what your data look like (my answer doesn't have a %3, but that's easy enough to insert depending on what data[:,2] looks like). This has a for loop, but only loops 3 times so will be fast.
for ind,col in enumerate(colors):
plt.scatter(data[:,0][data[:,2]==ind], data[:,1][data[:,2]==ind], c = col)
I am trying to label the intersection of two lines in a plot I have made. The code/MWE is:
import matplotlib.pyplot as plt
import numpy as np
#ignore my gross code, first time ever using Python :-)
#parameters
d = 0.02
s = 0.50 #absurd, but dynamics robust to 1>s>0
A = 0.90
u = 0.90
#variables
kt = np.arange(0, 50, 1)
invest = (1 - np.exp(-d*kt))*kt
output = A*u*kt
saving = s*output
#plot
plt.plot(kt, invest, 'r', label='Investment')
plt.plot(kt, output, 'b', label='Output')
plt.plot(kt, saving, label='Saving')
plt.xlabel('$K_t$')
plt.ylabel('$Y_t$, $S_t$, $I_t$')
plt.legend(loc="upper left")
#Steady State; changes with parameters
Kbar = np.log(1-s*A*u)/-d
x, y = [Kbar, Kbar], [0, s*A*u*Kbar]
plt.plot(x, y, 'k--')
#custom axes (no top and right)
ax = plt.gca()
right_side = ax.spines["right"]
right_side.set_visible(False)
top_side = ax.spines["top"]
top_side.set_visible(False)
#ax.grid(True) #uncomment for gridlines
plt.xlim(xmin=0) #no margins; preference
plt.ylim(ymin=0)
plt.show()
which creates:
I am trying to create a little label at the bottom of the dotted black line that says "$K^*$". I want it to coincide with Kbar so that, like the black line, it moves along with the parameters. Any tips or suggestions here?
I don't quite understand what you mean by "under the black dotted line", but you can already use the coordinate data of the dotted line to annotate it. I put it above the intersection point, but if you want to put it near the x-axis, you can set y=0.
plt.text(max(x), max(y)+1.5, '$K^*$', transform=ax.transData)
baseTicks=list(plt.xticks()[0]) #for better control, replace with a range or arange
ax.set_xticks(baseTicks+[np.log(1-A*u*s)/(-d)])
ax.set_xticklabels(baseTicks+['$K^*$'])
I am attempting to plot various points on a log log plot representing limits. I am using errorbar from matplotlib. However, the size of arrows varies from point to point. How can I generate limit arrows of constant size?
My code is as follows:
from math import pi
import numpy as np
import pylab as pl
x_1 = np.arange(0.,10.)
y_1 = np.arange(11.,20.)
x_1_avg = np.sum(x_1)/len(x_1)
y_1_avg = np.sum(y_1)/len(y_1)
x_2 = np.arange(11.,20.)
y_2 = np.arange(21.,30.)
x_2_avg = np.sum(x_2)/len(x_2)
y_2_avg = np.sum(y_2)/len(y_2)
pl.yscale('log')
pl.xscale('log')
pl.errorbar(x_1_avg, y_1_avg, yerr = 2, color = 'g', lolims=-y_1_avg)
pl.errorbar(x_2_avg, y_2_avg, yerr = 2, color = 'r', lolims=-y_2_avg)
pl.savefig('test.eps')
pl.show()
The fact that you're using a log scale means that the length of a line on the plot will change based on where it's plotted. Also, lolims is a boolean, so I don't think you want to pass it a numerical value. Anyhow, you can compensate for the length change by using a value for yerr that is proportional to the y coordinate of the errorbar.
pl.errorbar(x_1_avg, y_1_avg, yerr = y_1_avg * .5, color = 'g', lolims=True)
pl.errorbar(x_2_avg, y_2_avg, yerr = y_2_avg * .5, color = 'r', lolims=True)
I am trying to make a contour plot of the following data using matplotlib in python. The data is of this form -
# x y height
77.23 22.34 56
77.53 22.87 63
77.37 22.54 72
77.29 22.44 88
The data actually consists of nearly 10,000 points, which I am reading from an input file. However the set of distinct possible values of z is small (within 50-90, integers), and I wish to have a contour lines for every such distinct z.
Here is my code -
import matplotlib
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import csv
import sys
# read data from file
data = csv.reader(open(sys.argv[1], 'rb'), delimiter='|', quotechar='"')
x = []
y = []
z = []
for row in data:
try:
x.append(float(row[0]))
y.append(float(row[1]))
z.append(float(row[2]))
except Exception as e:
pass
#print e
X, Y = np.meshgrid(x, y) # (I don't understand why is this required)
# creating a 2D array of z whose leading diagonal elements
# are the z values from the data set and the off-diagonal
# elements are 0, as I don't care about them.
z_2d = []
default = 0
for i, no in enumerate(z):
z_temp = []
for j in xrange(i): z_temp.append(default)
z_temp.append(no)
for j in xrange(i+1, len(x)): z_temp.append(default)
z_2d.append(z_temp)
Z = z_2d
CS = plt.contour(X, Y, Z, list(set(z)))
plt.figure()
CB = plt.colorbar(CS, shrink=0.8, extend='both')
plt.show()
Here is the plot of a small sample of data -
Here is a close look to one of the regions of the above plot (note the overlapping/intersecting lines) -
I don't understand why it doesn't look like a contour plot. The lines are intersecting, which shouldn't happen. What can be possibly wrong? Please help.
Try to use the following code. This might help you -- it's the same thing which was in the Cookbook:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
# with this way you can load your csv-file really easy -- maybe you should change
# the last 'dtype' to 'int', because you said you have int for the last column
data = np.genfromtxt('output.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter='|')
# just an assigning for better look in the plot routines
x = data['x']
y = data['y']
z = data['z']
# just an arbitrary number for grid point
ngrid = 500
# create an array with same difference between the entries
# you could use x.min()/x.max() for creating xi and y.min()/y.max() for yi
xi = np.linspace(-1,1,ngrid)
yi = np.linspace(-1,1,ngrid)
# create the grid data for the contour plot
zi = griddata(x,y,z,xi,yi)
# plot the contour and a scatter plot for checking if everything went right
plt.contour(xi,yi,zi,20,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.show()
I created a sample output file with an Gaussian distribution in 2D. My result with using the code from above:
NOTE:
Maybe you noticed that the edges are kind of cropped. This is due to the fact that the griddata-function create masked arrays. I mean the border of the plot is created by the outer points. Everything outside the border is not there. If your points would be on a line then you will not have any contour for plotting. This is kind of logical. I mention it, cause of your four posted data points. It seems likely that you have this case. Maybe you don't have it =)
UPDATE
I edited the code a bit. Your problem was probably that you didn't resolve the dependencies of your input-file correctly. With the following code the plot should work correctly.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.mlab import griddata
import csv
data = np.genfromtxt('example.csv', dtype=[('x',float),('y',float),('z',float)],
comments='"', delimiter=',')
sample_pts = 500
con_levels = 20
x = data['x']
xmin = x.min()
xmax = x.max()
y = data['y']
ymin = y.min()
ymax = y.max()
z = data['z']
xi = np.linspace(xmin,xmax,sample_pts)
yi = np.linspace(ymin,ymax,sample_pts)
zi = griddata(x,y,z,xi,yi)
plt.contour(xi,yi,zi,con_levels,linewidths=1)
plt.scatter(x,y,c=z,s=20)
plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)
plt.show()
With this code and your small sample I get the following plot:
Try to use my snippet and just change it a bit. For example, I had to change for the given sample csv-file the delimitter from | to ,. The code I wrote for you is not really nice, but it's written straight foreword.
Sorry for the late response.