Plot smooth line with PyPlot - python

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.

You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:

For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.

See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.

One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)

I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).

Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()

It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.

Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

Related

Weighted empirical distribution function (ECDF) in python

I am trying to generate weighted empirical CDF in python. I know statsmodel.distributions.empirical_distribution provides an ECDF function, but it is unweighted. Is there a library that I can use or how can I go about extending this to write a function which calculates the weighted ECDF (EWCDF) like ewcdf {spatstat} in R.
Seaborn library has ecdfplot function which implements a weighted version of ECDF. I looked into the code of how seaborn calculates it.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sample = np.arange(100)
weights = np.random.randint(10, size=100)
estimator = sns.distributions.ECDF('proportion', complementary=True)
stat, vals = estimator(sample, weights=weights)
plt.plot(vals, stat)
Seaborn provides ecdfplot which allows you to plot a weighted CDF. See seaborn.ecdf. Based on deepAgrawal's answer, I adapted it a little bit so that what's plotted is CDF rather than 1-CDF.
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sample = np.arange(15)
weights = np.random.randint(5, size=15)
df = pd.DataFrame(np.vstack((sample, weights)).T, columns = ['sample', 'weights'])
sns.ecdfplot(data = df, x = 'sample', weights = 'weights', stat = 'proportion', legend = True)
def ecdf(x):
Sorted = np.sort(x)
Length = len(x)
ecdf = np.zeros(Length)
for i in range(Length):
ecdf[i] = sum(Sorted <= x[i])/Length
return ecdf
x = np.array([1, 2, 5, 4, 3, 6, 7, 8, 9, 10])
ecdf(x)

polynomial fitting in a semilogy plot in python

I am trying to get a polynomial fit for my data. Currently, I am using polyfit from numpy to get the best fit in a loglog plot. But my goal is to get the data fit in a semilogy plot. My code looks as follows:
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
import scipy.optimize as optimization
l = [ 0.006, 0.01, 0.014, 0.024, 0.0346, 0.049, 0.0535, 0.0736, 0.11 ]
f = [5.3375903383330048, 60.531976422513054, 89.111502526131474, 47.132498501584969, 17.447001214543118, 5.2583622688081455, 3.7779565652126865, 1.0621247249682186, 0.1922152085619766]
logx = np.log(l)
logy = np.log(f)
coeffs = np.polyfit(logx,logy,deg=3)
poly = np.poly1d(coeffs)
yfit = lambda x: np.exp(poly(np.log(x)))
plt.loglog(l,yfit(l), ':')
plt.loglog(l,f, 'o')
plt.show()
I would appreciate if you suggest what changes do I have to make to get a semilogy best fit curve. Also if there is any other package in python, please mention them too.
I think this.
# log sacle
x2 = np.linspace(np.min(l), np.max(l), 1000)
y2log = poly(np.log(x2))
plt.loglog(x2,np.exp(y2log), ':')
plt.loglog(l,f, 'o')
plt.show()

Python - Plotting confidence error bars with Maxwell Distribution

i've never tried implementing error bars based off of confidence intervals. Being that this is what I want to do, i'm unsure how to proceed further.
I have this large data array that consists ~1000 elements. From plotting the histogram that has this data, it looks well enough like a Maxwell-Boltzmann distribution.
Lets say my data is called x, which I apply the fitting for it as
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
maxwell = stats.maxwell
## Scale Parameter
params = maxwell.fit(x, floc=0)
print params
## mean
mean = 2*params[1]*np.sqrt(2/np.pi)
print mean
## Variance
sig = (params[1])**(3*np.pi-8)/np.pi
print sig
>>> (0, 178.17597215151301)
>>> 284.327714571
>>> 512.637498406
To which when plotting it
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111)
xd = np.argsort(x)
ax.plot(x[xd], maxwell.pdf(x, *params)[xd])
ax.hist(x[xd], bins=75, histtype="stepfilled", linewidth=1.5, facecolor='none', alpha=0.55, edgecolor='black',
normed=True)
How on earth do you go about implanting confidence intervals with the curve fit?
I can use
conf = maxwell.interval(0.90,loc=mean,scale=sig)
>>> (588.40702793225228, 1717.3973740895271)
But I have no clue what do with this

identify graph uptrend or downtrend

I am attempting to read in data and plot them on to a graph using python (standard line graph). Can someone please advise on how I can classify whether certain points in a graph are uptrends or downtrends programmatically? Which would be the most optimal way to achieve this? Surely this is a solved problem and a mathematical equation exists to identify this?
here is some sample data with some up trends and downtrends
x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]
y = [2,5,7,9,10,13,16,18,21,22,21,20,19,18,17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27]
thanks in advance
A simple way would be to look at the 'rate in change of y with respect to x', known as the derivative. This usually works better with continuous (smooth) functions, and so you could implement it with your data by interpolating your data with an n-th order polynomial as already suggested. A simple implementation would look something like this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.misc import derivative
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,\
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30])
y = np.array([2,5,7,9,10,13,16,18,21,22,21,20,19,18,\
17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27])
# Simple interpolation of x and y
f = interp1d(x, y)
x_fake = np.arange(1.1, 30, 0.1)
# derivative of y with respect to x
df_dx = derivative(f, x_fake, dx=1e-6)
# Plot
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
ax1.errorbar(x, y, fmt="o", color="blue", label='Input data')
ax1.errorbar(x_fake, f(x_fake), label="Interpolated data", lw=2)
ax1.set_xlabel("x")
ax1.set_ylabel("y")
ax2.errorbar(x_fake, df_dx, lw=2)
ax2.errorbar(x_fake, np.array([0 for i in x_fake]), ls="--", lw=2)
ax2.set_xlabel("x")
ax2.set_ylabel("dy/dx")
leg = ax1.legend(loc=2, numpoints=1,scatterpoints=1)
leg.draw_frame(False)
You see that when the plot transitions from an 'upwards trend' (positive gradient) to a 'downwards trend' (negative gradient) the derivative (dy/dx) goes from positive to negative. The transition of this happens at dy/dx = 0, which is shown by the green dashed line. For the scipy routines you can look at:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
NumPy's diff/gradient should also work, and not require the interpolation, but I showed the above so you could get the idea. For a complete mathemetical description of differentiation/calculus, look at wikipedia.
I found this topic very important and interesting. I would like to extend the above-mentioned answer:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.misc import derivative
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,\
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30])
y = np.array([2,5,7,9,10,13,16,18,21,22,21,20,19,18,\
17,14,10,9,7,5,7,9,10,12,13,15,16,17,22,27])
# Simple interpolation of x and y
f = interp1d(x, y, fill_value="extrapolate")
x_fake = np.arange(1.1, 30, 0.1)
# derivative of y with respect to x
df_dx = derivative(f, x_fake, dx=1e-6)
plt.plot(x,y, label = "Data")
plt.plot(x_fake,df_dx,label = "Trend")
plt.legend()
plt.show()
average = np.average(df_dx)
if average > 0 :
print("Uptrend", average)
elif average < 0:
print("Downtrend", average)
elif average == 0:
print("No trend!", average)
print("Max trend measure is:")
print(np.max(df_dx))
print("min trend measure is:")
print(np.min(df_dx))
print("Overall trend measure:")
print(((np.max(df_dx))-np.min(df_dx)-average)/((np.max(df_dx))-np.min(df_dx)))
extermum_list_y = []
extermum_list_x = []
for i in range(0,df_dx.shape[0]):
if df_dx[i] < 0.001 and df_dx[i] > -0.001:
extermum_list_x.append(x_fake[i])
extermum_list_y.append(df_dx[i])
plt.scatter(extermum_list_x, extermum_list_y, label="Extermum", marker = "o", color = "green")
plt.plot(x,y, label = "Data")
plt.plot(x_fake, df_dx, label="Trend")
plt.legend()
plt.show()
So, in overall the total trend is upward for this graph!
This approach is also nice when you want to find the x where the slope is zero; for example, the extremum in the curves. The local minimum and maximum points are found with the best accuracy and computation time.

Writing variables as subscripts in math mode

I am trying to plot some data, using a for loop to plot distributions. Now I want to label those distributions according to the loop counter as the subscript in math notation. This is where I am with this at the moment.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y,label=$v_i$) # where i is the variable i want to use to label. I should also be able to use elements from an array, say array[i] for the same.
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.show()
This doesn't work, and I can't keep the variable between the $ $ signs of the math notation, as it is interpreted as text. Is there a way to put the variable in the $ $ notation?
The original question has been edited, this answer has been updated to reflect this.
When trying to work with LaTeX formatting in matplotlib you must use raw strings, denoted by r"".
The code given below will iterate over range(4) and plot using i'th mean and variance (as you originally have done). It will also set the label for each plot using label=r'$v_{}$'.format(i+1). This string formatting simply replaces the {} with whatever is called inside format, in this case i+1. In this way you can automate the labels for your plots.
I have removed the plt.axvline(...), plt.xlabel(...) and plt.ylabel(...) out of the for loop as you only need to call it once. I've also removed the plt.legend() from the for loop for the same reason and have removed its arguments. If you supply the keyword argument label to plt.plot() then you can label your plots individually as you plot them.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y, label=r'$v_{}$'.format(i+1))
plt.xlabel("X")
plt.ylabel("P(X)")
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.legend()
plt.show()
So it turns out that you edited your question based on my answer. However, you;re still not quite there. If you want to do it the way I think you want to code it, it should be like this:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10, 12, 16, 22, 25]
variance = [3, 6, 8, 10, 12]
x = np.linspace(0, 40, 1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x, mean[i], sigma)
plt.plot(x, y, label = "$v_{" + str(i) + "}$")
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x = 15, ymin = 0, ymax = 1, ls = '--', c = 'black')
plt.show()
This code generates the following figure:
In case you want the first plot start with v_1 instead of v_0 all you need to change is str(i+1). This way the subscripts are 1, 2, 3, and 4 instead of 0, 1, 2 and 3.
Hope this helps!

Categories

Resources