I am trying to plot some data, using a for loop to plot distributions. Now I want to label those distributions according to the loop counter as the subscript in math notation. This is where I am with this at the moment.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y,label=$v_i$) # where i is the variable i want to use to label. I should also be able to use elements from an array, say array[i] for the same.
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.show()
This doesn't work, and I can't keep the variable between the $ $ signs of the math notation, as it is interpreted as text. Is there a way to put the variable in the $ $ notation?
The original question has been edited, this answer has been updated to reflect this.
When trying to work with LaTeX formatting in matplotlib you must use raw strings, denoted by r"".
The code given below will iterate over range(4) and plot using i'th mean and variance (as you originally have done). It will also set the label for each plot using label=r'$v_{}$'.format(i+1). This string formatting simply replaces the {} with whatever is called inside format, in this case i+1. In this way you can automate the labels for your plots.
I have removed the plt.axvline(...), plt.xlabel(...) and plt.ylabel(...) out of the for loop as you only need to call it once. I've also removed the plt.legend() from the for loop for the same reason and have removed its arguments. If you supply the keyword argument label to plt.plot() then you can label your plots individually as you plot them.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10,12,16,22,25]
variance = [3,6,8,10,12]
x = np.linspace(0,40,1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x,mean[i],sigma)
plt.plot(x,y, label=r'$v_{}$'.format(i+1))
plt.xlabel("X")
plt.ylabel("P(X)")
plt.axvline(x=15, ymin=0, ymax=1,ls='--',c='black')
plt.legend()
plt.show()
So it turns out that you edited your question based on my answer. However, you;re still not quite there. If you want to do it the way I think you want to code it, it should be like this:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.mlab as mlab
mean = [10, 12, 16, 22, 25]
variance = [3, 6, 8, 10, 12]
x = np.linspace(0, 40, 1000)
for i in range(4):
sigma = np.sqrt(variance[i])
y = mlab.normpdf(x, mean[i], sigma)
plt.plot(x, y, label = "$v_{" + str(i) + "}$")
plt.xlabel("X")
plt.ylabel("P(X)")
plt.legend()
plt.axvline(x = 15, ymin = 0, ymax = 1, ls = '--', c = 'black')
plt.show()
This code generates the following figure:
In case you want the first plot start with v_1 instead of v_0 all you need to change is str(i+1). This way the subscripts are 1, 2, 3, and 4 instead of 0, 1, 2 and 3.
Hope this helps!
Related
I am following the statsmodels documentation here:
https://www.statsmodels.org/stable/vector_ar.html
I get to the part at the middle of the page that says:
irf.plot(orth=False)
which produces the following graph for my data:
I need to modify the elements of the graph. E.g., I need to apply tight_layout and also decrease the y-tick sizes so that they don't get into the graphs to their left.
The documentation talks about passing "subplot plotting funcions" in to the subplot argument of irf.plot(). But when I try something like:
irf.plot(subplot_params = {'fontsize': 8, 'figsize' : (100, 100), 'tight_layout': True})
only the fontsize parameter works. I also tried passing these parameters to the 'plot_params' argument but of no avail.
So, my question is how can I access other parameters of this irf.plot, especially the figsize and ytick sizes? I also need to force it to print a grid, as well as all values on the x axis (1, 2, 3, 4, ..., 10)
Is there any way I can create a blank plot using the fig, ax = plt.subplots() way and then create the irf.plot on that figure?
Looks like the function returns a matplotlib.figure:
Try doing this:
fig = irf.plot(orth=False,..)
fig.tight_layout()
fig.set_figheight(100)
fig.set_figwidth(100)
If I run it with this example, it works:
import numpy as np
import pandas
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
mdata = sm.datasets.macrodata.load_pandas().data
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
from statsmodels.tsa.base.datetools import dates_from_str
quarterly = dates_from_str(quarterly)
mdata = mdata[['realgdp','realcons','realinv']]
mdata.index = pandas.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
model = VAR(data)
results = model.fit(maxlags=15, ic='aic')
irf = results.irf(10)
fig = irf.plot(orth=False)
fig.tight_layout()
fig.set_figheight(30)
fig.set_figwidth(30)
I would like to plot a vector field with curved arrows in python, as can be done in vfplot (see below) or IDL.
You can get close in matplotlib, but using quiver() limits you to straight vectors (see below left) whereas streamplot() doesn't seem to permit meaningful control over arrow length or arrowhead position (see below right), even when changing integration_direction, density, and maxlength.
So, is there a python library that can do this? Or is there a way of getting matplotlib to do it?
If you look at the streamplot.py that is included in matplotlib, on lines 196 - 202 (ish, idk if this has changed between versions - I'm on matplotlib 2.1.2) we see the following:
... (to line 195)
# Add arrows half way along each trajectory.
s = np.cumsum(np.sqrt(np.diff(tx) ** 2 + np.diff(ty) ** 2))
n = np.searchsorted(s, s[-1] / 2.)
arrow_tail = (tx[n], ty[n])
arrow_head = (np.mean(tx[n:n + 2]), np.mean(ty[n:n + 2]))
... (after line 196)
changing that part to this will do the trick (changing assignment of n):
... (to line 195)
# Add arrows half way along each trajectory.
s = np.cumsum(np.sqrt(np.diff(tx) ** 2 + np.diff(ty) ** 2))
n = np.searchsorted(s, s[-1]) ### THIS IS THE EDITED LINE! ###
arrow_tail = (tx[n], ty[n])
arrow_head = (np.mean(tx[n:n + 2]), np.mean(ty[n:n + 2]))
... (after line 196)
If you modify this to put the arrow at the end, then you could generate the arrows more to your liking.
Additionally, from the docs at the top of the function, we see the following:
*linewidth* : numeric or 2d array
vary linewidth when given a 2d array with the same shape as velocities.
The linewidth can be a numpy.ndarray, and if you can pre-calculate the desired width of your arrows, you'll be able to modify the pencil width while drawing the arrows. It looks like this part has already been done for you.
So, in combination with shortening the arrows maxlength, increasing the density, and adding start_points, as well as tweaking the function to put the arrow at the end instead of the middle, you could get your desired graph.
With these modifications, and the following code, I was able to get a result much closer to what you wanted:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.patches as pat
w = 3
Y, X = np.mgrid[-w:w:100j, -w:w:100j]
U = -1 - X**2 + Y
V = 1 + X - Y**2
speed = np.sqrt(U*U + V*V)
fig = plt.figure(figsize=(14, 18))
gs = gridspec.GridSpec(nrows=3, ncols=2, height_ratios=[1, 1, 2])
grains = 10
tmp = tuple([x]*grains for x in np.linspace(-2, 2, grains))
xs = []
for x in tmp:
xs += x
ys = tuple(np.linspace(-2, 2, grains))*grains
seed_points = np.array([list(xs), list(ys)])
# Varying color along a streamline
ax1 = fig.add_subplot(gs[0, 1])
strm = ax1.streamplot(X, Y, U, V, color=U, linewidth=np.array(5*np.random.random_sample((100, 100))**2 + 1), cmap='winter', density=10,
minlength=0.001, maxlength = 0.07, arrowstyle='fancy',
integration_direction='forward', start_points = seed_points.T)
fig.colorbar(strm.lines)
ax1.set_title('Varying Color')
plt.tight_layout()
plt.show()
tl;dr: go copy the source code, and change it to put the arrows at the end of each path, instead of in the middle. Then use your streamplot instead of the matplotlib streamplot.
Edit: I got the linewidths to vary
Starting with David Culbreth's modification, I rewrote chunks of the streamplot function to achieve the desired behaviour. Slightly too numerous to specify them all here, but it includes a length-normalising method and disables the trajectory-overlap checking. I've appended two comparisons of the new curved quiver function with the original streamplot and quiver.
Here's a way to obtain the desired output in vanilla pyplot (i.e., without modifying the streamplot function or anything that fancy). For reminder, the goal is to visualize a vector field with curved arrows whose length is proportional to the norm of the vector.
The trick is to:
make streamplot with no arrows that is traced backward from a given point (see)
plot a quiver from that point. Make the quiver small enough so that only the arrow is visible
repeat 1. and 2. in a loop for every seed and scale the length of the streamplot to be proportional to the norm of the vector.
import matplotlib.pyplot as plt
import numpy as np
w = 3
Y, X = np.mgrid[-w:w:8j, -w:w:8j]
U = -Y
V = X
norm = np.sqrt(U**2 + V**2)
norm_flat = norm.flatten()
start_points = np.array([X.flatten(),Y.flatten()]).T
plt.clf()
scale = .2/np.max(norm)
plt.subplot(121)
plt.title('scaling only the length')
for i in range(start_points.shape[0]):
plt.streamplot(X,Y,U,V, color='k', start_points=np.array([start_points[i,:]]),minlength=.95*norm_flat[i]*scale, maxlength=1.0*norm_flat[i]*scale,
integration_direction='backward', density=10, arrowsize=0.0)
plt.quiver(X,Y,U/norm, V/norm,scale=30)
plt.axis('square')
plt.subplot(122)
plt.title('scaling length, arrowhead and linewidth')
for i in range(start_points.shape[0]):
plt.streamplot(X,Y,U,V, color='k', start_points=np.array([start_points[i,:]]),minlength=.95*norm_flat[i]*scale, maxlength=1.0*norm_flat[i]*scale,
integration_direction='backward', density=10, arrowsize=0.0, linewidth=.5*norm_flat[i])
plt.quiver(X,Y,U/np.max(norm), V/np.max(norm),scale=30)
plt.axis('square')
Here's the result:
Just looking at the documentation on streamplot(), found here -- what if you used something like streamplot( ... ,minlength = n/2, maxlength = n) where n is the desired length -- you will need to play with those numbers a bit to get your desired graph
you can control for the points using start_points, as shown in the example provided by #JohnKoch
Here's an example of how I controlled the length with streamplot() -- it's pretty much a straight copy/paste/crop from the example from above.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.patches as pat
w = 3
Y, X = np.mgrid[-w:w:100j, -w:w:100j]
U = -1 - X**2 + Y
V = 1 + X - Y**2
speed = np.sqrt(U*U + V*V)
fig = plt.figure(figsize=(14, 18))
gs = gridspec.GridSpec(nrows=3, ncols=2, height_ratios=[1, 1, 2])
grains = 10
tmp = tuple([x]*grains for x in np.linspace(-2, 2, grains))
xs = []
for x in tmp:
xs += x
ys = tuple(np.linspace(-2, 2, grains))*grains
seed_points = np.array([list(xs), list(ys)])
arrowStyle = pat.ArrowStyle.Fancy()
# Varying color along a streamline
ax1 = fig.add_subplot(gs[0, 1])
strm = ax1.streamplot(X, Y, U, V, color=U, linewidth=1.5, cmap='winter', density=10,
minlength=0.001, maxlength = 0.1, arrowstyle='->',
integration_direction='forward', start_points = seed_points.T)
fig.colorbar(strm.lines)
ax1.set_title('Varying Color')
plt.tight_layout()
plt.show()
Edit: made it prettier, though still not quite what we were looking for.
I am trying to set the format to two decimal numbers in a matplotlib subplot environment. Unfortunately, I do not have any idea how to solve this task.
To prevent using scientific notation on the y-axis I used ScalarFormatter(useOffset=False) as you can see in my snippet below. I think my task should be solved by passing further options/arguments to the used formatter. However, I could not find any hint in matplotlib's documentation.
How can I set two decimal digits or none (both cases are needed)? I am not able to provide sample data, unfortunately.
-- SNIPPET --
f, axarr = plt.subplots(3, sharex=True)
data = conv_air
x = range(0, len(data))
axarr[0].scatter(x, data)
axarr[0].set_ylabel('$T_\mathrm{air,2,2}$', size=FONT_SIZE)
axarr[0].yaxis.set_major_locator(MaxNLocator(5))
axarr[0].yaxis.set_major_formatter(ScalarFormatter(useOffset=False))
axarr[0].tick_params(direction='out', labelsize=FONT_SIZE)
axarr[0].grid(which='major', alpha=0.5)
axarr[0].grid(which='minor', alpha=0.2)
data = conv_dryer
x = range(0, len(data))
axarr[1].scatter(x, data)
axarr[1].set_ylabel('$T_\mathrm{dryer,2,2}$', size=FONT_SIZE)
axarr[1].yaxis.set_major_locator(MaxNLocator(5))
axarr[1].yaxis.set_major_formatter(ScalarFormatter(useOffset=False))
axarr[1].tick_params(direction='out', labelsize=FONT_SIZE)
axarr[1].grid(which='major', alpha=0.5)
axarr[1].grid(which='minor', alpha=0.2)
data = conv_lambda
x = range(0, len(data))
axarr[2].scatter(x, data)
axarr[2].set_xlabel('Iterationsschritte', size=FONT_SIZE)
axarr[2].xaxis.set_major_locator(MaxNLocator(integer=True))
axarr[2].set_ylabel('$\lambda$', size=FONT_SIZE)
axarr[2].yaxis.set_major_formatter(ScalarFormatter(useOffset=False))
axarr[2].yaxis.set_major_locator(MaxNLocator(5))
axarr[2].tick_params(direction='out', labelsize=FONT_SIZE)
axarr[2].grid(which='major', alpha=0.5)
axarr[2].grid(which='minor', alpha=0.2)
See the relevant documentation in general and specifically
from matplotlib.ticker import FormatStrFormatter
fig, ax = plt.subplots()
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
If you are directly working with matplotlib's pyplot (plt) and if you are more familiar with the new-style format string, you can try this:
from matplotlib.ticker import StrMethodFormatter
plt.gca().yaxis.set_major_formatter(StrMethodFormatter('{x:,.0f}')) # No decimal places
plt.gca().yaxis.set_major_formatter(StrMethodFormatter('{x:,.2f}')) # 2 decimal places
From the documentation:
class matplotlib.ticker.StrMethodFormatter(fmt)
Use a new-style format string (as used by str.format()) to format the
tick.
The field used for the value must be labeled x and the field used for
the position must be labeled pos.
The answer above is probably the correct way to do it, but didn't work for me.
The hacky way that solved it for me was the following:
ax = <whatever your plot is>
# get the current labels
labels = [item.get_text() for item in ax.get_xticklabels()]
# Beat them into submission and set them back again
ax.set_xticklabels([str(round(float(label), 2)) for label in labels])
# Show the plot, and go home to family
plt.show()
format labels using lambda function
3x the same plot with differnt y-labeling
Minimal example
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt
from matplotlib.ticker import FormatStrFormatter
fig, axs = mpl.pylab.subplots(1, 3)
xs = np.arange(10)
ys = 1 + xs ** 2 * 1e-3
axs[0].set_title('default y-labeling')
axs[0].scatter(xs, ys)
axs[1].set_title('custom y-labeling')
axs[1].scatter(xs, ys)
axs[2].set_title('x, pos arguments')
axs[2].scatter(xs, ys)
fmt = lambda x, pos: '1+ {:.0f}e-3'.format((x-1)*1e3, pos)
axs[1].yaxis.set_major_formatter(mpl.ticker.FuncFormatter(fmt))
fmt = lambda x, pos: 'x={:f}\npos={:f}'.format(x, pos)
axs[2].yaxis.set_major_formatter(mpl.ticker.FuncFormatter(fmt))
You can also use 'real'-functions instead of lambdas, of course.
https://matplotlib.org/3.1.1/gallery/ticks_and_spines/tick-formatters.html
In matplotlib 3.1, you can also use ticklabel_format. To prevents scientific notation without offsets:
plt.gca().ticklabel_format(axis='both', style='plain', useOffset=False)
I am trying to make a profile plot for two columns of a pandas.DataFrame. I would not expect this to be in pandas directly but it seems there is nothing in matplotlib either. I have searched around and cannot find it in any package other than rootpy. Before I take the time to write this myself I thought I would ask if there was a small package that contained profile histograms, perhaps where they are known by a different name.
If you don't know what I mean by "profile histogram" have a look at the ROOT implementation. http://root.cern.ch/root/html/TProfile.html
You can easily do it using scipy.stats.binned_statistic.
import scipy.stats
import numpy
import matplotlib.pyplot as plt
x = numpy.random.rand(10000)
y = x + scipy.stats.norm(0, 0.2).rvs(10000)
means_result = scipy.stats.binned_statistic(x, [y, y**2], bins=50, range=(0,1), statistic='mean')
means, means2 = means_result.statistic
standard_deviations = numpy.sqrt(means2 - means**2)
bin_edges = means_result.bin_edges
bin_centers = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(x=bin_centers, y=means, yerr=standard_deviations, linestyle='none', marker='.')
Use seaborn. Data as from #MaxNoe
import numpy as np
import seaborn as sns
# just some random numbers to get started
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
sns.regplot(x=x, y=y, x_bins=10, fit_reg=None)
You can do much more (error bands are from bootstrap, you can change the estimator on the y-axis, add regression, ...)
While #Keith's answer seems to fit what you mean, it is quite a lot of code. I think this can be done much simpler, so one gets the key concepts and can adjust and build on top of it.
Let me stress one thing: what ROOT is calling a ProfileHistogram is not a special kind of plot. It is an errorbar plot. Which can simply be done in matplotlib.
It is a special kind of computation and that's not the task of a plotting library. This lies in the pandas realm, and pandas is great at stuff like this. It's symptomatical for ROOT as the giant monolithic pile it is to have an extra class for this.
So what you want to do is: discretize in some variable x and for each bin, calculate something in another variable y.
This can easily done using np.digitize together with the pandas groupy and aggregate methods.
Putting it all together:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# just some random numbers to get startet
x = np.random.uniform(-2, 2, 10000)
y = np.random.normal(x**2, np.abs(x) + 1)
df = pd.DataFrame({'x': x, 'y': y})
# calculate in which bin row belongs base on `x`
# bins needs the bin edges, so this will give as 100 equally sized bins
bins = np.linspace(-2, 2, 101)
df['bin'] = np.digitize(x, bins=bins)
bin_centers = 0.5 * (bins[:-1] + bins[1:])
bin_width = bins[1] - bins[0]
# grouby bin, so we can calculate stuff
binned = df.groupby('bin')
# calculate mean and standard error of the mean for y in each bin
result = binned['y'].agg(['mean', 'sem'])
result['x'] = bin_centers
result['xerr'] = bin_width / 2
# plot it
result.plot(
x='x',
y='mean',
xerr='xerr',
yerr='sem',
linestyle='none',
capsize=0,
color='black',
)
plt.savefig('result.png', dpi=300)
Just like ROOT ;)
I made a module myself for this functionality.
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import matplotlib.pyplot as plt
def Profile(x,y,nbins,xmin,xmax,ax):
df = DataFrame({'x' : x , 'y' : y})
binedges = xmin + ((xmax-xmin)/nbins) * np.arange(nbins+1)
df['bin'] = np.digitize(df['x'],binedges)
bincenters = xmin + ((xmax-xmin)/nbins)*np.arange(nbins) + ((xmax-xmin)/(2*nbins))
ProfileFrame = DataFrame({'bincenters' : bincenters, 'N' : df['bin'].value_counts(sort=False)},index=range(1,nbins+1))
bins = ProfileFrame.index.values
for bin in bins:
ProfileFrame.ix[bin,'ymean'] = df.ix[df['bin']==bin,'y'].mean()
ProfileFrame.ix[bin,'yStandDev'] = df.ix[df['bin']==bin,'y'].std()
ProfileFrame.ix[bin,'yMeanError'] = ProfileFrame.ix[bin,'yStandDev'] / np.sqrt(ProfileFrame.ix[bin,'N'])
ax.errorbar(ProfileFrame['bincenters'], ProfileFrame['ymean'], yerr=ProfileFrame['yMeanError'], xerr=(xmax-xmin)/(2*nbins), fmt=None)
return ax
def Profile_Matrix(frame):
#Much of this is stolen from https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py
import pandas.core.common as com
import pandas.tools.plotting as plots
from pandas.compat import lrange
from matplotlib.artist import setp
range_padding=0.05
df = frame._get_numeric_data()
n = df.columns.size
fig, axes = plots._subplots(nrows=n, ncols=n, squeeze=False)
# no gaps between subplots
fig.subplots_adjust(wspace=0, hspace=0)
mask = com.notnull(df)
boundaries_list = []
for a in df.columns:
values = df[a].values[mask[a].values]
rmin_, rmax_ = np.min(values), np.max(values)
rdelta_ext = (rmax_ - rmin_) * range_padding / 2.
boundaries_list.append((rmin_ - rdelta_ext, rmax_+ rdelta_ext))
for i, a in zip(lrange(n), df.columns):
for j, b in zip(lrange(n), df.columns):
common = (mask[a] & mask[b]).values
nbins = 100
(xmin,xmax) = boundaries_list[i]
ax = axes[i, j]
Profile(df[a][common],df[b][common],nbins,xmin,xmax,ax)
ax.set_xlabel('')
ax.set_ylabel('')
plots._label_axis(ax, kind='x', label=b, position='bottom', rotate=True)
plots._label_axis(ax, kind='y', label=a, position='left')
if j!= 0:
ax.yaxis.set_visible(False)
if i != n-1:
ax.xaxis.set_visible(False)
for ax in axes.flat:
setp(ax.get_xticklabels(), fontsize=8)
setp(ax.get_yticklabels(), fontsize=8)
return axes
To my knowledge matplotlib doesn't still allow to directly produce profile histograms.
You can instead give a look at Hippodraw, a package developed at SLAC, that can be used as a Python extension module.
Here there is a Profile histogram example:
http://www.slac.stanford.edu/grp/ek/hippodraw/datareps_root.html#datareps_profilehist
I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.