How to smooth date based data in matplotlib? - python

I have a 2 lists, first with dates (datetime objects) and second with some values for these dates.
When I create a simple plot:
plt.plot_date(x=dates, y=dur, fmt='r-')
I get a very ugly image like this.
How I can smooth this line? I think about extrapolation, but have not found a simple function for this. In Scipy there are very difficult tools for this, but I don't understand what I must add to my data for extrapolation.

You can make it smooth using sp.polyfit
Code:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
# sampledata
x = np.arange(199)
r = np.random.rand(100)
y = np.convolve(r, r)
# plot sampledata
plt.plot(x, y, color='grey')
# smoothen sampledata using a 50 degree polynomial
p = sp.polyfit(x, y, deg=50)
y_ = sp.polyval(p, x)
# plot smoothened data
plt.plot(x, y_, color='r', linewidth=2)
plt.show()

Related

Check if seaborn scatterplot function is sampling data

I have plotted a seaborn scatter plot. My data consists of 5000 data points. By looking into the plot, I definitely am not seeing 5000 points. So I'm pretty sure some kind of sampling is performed by seaborn scatterplot function. I want to know how many data points each point in the plot represent? If it depends on the code, the code is as following:
g = sns.scatterplot(x=data['x'], y=data['y'],hue=data['P'], s=40, edgecolor='k', alpha=0.8, legend="full")
Nothing would really suggest to me that seaborn is sampling your data. However, you can check the data in your axes g to be sure. Query the children of the axes for a PathCollection (scatter plot) object:
g.get_children()
It's probably the first item in the list that is returned. From there you can use get_offsets to retrieve the data and check its shape.
g.get_children()[0].get_offsets().shape
As far as I know, no sampling is performed. On the picture you have posted, you can see that most of the data points are just overlapping and that might be the reason why you can not see 5000 points. Try with less points and you will see that all of them get plotted.
In order to check whether or not Seaborn's scatter removes points, here is a way to see 5000 different points. No points seem to be missing.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.linspace(1, 100, 100)
y = np.linspace(1, 50, 50)
X, Y = np.meshgrid(x, y)
Z = (X * Y) % 25
X = np.ravel(X)
Y = np.ravel(Y)
Z = np.ravel(Z)
sns.scatterplot(x=X, y=Y, s=15, hue=Z, palette=plt.cm.plasma, legend=False)
plt.show()

Better visualization of matplotlib plot

I want to include a plot in my thesis (document will be standard a4 page pdf) for which I have data of two time series, both a continuous values expressed as percentages.
Both time series are over one year without sundays, so something of about 310 data points for each of them.
I tried to come with something like this,
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ts = day_agg_plan_temp.set_index('Date')
ts = ts['2018-01-01': '2019-01-01']
plt.figure(figsize=(20,15))
ax1 = ts.label.plot(grid=True, label='Ground Truth', marker='.')
ax2 = ts.pred.plot(grid=True, label='Prediction', marker='.')
plt.legend()
plt.show()
resulting in this:
This is not really appealing, as there is too much going on and I want to point the difference for each of the data points of the blue and orange line.
So my question is, is there a way to do it better other than shrinking the date range (which I really don't want because this plot is already a snippet of the actual time series which covers almost 3 years)
Here is some code that generates data using Fractional Brownian motion, calculates a trend using a Savitzky–Golay filter (but use whatever is best for you case study), and plots it in a way the user can see the original data and the trend clearly at the same time.
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
# Generating some Random Data
def brownian(x0, n, dt, delta, out=None):
x0 = np.asarray(x0)
r = norm.rvs(size=x0.shape + (n,), scale=delta * sqrt(dt))
if out is None:
out = np.empty(r.shape)
np.cumsum(r, axis=-1, out=out)
out += np.expand_dims(x0, axis=-1)
return out
delta = 2
T = 10.0
N = 500
dt = T/N
m = 2
x = np.empty((m,N+1))
x[:, 0] = 50
brownian(x[:,0], N, dt, delta, out=x[:,1:])
t = np.linspace(0.0, N*dt, N+1)
# Obtaining the trend using some arbitrary filter
y1 = savgol_filter(x[0], 51, 3)
y2 = savgol_filter(x[1], 51, 3)
# Plotting the raw data (transparent)
plt.plot(t, x[0], color="red", alpha=0.2)
plt.plot(t, x[1], color="blue", alpha=0.2)
# Plotting the trend data (opaque)
plt.plot(t, y1, color="red")
plt.plot(t, y2, color="blue")
# Calling the plot
plt.show()
The result is this:
My point is that by playing with the colors (or transparency) you can make some data appear as if in a background, and other (the most relevant usually) as if appearing in the foreground. It's an UX technique (like blurring, darkening, or make the background paler).
You can also play with the line width (or style) if the vertical variability of the data is not enough to clearly separate the sets. In your case I don't think it will be necessary.

Plot straight line of best fit on log-log plot

Have some data that I've plotted on a log-log plot and now I want to fit a straight line through these points. I have tried various methods and can't get what I'm after. Example code:
import numpy as np
import matplotlib.pyplot as plt
import random
x= np.linspace(1,100,10)
y = np.log10(x)+np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x),np.log10(y),1)
polynomial=np.poly1d(coefficients)
y_fit = polynomial(y)
plt.plot(x,y,'o')
plt.plot(x,y_fit,'-')
plt.yscale('log')
plt.xscale('log')
This gives me a ideal 'straight' line in log log offset by a random number to which I then fit a 1d poly. The output is:
So ignoring the offset, which I can deal with, it is not quite what I require as it has basically plotted a straight line between each point and then joined them up whereas I need a 'line of best fit' through the middle of them all so I can measure the gradient of it.
What is the best way to achieve this?
One problem is
y_fit = polynomial(y)
You must plug in the x values, not y, to get y_fit.
Also, you fit log10(y) with log10(x), so to evaluate the linear interpolator, you must plug in log10(x), and the result will be the base-10 log of the y values.
Here's a modified version of your script, followed by the plot it generates.
import numpy as np
import matplotlib.pyplot as plt
import random
x = np.linspace(1,100,10)
y = np.log10(x) + np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x), np.log10(y), 1)
polynomial = np.poly1d(coefficients)
log10_y_fit = polynomial(np.log10(x)) # <-- Changed
plt.plot(x, y, 'o-')
plt.plot(x, 10**log10_y_fit, '*-') # <-- Changed
plt.yscale('log')
plt.xscale('log')

How to plot several curves with an offset on the same graph

I read a waveform from an oscilloscope. The waveform is divided into 10 segments as a function of time. I want to plot the complete waveform, one segment above (or under) another, 'with a vertical offset', so to speak. Additionally, a color map is necessary to show the signal intensity. I've only been able to get the following plot:
As you can see, all the curves are superimposed, which is unacceptable. One could add an offset to the y data but this is not how I would like to do it. Surely there is a much neater way of plotting my data? I've tried a few things to solve this issue using pylab but I am not even sure how to proceed and if this is the right way to go.
Any help will be appreciated.
import readTrc #helps read binary data from an oscilloscope
import matplotlib.pyplot as plt
fName = r"...trc"
datX, datY, m = readTrc.readTrc(fName)
segments = m['SUBARRAY_COUNT'] #number of segments
x, y = [], []
for i in range(segments+1):
x.append(datX[segments*i:segments*(i+1)])
y.append(datY[segments*i:segments*(i+1)])
plt.plot(x,y)
plt.show()
A plot with a vertical offset sounds like a frequency trail.
Here's one approach that does just adjust the y value.
Frequency Trail in MatPlotLib
The same plot has also been coined a joyplot/ridgeline plot. Seaborn has an implementation that creates a series of plots (FacetGrid), and then adjusts the offset between them for a similar effect.
https://seaborn.pydata.org/examples/kde_joyplot.html
An example using a line plot might look like:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
segments = 10
points_per_segment = 100
#your data preparation will vary
x = np.tile(np.arange(points_per_segment), segments)
z = np.floor(np.arange(points_per_segment * segments)/points_per_segment)
y = np.sin(x * (1 + z))
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
pal = sns.color_palette()
g = sns.FacetGrid(df, row="z", hue="z", aspect=15, height=.5, palette=pal)
g.map(plt.plot, 'x', 'y')
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.00)
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
Out:

Plotting a Lognormal Distribution

I am trying to plot a lognormal distribution so I can compare it with a histogram of my sample data using the code below but my plot does not look right. Is there something with my code that I am not doing correctly?
The C array has a length of 17576
import matplotlib.pyplot as plt
import numpy as np
data=np.loadtxt(F)
C=data[:,3]
x = np.ma.log(C)
avg = np.mean(x)
std = np.std(x)
dist=lognorm(std,loc=avg)
plt.plot(C,dist.pdf(C),'r')
plt.show()
It looks like your x data are not in sorted order. Try this
ind = np.argsort(C)
xx = C[ind]
yy = dist.pdf(C)[ind]
plt.plot(xx, yy, 'r')
Plot just connects all the (x,y) pairs with straight lines, so you need to make sure you trace your function from left-right (or right-left). Alternatively, you can skip the lines between the plot:
plt.plot(C, dist.pdf(C), 'ro')

Categories

Resources