i am trying to construct a function which gives me interpolated values of a piecewise linear function. I tried linear spline interpolation (which should be able to do exactly this?)- but without any luck. The problem is most visible on a log scale plot. Below there is the code of a small example i prepared:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
from scipy import interpolate
#Original Data
pwl_data = np.array([[0,1e3, 1e5, 1e8], [-90,-90, -90, -130]])
#spine interpolation
pwl_spline = interpolate.splrep(pwl_data[0], pwl_data[1])
spline_x = np.linspace (0,1e8, 10000)
legend = []
plt.plot(pwl_data[0],pwl_data[1])
plt.plot(spline_x,interpolate.splev(spline_x,pwl_spline ),'*')
legend.append("Data")
legend.append("Interpolated Data")
plt.xscale('log')
plt.legend(legend)
plt.grid(True)
plt.grid(b=True, which='minor', linestyle='--')
plt.show()
What am I doing wrong?
The spline fitting have to be performed on the linearized data, i.e. using log(x) instead of x:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
#Original Data
pwl_data = np.array([[1, 1e3, 1e5, 1e8], [-90, -90, -90, -130]])
x = pwl_data[0]
y = pwl_data[1]
log_x = np.log(x)
#spine interpolation
pwl_spline = interpolate.splrep(log_x, y)
spline_log_x = np.linspace(0, 18, 30)
spline_y = interpolate.splev(spline_log_x, pwl_spline )
plt.plot(log_x, y, '-o')
plt.plot(spline_log_x, spline_y, '-*')
plt.xlabel('log(x)');
note: I remove the zero from the data. Also, spline fitting could be not the best if you want a piecewise linear function, you could have a look at this question for example: https://datascience.stackexchange.com/q/8457/53362
For plotting with matplotlib, consider matplotlibs step which internally performs a piecewise constant interpolation.
https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.step.html
you can invoke it simply via:
plt.step(x,y) given your inputs x and y.
In plotly the argument line_shape='hv' for the Scatter plot achieves similar results see https://plotly.com/python/line-charts/
Related
I want to create a regplot with a linear regression in Seaborn and scale both axes equally by log, such that the regression stays a straight line.
An example:
import matplotlib.pyplot as plt
import seaborn as sns
some_x=[0,1,2,3,4,5,6,7]
some_y=[3,5,4,7,7,9,9,10]
ax = sns.regplot(x=some_x, y=some_y, order=1)
plt.ylim(0, 12)
plt.xlim(0, 12)
plt.show()
What I get:
If I scale the x and y axis by log, I would expect the regression to stay a straight line. What I tried:
import matplotlib.pyplot as plt
import seaborn as sns
some_x=[0,1,2,3,4,5,6,7]
some_y=[3,5,4,7,7,9,9,10]
ax = sns.regplot(x=some_x, y=some_y, order=1)
ax.set_yscale('log')
ax.set_xscale('log')
plt.ylim(0, 12)
plt.xlim(0, 12)
plt.show()
How it looks:
The problem is that you are fitting to your data on a regular scale but later you are transforming the axes to log scale. So linear fit will no longer be linear on a log scale.
What you need instead is to transform your data to log scale (base 10) and then perform a linear regression. Your data is currently a list. It would be easy to transform your data to log scale if you convert your list to NumPy array because then you can make use of vectorised operation.
Caution: One of your x-entry is 0 for which log is not defined. You will encounter a warning there.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
some_x=np.array([0,1,2,3,4,5,6,7])
some_y=np.array([3,5,4,7,7,9,9,10])
ax = sns.regplot(x=np.log10(some_x), y=np.log10(some_y), order=1)
Solution using NumPy polyfit where you exclude x=0 data point from the fit
import matplotlib.pyplot as plt
import numpy as np
some_x=np.log10(np.array([0,1,2,3,4,5,6,7]))
some_y=np.log10(np.array([3,5,4,7,7,9,9,10]))
fit = np.poly1d(np.polyfit(some_x[1:], some_y[1:], 1))
plt.plot(some_x, some_y, 'ko')
plt.plot(some_x, fit(some_x), '-k')
Have some data that I've plotted on a log-log plot and now I want to fit a straight line through these points. I have tried various methods and can't get what I'm after. Example code:
import numpy as np
import matplotlib.pyplot as plt
import random
x= np.linspace(1,100,10)
y = np.log10(x)+np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x),np.log10(y),1)
polynomial=np.poly1d(coefficients)
y_fit = polynomial(y)
plt.plot(x,y,'o')
plt.plot(x,y_fit,'-')
plt.yscale('log')
plt.xscale('log')
This gives me a ideal 'straight' line in log log offset by a random number to which I then fit a 1d poly. The output is:
So ignoring the offset, which I can deal with, it is not quite what I require as it has basically plotted a straight line between each point and then joined them up whereas I need a 'line of best fit' through the middle of them all so I can measure the gradient of it.
What is the best way to achieve this?
One problem is
y_fit = polynomial(y)
You must plug in the x values, not y, to get y_fit.
Also, you fit log10(y) with log10(x), so to evaluate the linear interpolator, you must plug in log10(x), and the result will be the base-10 log of the y values.
Here's a modified version of your script, followed by the plot it generates.
import numpy as np
import matplotlib.pyplot as plt
import random
x = np.linspace(1,100,10)
y = np.log10(x) + np.log10(np.random.uniform(0,10))
coefficients = np.polyfit(np.log10(x), np.log10(y), 1)
polynomial = np.poly1d(coefficients)
log10_y_fit = polynomial(np.log10(x)) # <-- Changed
plt.plot(x, y, 'o-')
plt.plot(x, 10**log10_y_fit, '*-') # <-- Changed
plt.yscale('log')
plt.xscale('log')
How would I calculate the confidence intervals for a LOWESS regression in Python? I would like to add these as a shaded region to the LOESS plot created with the following code (other packages than statsmodels are fine as well).
import numpy as np
import pylab as plt
import statsmodels.api as sm
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
lowess = sm.nonparametric.lowess(y, x, frac=0.1)
plt.plot(x, y, '+')
plt.plot(lowess[:, 0], lowess[:, 1])
plt.show()
I've added an example plot with confidence interval below from the webblog Serious Stats (it is created using ggplot in R).
LOESS doesn't have an explicit concept for standard error. It just doesn't mean anything in this context. Since that's out, your stuck with the brute-force approach.
Bootstrap your data. Your going to fit a LOESS curve to the bootstrapped data. See the middle of this page to find a pretty picture of what your doing. http://statweb.stanford.edu/~susan/courses/s208/node20.html
Once you have your large number of different LOESS curves, you can find the top and bottom Xth percentile.
This is a very old question but it's one of the first that pops up on google search. You can do this using the loess() function from scikit-misc. Here's an example (I tried to keep your original variable names, but I bumped up the noise a bit to make it more visible)
import numpy as np
import pylab as plt
from skmisc.loess import loess
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.4
l = loess(x,y)
l.fit()
pred = l.predict(x, stderror=True)
conf = pred.confidence()
lowess = pred.values
ll = conf.lower
ul = conf.upper
plt.plot(x, y, '+')
plt.plot(x, lowess)
plt.fill_between(x,ll,ul,alpha=.33)
plt.show()
result:
For a project of mine, I need to create intervals for time-series modeling, and to make the procedure more efficient I created tsmoothie: A python library for time-series smoothing and outlier detection in a vectorized way.
It provides different smoothing algorithms together with the possibility to computes intervals.
In the case of LowessSmoother:
import numpy as np
import matplotlib.pyplot as plt
from tsmoothie.smoother import *
from tsmoothie.utils_func import sim_randomwalk
# generate 10 randomwalks of length 200
np.random.seed(33)
data = sim_randomwalk(n_series=10, timesteps=200,
process_noise=10, measure_noise=30)
# operate smoothing
smoother = LowessSmoother(smooth_fraction=0.1, iterations=1)
smoother.smooth(data)
# generate intervals
low, up = smoother.get_intervals('prediction_interval', confidence=0.05)
# plot the first smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.plot(smoother.data[0], '.k')
plt.fill_between(range(len(smoother.data[0])), low[0], up[0], alpha=0.3)
I point out also that tsmoothie can carry out the smoothing of multiple time-series in a vectorized way. Hope this can help someone
Is there a simple way to get log transformed counts when plotting a two dimensional histogram in matplotlib? Unlike the pyplot.hist method, the pyplot.hist2d method does not seem to have a log parameter.
Currently I'm doing the following:
import numpy as np
import matplotlib as mpl
import matplotlib.pylab as plt
matrix, *opt = np.histogram2d(x, y)
img = plt.imshow(matrix, norm = mpl.colors.LogNorm(), cmap = mpl.cm.gray,
interpolation="None")
Which plots the expected histogram, but the axis labels show the indices of the bins and thus not the expected value.
It's kind of embarrassing, but the answer to my question is actually in the docstring of the corresponding code:
Notes
-----
Rendering the histogram with a logarithmic color scale is
accomplished by passing a :class:`colors.LogNorm` instance to
the *norm* keyword argument. Likewise, power-law normalization
(similar in effect to gamma correction) can be accomplished with
:class:`colors.PowerNorm`.
So this works:
import matplotlib as mpl
import matplotlib.pylab as plt
par = plt.hist2d(x, y, norm=mpl.colors.LogNorm(), cmap=mpl.cm.gray)
I have a 2 lists, first with dates (datetime objects) and second with some values for these dates.
When I create a simple plot:
plt.plot_date(x=dates, y=dur, fmt='r-')
I get a very ugly image like this.
How I can smooth this line? I think about extrapolation, but have not found a simple function for this. In Scipy there are very difficult tools for this, but I don't understand what I must add to my data for extrapolation.
You can make it smooth using sp.polyfit
Code:
import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
# sampledata
x = np.arange(199)
r = np.random.rand(100)
y = np.convolve(r, r)
# plot sampledata
plt.plot(x, y, color='grey')
# smoothen sampledata using a 50 degree polynomial
p = sp.polyfit(x, y, deg=50)
y_ = sp.polyval(p, x)
# plot smoothened data
plt.plot(x, y_, color='r', linewidth=2)
plt.show()