scipy: interpolation, cubic & linear - python

I'm trying to interpolate my set of data (first columnt is the time, third columnt is the actual data):
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
data = np.genfromtxt("data.csv", delimiter=" ")
x = data[:, 0]
y = data[:, 2]
xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
#y_smooth = interp1d(x, y, kind="cubic")(xx)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(xx, y_smooth, "r-")
plt.show()
but I see some strange difference between linear and cubic interpolation.
Here is the result for linear:
Here is the same for cubic:
I'm not sure, why is graph jumping all the time and y_smooth contains incorrect values?
ipdb> y_smooth_linear.max()
141.5481144
ipdb> y_smooth_cubic.max()
1.2663431888584225e+18
Can anybody explain to me, how can I change my code to achieve correct interpolation?
UPD: here is data.cvs file

Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms.
Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same anymore.

Given cfh's observation that x has duplicate values, you could use np.unique
to select a unique value of y for each x:
x2, idx = np.unique(x, return_index=True)
y2 = y[idx]
return_index=True causes np.unique to return not only the unique values, x2, but also the locations, idx, of the unique xs in the original x array. Note that this selects the first value of y for each unique x.
If you'd like to average all the y values for each unique x, you could use
stats.binned_statistic:
import scipy.stats as stats
x2, inv = np.unique(x, return_inverse=True)
y2, bin_edges, binnumber = stats.binned_statistic(
x=inv, values=y, statistic='mean', bins=inv.max()+1)
return_inverse=True tells np.unique to return indices from which the
original array can be reconstructed. Those indices can also serve as categorical
labels or "factors", which is how they are being used in the call to
binned_statistic above.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import scipy.stats as stats
data = np.genfromtxt("data.csv", delimiter=" ")
x = data[:, 0]
y = data[:, 1]
x2, idx, inv = np.unique(x, return_index=True, return_inverse=True)
y_uniq = y[idx]
y_ave, bin_edges, binnumber = stats.binned_statistic(
x=inv, values=y, statistic='mean', bins=inv.max()+1)
xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
y_smooth2 = interp1d(x2, y_uniq, kind="cubic")(xx)
y_smooth3 = interp1d(x2, y_ave, kind="cubic")(xx)
fig, ax = plt.subplots(nrows=3, sharex=True)
ax[0].plot(xx, y_smooth, "r-", label='linear')
ax[1].plot(xx, y_smooth2, "b-", label='cubic (first y)')
ax[2].plot(xx, y_smooth3, "b-", label='cubic (ave y)')
ax[0].legend(loc='best')
ax[1].legend(loc='best')
ax[2].legend(loc='best')
plt.show()

Related

non linear regression scatter plot

My data points are:
x =[5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 0.33E-04, 1.00E-03]
y= [494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715]
The x axis on my plot must be exponential!!
I want to make a regression line such as the image added, in an S shape. How do I do this (in matlab or python)?
IMG
UPDATE: I tried:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 100)
#define spline
spl = make_interp_spline(x, y, k=2)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(x,y, 'o', xnew, y_smooth)
plt.xscale("log")
plt.show()
My results are: results
How can I make it even smoother? differing the k doesn't make it better.
Note that the higher the degree you use for the k argument, the more “wiggly” the curve will be
Depending on how curved you want the line to be, you can modify the value for k.
try this:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 200)
#define spline
spl = make_interp_spline(x, y, k=3)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(xnew, y_smooth)
plt.show()

3d plot from two vectors and an array

I have two vectors that store my X, Y values than are lengths 81, 105 and then a (81,105) array (actually a list of lists) that stores my Z values for those X, Y. What would be the best way to plot this in 3d? This is what i've tried:
Z = np.load('Z.npy')
X = np.load('X.npy')
Y = np.linspace(0, 5, 105)
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap= 'viridis')
plt.show()
I get the following error : ValueError: shape mismatch: objects cannot be broadcast to a single shape
OK, I got it running. There is some tricks here. I will mention them in the codes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from random import shuffle
# produce some data.
x = np.linspace(0,1,81)
y = np.linspace(0,1,105)
z = [[i for i in range(81)] for x in range(105)]
array_z = np.array(z)
# Make them randomized.
shuffle(x)
shuffle(y)
shuffle(z)
# Match data in x and y.
data = []
for i in range(len(x)):
for j in range(len(y)):
data.append([x[i], y[j], array_z[j][i]])
# Be careful how you data is stored in your Z array.
# Stored in dataframe
results = pd.DataFrame(data, columns = ['x','y','z'])
# Plot the data.
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(results.x, results.y, results.z, cmap= 'viridis')
The picture looks weird because I produced some data. Hope it helps.

How to use scipy.interpolate.interp2d for a vector of data?

I have a table of measured values for a quantity that depends on two parameters.
So say I have a function fuelConsumption(speed, temperature), for which data on a mesh are known.
Now I want to interpolate the expected fuelConsumption for a lot of measured data points (speed, temperature) from a pandas.DataFrame (and return a vector with the values for each data point).
I am currently using SciPy's interpolate.interp2d for interpolation, but when passing the parameters as two vectors [s1,s2] and [t1,t2] (only two ordered values for simplicity) it will construct a mesh and return:
[[f(s1,t1), f(s2,t1)], [f(s1,t2), f(s2,t2)]]
The result I am hoping to get is:
[f(s1,t1), f(s2, t2)]
How can I interpolate to get the output I want?
From scipy v0.14 onwards you can use scipy.interpolate.RectBivariateSpline with grid=False:
import numpy as np
from scipy.interpolate import RectBivariateSpline
from matplotlib import pyplot as plt
x, y = np.ogrid[-1:1:10j,-1:1:10j]
z = (x + y)*np.exp(-6.0 * (x * x + y * y))
spl = RectBivariateSpline(x, y, z)
xi = np.linspace(-1, 1, 50)
yi = np.linspace(-1, 1, 50)
zi = spl(xi, yi, grid=False)
fig, ax = plt.subplots(1, 1)
ax.hold(True)
ax.imshow(z, cmap=plt.cm.coolwarm, origin='lower', extent=(-1, 1, -1, 1))
ax.scatter(xi, yi, s=60, c=zi, cmap=plt.cm.coolwarm)

How to smoothen data in Python?

I am trying to smoothen a scatter plot shown below using SciPy's B-spline representation of 1-D curve. The data is available here.
The code I used is:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
data = np.genfromtxt("spline_data.dat", delimiter = '\t')
x = 1000 / data[:, 0]
y = data[:, 1]
x_int = np.linspace(x[0], x[-1], 100)
tck = interpolate.splrep(x, y, k = 3, s = 1)
y_int = interpolate.splev(x_int, tck, der = 0)
fig = plt.figure(figsize = (5.15,5.15))
plt.subplot(111)
plt.plot(x, y, marker = 'o', linestyle='')
plt.plot(x_int, y_int, linestyle = '-', linewidth = 0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
I tried changing the order of the spline and the smoothing condition, but I am not getting a smooth plot.
B-spline interpolation should be able to smoothen the data but what is wrong? Any alternate method to smoothen this data?
Use a larger smoothing parameter. For example, s=1000:
tck = interpolate.splrep(x, y, k=3, s=1000)
This produces:
Assuming we are dealing with noisy observations of some phenomena, Gaussian Process Regression might also be a good choice. Knowledge about the variance of the noise can be included into the parameters (nugget) and other parameters can be found using Maximum Likelihood estimation. Here's a simple example of how it could be applied:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.gaussian_process import GaussianProcess
data = np.genfromtxt("spline_data.dat", delimiter='\t')
x = 1000 / data[:, 0]
y = data[:, 1]
x_pred = np.linspace(x[0], x[-1], 100)
# <GP regression>
gp = GaussianProcess(theta0=1, thetaL=0.00001, thetaU=1000, nugget=0.000001)
gp.fit(np.atleast_2d(x).T, y)
y_pred = gp.predict(np.atleast_2d(x_pred).T)
# </GP regression>
fig = plt.figure(figsize=(5.15, 5.15))
plt.subplot(111)
plt.plot(x, y, marker='o', linestyle='')
plt.plot(x_pred, y_pred, linestyle='-', linewidth=0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
which will give:
In your specific case, you could also try changing the last argument of the np.linspace function to a smaller number, np.linspace(x[0], x[-1], 10), for example.
Demo code:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
data = np.random.rand(100,2)
tempx = list(data[:, 0])
tempy = list(data[:, 1])
x = np.array(sorted([point*10 + tempx.index(point) for point in tempx]))
y = np.array([point*10 + tempy.index(point) for point in tempy])
x_int = np.linspace(x[0], x[-1], 10)
tck = interpolate.splrep(x, y, k = 3, s = 1)
y_int = interpolate.splev(x_int, tck, der = 0)
fig = plt.figure(figsize = (5.15,5.15))
plt.subplot(111)
plt.plot(x, y, marker = 'o', linestyle='')
plt.plot(x_int, y_int, linestyle = '-', linewidth = 0.75, color='k')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
You could also smooth the data with a rolling_mean in pandas:
import pandas as pd
data = [...(your data here)...]
smoothendData = pd.rolling_mean(data,5)
the second argument of rolling_mean is the moving average (rolling mean) period. You can also reverse the data 'data.reverse', take a rolling_mean of the data that way, and combine it with the forward rolling mean. Another option is exponentially weighted moving averages:
Pandas: Exponential smoothing function for column
or using bandpass filters:
fft bandpass filter in python
http://docs.scipy.org/doc/scipy/reference/signal.html

Plot larger points on bottom and smaller on top

I'm looking for a way to produce a scatter plot in python where smaller plots will be drawn above larger ones to improve the figure's "readability" (is there a similar word for an image?)
Here's a simple MWE:
import numpy as np
import matplotlib.pyplot as plt
def random_data(N):
# Generate some random data.
return np.random.uniform(70., 250., N)
# Data lists.
N = 1000
x = random_data(N)
y = random_data(N)
z1 = random_data(N)
z2 = random_data(N)
cm = plt.cm.get_cmap('RdYlBu')
plt.scatter(x, y, s=z1, c=z2, cmap=cm)
plt.colorbar()
plt.show()
which produces:
I'd like the smaller points to be drawn last so they won't be hidden behind larger points. How could I do this?
Apply sort before plotting
order = np.argsort(-z1) # for desc
x = np.take(x, order)
y = np.take(y, order)
z1 = np.take(z1, order)
z2 = np.take(z2, order)
The figure using alpha is more readable.
import numpy as np
import matplotlib.pyplot as plt
def random_data(N):
# Generate some random data.
return np.random.uniform(70., 250., N)
# Data lists.
N = 1000
x = random_data(N)
y = random_data(N)
z1 = random_data(N)
z2 = random_data(N)
order = np.argsort(-z1)
x = np.take(x, order)
y = np.take(y, order)
z1 = np.take(z1, order)
z2 = np.take(z2, order)
cm = plt.cm.get_cmap('RdYlBu')
plt.scatter(x, y, s=z1, c=z2, cmap=cm, alpha=0.7) # alpha can be 0 ~ 1
plt.colorbar()
plt.show()
The output is

Categories

Resources