My objective is to randomly generate good looking continuous functions, good looking meaning that functions which can be recovered from their plots.
Essentially I want to generate a random time series data for 1 second with 1024 samples per second. If I randomly choose 1024 values, then the plot looks very noisy and nothing meaningful can be extracted out of it. In the end I have attached plots of two sinusoids, one with a frequency of 3Hz and another with a frequency of 100Hz. I consider 3Hz cosine as a good function because I can extract back the timeseries by looking at the plot. But the 100 Hz sinusoid is bad for me as I cant recover the timeseries from the plot. So in the above mentioned meaning of goodness of a timeseries, I want to randomly generate good looking continuos functions/timeseries.
The method I am thinking of using is as follows (python language):
(1) Choose 32 points in x-axis between 0 to 1 using x=linspace(0,1,32).
(2) For each of these 32 points choose a random value using y=np.random.rand(32).
(3) Then I need an interpolation or curve fitting method which takes as input (x,y) and outputs a continuos function which would look something like func=curve_fit(x,y)
(4) I can obtain the time seires by sampling from the func function
Following are the questions that I have:
1) What is the best curve-fitting or interpolation method that I can
use. They should also be available in python.
2) Is there a better method to generate good looking functions,
without using curve fitting or interpolation.
Edit
Here is the code I am using currently for generating random time-series of length 1024. In my case I need to scale the function between 0 and 1 in the y-axis. Hence for me l=0 and h=0. If that scaling is not needed you just need to uncomment a line in each function to randomize the scaling.
import numpy as np
from scipy import interpolate
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
## Curve fitting technique
def random_poly_fit():
l=0
h=1
degree = np.random.randint(2,11)
c_points = np.random.randint(2,32)
cx = np.linspace(0,1,c_points)
cy = np.random.rand(c_points)
z = np.polyfit(cx, cy, degree)
f = np.poly1d(z)
y = f(x)
# l,h=np.sort(np.random.rand(2))
y = MinMaxScaler(feature_range=(l,h)).fit_transform(y.reshape(-1, 1)).reshape(-1)
return y
## Cubic Spline Interpolation technique
def random_cubic_spline():
l=0
h=1
c_points = np.random.randint(4,32)
cx = np.linspace(0,1,c_points)
cy = np.random.rand(c_points)
z = interpolate.CubicSpline(cx, cy)
y = z(x)
# l,h=np.sort(np.random.rand(2))
y = MinMaxScaler(feature_range=(l,h)).fit_transform(y.reshape(-1, 1)).reshape(-1)
return y
func_families = [random_poly_fit, random_cubic_spline]
func = np.random.choice(func_families)
x = np.linspace(0,1,1024)
y = func()
plt.plot(x,y)
plt.show()
Add sin and cosine signals
from numpy.random import randint
x= np.linspace(0,1,1000)
for i in range(10):
y = randint(0,100)*np.sin(randint(0,100)*x)+randint(0,100)*np.cos(randint(0,100)*x)
y = MinMaxScaler(feature_range=(-1,1)).fit_transform(y.reshape(-1, 1)).reshape(-1)
plt.plot(x,y)
plt.show()
Output:
convolve sin and cosine signals
for i in range(10):
y = np.convolve(randint(0,100)*np.sin(randint(0,100)*x), randint(0,100)*np.cos(randint(0,100)*x), 'same')
y = MinMaxScaler(feature_range=(-1,1)).fit_transform(y.reshape(-1, 1)).reshape(-1)
plt.plot(x,y)
plt.show()
Output:
Related
I am plotting two lists of data against each other, namely freq and data. Freq stands for frequency, and data are the numeric observations for each frequency.
In the next step, I apply the ordinary linear least-squared regression between freq and data, using stats.linregress on the logarithmic scale. My aim is applying the linear regression inside the log-log scale, not on the normal scale.
Before doing so, I transform both freq and data into np.log10, since I plan to plot a straight linear regression line on the logarithmic scale, using plt.loglog.
Problem:
The problem is that the regression line, plotted in red color, is plotted far from the actual data, plotted in green color. I assume that there is a problem in combination with plt.loglog in my code, hence the visual distance between the green data and the red regression line. How can I fix this problem, so that the regression line plots on top of the actual data?
Here is my reproducible code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Data
freq = [0.0102539, 0.0107422, 0.0112305, 0.0117188, 0.012207, 0.0126953,
0.0131836]
data = [4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357]
# Plot log10 of freq vs. data
plt.loglog(freq, data, c="green")
# Linear regression
log_freq = np.log10(freq)
log_data = np.log10(data)
reg = stats.linregress(log_freq, log_data)
slope = reg[0]
intercept = reg[1]
plt.plot(freq, slope*log_freq + intercept, color="red")
And here is a screenshot of the code’s result:
You can convert your data sets to log base 10 first, then do linear regression and plot them accordingly.
Note that after the log transformation, the numbers inlog_freq will all be negative; therefore x-axis cannot be log-scaled.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Data
freq = np.array([0.0102539, 0.0107422, 0.0112305, 0.0117188, 0.012207, 0.0126953,
0.0131836])
data = np.array([4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357])
# transform date to log base 10
log_freq = np.log10(freq)
log_data = np.log10(data)
# Plot freq vs. data
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(log_freq, log_data, c="green", label='Original data (log 10 base)')
# Linear regression
reg = stats.linregress(log_freq, log_data)
# Plot fitted freq vs. data
ax.plot(log_freq, reg.slope * log_freq + reg.intercept, color="red",
label='Fitted line on the original data (log 10 base)')
plt.legend()
plt.tight_layout()
plt.show()
References:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html
https://numpy.org/doc/stable/reference/generated/numpy.log10.html#
First of all, I question the necessity of log-log axes, because the ranges of the data, or at least the ranges of the data that you've shown us, are limited on both coordinates.
In the code below, I have
computed the logarithms in base 10 of your arrays,
used the formulas for linear regression but using the logarithms of data to obtain the equation of a straight line:
y = a + b·x
in, so to say, the logarithmic space.
Because a straight line in log-space corresponds, in data-space, to a power law, y = pow(10, a)·pow(x, b), I have plotted
the original data, in log-log, and
the power law, also in log-log,
obtaining a straight line in the log-log representation.
import matplotlib.pyplot as plt
from math import log10
freq = [.0102539, .0107422, .0112305, .0117188, .012207, .0126953, .0131836]
data = [4.48575, 4.11893, 3.69591, 3.34766, 3.18452, 3.23554, 3.43357]
n = len(freq)
# the following block of code is the unfolding of the formulas in
# https://mathworld.wolfram.com/LeastSquaresFittingPowerLaw.html
# START ##############################################
lx, ly = [[log10(V) for V in v] for v in (freq, data)]
sum_x = sum(x for x in lx)
sum_y = sum(y for y in ly)
sum_x2 = sum(x**2 for x in lx)
sum_y2 = sum(y**2 for y in ly)
sum_xy = sum(x*y for x, y in zip(lx, ly))
# coefficients of a straight line "y = a + b x" in log-log space
b = (n*sum_xy - sum_x*sum_y)/(n*sum_x2-sum_x**2)
a = (sum_y - b*sum_x)/n
A = pow(10, a)
# END ##############################################
plt.loglog(freq, data)
plt.loglog(freq, [A*pow(x, b) for x in freq])
I'm looking for a function which mimics MATLAB's cscvn function in their Curve Fitting Toolbox, suitable for points in 3D space. The closest function I've found has been scipy.interpolate.splprep, which is capable of computing 3 dimensions but loses its accuracy with fewer data points. If smoothness is reduced to a point of fitting the points, the curve has kinks.
I have a discrete dataset made up of physical points (elevation data) that I'm looking to model, so the spline must pass through those points. There is a finite number of points at varying chord lengths from one another.
Here's a sample of the quick test function I've written to test Python splines. Unfortunately, I can't share my MATLAB code, but the cscvn function splines smoothly and passes through all data points.
import scipy as sp
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev, interp2d
x = np.linspace(0, 10, num = 20) #list of known x coordinates
y = 2*x #list of known y coordinates
z = x*x #list of known z coordinates
## Note: You must have more points than degree of the spline. if k = 3, must have 4 points min.
print([x,y,z])
tck, u = splprep([x,y,z], s = 26) # Generate function out of provided points, default k = 3
newPoints = splev(u, tck) # Creating spline points
print(newPoints)
ax = plt.axes(projection = "3d")
ax.plot3D(x, y, z, 'go') # Green is the actual 3D function
ax.plot3D(newPoints[:][0], newPoints[:][1], newPoints[:][2], 'r-') # Red is the spline
plt.show()
Here is an example of many points creating a smooth curve (red), but the line doesn't align with the physical data points (green).
Here is an example of kinks in the spline (red) created by too few data points (green). This is more akin to what my dataset looks like.
Change your U for:
unew = np.arange(0, 1.00, 0.005)
I'm plotting x and y points. This results in a curved line, the line is first bending and then after a point its straight and after some time it bends again. I want to retrieve those two points. Though x is linear and y is plotted against x but y is not linearly dependent on x.
I tried matplotlib for plotting and numpy polynomial functions, and am currently looking into splines, but it seems that for these y needs to be directly dependent on x.
Your data is noisy, so you can't use a simple numerical derivative. Instead, as you may have found already, you should fit it with a spline and then check the curvature of the spline.
Keying off this answer, you can fit a spline and calculate the second derivative (curvature) like this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline
x = file['n']
y = file['Ds/2']
y_spline = UnivariateSpline(x, y)
x_range = np.linspace(x[0], x[-1], 1000) # or could use x_range = x
y_spline_deriv = y_spl.derivative(n=2)
curvature = y_spline_deriv(x_range)
Then you can find the start and end of the straight region like this:
straight_points = np.where(curvature.abs() <= 0.1)[0] # pick your threshold
start_idx = straight_points[0]
end_idx = straight_points[-1]
start_x = x_range[start_idx]
end_x = x_range[end_idx]
Alternatively, if you're mainly interested in finding the flattest part of the curve (as shown in your graphic), you could try calculating the first derivative and then finding regions where the slope is within some small amount of the minimum slope anywhere in the data. In that case, just substitute y_spline_deriv = y_spl.derivative(n=1) in the code above.
I've been having invalid input errors when working with scipy interp2d function. It turns out the problem comes from the bisplrep function, as showed here:
import numpy as np
from scipy import interpolate
# Case 1
x = np.linspace(0,1)
y = np.zeros_like(x)
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z) # or interp2d
Returns: ValueError: Invalid inputs
It turned out the test data I was giving interp2d contained only one distinct value for the 2nd axis, as in the test sample above. The bisplrep function inside interp2d considers it as an invalid output:
This may be considered as an acceptable behaviour: interp2d & bisplrep expect a 2D grid, and I'm only giving them values along one line.
On a side note, I find the error message quite unclear. One could include a test in interp2d to deal with such cases: something along the lines of
if len(np.unique(x))==1 or len(np.unique(y))==1:
ValueError ("Can't build 2D splines if x or y values are all the same")
may be enough to detect this kind of invalid input, and raise a more explicit error message, or even directly call the more appropriate interp1d function (which works perfectly here)
I thought I had correctly understood the problem. However, consider the following code sample:
# Case 2
x = np.linspace(0,1)
y = x
z = np.ones_like(x)
tck = interpolate.bisplrep(x,y,z)
In that case, y being proportional to x, I'm also feeding bisplrep with data along one line. But, surprisingly, bisplrep is able to compute a 2D spline interpolation in that case. I plotted it:
# Plot
def plot_0to1(tck):
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X = np.linspace(0,1,10)
Y = np.linspace(0,1,10)
Z = interpolate.bisplev(X,Y,tck)
X,Y = np.meshgrid(X,Y)
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
plt.show()
plot_0to1(tck)
The result is the following:
where bisplrep seems to fill the gaps with 0's, as better showed when I extend the plot below:
Regarding of whether adding 0 is expected, my real question is: why does bisplrep work in Case 2 but not in Case 1?
Or, in other words: do we want it to return an error when 2D interpolation is fed with input along one direction only (Case 1 & 2 fail), or not? (Case 1 & 2 should return something, even if unpredicted).
I was originally going to show you how much of a difference it makes for 2d interpolation if your input data are oriented along the coordinate axes rather than in some general direction, but it turns out that the result would be even messier than I had anticipated. I tried using a random dataset over an interpolated rectangular mesh, and comparing that to a case where the same x and y coordinates were rotated by 45 degrees for interpolation. The result was abysmal.
I then tried doing a comparison with a smoother dataset: turns out scipy.interpolate.interp2d has quite a few issues. So my bottom line will be "use scipy.interpolate.griddata".
For instructive purposes, here's my (quite messy) code:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.cm as cm
n = 10 # rough number of points
dom = np.linspace(-2,2,n+1) # 1d input grid
x1,y1 = np.meshgrid(dom,dom) # 2d input grid
z = np.random.rand(*x1.shape) # ill-conditioned sample
#z = np.cos(x1)*np.sin(y1) # smooth sample
# first interpolator with interp2d:
fun1 = interp.interp2d(x1,y1,z,kind='linear')
# construct twice finer plotting and interpolating mesh
plotdom = np.linspace(-1,1,2*n+1) # for interpolation and plotting
plotx1,ploty1 = np.meshgrid(plotdom,plotdom)
plotz1 = fun1(plotdom,plotdom) # interpolated points
# construct 45-degree rotated input and interpolating meshes
rotmat = np.array([[1,-1],[1,1]])/np.sqrt(2) # 45-degree rotation
x2,y2 = rotmat.dot(np.vstack([x1.ravel(),y1.ravel()])) # rotate input mesh
plotx2,ploty2 = rotmat.dot(np.vstack([plotx1.ravel(),ploty1.ravel()])) # rotate plotting/interp mesh
# interpolate on rotated mesh with interp2d
# (reverse rotate by using plotx1, ploty1 later!)
fun2 = interp.interp2d(x2,y2,z.ravel(),kind='linear')
# I had to generate the rotated points element-by-element
# since fun2() accepts only rectangular meshes as input
plotz2 = np.array([fun2(xx,yy) for (xx,yy) in zip(plotx2.ravel(),ploty2.ravel())])
# try interpolating with griddata
plotz3 = interp.griddata(np.array([x1.ravel(),y1.ravel()]).T,z.ravel(),np.array([plotx1.ravel(),ploty1.ravel()]).T,method='linear')
plotz4 = interp.griddata(np.array([x2,y2]).T,z.ravel(),np.array([plotx2,ploty2]).T,method='linear')
# function to plot a surface
def myplot(X,Y,Z):
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X, Y, Z,rstride=1, cstride=1,
linewidth=0, antialiased=False,cmap=cm.coolwarm)
plt.show()
# plot interp2d versions
myplot(plotx1,ploty1,plotz1) # Cartesian meshes
myplot(plotx1,ploty1,plotz2.reshape(2*n+1,-1)) # rotated meshes
# plot griddata versions
myplot(plotx1,ploty1,plotz3.reshape(2*n+1,-1)) # Cartesian meshes
myplot(plotx1,ploty1,plotz4.reshape(2*n+1,-1)) # rotated meshes
So here's a gallery of the results. Using random input z data, and interp2d, Cartesian (left) vs rotated interpolation (right):
Note the horrible scale on the right side, noting that the input points are between 0 and 1. Even its mother wouldn't recognize the data set. Note that there are runtime warnings during the evaluation of the rotated data set, so we're being warned that it's all crap.
Now let's do the same with griddata:
We should note that these figures are much closer to each other, and they seem to make way more sense than the output of interp2d. For instance, note the overshoot in the scale of the very first figure.
These artifacts always arise between input data points. Since it's still interpolation, the input points have to be reproduced by the interpolating function, but it's pretty weird that a linear interpolating function overshoots between data points. It's clear that griddata doesn't suffer from this issue.
Consider an even more clear case: the other set of z values, which are smooth and deterministic. The surfaces with interp2d:
HELP! Call the interpolation police! Already the Cartesian input case has inexplicable (well, at least by me) spurious features in it, and the rotated input case poses the threat of s͔̖̰͕̞͖͇ͣ́̈̒ͦ̀̀ü͇̹̞̳ͭ̊̓̎̈m̥̠͈̣̆̐ͦ̚m̻͑͒̔̓ͦ̇oͣ̐ͣṉ̟͖͙̆͋i͉̓̓ͭ̒͛n̹̙̥̩̥̯̭ͤͤͤ̄g͈͇̼͖͖̭̙ ̐z̻̉ͬͪ̑ͭͨ͊ä̼̣̬̗̖́̄ͥl̫̣͔͓̟͛͊̏ͨ͗̎g̻͇͈͚̟̻͛ͫ͛̅͋͒o͈͓̱̥̙̫͚̾͂.
So let's do the same with griddata:
The day is saved, thanks to The Powerpuff Girls scipy.interpolate.griddata. Homework: check the same with cubic interpolation.
By the way, a very short answer to your original question is in help(interp.interp2d):
| Notes
| -----
| The minimum number of data points required along the interpolation
| axis is ``(k+1)**2``, with k=1 for linear, k=3 for cubic and k=5 for
| quintic interpolation.
For linear interpolation you need at least 4 points along the interpolation axis, i.e. at least 4 unique x and y values have to be present to get a meaningful result. Check these:
nvals = 3 # -> RuntimeWarning
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
nvals = 4 # -> no problem here
x = np.linspace(0,1,10)
y = np.random.randint(low=0,high=nvals,size=x.shape)
z = x
interp.interp2d(x,y,z)
And of course this all ties in to you question like this: it makes a huge difference if your geometrically 1d data set is along one of the Cartesian axes, or if it's in a general way such that the coordinate values assume various different values. It's probably meaningless (or at least very ill-defined) to try 2d interpolation from a geometrically 1d data set, but at least the algorithm shouldn't break if your data are along a general direction of the x,y plane.
suppose I have
t= [0,7,10,17,23,29,31]
f_t= [4,3,11,19,12,9,17]
and I have plotted f_t vs t.
Now from plotting these 7 data points, I want to retrieve 100 data points and save them in a text file. What do I have to do?
Note that I am not asking about the fitting of the plot; I know between two points the plot is linear.
What I am asking If I create a array like t=np.arange(0,31,.1), then what is the corresponding array of f_t which agrees well with the previous plot, i.e., for any t between t=0 to t=7, f_t will be determined by using a straight line connecting (0,4) and (7,3), and so on.
You should use a linear regression, that gives you a straight line formula, in which you can grasp as many points as you want.
If the line is more of a curve, then you should try to have a polynomial regression of higher degree.
ie:
import pylab
import numpy
py_x = [0,7,10,17,23,29,31]
py_y = [4,3,11,19,12,9,17]
x = numpy.asarray(py_x)
y = numpy.asarray(py_y)
poly = numpy.polyfit(x,y,1) # 1 is the degree here. If you want curves, put 2, 3 or 5...
poly is now the polynome you can use to calculate other points with.
for z in range(100):
print numpy.polyval(poly,z) #this returns the interpolated f(z)
The function np.interp will do linear interpolation between your data points:
f2 = np.interp(np.arange(0,31,.1), t, ft)