I have done some work in Python, but I'm new to scipy. I'm trying to use the methods from the interpolate library to come up with a function that will approximate a set of data.
I've looked up some examples to get started, and could get the sample code below working in Python(x,y):
import numpy as np
from scipy.interpolate import interp1d, Rbf
import pylab as P
# show the plot (empty for now)
P.clf()
P.show()
# generate random input data
original_data = np.linspace(0, 1, 10)
# random noise to be added to the data
noise = (np.random.random(10)*2 - 1) * 1e-1
# calculate f(x)=sin(2*PI*x)+noise
f_original_data = np.sin(2 * np.pi * original_data) + noise
# create interpolator
rbf_interp = Rbf(original_data, f_original_data, function='gaussian')
# Create new sample data (for input), calculate f(x)
#using different interpolation methods
new_sample_data = np.linspace(0, 1, 50)
rbf_new_sample_data = rbf_interp(new_sample_data)
# draw all results to compare
P.plot(original_data, f_original_data, 'o', ms=6, label='f_original_data')
P.plot(new_sample_data, rbf_new_sample_data, label='Rbf interp')
P.legend()
The plot is displayed as follows:
Now, is there any way to get a polynomial expression representing the interpolated function created by Rbf (i.e. the method created as rbf_interp)?
Or, if this is not possible with Rbf, any suggestions using a different interpolation method, another library, or even a different tool are also welcome.
The RBF uses whatever functions you ask, it is of course a global model, so yes there is a function result, but of course its true that you will probably not like it since it is a sum over many gaussians. You got:
rbf.nodes # the factors for each of the RBF (probably gaussians)
rbf.xi # the centers.
rbf.epsilon # the width of the gaussian, but remember that the Norm plays a role too
So with these things you can calculate the distances (with rbf.xi then pluggin the distances with the factors in rbf.nodes and rbf.epsilon into the gaussian (or whatever function you asked it to use). (You can check the python code of __call__ and _call_norm)
So you get something like sum(rbf.nodes[i] * gaussian(rbf.epsilon, sqrt((rbf.xi - center)**2)) for i, center in enumerate(rbf.nodes)) to give some funny half code/formula, the RBFs function is written in the documentation, but you can also check the python code.
The answer is no, there is no "nice" way to write down the formula, or at least not in a short way. Some types of interpolations, like RBF and Loess, do not directly search for a parametric mathematical function to fit to the data and instead they calculate the value of each new data point separately as a function of the other points.
These interpolations are guaranteed to always give a good fit for your data (such as in your case), and the reason for this is that to describe them you need a very large number of parameters (basically all your data points). Think of it this way: you could interpolate linearly by connecting consecutive data points with straight lines. You could fit any data this way and then describe the function in a mathematical form, but it would take a large number of parameters (at least as many as the number of points). Actually what you are doing right now is pretty much a smoothed version of that.
If you want the formula to be short, this means you want to describe the data with a mathematical function that does not have many parameters (specifically the number of parameters should be much lower than the number of data points). Such examples are logistic functions, polynomial functions and even the sine function (that you used to generate the data). Obviously, if you know which function generated the data that will be the function you want to fit.
RBF likely stands for Radial Basis Function. I wouldn't be surprised if scipy.interpolate.Rbf was the function you're looking for.
However, I doubt you'll be able to find a polynomial expression to represent your result.
If you want to try different interpolation methods, check the corresponding Scipy documentation, that gives link to RBF, splines...
I don’t think SciPy’s RBF will give you the actual function. But one thing that you could do is sample the function that SciPy’s RBF gave you (ie 100 points). Then use Lagrange interpretation with those points. This will generate a polynomial function for you. Here is an example on how this would look. If you do not want to use Lagrange interpolation, You can also use “Newton’s dividend difference method” to generate a polynomial function.
My answer is based on numpy only :
import matplotlib.pyplot as plt
import numpy as np
x_data = [324, 531, 806, 1152, 1576, 2081, 2672, 3285, 3979, 4736]
y_data = [20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
x = np.array(x_data)
y = np.array(y_data)
model = np.poly1d(np.polyfit(x, y, 2))
ynew = model(x)
plt.plot(x, y, 'o', x, ynew, '-' , )
plt.ylabel( str(model).strip())
plt.show()
Related
I need to non-linearly expand on each pixel value from 1 dim pixel vector with taylor series expansion of specific non-linear function (e^x or log(x) or log(1+e^x)), but my current implementation is not right to me at least based on taylor series concepts. The basic intuition behind is taking pixel array as input neurons for a CNN model where each pixel should be non-linearly expanded with taylor series expansion of non-linear function.
new update 1:
From my understanding from taylor series, taylor series is written for a function F of a variable x in terms of the value of the function F and it's derivatives in for another value of variable x0. In my problem, F is function of non-linear transformation of features (a.k.a, pixels), x is each pixel value, x0 is maclaurin series approximation at 0.
new update 2
if we use taylor series of log(1+e^x) with approximation order of 2, each pixel value will yield two new pixel by taking first and second expansion terms of taylor series.
graphic illustration
Here is the graphical illustration of the above formulation:
Where X is pixel array, p is approximation order of taylor series, and α is the taylor expansion coefficient.
I wanted to non-linearly expand pixel vectors with taylor series expansion of non-linear function like above illustration demonstrated.
My current attempt
This is my current attempt which is not working correctly for pixel arrays. I was thinking about how to make the same idea applicable to pixel arrays.
def taylor_func(x, approx_order=2):
x_ = x[..., None]
x_ = tf.tile(x_, multiples=[1, 1, approx_order+ 1])
pows = tf.range(0, approx_order + 1, dtype=tf.float32)
x_p = tf.pow(x_, pows)
x_p_ = x_p[..., None]
return x_p_
x = Input(shape=(4,4,3))
x_new = Lambda(lambda x: taylor_func(x, max_pow))(x)
my new updated attempt:
x_input= Input(shape=(32, 32,3))
def maclurin_exp(x, powers=2):
out= 0
for k in range(powers):
out+= ((-1)**k) * (x ** (2*k)) / (math.factorial(2 * k))
return res
x_input_new = Lambda(lambda x: maclurin_exp(x, max_pow))(x_input)
This attempt doesn't yield what the above mathematical formulation describes. I bet I missed something while doing the expansion. Can anyone point me on how to make this correct? Any better idea?
goal
I wanted to take pixel vector and make non-linearly distributed or expanded with taylor series expansion of certain non-linear function. Is there any possible way to do this? any thoughts? thanks
This is a really interesting question but I can't say that I'm clear on it as of yet. So, while I have some thoughts, I might be missing the thrust of what you're looking to do.
It seems like you want to develop your own activation function instead of using something RELU or softmax. Certainly no harm there. And you gave three candidates: e^x, log(x), and log(1+e^x).
Notice log(x) asymptotically approaches negative infinity x --> 0. So, log(x) is right out. If that was intended as a check on the answers you get or was something jotted down as you were falling asleep, no worries. But if it wasn't, you should spend some time and make sure you understand the underpinnings of what you doing because the consequences can be quite high.
You indicated you were looking for a canonical answer and you get a two for one here. You get both a canonical answer and highly performant code.
Considering you're not likely to able to write faster, more streamlined code than the folks of SciPy, Numpy, or Pandas. Or, PyPy. Or Cython for that matter. Their stuff is the standard. So don't try to compete against them by writing your own, less performant (and possibly bugged) version which you will then have to maintain as time passes. Instead, maximize your development and run times by using them.
Let's take a look at the implementation e^x in SciPy and give you some code to work with. I know you don't need a graph for what you're at this stage but they're pretty and can help you understand how they Taylor (or Maclaurin, aka Euler-Maclaurin) will work as the order of the approximation changes. It just so happens that SciPy has Taylor approximation built-in.
import scipy
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import approximate_taylor_polynomial
x = np.linspace(-10.0, 10.0, num=100)
plt.plot(x, np.exp(x), label="e^x", color = 'black')
for degree in np.arange(1, 4, step=1):
e_to_the_x_taylor = approximate_taylor_polynomial(np.exp, 0, degree, 1, order=degree + 2)
plt.plot(x, e_to_the_x_taylor(x), label=f"degree={degree}")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.0, shadow=True)
plt.tight_layout()
plt.axis([-10, 10, -10, 10])
plt.show()
That produces this:
But let's say if you're good with 'the maths', so to speak, and are willing to go with something slightly slower if it's more 'mathy' as in it handles symbolic notation well. For that, let me suggest SymPy.
And with that in mind here is a bit of SymPy code with a graph because, well, it looks good AND because we need to go back and hit another point again.
from sympy import series, Symbol, log, E
from sympy.functions import exp
from sympy.plotting import plot
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = 13,10
plt.rcParams['lines.linewidth'] = 2
x = Symbol('x')
def taylor(function, x0, n):
""" Defines Taylor approximation of a given function
function -- is our function which we want to approximate
x0 -- point where to approximate
n -- order of approximation
"""
return function.series(x,x0,n).removeO()
# I get eyestain; feel free to get rid of this
plt.rcParams['figure.figsize'] = 10, 8
plt.rcParams['lines.linewidth'] = 1
c = log(1 + pow(E, x))
plt = plot(c, taylor(c,0,1), taylor(c,0,2), taylor(c,0,3), taylor(c,0,4), (x,-5,5),legend=True, show=False)
plt[0].line_color = 'black'
plt[1].line_color = 'red'
plt[2].line_color = 'orange'
plt[3].line_color = 'green'
plt[4].line_color = 'blue'
plt.title = 'Taylor Series Expansion for log(1 +e^x)'
plt.show()
I think either option will get you where you need go.
Ok, now for the other point. You clearly stated after a bit of revision that log(1 +e^x) was your first choice. But the others don't pass the sniff test. e^x vacillates wildly as the degree of the polynomial changes. Because of the opaqueness of algorithms and how few people can conceptually understand this stuff, Data Scientists can screw things up to a degree people can't even imagine. So make sure you're very solid on theory for this.
One last thing, consider looking at the CDF of the Erlang Distribution as an activation function (assuming I'm right and you're looking to roll your own activation function as an area of research). I don't think anyone has looked at that but it strikes as promising. I think you could break out each channel of the RGB as one of the two parameters, with the other being the physical coordinate.
You can use tf.tile and tf.math.pow to generate the elements of the series expansion. Then you can use tf.math.cumsum to compute the partial sums s_i. Eventually you can multiply with the weights w_i and compute the final sum.
Here is a code sample:
import math
import tensorflow as tf
x = tf.keras.Input(shape=(32, 32, 3)) # 3-channel RGB.
# The following is determined by your series expansion and its order.
# For example: log(1 + exp(x)) to 3rd order.
# https://www.wolframalpha.com/input/?i=taylor+series+log%281+%2B+e%5Ex%29
order = 3
alpha = tf.constant([1/2, 1/8, -1/192]) # Series coefficients.
power = tf.constant([1.0, 2.0, 4.0])
offset = math.log(2)
# These are the weights of the network; using a constant for simplicity here.
# The shape must coincide with the above order of series expansion.
w_i = tf.constant([1.0, 1.0, 1.0])
elements = offset + alpha * tf.math.pow(
tf.tile(x[..., None], [1, 1, 1, 1, order]),
power
)
s_i = tf.math.cumsum(elements, axis=-1)
y = tf.math.reduce_sum(w_i * s_i, axis=-1)
I have a cloud of data points (x,y) that I would like to interpolate and smooth.
Currently, I am using scipy :
from scipy.interpolate import interp1d
from scipy.signal import savgol_filter
spl = interp1d(Cloud[:,1], Cloud[:,0]) # interpolation
x = np.linspace(Cloud[:,1].min(), Cloud[:,1].max(), 1000)
smoothed = savgol_filter(spl(x), 21, 1) #smoothing
This is working pretty well, except that I would like to give some weights to the data points given at interp1d. Any suggestion for another function that is handling this ?
Basically, I thought that I could just multiply the occurrence of each point of the cloud according to its weight, but that is not very optimized as it increases a lot the number of points to interpolate, and slows down the algorithm ..
The default interp1d uses linear interpolation, i.e., it simply computes a line between two points. A weighted interpolation does not make much sense mathematically in such scenario - there is only one way in euclidean space to make a straight line between two points.
Depending on your goal, you can look into other methods of interpolation, e.g., B-splines. Then you can use scipy's scipy.interpolate.splrep and set the w argument:
w - Strictly positive rank-1 array of weights the same length as x and y. The weights are used in computing the weighted least-squares spline fit. If the errors in the y values have standard-deviation given by the vector d, then w should be 1/d. Default is ones(len(x)).
I am having issues in implementing some less-than-usual interpolation problem. I have some (x,y) data points scattered along some curve which a priori I don't know, and I want to reconstruct this curve at my best, interpolating my point with min square error. I thought of using scipy.interpolate.splrep for this purpose (but maybe there are better options you would advise to use). The additional difficulty in my case, is that I want to constrain the spline curve to pass through some specific points of my original data. I assume that playing with knots and weights could make the trick, but I don't know how to do so (I am procrastinating avoidance of spline interpolation theory besides basic fitting procedures). Also, for some undisclosed reasons, when I try to setup knots in my splrep I get the same error of this post, which keeps complicating things. The following is my sample code:
from __future__ import division
import numpy as np
import scipy.interpolate as spi
import matplotlib.pylab as plt
# Some surrogate sample data
f = lambda x : x**2 - x/2.
x = np.arange(0.,20.,0.1)
y = f(4*(x + np.random.normal(size=np.size(x))))
# I want to use spline interpolation with least-square fitting criterion, making sure though that the spline starts
# from the origin (or in general passes through a precise point of my dataset).
# In my case for example I would like the spline to originate from the point in x=0. So I attempted to include as first knot x=0...
# but it won't work, nor I am sure this is the right procedure...
fy = spi.splrep(x,y)
fy = spi.splrep(x,y,t=fy[0])
yy = spi.splev(x,fy)
plt.plot(x,y,'-',x,yy,'--')
plt.show()
which despite the fact I am even passing knots computed from a first call of splrep, it will give me:
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/fitpack.py", line 289, in splrep
res = _impl.splrep(x, y, w, xb, xe, k, task, s, t, full_output, per, quiet)
File "/usr/lib64/python2.7/site-packages/scipy/interpolate/_fitpack_impl.py", line 515, in splrep
raise _iermess[ier][1](_iermess[ier][0])
ValueError: Error on input data
You use the weights argument of splrep: can give these points you need fixed very large weights. This is a workaround for sure, so keep an eye on the fit quality and stability.
Setting high weights for specific points is indeed a working solution as suggested by #ev-br. In addition, because there is no direct way to match derivatives at the extrema of the curve, the same rationale can be applied in this case as well. Say you want the derivative in y[0] and y[-1] match the derivative of your data points, then you add large weights also for y[1] and y[-2], i.e.
weights = np.ones(len(x))
weights[[0,-1]] = 100 # Promote spline interpolant through first and last point
weights[[1,-2]] = 50 # Make spline interpolant derivative tend to derivatives at first/last point
fy = spi.splrep(x,y,w=weights,s=0.1)
yy = spi.splev(x,fy)
I have a moderate size data set, namely 20000 x 2 floats in a two column matrix. The first column is the the x column which represents the distance to the original point along a trajectory, another column is the y column which represents the work has done to the object. This data set is obtained from lab operations, so it's fairly arbitrary. I've already turned this structure into numpy array. I want to plot y vs x in a figure with a smooth curve. So I hope the following code could help me:
x_smooth = np.linspace(x.min(),x.max(), 20000)
y_smooth = spline(x, y, x_smooth)
plt.plot(x_smooth, y_smooth)
plt.show()
However, when my program execute the line y_smooth = spline(x,y,x_smooth), it takes a very long time,say 10 min, and even sometimes it will blow my memory that I have to restart my machine. I tried to reduce the chunk number to 200 and 2000 and none of them works. Then I checked the official scipy reference: scipy.interpolate.spline here. And they said that spline is deprecated in v 0.19, but I'm not using the new version. If spline is deprecated for quite a bit of the time, how to use the equivalent Bspline now? If spline is still functioning, then what causes the slow performance
One portion of my data could look like this:
13.202 0.0
13.234738 -0.051354643759
12.999116 0.144464320836
12.86252 0.07396528119
13.1157 0.10019738758
13.357109 -0.30288563381
13.234004 -0.045792536285
12.836279 0.0362257166275
12.851597 0.0542649286915
13.110691 0.105297378401
13.220619 -0.0182963209185
13.092143 0.116647353635
12.545676 -0.641112204849
12.728248 -0.147460703493
12.874176 0.0755861585235
12.746764 -0.111583725833
13.024995 0.148079528382
13.106033 0.119481137144
13.327233 -0.197666132456
13.142423 0.0901867159545
Several issues here. First and foremost, spline fitting you're trying to use is global. This means that you're solving a system of linear equations of the size 20000 at the construction time (evaluations are weakly sensitive to the dataset size though). This explains why the spline construction is slow.
scipy.interpolate.spline, furthermore, does linear algebra with full matrices --- hence memory consumption. This is precisely why it's deprecated from scipy 0.19.0 on.
The recommended replacement, available in scipy 0.19.0, is the BSpline/ make_interp_spline combo:
>>> spl = make_interp_spline(x, y, k=3) # returns a BSpline object
>>> y_new = spl(x_new) # evaluate
Notice it is not BSpline(x, y, k): BSpline objects do not know anything about the data or fitting or interpolation.
If you are using older scipy versions, your options are:
CubicSpline(x, y) for cubic splines
splrep(x, y, s=0) / splev combo.
However, you may want to think if you really need twice continuously differentiable functions. If only once differentiable functions are smooth enough for your purposes, then you can use local spline interpolations, e.g. Akima1DInterpolator or PchipInterpolator:
In [1]: import numpy as np
In [2]: from scipy.interpolate import pchip, splmake
In [3]: x = np.arange(1000)
In [4]: y = x**2
In [5]: %timeit pchip(x, y)
10 loops, best of 3: 58.9 ms per loop
In [6]: %timeit splmake(x, y)
1 loop, best of 3: 5.01 s per loop
Here splmake is what spline uses under the hood, and it's also deprecated.
Most interpolation methods in SciPy are function-generating, i.e. they return function which you can then execute on your x data. For example, using CubicSpline method, which connects all points with pointwise cubic spline would be
from scipy.interpolate import CubicSpline
spline = CubicSpline(x, y)
y_smooth = spline(x_smooth)
Based on your description I think that you correctly want to use BSpline. To do so, follow the pattern above, i.e.
from scipy.interpolate import BSpline
order = 2 # smoothness order
spline = BSpline(x, y, order)
y_smooth = spline(x_smooth)
Since you have such amount of data, it probably must be very noisy. I'd suggest using bigger spline order, which relates to the number of knots used for interpolation.
In both cases, your knots, i.e. x and y, should be sorted. These are 1D interpolation (since you are using only x_smooth as input). You can sort them using np.argsort. In short:
from scipy.interpolate import BSpline
sort_idx = np.argsort(x)
x_sorted = x[sort_idx]
y_sorted = y[sort_idx]
order = 20 # smoothness order
spline = BSpline(x_sorted, y_sorted, order)
y_smooth = spline(x_smooth)
plt.plot(x_sorted, y_sorted, '.')
plt.plot(x_smooth, y_smooth, '-')
plt.show()
My problem can be generalize to how to smoothly plot 2d graphs when data points are randomized. Since you are only dealing with two columns of data, if you sort your data by independent variable, at least your data points will be connected in order, and that's how matplotlib connects your data points.
#Dawid Laszuk has provided one solution to sort data by independent variable, and I'll display mine here:
plotting_columns = []
for i in range(len(x)):
plotting_columns.append(np.array([x[i],y[i]]))
plotting_columns.sort(key=lambda pair : pair[0])
plotting_columns = np.array(plotting_columns)
traditional sort() by filter condition could also do the sorting job efficient here.
But it's just your first step. The following steps are not hard either, to smooth your graph, you also want to keep your independent variable in linear ascending order with identical step interval, so
x_smooth = np.linspace(x.min(), x.max(), num_steps)
is enough to do the job. Usually, if you have plenty of data points, for example, more than 10000 points (correctness and accuracy are not human verifiable), you just want to plot the significant points to display the trend, then only smoothing x is enough. So you can plt.plot(x_smooth,y) simply.
You will notice that x_smooth will generate many x values that will not have corresponding y value. When you want to maintain the correctness, you need to use line fitting functions. As #ev-br demonstrated in his answer, spline functions are expensive on purpose. Therefore you might want to do some simpler trick. I smoothed my graph without using those functions. And you have some simple steps to it.
First, round your values so that your data will not vary too much in small intervals. (You can skip this step)
You can change one line when you constructing the plotting_columns as:
plotting_columns.append(np.around(np.array(x[i],y[i]), decimal=4))
After done this, you can filter out the point that you don't want to plot by choosing the points close to the x_smooth values:
new_plots = []
for i in range(len(x_smooth)):
if plotting_columns[:,0][i] >= x_smooth[i] - error and plotting_columns[:,0][i]< x_smooth[i] + error:
new_plots.append(plotting_columns[i])
else:
# Remove all points between the interval #
This is how I solved my problems.
I'm trying to interpolate some data for the purpose of plotting. For instance, given N data points, I'd like to be able to generate a "smooth" plot, made up of 10*N or so interpolated data points.
My approach is to generate an N-by-10*N matrix and compute the inner product the original vector and the matrix I generated, yielding a 1-by-10*N vector. I've already worked out the math I'd like to use for the interpolation, but my code is pretty slow. I'm pretty new to Python, so I'm hopeful that some of the experts here can give me some ideas of ways I can try to speed up my code.
I think part of the problem is that generating the matrix requires 10*N^2 calls to the following function:
def sinc(x):
import math
try:
return math.sin(math.pi * x) / (math.pi * x)
except ZeroDivisionError:
return 1.0
(This comes from sampling theory. Essentially, I'm attempting to recreate a signal from its samples, and upsample it to a higher frequency.)
The matrix is generated by the following:
def resampleMatrix(Tso, Tsf, o, f):
from numpy import array as npar
retval = []
for i in range(f):
retval.append([sinc((Tsf*i - Tso*j)/Tso) for j in range(o)])
return npar(retval)
I'm considering breaking up the task into smaller pieces because I don't like the idea of an N^2 matrix sitting in memory. I could probably make 'resampleMatrix' into a generator function and do the inner product row-by-row, but I don't think that will speed up my code much until I start paging stuff in and out of memory.
Thanks in advance for your suggestions!
This is upsampling. See Help with resampling/upsampling for some example solutions.
A fast way to do this (for offline data, like your plotting application) is to use FFTs. This is what SciPy's native resample() function does. It assumes a periodic signal, though, so it's not exactly the same. See this reference:
Here’s the second issue regarding time-domain real signal interpolation, and it’s a big deal indeed. This exact interpolation algorithm provides correct results only if the original x(n) sequence is periodic within its full time interval.
Your function assumes the signal's samples are all 0 outside of the defined range, so the two methods will diverge away from the center point. If you pad the signal with lots of zeros first, it will produce a very close result. There are several more zeros past the edge of the plot not shown here:
Cubic interpolation won't be correct for resampling purposes. This example is an extreme case (near the sampling frequency), but as you can see, cubic interpolation isn't even close. For lower frequencies it should be pretty accurate.
If you want to interpolate data in a quite general and fast way, splines or polynomials are very useful. Scipy has the scipy.interpolate module, which is very useful. You can find many examples in the official pages.
Your question isn't entirely clear; you're trying to optimize the code you posted, right?
Re-writing sinc like this should speed it up considerably. This implementation avoids checking that the math module is imported on every call, doesn't do attribute access three times, and replaces exception handling with a conditional expression:
from math import sin, pi
def sinc(x):
return (sin(pi * x) / (pi * x)) if x != 0 else 1.0
You could also try avoiding creating the matrix twice (and holding it twice in parallel in memory) by creating a numpy.array directly (not from a list of lists):
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
for j in xrange(o):
retval[i][j] = sinc((Tsf*i - Tso*j)/Tso)
return retval
(replace xrange with range on Python 3.0 and above)
Finally, you can create rows with numpy.arange as well as calling numpy.sinc on each row or even on the entire matrix:
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.zeros((f, o))
for i in xrange(f):
retval[i] = numpy.arange(Tsf*i / Tso, Tsf*i / Tso - o, -1.0)
return numpy.sinc(retval)
This should be significantly faster than your original implementation. Try different combinations of these ideas and test their performance, see which works out the best!
I'm not quite sure what you're trying to do, but there are some speedups you can do to create the matrix. Braincore's suggestion to use numpy.sinc is a first step, but the second is to realize that numpy functions want to work on numpy arrays, where they can do loops at C speen, and can do it faster than on individual elements.
def resampleMatrix(Tso, Tsf, o, f):
retval = numpy.sinc((Tsi*numpy.arange(i)[:,numpy.newaxis]
-Tso*numpy.arange(j)[numpy.newaxis,:])/Tso)
return retval
The trick is that by indexing the aranges with the numpy.newaxis, numpy converts the array with shape i to one with shape i x 1, and the array with shape j, to shape 1 x j. At the subtraction step, numpy will "broadcast" the each input to act as a i x j shaped array and the do the subtraction. ("Broadcast" is numpy's term, reflecting the fact no additional copy is made to stretch the i x 1 to i x j.)
Now the numpy.sinc can iterate over all the elements in compiled code, much quicker than any for-loop you could write.
(There's an additional speed-up available if you do the division before the subtraction, especially since inthe latter the division cancels the multiplication.)
The only drawback is that you now pay for an extra Nx10*N array to hold the difference. This might be a dealbreaker if N is large and memory is an issue.
Otherwise, you should be able to write this using numpy.convolve. From what little I just learned about sinc-interpolation, I'd say you want something like numpy.convolve(orig,numpy.sinc(numpy.arange(j)),mode="same"). But I'm probably wrong about the specifics.
If your only interest is to 'generate a "smooth" plot' I would just go with a simple polynomial spline curve fit:
For any two adjacent data points the coefficients of a third degree polynomial function can be computed from the coordinates of those data points and the two additional points to their left and right (disregarding boundary points.) This will generate points on a nice smooth curve with a continuous first dirivitive. There's a straight forward formula for converting 4 coordinates to 4 polynomial coefficients but I don't want to deprive you of the fun of looking it up ;o).
Here's a minimal example of 1d interpolation with scipy -- not as much fun as reinventing, but.
The plot looks like sinc, which is no coincidence:
try google spline resample "approximate sinc".
(Presumably less local / more taps ⇒ better approximation,
but I have no idea how local UnivariateSplines are.)
""" interpolate with scipy.interpolate.UnivariateSpline """
from __future__ import division
import numpy as np
from scipy.interpolate import UnivariateSpline
import pylab as pl
N = 10
H = 8
x = np.arange(N+1)
xup = np.arange( 0, N, 1/H )
y = np.zeros(N+1); y[N//2] = 100
interpolator = UnivariateSpline( x, y, k=3, s=0 ) # s=0 interpolates
yup = interpolator( xup )
np.set_printoptions( 1, threshold=100, suppress=True ) # .1f
print "yup:", yup
pl.plot( x, y, "green", xup, yup, "blue" )
pl.show()
Added feb 2010: see also basic-spline-interpolation-in-a-few-lines-of-numpy
Small improvement. Use the built-in numpy.sinc(x) function which runs in compiled C code.
Possible larger improvement: Can you do the interpolation on the fly (as the plotting occurs)? Or are you tied to a plotting library that only accepts a matrix?
I recommend that you check your algorithm, as it is a non-trivial problem. Specifically, I suggest you gain access to the article "Function Plotting Using Conic Splines" (IEEE Computer Graphics and Applications) by Hu and Pavlidis (1991). Their algorithm implementation allows for adaptive sampling of the function, such that the rendering time is smaller than with regularly spaced approaches.
The abstract follows:
A method is presented whereby, given a
mathematical description of a
function, a conic spline approximating
the plot of the function is produced.
Conic arcs were selected as the
primitive curves because there are
simple incremental plotting algorithms
for conics already included in some
device drivers, and there are simple
algorithms for local approximations by
conics. A split-and-merge algorithm
for choosing the knots adaptively,
according to shape analysis of the
original function based on its
first-order derivatives, is
introduced.