Suppose I have the following code
import numpy as np
f = lambda x,y: (np.sum(x) + np.sum(y))**2
x = np.array([1,2,3])
y = np.array([4,5,6])
df_dx
df_dy
df2_dx2
df2_dxdy
...
is there a fast way to compute all the derivatives (single and mixed) of such a function? The module should perform the classical finite difference technique at the array level, ie adding h= tol elementwise to the array variables (depending on the derivative), computing the function and dividing by h.
(My real case is much more complicated as it involves an array valued function coming from a DLL I can't modify...the number of variables is arbitrary, please do not focus on this particular toy example)
Related
I have the following problem. I have a function f defined in python using numpy functions. The function is smooth and integrable on positive reals. I want to construct the double antiderivative of the function (assuming that both the value and the slope of the antiderivative at 0 are 0) so that I can evaluate it on any positive real smaller than 100.
Definition of antiderivative of f at x:
integrate f(s) with s from 0 to x
Definition of double antiderivative of f at x:
integrate (integrate f(t) with t from 0 to s) with s from 0 to x
The actual form of f is not important, so I will use a simple one for convenience. But please note that even though my example has a known closed form, my actual function does not.
import numpy as np
f = lambda x: np.exp(-x)*x
My solution is to construct the antiderivative as an array using naive numerical integration:
N = 10000
delta = 100/N
xs = np.linspace(0,100,N+1)
vs = f(xs)
avs = np.cumsum(vs)*delta
aavs = np.cumsum(avs)*delta
This of course works but it gives me arrays instead of functions. But this is not a big problem as I can interpolate aavs using a spline to get a function and get rid of the arrays.
from scipy.interpolate import UnivariateSpline
aaf = UnivariateSpline(xs, aavs)
The function aaf is approximately the double antiderivative of f.
The problem is that even though it works, there is quite a bit of overhead before I can get my function and precision is expensive.
My other idea was to interpolate f by a spline and take the antiderivative of that, however this introduces numerical errors that are too big for what I want to use the function.
Is there any better way to do that? By better I mean faster without sacrificing accuracy.
Edit: What I hope is possible is to use some kind of Fourier transform to avoid integrating twice. I hope that there is some convenient transform of vs that allows to multiply the values component-wise with xs and transform back to get the double antiderivative. I played with this a bit, but I got lost.
Edit: I figured out that by using the trapezoidal rule instead of a naive sum, increases the accuracy quite a bit. Using Simpson's rule should increase the accuracy further, but it's somewhat fiddly to do with numpy arrays.
Edit: As #user202729 rightfully complains, this seems off. The reason it seems off is because I have skipped some details. I explain here why what I say makes sense, but it does not affect my question.
My actual goal is not to find the double antiderivative of f, but to find a transformation of this. I have skipped that because I think it only confuses the matter.
The function f decays exponentially as x approaches 0 or infinity. I am minimizing the numerical error in the integration by starting the sum from 0 and going up to approximately the peak of f. This ensure that the relative error is approximately constant. Then I start from the opposite direction from some very big x and go back to the peak. Then I do the same for the antiderivative values.
Then I transform the aavs by another function which is sensitive to numerical errors. Then I find the region where the errors are big (the values oscillate violently) and drop these values. Finally I approximate what I believe are good values by a spline.
Now if I use spline to approximate f, it introduces an absolute error which is the dominant term in a rather large interval. This gets "integrated" twice and it ends up being a rather large relative error in aavs. Then once I transform aavs, I find that the 'good region' has shrunk considerably.
EDIT: The actual form of f is something I'm still looking into. However, it is going to be a generalisation of the lognormal distribution. Right now I am playing with the following family.
I start by defining a generalization of the normal distribution:
def pdf_n(params, center=0.0, slope=8):
scale, min, diff = params
if diff > 0:
r = min
l = min + diff
else:
r = min - diff
l = min
def retfun(m):
x = (m - center)/scale
E = special.expit(slope*x)*(r - l) + l
return np.exp( -np.power(1 + x*x, E)/2 )
return np.vectorize(retfun)
It may not be obvious what is happening here, but the result is quite simple. The function decays as exp(-x^(2l)) on the left and as exp(-x^(2r)) on the right. For min=1 and diff=0, this is the normal distribution. Note that this is not normalized. Then I define
g = pdf(params)
f = np.vectorize(lambda x:g(np.log(x))/x/area)
where area is the normalization constant.
Note that this is not the actual code I use. I stripped it down to the bare minimum.
You can compute the two np.cumsum (and the divisions) at once more efficiently using Numba. This is significantly faster since there is no need for several temporary arrays to be allocated, filled, read again and freed. Here is a naive implementation:
import numba as nb
#nb.njit('float64[::1](float64[::1], float64)') # Assume vs is contiguous
def doubleAntiderivative_naive(vs, delta):
res = np.empty(vs.size, dtype=np.float64)
sum1, sum2 = 0.0, 0.0
for i in range(vs.size):
sum1 += vs[i] * delta
sum2 += sum1 * delta
res[i] = sum2
return res
However, the sum is not very good in term of numerical stability. A Kahan summation is needed to improve the accuracy (or possibly the alternative Kahan–Babuška-Klein algorithm if you are paranoid about the accuracy and performance do not matter so much). Note that Numpy use a pair-wise algorithm which is quite good but far from being prefect in term of accuracy (this is a good compromise for both performance and accuracy).
Moreover, delta can be factorized during in the summation (ie. the result just need to be premultiplied by delta**2).
Here is an implementation using the more accurate Kahan summation:
#nb.njit('float64[::1](float64[::1], float64)')
def doubleAntiderivative_accurate(vs, delta):
res = np.empty(vs.size, dtype=np.float64)
delta2 = delta * delta
sum1, sum2 = 0.0, 0.0
c1, c2 = 0.0, 0.0
for i in range(vs.size):
# Kahan summation of the antiderivative of vs
y1 = vs[i] - c1
t1 = sum1 + y1
c1 = (t1 - sum1) - y1
sum1 = t1
# Kahan summation of the double antiderivative of vs
y2 = sum1 - c2
t2 = sum2 + y2
c2 = (t2 - sum2) - y2
sum2 = t2
res[i] = sum2 * delta2
return res
Here is the performance of the approaches on my machine (with an i5-9600KF processor):
Numpy cumsum: 51.3 us
Naive Numba: 11.6 us
Accutate Numba: 37.2 us
Here is the relative error of the approaches (based on the provided input function):
Numpy cumsum: 1e-13
Naive Numba: 5e-14
Accutate Numba: 2e-16
Perfect precision: 1e-16 (assuming 64-bit numbers are used)
If f can be easily computed using Numba (this is the case here), then vs[i] can be replaced by calls to f (inlined by Numba). This helps to reduce the memory consumption of the computation (N can be huge without saturating your RAM).
As for the interpolation, the splines often gives good numerical result but they are quite expensive to compute and AFAIK they require the whole array to be computed (each item of the array impact all the spline although some items may have a negligible impact alone). Regarding your needs, you could consider using Lagrange polynomials. You should be careful when using Lagrange polynomials on the edges. In your case, you can easily solve the numerical divergence issue on the edges by extending the array size with the border values (since you know the derivative on each edges of vs is 0). You can apply the interpolation on the fly with this method which can be good for both performance (typically if the computation is parallelized) and memory usage.
First, I created a version of the code I found more intuitive. Here I multiply cumulative sum values by bin widths. I believe there is a small error in the original version of the code related to the bin width issue.
import numpy as np
f = lambda x: np.exp(-x)*x
N = 1000
xs = np.linspace(0,100,N+1)
domainwidth = ( np.max(xs) - np.min(xs) )
binwidth = domainwidth / N
vs = f(xs)
avs = np.cumsum(vs)*binwidth
aavs = np.cumsum(avs)*binwidth
Next, for visualization here is some very simple plotting code:
import matplotlib
import matplotlib.pyplot as plt
plt.figure()
plt.scatter( xs, vs )
plt.figure()
plt.scatter( xs, avs )
plt.figure()
plt.scatter( xs, aavs )
plt.show()
The first integral matches the known result of the example expression and can be seen on wolfram
Below is a simple function that extracts an element from the second derivative. Note that int is a bad rounding function. I assume this is what you have implemented already.
def extract_double_antideriv_value(x):
return aavs[int(x/binwidth)]
singleresult = extract_double_antideriv_value(50.24)
print('singleresult', singleresult)
Whatever full computation steps are required, we need to know them before we can start optimizing. Do you have a million different functions to integrate? If you only need to query a single double anti-derivative many times, your original solution should be fairly ideal.
Symbolic Approximation:
Have you considered approximations to the original function f, which can have closed form integration solutions? You have a limited domain on which the function lives. Perhaps approximate f with a Taylor series (which can be constructed with known maximum error) then integrate exactly? (consider Pade, Taylor, Fourier, Cheby, Lagrange(as suggested by another answer), etc...)
Log Tricks:
Another alternative to dealing with spiky errors, would be to take the log of your original function. Is f always positive? Is the integration error caused because the neighborhood around the max is very small? If so, you can study ln(f) or even ln(ln(f)) instead. It would really help to understand what f looks like more.
Approximation Integration Tricks
There exist countless integration tricks in general, which can make approximate closed form solutions to undo-able integrals. A very common one when exponetnial functions are involved (I think yours is expoential?) is to use Laplace's Method. But which trick to pull out of the bag is highly dependent upon the conditions which f satisfies.
I have two signals which I need to correlate or convolve. Each signal is sampled non-uniformly and the values of the signal I have with me are the timestamp and the magnitude of the signal at that timestamp. The values of the signal at all the other times can be assumed to be zero. The timestamps of the signal have a resolution running in microseconds.
An example of how the signal looks is shown below:
As can be seen, the resolution of the signal is in microseconds and signal is mostly sparse.
If I were to the convolve two signals of this type, I would first have to pad the signals with zeros (since I would have to discretise the signal). While the padding can be done with resolution of microseconds, The number of values to be multiplied becomes too big and the operation becomes increasingly slow. Most of the multiplications in this convolution would be multiplication of zeros(which are pretty much useless). I have therefore chosen a round off value of 2 places (0.xxxxxx becomes 0.xx),since I have to perform 40,000 similar convolutions. I have written my resampling function as shown below.
import numpy as np
import math
def resampled_signal_lists_with_zeros(signal_dict, xlimits):
'''
resamples the given signal with precision determined by the round function.
signal_dict is a dictionary with timestamp as key and signal magnitude is the value of the key.
xlimits is an array containing the start and stop time of the signal.
'''
t_stamps_list = list(signal_dict.keys())
t_list = list(np.arange(int(math.floor(xlimits[0])), int(math.ceil(xlimits[1])), 0.005))
t_list = [round(t, 2) for t in t_list]
s_list = list()
time_keys = [round(t, 2) for t in t_stamps_list]
i = 0
for t in t_list:
if i < len(t_stamps_list):
if t==time_keys[i]:
s_list.append(signal_dict[t_stamps_list[i]])
i+=1
else:
s_list.append(0)
else:
s_list.append(0)
return t_list, s_list
The correlation of two signals padded in the above manner is done using scipy as follows:
from scipy.signal import correlate
output = correlate(s_1, s_2, mode='same')
The output calculated in the above manner is pretty slow .Since the signal is pretty sparse and most of the multiplications in the signal are multiplications of zero, I think there should be a better way to do the same operations. Is there a way to get the result of the convolutions of the two sparse signals faster?
Let f(x) and g(x) be "sparse" pulse functions defined with support in [0, inf). Discretize them over some linear mesh, so that f = [f0, f1,...] and g = [g0, g1, ...]. Let l_f be the length of f, and l_g be the length of g.
If the signals are very sparse, like point measures, then this could be done combinatorically.
Let indices f_sup = u1, u2, ... be the sparse support of f, of length k_f.
And g_sup = v1, v2, ... be the sparse support of g, of length k_g.
Then for convolution, the support points simply add, requiring k_f*k_g operations:
out_function_by_index = []
# (used for loops, but this could be done via comprehension for speed)
for ii in f_sup:
for jj in g_sup:
out_function_by_index.append((ii+jj,f[ii]*g[jj]))
To reconstitute the output to a function over the same linear mesh:
out_function = [0.0 for x in range(l_f+l_g-1)]
for tup in out_function:
out_function[tup[0]] = tup[1]
Do a sign flip on f or g for the respective (non-commutative) cross-correlation.
If on the other hand the sparse signals are "piecewise sparse," you could pre-process the signals to identify sub-intervals that constitute sufficiently dense support, and do the combinatoric approach with convolution per sub-interval--ie just ordinary convolution between these dense-support sub-intervals and add the result of all these inter-function sub-interval convolutions.
I'm looking for a numpy function (or a function from any other package) that would efficiently evaluate
with f being a vector-valued function of a vector-valued input x. The product is taken to be a simple component-wise multiplication.
The issue here is that both the length of each x vector and the total number of result vectors (f of x) to be multiplied (N) is very large, in the order of millions. Therefore, it is impossible to generate all the results at once (it wouldn't fit in memory) and then multiply them afterwards using np.multiply.reduce or the like .
A toy example of the type of code I would like to replace is:
import numpy as np
x = np.ones(1000000)
prod = f(x)
for i in range(2, 1000000):
prod *= f(i * np.ones(1000000))
with f a vector-valued function with the dimension of its output equal to the dimension of its input.
To be sure: I'm not looking for equivalent code, but for a single, highly optimized function. Is there such a thing?
For those familiar with Wolfram Mathematica: It would be the equivalent to Product. In Mathematica, I would be able to simply write Product[f[i ConstantArray[1,1000000]],{i,1000000}].
Numpy ufuncs all have a reduce method. np.multiply is a ufunc. So it's a one-liner:
np.multiply.reduce(v)
Where v is the vector of values you compute in what is hopefully an equally efficient manner.
To compute the vector, just apply your function to the input:
v = f(x)
So with your example:
np.multiply.reduce(np.sin(x))
Alternative
A simpler way to phrase the same thing is np.prod:
np.prod(v)
You can also use the prod method directly on your vector:
v.prod()
Problem: I want to numerically integrate a function f(t,N) that may be written as a linear combination of N other known functions g_1(t), ..., g_N(t).
My Solution I: I know the functions g_i and also the coefficients, so my initial idea was to create an row vector of coefficients and a column vector containing the lambda functions g_i and then use np.dot for the inner product to get the function object I want. Unfortunately, you cannot just add two function objects nor multiply a function object by a scalar.
My Solution II: Of course I can do something like (basically defining point wise what I want):
def f(t,N,a,g):
"""
a = numpy array of coefficients
g = numpy array of lambda functions corresponding to functions g_i
"""
res = 0
for i in xrange(N):
res += a[i] * g[i](t)
return res
But the for loop is of course not very great, especially when:
I need to run this function at many many time steps t
I pass this function f into a numerical integration routine like scipy.integrate.quad.
briefly:
In Cython You could speed up indexing using memoryviews.
If these equations are linear You could superimpose them using sympy:
example:
import sympy as sy
x,y = sy.symbols('x y')
g0 = x*0.33 + 6
g1 = x*0.72 + 1.3
g2 = x*11.2 - 6.5
gn = x*3.3 - 7.3
G = [g0,g1,g2,gn]
#this is superimposition
print sum(G).subs(x,15.1)
print sum(gi.subs(x,15.1) for gi in G)
'''
output:
228.305000000000
228.305000000000
'''
If its not what You want, give some example input and output, so that I can try and dont go blind...
With low ram avaiable You could get finall equation to numexpr and evaluate it with some input. Otherwise its best to work on numpy arrays.
The sparse matrix format (dok) assumes that values of keys not in the dictionary are equal to zero. Is there any way to make it use a default value other than zero?
Also, is there a way to calculate the log of a sparse matrix (akin to np.log in regular numpy matrix)
That feature is not built-in, but if you really need this, you should be able to write your own dok_matrix class, or subclass Scipy's one. The Scipy implementation is here. At least in the places where dict.* calls are made, the default value needs to be changed --- and maybe there are some other changes that need to be made.
However, I'd try to reformulate the problem so that this is not needed. If you for instance do linear algebra, you can isolate the constant term, and do instead
from scipy.sparse.linalg import LinearOperator
A = whatever_dok_matrix_minus_constant_term
def my_matvec(x):
return A*x + constant_term * x.sum()
op = LinearOperator(A.shape, matvec=my_matvec)
To most linear algebra routines (e.g. iterative solvers), you can pass in op instead of A.
As to the matrix logarithm: logarithm of a sparse matrix (as in scipy.linalg.logm) is typically dense, so you should just convert the matrix to a dense one first, and then compute the logarithm as usual. As far as I see, using a sparse matrix would give no performance gain. If you need only to compute a product of a vector and the logarithm, log(A) * v vector, some Krylov method might help, though.
If you OTOH want to compute the logarithm elementwise, you can modify the .data attribute directly (available at least in COO, CSR, and CSC)
x = A.tocoo()
x.data = np.log(x.data)
A = x.todok()
This leaves the zero elements alone, but as above, this allows treating the constant part separately.