How to cache the function that is returned by scipy interpolation - python

Trying to speed up a potential flow aerodynamic solver. Instead of calculating velocity at an arbitrary point using a relatively expensive formula I tried to precalculate a velocity field so that I could interpolate the values and (hopefully) speed up the code. Result was a slow-down due (I think) to the scipy.interpolate.RegularGridInterpolator method running on every call. How can I cache the function that is the result of this call? Everything I tried gets me hashing errors.
I have a method that implements the interpolator and a second 'factory' method to reduce the argument list so that it can be used in an ODE solver.
x_panels and y_panels are 1D arrays/tuples, vels is a 2D array/tuple, x and y are floats.
def _vol_vel_factory(x_panels, y_panels, vels):
# Function factory method
def _vol_vel(x, y, t=0):
return _volume_velocity(x, y, x_panels, y_panels, vels)
return _vol_vel
def _volume_velocity(x, y, x_panels, y_panels, vels):
velfunc = sp_int.RegularGridInterpolator(
(x_panels, y_panels), vels
)
return velfunc(np.array([x, y])).reshape(2)
By passing tuples instead of arrays as inputs I was able to get a bit further but converting the method output to a tuple did not make a difference; I still got the hashing error.
In any case, caching the result of the _volume_velocity method is not really what I want to do, I really want to somehow cache the result of _vol_vel_factory, whose result is a function. I am not sure if this is even a valid concept.

scipy.interpolate.RegularGridInterpolator returns a numpy array. This is not cacheable because it doesn't implement hash.
You can store other representations of the numpy array and cache that and then convert it back to a numpy array though. For details on how to do that look at the following.
How to hash a large object (dataset) in Python?

Related

Bound Scipy optimisation of a function returning 1D data

this is more of a question on what is an appropriate approach to my problem.
I have a function that takes some 1D vector as input and returns a 1D array (in actuality its a 2D array that is flattened). I am looking to do least squares optimisation of this function. I already have my bounds and constraints on x all sorted, and had thought about doing something like this
result = optimize.minimize(func,x0,method='SLSQP',bounds=my_bounds,constraints=dict_of_constraints,args=(my_args,))
however, this approach uses _minimize_slsqp which requires that the objective function return a scalar. Is there any such approach that would work similarly to the above, but work on an objective function that can return 1D (or 2D?) data?
Cheers
You need to form a scalar function (a function that returns a single scalar value). Likely something like
||F(x)||
where ||.|| is a norm. This new scalar function can be passed on to optimize.minimize.

Vector to matrix function in NumPy without accessing elements of vector

I would like to create a NumPy function that computes the Jacobian of a function at a certain point - with the Jacobian hard coded into the function.
Say I have a vector containing two arbitrary scalars X = np.array([[x],[y]]), and a function f(X) = np.array([[2xy],[3xy]]).
This function has Jacobian J = np.array([[2y, 2x],[3y, 3x]])
How can I write a function that takes in the array X and returns the Jacobian? Of course, I could do this using array indices (e.g. x = X[0,0]), but am wondering if there is a way to do this directly without accessing the individual elements of X.
I am looking for something that works like this:
def foo(x,y):
return np.array([[2*y, 2*x],[3*y, 3*x]])
X = np.array([[3],[7]])
J = foo(X)
Given that this is possible on 1-dimensional arrays, e.g. the following works:
def foo(x):
return np.array([x,x,x])
X = np.array([1,2,3,4])
J = foo(X)
You want the jacobian, which is the differential of the function. Is that correct? I'm afraid numpy is not the right tool for that.
Numpy works with fixed numbers not with variables. That is given some number you can calculate the value of a function. The differential is a different function, that has a special relationship to the original function but is not the same. You cannot just calculate the differential but must deduce it from the functional form of the original function using differentiating rules. Numpy cannot do that.
As far as I know you have three options:
use a numeric library to calculate the differential at a specific point. However you only will get the jacobian at a specific point (x,y) and no formula for it.
take a look at a pythen CAS library like e.g. sympy. There you can define expressions in terms of variables and compute the differential with respect to that variables.
Use a library that perform automatic differentiation. Maschine learning toolkits like pytorch or tensorflow have excellent support for automatic differentiation and good integration of numpy arrays. They essentially calculate the differential, by knowing the differential for all basic operation like multiplication or addition. For composed functions, the chain rule is applied and the difderential can be calculated for arbitray complex functions.

Store values of integration from array, then use that new array

I'm new to Python so I'm really struggling with this. I want to define a function, have a certain calculation done to it for an array of different values, store those newly calculated values in a new array, and then use those new values in another calculation. My attempt is this:
import numpy as np
from scipy.integrate import quad
radii = np.arange(10) #array of radius values
def rho(r):
return (r**2)
for i in range(len(radii)):
def M[r]: #new array by integrating over values from 0 to radii
scipy.integrate.quad(rho(r), 0, radii[i])
def P(r):
return (5*M[r]) #make new array using values from M[r] calculated above
Alright, this script is a bit of a mess, so let's unpack this. I've never used scipy.integrate.quad but I looked it up, and along with testing it have determined that those are valid arguments for quad. There are more efficient ways to do this, but in the interests of preservation, I'll try to keep the overall structure of your script, just fixing the bugs and errors. So, as I understand it, you want to write this:
import numpy as np
from scipy.integrate import quad
# Here's where we start to make changes. First, we're going to define the function, taking in two parameters, r and the array radii.
# We don't need to specify data types, because Python is a strongly-typed language.
# It is good practice to define your functions before the start of the program.
def M(r, radii):
# The loop goes _inside_ the function, otherwise we're just defining the function M(r) over and over again to a slightly different thing!
for i in range(len(radii)):
# Also note: since we imported quad from scipy.integrate, we only need to reference quad, and in fact referencing scipy.integrate.quad just causes an error!
output[i] = quad(r, 0, radii[i])
# We can also multiply by 5 in this function, so we really only need one. Hell, we don't actually _need_ a function at all,
# unless you're planning to reference it multiple times in other parts of a larger program.
output[i] *= 5
return output
# You have a choice between doing the maths _inside_ the main function or in maybe in a lambda function like this, which is a bit more pythonic than a 1-line normal function. Use like so:
rho = lambda r: r**2
# Beginning of program (this is my example of what calling the function with a list called radii might be)
radii = np.arange(10)
new_array = M(rho, radii)
If this solution is correct, please mark it as accepted.
I hope this helps!

An issue with paralellising function broadcasting over a mesh using dask

I am looking to parallelise a function which takes multiple 1-dimensional ranges (which are of the form np.linspace(x,y,t)) of numerical input values (this is variable, but lets say it takes five), creates a mesh out of these ranges, and then evaluates some (5-dimensional) cost function for this over this mesh. In its current form it looks something like this:
def func_5d(a,b,c,d,e):
return a + b + c + d + e
def range_search(a_range, b_range, c_range, d_range, e_range):
mesh = itertools.product(a_range, b_range, c_range, d_range, e_range)
func_eval = map(lambda x: (func_5d(np.array(x)), x), mesh)
return func_eval
So, here I would be looking to parallelise the function range_search using dask. Ideally, this would be done by creating a dask mesh, which could then be chunked, and then mapped through to our cost function using either multi-threading or multi-core processing. Looking through the dask documentation, it does not appear that dask.array contains any suitable mechanism to achieve this. There is a dask.array.meshgrid function, extended from the numpy library, but this does not support chunking. Additionally, dask.array does not seem to contain a paralellised map function. However, there is one in dask.bag. But the documentation seems to suggest that dask.bag is used only as a module to carry out preliminary processing of raw data (in formats such as CSV, JSON, etc). Dask.bag objects do also have a method called product() which seems to imitate the itertools.product; however this only takes one other dask.bag object as an argument. So meshing 5 arrays required this method called to be stacked (4 times), which aside from being hideously ugly, is also inefficent when the number of inputs is variable.
From here, I don't really know where to go. I have worked through the Jupyter Notebooks that dask have put together, but they do not seem to hold an answer to my question. Any suggestions on the best approach to paralellising functions of the above form would be much appreciated.
I would use Numpy Slicing for this
a[:, None, None] + b[None, :, None] + c[None, None, :]
You will want to make sure that your input vectors are chunked finely enough that the products of them will still fit comfortably in memory.

scipy 'Minimize the sum of squares of a set of equations'

I face a problem in scipy 'leastsq' optimisation routine, if i execute the following program it says
raise errors[info][1], errors[info][0]
TypeError: Improper input parameters.
and sometimes index out of range for an array...
from scipy import *
import numpy
from scipy import optimize
from numpy import asarray
from math import *
def func(apar):
apar = numpy.asarray(apar)
x = apar[0]
y = apar[1]
eqn = abs(x-y)
return eqn
Init = numpy.asarray([20.0, 10.0])
x = optimize.leastsq(func, Init, full_output=0, col_deriv=0, factor=100, diag=None, warning=True)
print 'optimized parameters: ',x
print '******* The End ******'
I don't know what is the problem with my func optimize.leastsq() call, please help me
leastsq works with vectors so the residual function, func, needs to return a vector of length at least two. So if you replace return eqn with return [eqn, 0.], your example will work. Running it gives:
optimized parameters: (array([10., 10.]), 2)
which is one of the many correct answers for the minimum of the absolute difference.
If you want to minimize a scalar function, fmin is the way to go, optimize.fmin(func, Init).
The issue here is that these two functions, although they look the same for a scalars are aimed at different goals. leastsq finds the least squared error, generally from a set of idealized curves, and is just one way of doing a "best fit". On the other hand fmin finds the minimum value of a scalar function.
Obviously yours is a toy example, for which neither of these really makes sense, so which way you go will depend on what your final goal is.
Since you want to minimize a simple scalar function (func() returns a single value, not a list of values), scipy.optimize.leastsq() should be replaced by a call to one of the fmin functions (with the appropriate arguments):
x = optimize.fmin(func, Init)
correctly works!
In fact, leastsq() minimizes the sum of squares of a list of values. It does not appear to work on a (list containing a) single value, as in your example (even though it could, in theory).
Just looking at the least squares docs, it might be that your function func is defined incorrectly. You're assuming that you always receive an array of at least length 2, but the optimize function is insanely vague about the length of the array you will receive. You might try writing to screen whatever apar is, to see what you're actually getting.
If you're using something like ipython or the python shell, you ought to be getting stack traces that show you exactly which line the error is occurring on, so start there. If you can't figure it out from there, posting the stack trace would probably help us.

Categories

Resources