I'm working on a panel method code at the moment. To keep us from being bogged down in the minutia, I won't show the code - this is a question about overall program structure.
Currently, I solve my system by:
Generating the corresponding rows of the A matrix and b vector in an explicit component for each boundary condition
Assembling the partial outputs into the full A, b.
Solving the linear system, Ax=b, using a LinearSystemComp.
Here's a (crude) diagram:
I would prefer to be able to do this by just writing one implicit component to represent each boundary condition, vectorising the inputs/outputs to represent multiple rows/cols in the matrix, then allowing openMDAO to solve for the x while driving the residual for each boundary condition to 0.
I've run into trouble trying to make this work, as each implicit component is underdetermined (more rows in the output vector x than the component output residuals; that is, A1.x - b1= R1, length(R1) < length(x). Essentially, I would like openMDAO to take each of these underdetermined implicit systems, and find the value of x that solves the determined full system - without needing to do all of the assembling stuff myself.
Something like this:
To try and make my goal clearer, I'll explain what I actually want from the perspective of my panel method. I'd like a component, let's say Influence, that computes the potential induced by a given panel at a given point in the panel's reference frame. I'd like to vectorise the input panels and points such that it can compute the influence coefficent of many panels on one point, of many points on one panel, or of many points on many panels.
I'd then like a system of implicit boundary conditions to find the correct value of mu to solve the system. These boundary conditions, again, should be able to be vectorised to compute the violation of the boundary condition at many points under the influence of many panels.
I get confused again at this part. Not every boundary condition will use the influence coefficient values - some, like the Kutta condition, are just enforced on the mu vector, e.g .
How would I implement this as an implicit component? It has no inputs, and doesn't output the full mu vector.
I appreciate that the question is rather long and rambling, but I'm pretty confused. To summarise:
How can I use openMDAO to solve multiple individually underdetermined (but combined, fully determined) implicit systems?
How can I use openMDAO to write an implicit component that takes no inputs and only uses a portion of the overall solution vector?
In the OpenMDAO docs there is a close analog to what you are trying to accomplish, with the node-voltage analysis tutorial. In that code, the balance comp is used to create an implicit relationship that is similar to what you're describing. Its singular on its own, but part of a larger group is a well defined system.
You'll need to find a way to build similar components for your model. Each "row" in your equation will be associated with one state variables (one entry in your x vector).
In the simplest case, each row (or set of rows) would have one input which is the associated row of the A matrix, and a second input which is ALL of the other values for x, and a final input which is the entry of the b vector (right hand side vector). Then you could evaluate the residual for that specific row, which would be the following
R['x_i'] = np.sum(A*x_full) - b
where x_full is the assembly of the full x-vector from the x_other input and the x_i state variable.
#########
Having proposed the above solution, I have to say that I don't think this is a particularly efficient way to build or solve this linear system. It is modular, and might give you some flexibility, but you're jumping through a lot of hoops to avoid doing some index-math, and shoving everything into a matrix.
Granted, the derivatives might be a bit easier in your design, because the matrix assembly is going to get handled "magically" by the connections you have to create between the various row-components. So maybe its worth the trade... but i would say you might be better of trying a more traditional coding approach and using JAX or some other AD code to make the derivatives easier.
Related
I am currently simulating a structural optimization problem in which the gradients of responses are extracted from Nastran and provided to SLSQP optimizer in OpenMDAO. The number of constraints changes in subsequent iterations, because the design variables included both the shape and sizing variables, therefore a new mesh is generated every time. A constraint component is defined in OpenMDAO, and it reads the response data exported from Nastran. Now, the issue here is in defining the shape of its output variable "f_const". The shape of this output variable is required to adjust according to the shape of the available response array, since outputs['f_const'] = np.loadtxt("nsatran_const.dat"). Here, nastran_const.dat is the file containing response data extracted from Nastran. The shape of this data is not known at the beginning of design iteration and keep on changing during the subsequent iterations. So, if some shape of f_const is defined at the start, then it does not change later and gives error because of the mismatch in the shapes.
In the doc of openmdao, I found https://openmdao.org/newdocs/versions/latest/features/experimental/dyn_shapes.html?highlight=varying%20shape
It explains, that the shape of input/out variable can be set dynamic by linking it to any connecting or local variables whose shapes are already known. This is different from my case because, the shape of stress array is not known before the start of computation. The shape of f_const is to be defined in setup, and I cannot figure out how to change it later. Please guide me in this regard.
You can't have arrays that change shape like that. The "dynamic" shape you found in the docs refers to setup-time variation. Once setup has been finished though, sizes are fixed. So we need a way for you arrays to be of a fixed size.
If you really must re-mesh every time (which I don't recommend) then there are two possible solutions I can think of:
Over-allocation
Constraint Aggregation
Option 1 -- Over Allocation
This topic is covered in detail in this related question, but briefly what you could do is allocate an array big enough that you always have enough space. Then you can use one entry of the array to record how many active entries are in it. Any non-active entries would be set to a default value that won't violate your constraints.
You'll have to be very careful with the way you define the derivatives. For active array entries, the derivatives come from NASTRAN. For inactive ones, you could set them to 0 but note that you are creating a discrete discontinuity when an entry switches to active. This may very well give the optimizer fits when its trying to converge and derivatives of active constraints keep flipping between 0 and nonzero values.
I really don't recommend this approach, but if you absolutely must have "variable size" arrays then over-allocation is your best best.
Option 2 -- Constraint Aggregation
They key idea here is to use an aggregation function to collapse all the stress constraints into a single value. For structural problems this is most often done with a KS function. OpenMDAO has a KScomponent in its standard library that you can use.
The key is that this component requires a constant sized input. So again, over-allocation would be used here. In this case, you shouldn
't track the number of active values in the array, because you are passing that to the aggregation function. KS functions are like smooth max functions, so if you have a bunch of 0's then it shouldn't affect it.
Your problem still has a discontinuous operation going on with the re-meshing and the noisy constraint array. The KS function should smooth some of that, but not all of it. I still think you'll have trouble converging, but it should work better than raw over-allocation.
Option 3 --- The "right" answer
Find a way to fix your grid, so it never changes. I know this is hard if you're using VSP to generate your discritizations, and letting NASTRAN re-grid things from there ... but its not impossible at all.
OpenVSP has a set of geometry-query functions that can be used to back-fit fixed meshes into the parametric space of the geometry. If you do that, then you can regenerate the geometry in VSP and use the parametric space to move your fixed grids with it. This is how the pyGeo tool that the University of Michigan MDO Lab does it, and it works very well.
Its a modest amount of work (though a lot less if you use pyGeo directly), but I think its well worth it. You'll get faster components and a much more stable optimization.
I have been using the SLSQP algorithm to run some MDO problems with ExplicitComponents only. Each component has a runtime of around 10 seconds and 60-100 input variables. Most of the input variables are static input variables that will remain constant during the entire optimization. The static input variables originate from an IndepVarComp. The ExplicitComponents are black boxes, so no information is available on the partials.
I noticed that when the Jacobian is calculated in the compute_totals(), the components are linearized with respect to all their input values. In the compute_approximations() a finite difference is calculated over all the input values, including the static input values. So, my question is: why is a finite difference calculation performed over these static input variables? As the values remain constant, I’m not sure why this information would be useful?
Furthermore, if I understand it correctly, the components are linearized to get the sub-Jacobians, which are then used to calculate the total Jacobian. However, is it possible to directly calculate a finite-difference over the entire group instead of linearizing each component? With the runtimes of my components and amount of input variables, it will take a long time to perform the linearization of each component. However, the optimization problem has only 3 design variables. So, if I could perform three finite difference calculations over the entire MDA to calculate the total Jacobian, the total runtime will decrease significantly.
To answer your questions in reverse order:
1) Can you FD over the entire model instead of each individual component? Yes!
You can set up FD over any group in your model, including the top-level group. Then the FD is taken across that group rather than across each component in it.
We call that computing a semi-total derivative, because in general you can select a sub-group in your model, in which case the FD is approximating a total-derivative across that group but that total-derivative is still effectively a partial-derivative for the overall model. hence semi-total derivative.
2) Why is a finite difference calculation performed over these static input variables?
In theory, you're correct that you don't really need partial derivatives of the inputs that can't change. As of OpenMDAO 2.4, we don't handle that situation automatically though, and we don't have plans to add that in the near future. However, the framework is only taking FD across the partials you tell it to. It sounds like you are declaring your partials like this:
self.declare_partials(of=['*'], wrt=['*'], method='fd')
So you're specifically asking the framework to compute all those partials. Instead, you could specify in the wrt argument only the inputs you know are actually changing. Of course, this is mathematically incorrect because there is a derivative wrt to the static-inputs. If someone later on connects something to those inputs and tries and optimization, they would get a wrong answer. But as long as your careful, you can specifically ask for only the partials you wanted from any component and simple leave the non-changing inputs as effectively 0.
I am modeling electrical current through various structures with the help of FiPy. To do so, I solve Laplace's equation for the electrical potential. Then, I use Ohm's law to derive the field and with the help of the conductivity, I obtain the current density.
FiPy stores the potential as a cell-centered variable and its gradient as a face-centered variable which makes sense to me. I have two questions concerning face-centered variables:
If I have a two- or three-dimensional problem, FiPy computes the gradient in all directions (ddx, ddy, ddz). The gradient is a FaceVariable which is always defined on the face between two cell centers. For a structured (quadrilateral) grid, only one of the derivates should be greater than zero since for any face, the position of the two cell-centers involved should only differ in one coordinate. In my simulations however, it occurs frequently that more than one of the derivates (ddx, ddy, ddz) is greater than zero, even for a structured grid.
The manual gives the following explanation for the FaceGrad-Method:
Return gradient(phi) as a rank-1 FaceVariable using differencing for the normal direction(second-order gradient).
I do not see, how this differs from my understanding pointed out above.
What makes it even more problematic: Whenever "too many" derivates are included, current does not seem to be conserved, even in the simplest structures I model...
Is there a clever way to access the data stored in the face-centered variable? Let's assume I would want to compute the electrical current going through my modeled structure.
As of right now, I save the data stored in the FaceVariable as a tsv-file. This yields a table with (x,y,z)-positions and (ddx, ddy, ddz)-values. I read the file and save the data into arrays to use it in Python. This seems counter-intuitive and really inconvenient. It would be a lot better to be able to access the FaceVariable along certain planes or at certain points.
The documentation does not make it clear, but .faceGrad includes tangential components which account for more than just the neighboring cell center values.
Please see this Jupyter notebook for explicit expressions for the different types of gradients that FiPy can calculate (yes, this stuff should go into the documentation: #560).
The value is accessible with myFaceVar.value and the coordinates with myFaceVar.mesh.faceCenters. FiPy is designed around unstructured meshes and so taking arbitrary slices is not trivial. CellVariable objects support interpolation by calling myCellVar((xs, ys, zs)), but FaceVariable objects do not. See this discussion.
After using scipy.integrate for a while I am at the point where I need more functions like bifurcation analysis or parameter estimation. This is why im interested in using the PyDSTool, but from the documentation I can't figure out how to work with ModelSpec and if this is actually what will lead me to the solution.
Here is a toy example of what I am trying to do: I have a network with two nodes, both having the same (SIR) dynamic, described by two ODEs, but different initial conditions. The equations are coupled between nodes via the Epsilon (see formula below).
formulas as a picture for better read, the 'n' and 'm' are indices, not exponents ~>
http://image.noelshack.com/fichiers/2014/28/1404918182-odes.png
(could not use the upload on stack, sadly)
In the two node case my code (using PyDSTool) looks like this:
#multiple SIR metapopulations
#parameter and initial condition definition; a dict is a must
import PyDSTool as pdt
params={'alpha': 0.7, 'beta':0.1, 'epsilon1':0.5,'epsilon2':0.5}
ini={'s1':0.99,'s2':1,'i1':0.01,'i2':0.00}
DSargs=pdt.args(name='SIRtest_multi',
ics=ini,
pars=params,
tdata=[0,20],
#the for-macro generates formulas for s1,s2 and i1,i2;
#sum works similar but sums over the expressions in it
varspecs={'s[o]':'for(o,1,2,-alpha*s[o]*sum(k,1,2,epsilon[k]*i[k]))',
'i[l]':'for(l,1,2,alpha*s[l]*sum(m,1,2,epsilon[m]*i[m]))'})
#generator
DS = pdt.Generator.Vode_ODEsystem(DSargs)
#computation, a trajectory object is generated
trj=DS.compute('test')
#extraction of the points for plotting
pts=trj.sample()
#plotting; pylab is imported along with PyDSTool as plt
pdt.plt.plot(pts['t'],pts['s1'],label='s1')
pdt.plt.plot(pts['t'],pts['i1'],label='i1')
pdt.plt.plot(pts['t'],pts['s2'],label='s2')
pdt.plt.plot(pts['t'],pts['i2'],label='i2')
pdt.plt.legend()
pdt.plt.xlabel('t')
pdt.plt.show()
But in my original problem, there are more than 1000 nodes and 5 ODEs for each, every node is coupled to a different number of other nodes and the epsilon values are not equal for all the nodes. So tinkering with this syntax did not led me anywhere near the solution yet.
What I am actually thinking of is a way to construct separate sub-models/solver(?) for every node, having its own parameters (epsilons, since they are different for every node). Then link them to each other. And this is the point where I do not know wether it is possible in PyDSTool and if it is the way to handle this kind of problems.
I looked through the examples and the Docs of PyDSTool but could not figure out how to do it, so help is very appreciated! If the way I'm trying to do things is unorthodox or plain stupid, you are welcome to make suggestions how to do it more efficiently. (Which is actually more efficient/fast/better way to solve problems like this: subdivide it into many small (still not decoupled) models/solvers or one containing all the ODEs at once?)
(Im neither a mathematician nor a programmer, but willing to learn, so please be patient!)
The solution is definitely not to build separate simulation models. That won't work because so many variables will be continuously coupled between the sub-models. You absolutely must have all the ODEs in one place together.
It sounds like the solution you need is to use the ModelSpec object constructs. These let you hierarchically build the sub-model definitions out of symbolic pieces. They can have their own "epsilon" parameters, etc. You declare all the pieces when you're finished and let PyDSTool make the final strings containing the ODE definitions for you. I suggest you look at the tutorial example at:
http://www.ni.gsu.edu/~rclewley/PyDSTool/Tutorial/Tutorial_compneuro.html
and the provided examples: ModelSpec_test.py, MultiCompartments.py. But, remember that you still have to have a source for the parameters and coupling data (i.e., a big matrix or dictionary loaded from a file) to be able to automate the process of building the model, otherwise you'd still be writing it all out by hand.
You have to build some classes for the components that you want to have. You might also create a factory function (compare 'makeSoma' in the neuralcomp.py toolbox) that will take all your sub-components and create an ODE based on summing something up from each of the declared components. At the end, you can refer to the parameters by their position in the hierarchy. One might be 's1.epsilon' while another might be 'i4.epsilon'.
Unfortunately, to build models like this efficiently you will have to learn to do some more complex programming! So start by understanding all the steps in the tutorial. You can email me directly through the SourceForge support discussions or email once you've got started and have specific questions.
I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.