I use Z3Py to build large formulas (~1500 Bool variables, ~90k assertions) and I am currently using Solver.add to add the assertions, which are mostly small (eg. implications on 2 variables).
My code looks something like this, with about 10 outer for loops in sequence. The loop nesting depth varies from 2 to 6.
s = Solver()
for i in range(A):
for j in range(B):
...
s.add(Implies(vars[i,j,...], vars[k,l,...]))
The problem is that building the solver takes ~11 seconds (with __debug__ == False), while finding a solution only takes about 8.
Profiling shows that a lot of time is spent in Z3_sort_to_ast, z3code.Elementaries.Check (called by the former), and other methods which seem like they could be inlined at least, if not somehow eliminated.
How does one optimize the creation of the Z3 Solver? Maybe there is a more low-level, internal interface which could speed things up?
I see 3 options:
Use the SMT-LIB interface
Skip the high-level API of Z3
Rewrite the code using the C API directly
If the interaction with Z3 is minimal (solve and get a model), SMT-LIB might be the best option.
If the python code is quite complex to rewrite in C, please give pySMT a try. The way we integrate with Z3 skips the high-level API, and calls directly the underlying C-functions exposed at the python level. There would be the overhead from pySMT itself, but typically it pays out. You can have a look at [1] for some ideas on how we do it.
[1] https://github.com/pysmt/pysmt/blob/master/pysmt/solvers/z3.py#L853
Related
It is often said that compiled languages such as C perform better than interpreted languages such as Python. Therefore, I might be interested in migrating Python implementations to C/C++ (assuming they also have access to some Z3 API that is in use and maintenance).
However, this migration only makes sense in one case: if my performance loss is due to the language and not due to Z3. Therefore, I would like to know if there is any way to know what percentage of execution is being executed by Z3 and what percentage by pure Python.
A very naive possibility would be to use a timer just before and after each call to Z3 in my implementation and add up those times to finally see how much of the total those times represent. A sketch of this idea (pseudo-code):
timer_total = 0
time_z3 = 0
while executing:
time_local = take_time()
call_z3()
time_z3 += take_time() - time_local
print(time_z3/time_total)
This, even though it is an ugly solution, would answer my first question: how long does Z3 take over the total execution.
However, I would want to get even more information, if possible: I want to know not only how long Z3 takes to do its computations, but also whether using Python causes Z3 to have to do large data transformations before the information arrives "pure" (i.e., as if I wrote it in Z3) to Z3 and that, therefore, Z3's time has been considerably more than it would be if it didn't have to do them. In other words: I would like to know how long Z3 is only with the part of doing the logical calculations (not transformations and other processes), but only looking for models.
Specifically: I want to know if other languages like C++ do these transformations cheaper and therefore is the Z3 API of some other language more recommended/effective/optimized than Python.
I know it's abstract, but I hope the question was understood, and if not, we can discuss it in the comments.
Scenario
In the world of mathematical optimization, the need to model inequality g_k(...) constraints arises. g_k(...) can sometimes be a function call to an external program that is, for all intents and purposes, a blackbox. Simply finding satisfactory answers can be beneficial for certain engineering analysis.
Example
An example application of the above scenario for Reals, but could also be Ints or Booleans:
min f(x,y)
g1(x,y) <= 25
g2(x,y) >= 7.7
x,y ε Real
x >= 0
x <= 50.0
y >= 0
y <= 5.0
g1 and g2 are Python functions that call an external program. The functions return a real number. Following this Z3 format to find a model that simply satisfies the constraints would be represented as:
from z3 import *
from ExternalCodes import Code1, Code2 #For example, these could be Python wrappers to C++ codes
def g_1(x, y):
return Code1(x, y) #isinstance(Code1(x,y), float) == True
def g_2(x, y):
return Code2(x, y) #isinstance(Code2(x,y), float) == True
s = Solver()
x, y = Reals('x y')
s.add(g_1(x, y) <= 25.0)
s.add(g_2(x, y) >= 7.7)
s.add(x <= 0)
s.add(50.0 <= x)
s.add(y <= 0)
s.add(5.0 <= y)
m = s.model()
print(m)
Questions
I understand that the type returned by Code1 and Code2 need to be Z3
datatypes. How do I convert Python types to Z3 types like mentioned in the 1st
comment
How is Z3 used to find a sat model when constraints may need to be
evaluated in external code, i.e. declare functions? I understand it
may be inefficient, I would loose heuristics, etc., because it is undecidable,
but for certain engineering applications, enumerating a sat solution, if it
exists, is more advantageous than employing an optimizer from the get-go.
Relevant answers
z3python: using math library
-Not quite the same application. I'm curious if, 4 years later, is this answer still the case, but this is not my question.
Can Z3 call an externally defined function?
-In essence, the same question. No Z3Py answer, and unfortunately, the Rise4fun link is broken. I also cannot find the mentioned F#/.NET example in the Github repo
You're looking for uninterpreted functions.
Search for the term "uninterpreted functions" in http://ericpony.github.io/z3py-tutorial/advanced-examples.htm
Your question seems to make some assumptions about how SMT solvers can be used; which don't quite reflect the current state-of-the-art. A great resource to read about the use of SMT solvers is: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/nbjorner-smt-application-chapter.pdf It would be well worth your time to go over it to see how it applies in practice.
Some general comments:
Z3py
Z3py is not a system for reasoning about Python programs. Z3py is a collection of libraries so you can script Z3 in a much more clearer and easier way. There are many such bindings from many languages: C/C++/Java/Haskell/Scala, you name it. The advantage of Z3py is that it is easier to learn and use. But you shouldn't think of it as a system to reason about Python itself. It's merely a way of scripting Z3 in a lightweight form.
Solver logics
SMT solvers essentially work on decidable fragments of (mostly quantifier-free) many-sorted logics of various theories. You can find these in detail at: http://smtlib.cs.uiowa.edu/logics.shtml
SMTLib
Most solvers accept input in the so-called SMT-Lib format, detailed here: http://smtlib.cs.uiowa.edu/papers/smt-lib-reference-v2.6-r2017-07-18.pdf
Note that any "binding" at the C/Python/Java etc. level is merely a programmers convenience. While many solvers also provide extended features, the SMTLib language is what you should think of. In your particular case, uninterpreted functions are well described in the above document.
Types
SMTLib understands a set of types: Integers, Reals, Bit-vectors (machine integers), Floating-point, etc. It also allows composite types such as algebraic data types, which can even be recursive. (Though solver support varies.) You have to "map" your external function types to these types: Hopefully, there's something close enough. If not, feel free to ask specific questions about types you are interested in.
"Importing" functions
It is impossible to import functions written in other languages (Python/C/C++ etc.) into SMTLib and reason about them. There's no mechanism for doing so, neither there ever will be. This isn't the goal of an SMT solver. If you want to reason about programs written in a particular language, then you should look for tools that are specifically designed to work on those languages. (For instance Dafny for general imperative programs, Klee/CBMC for C programs, LiquidHaskell for Haskell programs, etc.) These tools vary in their capabilities, and what they allow you to specify and prove. Note that these tools themselves can use SMT-solvers underneath to accomplish their tasks---and they often do, not the other way around.
Sticking to SMTLib
If there are no other tools available (and unfortunately this is likely the case for most languages out there, especially legacy ones), you're essentially stuck with whatever the SMTLib language provides. In your case, the best method for modeling such "external" functions using SMTLib is to use uninterpreted functions, together with axioms. In general, you need axioms to restrict the behavior of the uninterpreted functions themselves, to model your external functions. On the flip side, if the axioms are quantified (which in general they will be), the solver might return unknown.
Summary
While SMT solvers are great tools, you should always keep in mind that the language they operate on is SMTLib, not Python/C or anything else. Those bindings are simply means of accessing the solver as you might incorporate it in a bigger project. I hope that clears the expected use case.
Asking specific questions about what you've tried and what you're trying to achieve (code samples) is the best way to get mileage out of this forum!
I have been using sympy to work with systems of differential equations. I write the equations symbolically, use autowrap to compile them through cython, and then pass the resulting function to the scipy ODE solver. One of the major benefits of doing this is that I can solve for the jacobian symbolically using the sympy jacobian function, compile it, and it to the ODE solver as well.
This has been working great for systems of about 30 variables. Recently I tried doing it with 150 variables, and what happened was that I ran out of memory when compiling the jacobian function. This is on Windows with anaconda and the microsoft Visual C++ 14 tools for python. Basically during compilation of the jacobian, which is now a 22000-element vector, memory usage during the linking step went up to about 7GB (on my 8GB laptop) before finally crashing out.
Does someone have some suggestions before I go and try on a machine with more memory? Are other operating systems or other C compilers likely to improve the situation?
I know lots of people do this type of work, so if there's an answer, it will be beneficial to a good chunk of the community.
Edit: response to some of Jonathan's comments:
Yes, I'm fully aware that this is an N^2 problem. The jacobian is a matrix of all partial derivatives, so it will have size N^2. There is no real way around this scaling. However, a 22000-element array is not nearly at the level that would create a memory problem during runtime -- I only have the problem during compilation.
Basically there are three levels that we can address this at.
1) solve the ODE problem without the jacobian, or somehow split up the jacobian to not have a 150x150 matrix. That would address the very root, but it certainly limits what I can do, and I'm not yet convinced that it's impossible to compile the jacobian function
2) change something about the way sympy automatically generates C code, to split it up into multiple chunks, use more functions for intermediate expressions, to somehow make the .c file smaller. People with more sympy experience might have some ideas on this.
3) change something about the way the C is compiled, so that less memory is needed.
I thought that by posting a separate question more oriented around #3 (literal referencing of large array -- compiler out of memory) , I would get a different audience answering. That is in fact exactly what happened. Perhaps the answer to #3 is "you can't" but that's also useful information.
Following a lot of the examples posted at http://www.sympy.org/scipy-2017-codegen-tutorial/ I was able to get this to compile.
The key things were
1) instead of using autowrap, write the C code directly with more control over it. Among other things, this allows passing the argument list as a vector instead of expanding it. This took some effort to get working (setting up the compiler flags through distutils, etc, etc) but in the end it worked well. Having the repo from the course linked above as an example helped a lot.
2) using common subexpression elimination (sympy.cse) to dramatically reduce the size of the expressions for the jacobian elements.
(1) by itself didn't do that much to help in this case (although I was able to use it to vastly improve performance of smaller models). The code was still 200 MB instead of the original 300 MB. But combining it with (2) (cse) I was able to get it down to a meager 1.7 MB (despite 14000 temporary variables).
The cse takes about 20-30 minutes on my laptop. After that, it compiles quickly.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I've been working on one of the coding challenges on InterviewStreet.com and I've run into a bit of an efficiency problem. Can anyone suggest where I might change the code to make it faster and more efficient?
Here's the code
Here's the problem statement if you're interested
If your question is about optimising python code generally (which I think it should be ;) then there are all sorts of intesting things you can do, but first:
You probably shouldn't be obsessively optimising python code! If you're using the fastest algorithm for the problem you're trying to solve and python doesn't do it fast enough you should probably be using a different language.
That said, there are several approaches you can take (because sometimes, you really do want to make python code faster):
Profile (do this first!)
There are lots of ways of profiling python code, but there are two that I'll mention: cProfile (or profile) module, and PyCallGraph.
cProfile
This is what you should actually use, though interpreting the results can be a bit daunting.
It works by recording when each function is entered or exited, and what the calling function was (and tracking exceptions).
You can run a function in cProfile like this:
import cProfile
cProfile.run('myFunction()', 'myFunction.profile')
Then to view the results:
import pstats
stats = pstats.Stats('myFunction.profile')
stats.strip_dirs().sort_stats('time').print_stats()
This will show you in which functions most of the time is spent.
PyCallGraph
PyCallGraph provides a prettiest and maybe the easiest way of profiling python programs -- and it's a good introduction to understanding where the time in your program is spent, however it adds significant execution overhead
To run pycallgraph:
pycallgraph graphviz ./myprogram.py
Simple! You get a png graph image as output (perhaps after a while...)
Use Libraries
If you're trying to do something in python that a module already exists for (maybe even in the standard library), then use that module instead!
Most of the standard library modules are written in C, and they will execute hundreds of times faster than equivilent python implementations of, say, bisection search.
Make the Interpreter do as Much of Your Work as You Can
The interpreter will do some things for you, like looping. Really? Yes! You can use the map, reduce, and filter keywords to significantly speed up tight loops:
consider:
for x in xrange(0, 100):
doSomethingWithX(x)
vs:
map(doSomethingWithX, xrange(0,100))
Well obviously this could be faster because the interpreter only has to deal with a single statement, rather than two, but that's a bit vague... in fact, this is faster for two reasons:
all flow control (have we finished looping yet...) is done in the interpreter
the doSomethingWithX function name is only resolved once
In the for loop, each time around the loop python has to check exactly where the doSomethingWithX function is! even with cacheing this is a bit of an overhead.
Remember that Python is an Interpreted Language
(Note that this section really is about tiny tiny optimisations that you shouldn't let affect your normal, readable coding style!)
If you come from a background of a programming in a compiled language, like c or Fortran, then some things about the performance of different python statements might be surprising:
try:ing is cheap, ifing is expensive
If you have code like this:
if somethingcrazy_happened:
uhOhBetterDoSomething()
else:
doWhatWeNormallyDo()
And doWhatWeNormallyDo() would throw an exception if something crazy had happened, then it would be faster to arrange your code like this:
try:
doWhatWeNormallyDo()
except SomethingCrazy:
uhOhBetterDoSomething()
Why? well the interpreter can dive straight in and start doing what you normally do; in the first case the interpreter has to do a symbol look up each time the if statement is executed, because the name could refer to something different since the last time the statement was executed! (And a name lookup, especially if somethingcrazy_happened is global can be nontrivial).
You mean Who??
Because of cost of name lookups it can also be better to cache global values within functions, and bake-in simple boolean tests into functions like this:
Unoptimised function:
def foo():
if condition_that_rarely_changes:
doSomething()
else:
doSomethingElse()
Optimised approach, instead of using a variable, exploit the fact that the interpreter is doing a name lookup on the function anyway!
When the condition becomes true:
foo = doSomething # now foo() calls doSomething()
When the condition becomes false:
foo = doSomethingElse # now foo() calls doSomethingElse()
PyPy
PyPy is a python implementation written in python. Surely that means it will run code infinitely slower? Well, no. PyPy actually uses a Just-In-Time compiler (JIT) to run python programs.
If you don't use any external libraries (or the ones you do use are compatible with PyPy), then this is an extremely easy way to (almost certainly) speed up repetitive tasks in your program.
Basically the JIT can generate code that will do what the python interpreter would, but much faster, since it is generated for a single case, rather than having to deal with every possible legal python expression.
Where to look Next
Of course, the first place you should have looked was to improve your algorithms and data structures, and to consider things like caching, or even whether you need to be doing so much in the first place, but anyway:
This page of the python.org wiki provides lots of information about how to speed up python code, though some of it is a bit out of date.
Here's the BDFL himself on the subject of optimising loops.
There are quite a few things, even from my own limited experience that I've missed out, but this answer was long enough already!
This is all based on my own recent experiences with some python code that just wasn't fast enough, and I'd like to stress again that I don't really think any of what I've suggested is actually a good idea, sometimes though, you have to....
First off, profile your code so you know where the problems lie. There are many examples of how to do this, here's one: https://codereview.stackexchange.com/questions/3393/im-trying-to-understand-how-to-make-my-application-more-efficient
You do a lot of indexed access as in:
for pair in range(i-1, j):
if coordinates[pair][0] >= 0 and coordinates[pair][1] >= 0:
Which could be written more plainly as:
for coord in coordinates[i-1:j]:
if coord[0] >= 0 and cood[1] >= 0:
List comprehensions are cool and "pythonic", but this code would probably run faster if you didn't create 4 lists:
N = int(raw_input())
coordinates = []
coordinates = [raw_input() for i in xrange(N)]
coordinates = [pair.split(" ") for pair in coordinates]
coordinates = [[int(pair[0]), int(pair[1])] for pair in coordinates]
I would instead roll all those together into one simple loop or if you're really dead set on list comprehensions, encapsulate the multiple transformations into a function which operates on the raw_input().
This answer shows how I locate code to optimize.
Suppose there is some line of code you could replace, and it is costing, say, 40% of the time.
Then it resides on the call stack 40% of the time.
If you take 10 samples of the call stack, it will appear on 4 of them, give or take.
It really doesn't matter how many samples show it.
If it appears on two or more, and if you can replace it, you will save whatever time it costs.
Most of the interview street problems seem to be tested in a way that will verify that you have found an algorithm with the right big O complexity rather than that you have coded the solution in the most optimal way possible.
In other words if you are failing some of the test cases due to running out of time the problem is likely that you need to figure out a solution with lower algorithmic complexity rather than micro-optimize the algorithm you have. This is why they generally state that N can be quite large.
In a game that I am writing, I use a 2D vector class which I have written to handle the speeds of the objects. This is called a large number of times every frame as there are a lot of objects on the screen, so any increase I can make in its speed will be useful.
It is pretty simple, consisting mostly of wrappers to the related math functions. It would be quite trivial to rewrite in C, but I am not sure whether doing so will make any significant difference as all it really does is call the underlying math functions, add, multiply or divide.
So, my question is under what circumstances does it make sense to rewrite in C? Where will you see a significant speed boost, and where can you see a reasonable speed boost without rewriting an extensive amount of the program?
If you're vector-munging, give numpy a try first. Chances are you will get speeds not far from C if you utilize numpy's vector manipulation functions wisely.
Other than that, your question is very heuristic. If your code is too slow:
Profile it - chances are you'll be able to improve it in Python
Use the correct optimized C-based libraries (numpy in your case)
Try psyco
Try rewriting parts with cython
If all else fails, rewrite in C
First measure then optimize
You should never optimize anything, be it in C or any other language, without timing your code before and after your optimization:
your clever optimization could in fact induce a slow down
optimizing something that takes 1% of the total execution time will never give you more than 1% performance
The common approach is:
profile your code
identify a hotspot
time this hotspot
optimize it
time the hotspot again, see if it's faster. If it's not goto 3.
If you can't find hotspots it could mean that your app is already optimized, or that you are not using the good algorithm for your problem. In both cases profiling helps understanding what your code does.
For profiling python code under Linux, you can use pyprof2calltree which works in conjunction with kcachegrind, and is totally awesome.
Common wisdom is "profile", "measure", etc. Well - maybe. Just get in the debugger and take 10 stackshots. If more than one of them terminates in your wrapper code, then it is costing more than 10% roughly, so you should consider re-doing it in C, to save that time. Chances are you will find other things also that are costing more than that.
A nice Profiler I use on Linux is pycallgraph - however, as your program gets bigger it starts to create much larger images which are harder to trace. I'm pretty sure you can exclude modules, though.