Gurobi and CPLEX are solvers that have been very popular in recent years. CPLEX is easier for academics in terms of the license. It is also said to be very high in performance. But Gurobi is claimed to be the fastest solver in recent years, with continuous improvements. However, it is said that its performance decreases when the number of constraints increases.
In terms of speed and performance, which solver is generally recommended specifically for large-scale problems with the quadratic objective function, which have not too many constraints?
Will their use within Python affect their performance?
Math programming is inherently hard and there will likely always be instances where one solver is faster than another. Often, problems are solved quickly just because some heuristic was "lucky".
Also, the size of a problem alone is not a reliable measure for its difficulty. There are tiny instances that are still unsolved while we can solve instances with millions of constraints in a very short amount of time.
When you're looking for the best performance, you should analyze the solver's behavior by inspecting the log file and then try to adjust parameters accordingly. If you have the opportunity to test out different solvers you should just go for it to have even more options available. You should be careful about recommendations for either of the established, state-of-the-art solvers - especially without hands-on computational experiments.
You also need to consider the difficulty of the modeling environment/language and how much time you might need to finish the modeling part.
To answer your question concerning Gurobi's Python interface: this is a very performant and popular tool for all kinds of applications and is most likely not going to impact the overall solving time. In the majority of cases, the actual solving time is still the dominant factor while the model construction time is negligible.
As mattmilten already said, if you compare the performance of the major commercial solvers on a range of problems you will find instances where one is clearly better than the others. However that will depend on many details that might seem irrelevant. We did a side-by-side comparison on our own collection of problem instances (saved as MPS files) that were all generated from the same C++ code on different sub-problems of a large optimisation problem. So they were essentially just different sets of data in the same model and we still found big variations across the solvers. It really does depend on the details of your specific problem.
Related
For research purposes, I often find myself using the very good dense linear algebra packages available in the Python ecosystem. Mostly numpy, scipy and pytorch, which are (if I understand correctly) heavily based on BLAS/Lapack.
Of course, these go a long way, being quite extensive and having in general quite good performance (in terms of both robustness and speed of execution).
However, I often have quite specific needs that are not currently covered by these libraries. For instance, I recently found myself starting to code structure-preserving linear algebra for symplectic and (skew) Hamilton matrices in Cython (which I find to be a good compromise between speed of execution and ease of integration with the bulk of the python code). This process most often consists in rewriting algorithms of the 70-80s based on dated and sometimes painful to decypher research papers, which I would not mind not having to go through.
I'm not the only person doing this on the stack exchange family either! Some links below of people who have questions doing exactly this:
Function to Convert Square Matrix to Upper Hessenberg with Similarity Transformations
It seems to me (from reading the literature from this community, not from first hand acquaintance with the members / experience in the field) that a large part of these algorithms are tested using Matlab, which is prohibitively expensive for me to get my hands on, and which would probably not work great with the rest of my codebase.
Hence my question: where can I find open source exemples of implementations of "research level" dense algebra algorithms that might easily be used in python or copied?
I have an implementation in Python that makes use of theorem proving. I would like to know if there is a possibility to speed up the SMT solving part, which is currently using Z3.
I am trying to discover different solvers, and have found cvc4/cvc5 and Yices as multiple-theory (arithmetics, equality, bitvectors...) solvers. I also found dReal and MetiTarski (this one seems to be out to date) for the concrete case of real arithmetics.
My intention is to test my implementation with those tools' APIs to see whether I can use one or other solver depending on the sort I want to solve.
However, I would like to know in advance if there is some kind of comparison between these solvers in order to have a more useful foundation for my findings. I am interested in both standard benchmarks or user-tests out there in GitHub or Stack.
I only found this paper of cvc5 (https://www-cs.stanford.edu/~preiner/publications/2022/BarbosaBBKLMMMN-TACAS22.pdf), which, obviously suggests it as the best option. I also found this minimal comparison (https://lemire.me/blog/2020/11/08/benchmarking-theorem-provers-for-programming-tasks-yices-vs-z3/) that tells us Yices is 15 times faster than Z3 for a concrete example.
Any advise?
Yices: https://yices.csl.sri.com/
cvc5: https://cvc5.github.io/
dReal: http://dreal.github.io/
MetiTarski:
https://www.cl.cam.ac.uk/~lp15/papers/Arith/index.html
You can always look at the results of SMT competition: https://smt-comp.github.io
Having said that, I think it’s a fool’s errand to look for the “best.” There isn’t a good yard stick to compare all solvers in a meaningful way: it all depends on your particular application.
If your system allows for using multiple backend solvers, why not take advantage of the many cores on modern machines: spawn all of them and take the result of first to complete. Any a priori selection of the solver will suffer from a case where another will perform better. At this point, running all available and taking the fastest result is the best option to utilize your hardware.
I have an optimization model written on pyomo. When I run it using gurobi, it outputs the answer to the problem very quickly. Mostly because of its efficient presolver. Is there a way to do a presolve on pyomo before calling the actual solver so I can test my model using non-commercial packages, like couenne or cbc?
As #gmavrom mentions, it's important to know what you are trying to accomplish with a presolve, as many different techniques may be considered "presolve" operations. The commercial solvers put a lot of engineering effort into the tuning of their respective presolve operations.
As #Erwin points out, commercial AMLs like AMPL also sometimes provide presolve capabilities.
Within Pyomo, you can implement various "presolve" techniques by operating directly on the optimization modeling objects. See the feasibility-based bounds tightening implemented in pyomo.contrib.fbbt as an example: https://github.com/Pyomo/pyomo/blob/master/pyomo/contrib/fbbt/fbbt.py
Since currently there is no easy way to profile TensorFlow operations (Can I measure the execution time of individual operations with TensorFlow?), can anyone help me understand the benefits of using segment operations (e.g. segment_sum) compared to using multiple operations on pre-segmented tensors? Would segment_sum be more efficient than using dynamic_partition or gather followed by multiple reduce_sum? Would segment_sum be equally parallelizable?
I've updated the SO question you link to with some information about CPU inference profiling tools we've recently released at:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/benchmark
Unfortunately the overall question is a lot harder to answer, since it depends on:
Whether you're focused on training, or inference.
If you're using a GPU, and if so what kind and how many.
Whether you're running distributed.
What your data looks like, and where the bottlenecks are.
What I usually end up doing is building small sub-graphs that are representative of the sort of ops I'm considering, and then timing how long they take on the sort of data I'll be feeding in. I know that isn't immediately helpful, since the experimentation can be time-consuming, but it is the best way to get an intuitive understanding of the optimal solution for your particular circumstances.
I am working on a theoretical graph theory problem which involves taking combinations of hyperedges in a hypergrapha to analyse the various cases.
I have implemented an initial version of the main algorithm in Python, but due to its combinatorial structure (and probably my implementation) the algorithm is quite slow.
One way I am considering speeding it up is by using either PyPy or Cython.
Looking at the documentation it seems Cython doesn't offer great speedup when it comes to tuples. This might be problematic for the implementation, since I am representing hyperedges as tuples - so the majority of the algorithm is in manipulating tuples (however they are all the same length, around len 6 each).
Since both my C and Python skills are quite minimal I would appreciate it if someone can advise what would be the best way to proceed in optimising the code given its reliance on tuples/lists. Is there a documentation of using lists/tuples with Cython (or PyPy)?
If your algorithm is bad in terms of computational complexity, then you cannot be saved, you need to write it better. Consult a good graph theory book or wikipedia, it's usually relatively easy, although there are some that have both non-trivial and crazy hard to implement algorithms. This sounds like a thing that PyPy can speed up quite significantly, but only by a constant factor, however it does not involve any modifications to your code. Cython does not speed up your code all that much without type declarations and it seems like this sort of problem cannot be really sped up just by types.
The constant part is what's crucial here - if the algorithm complexity grown like, say, 2^n (which is typical for a naive algorithm), then adding extra node to the graph doubles your time. This means 10 nodes add 1024 time time, 20 nodes 1024*1024 etc. If you're super-lucky, PyPy can speed up your algorithm by 100x, but this remains constant on the graph size (and you quickly run out of the universe time one way or another).
what would be the best way to proceed in optimising the code...
Profile first. There is a standard cProfile module that does simple profiling very well. Optimising your code before profiling is quite pointless.
Besides, for graphs you can try using the excellent networkx module. Also, if you deal with long sorted lists you can have a look at bisect and heapq modules.