Scipy: Brute (grid-search) with multi threading? - python

I'm using scipy.optimize.brute(), but I noticed that it's only using one of my cores. One big advantage of a grid-search is to have all iterations of the solutions algorithm independent of each other.
Given that that's the case - why is brute() not implemented to run on multiple cores? If there is no good reason - is there a quick way to extend it / make it work, or does it make more sense to write the whole routine from scratch?

scipy.optimize.brute takes an arbitrary Python function. There is no guarantee this function is threadsafe. Even if it is, Python's global interpreter lock means that unless the function bypasses the GIL in C, it can't be run on more than one core anyway.
If you want to parallelize your brute-force search, you should write it yourself. You may have to write some Cython or C to get around the GIL.

Do you have scikit-learn installed? With a bit of refactoring you could use sklearn.grid_search.GridSearchCV, which supports multiprocessing via joblib.
You would need to wrap your local optimization function as an object that exposes the generic scikit-learn estimator interface, including a .score(...) method (or you could pass in a separate scoring function to the GridSearchCV constructor via the scoring= kwarg).

Related

Can scipy.optimize.minimize(..., method='SLSQP', ...) use multiple cores?

I am using scipy.optimize.minimize for nonlinear constrained optimization.
I tested two methods (trust-constr, SLSQP).
On a machine (Ubuntu 20.04.1 LTS) where proc gives 32,
scipy.optimize.minimize(..., method='trust-constr', ...) uses multiple cores like 1600%
scipy.optimize.minimize(..., method='SLSQP', ...) only uses one core
According to another post (scipy optimise minimize -- parallelisation options), it seems that this is not a python problem, rather, a BLAS/LAPACK/MKL problem.
However, if it is a BLAS problem, then for me, it seems that all methods should be of a single core.
In the post, someone replied that SLSQP uses multiple cores.
Does the parallelization support of scipy.optimize.minimize depends on a chosen method?
How can I make SLSQP use multiple cores?
One observation I made by looking into
anaconda3/envs/[env_name]/lib/python3.8/site-packages/scipy/optimize
trust-constr is implemented in python (_trustregsion_constr directory)
SLSQP is implemented by C (_slsqp.cpython-38-x86_64-linux-gnu.so file)
On parsing the _slsqp.py source file , you may notice that scipy's SLSQP not using MPI or multiprocessing (or any parallel processing).
Adding some sort of multiprocessing/MPI support is not trivial, because you have to do some surgery on the backend to enable those MPI barriers/synchronization holds (and make sure that all processes/threads are running in sync, and the main "optimizer" is only run on a single core).
If you're heading down this path, its relevant to mention: SLSQP as implemented in Scipy has some inefficient order of operations. When it computes derivatives, it perturbs all design variables and finds the gradient of the objective function first (some wrapper function is created at runtime to do this operation), and then SLSQP's python wrapper computes gradients for constraint functions by perturbing each design variable.
If speeding up SLSQP is critical, fixing the order of operations in the backend (where it invokes different treatment for finding gradients of objectives vs constraints) is important for many problems where there are a lot of common operations for calculating objectives and constraints. I'd say both backend updates belong under this category.. something for the dev forums to ponder.

How can I do parallel processing in Python?

I have a a problem where I need to solve thousands of independent nonnegative least squares problem using nnls in scipy. All problems are small about 100x100 matricies. To speed it up I've tried to use the multiprocessing module in python with the Pool class. I get about a factor 2 improvement if I set number of threads in numpy to 1 and use multiprocessing vs using multithreaded numpy and no multiprocessing. But the performance is very unpredictable. For instance, if I move sections of code into a separate function (to make it easier to read) or call the pool.map function in a class method the performance can decrease with 50%. So it seems like the multiprocessing module is too unreliable to be used.
Does anyone know what can cause this behaviour or know of a better alternative to multiprocessing?

Tensorflow: why tf.nn.conv2d runs faster than tf.layers.conv2d?

I am writing a simple implementation of AlexNet. I tried with using tf.nn.conv2d and tf.layers.conv2d, and the results turn out that the loss dropped faster when using tf.nn.conv2d, even the structure is exactly the same. Does anyone know any explanation for that?
If you try to follow the chain of function calls, you will find that tf.layers.conv2D() makes calls to tf.nn.conv2D() so no matter what you use, tf.nn.conv2d() will be called, it will be just faster if you call it yourself. You can use traceback.print_stack() method to verify that for yourself.
NOTE This does not mean that they are one and the same, select the function based on your need as there are various other tasks undertaken by tf.layers.conv2D().

Defining tensorflow operations in python with attributes

I am trying to register a python function and its gradient as a tensorflow operation.
I found many useful examples e.g.:
Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)
https://programtalk.com/python-examples/tensorflow.python.framework.function.Defun/
Nonetheless I would like to register attributes in the operation and use these attributes in the gradient definition by calling op.get_attr('attr_name').
Is this possible without going down to C implementation?
May you give me an example?
Unfortunately I don't believe it is possible to add attributes without using a C++ implementation of the operation. One feature that may help though is that you can define 'private' attributes by prepending an underscore to the start. I'm not sure if this is well documented or what the long-term guarantees are, but you can try setting '_my_attr_name' and you should be able to retrieve it later.

how does RandomizedLasso in sklearn use the variable n_jobs?

When I have to parallelize an algorithm in python I usually use the multiprocessing map function.
In sklearn randomized Lasso it seems that they are using something of different RandomizedLasso
I am not very expert of parallel computing in python and I hope that I can learn something new from this.
Can anyone explain me what are they using?
In their situation I would have used multiprocessing. Why did they choose something of different?
n_jobs is fed to joblib, which is used for all parallel processing in scikit-learn. As you can see on the joblib website, it's much easier to use than multiprocessing; it's also more feature-rich, as it can use either processes or threads (faster when executing C code) and has shared-memory support for NumPy arrays.

Categories

Resources