Occasional timeout problems with get() function using python multiproccessing - python

I am relatively new to python, but I have an issue that I just cannot understand.
I am trying to implement multiprocessing on piece of code that computes the mandelbrot fractal. My problem is that, it seemingly works and doesn't work at random, and I don't understand why.
I have narrowed the problem down to specifically the .get() function that i am running to get data from the "ApplyResult" multiprocessing objects. It seems that this function will time out and run forever or until it is terminated. Occasionally the code will run once or twice, and then stop working, which suggests to me some kind of ports that are being used and not released, but everything I try to add to the code to fix this doesn't seem to help.
I have tried removing the "if name = main" part, which seemingly makes no difference, I have tried other functions I could find in the multiprocessing documentation to release the ports, including,.join() .kill(), .terminate(), as well as trying without the close altogether.
The relevant part of my code looks like this
import numpy as np
import multiprocessing as mp
import os
I = 100
T = 2
N = 100
C_rerow = np.reshape(np.linspace(-2, 1, N), (1, N))
C_imrow = np.reshape(np.linspace(1.5, -1.5, N), (N, 1))
C = C_rerow + C_imrow*1j
def mandel(C_row):
Mandelbrot = np.ones_like(np.absolute(C_row))
Z_old = np.zeros_like(C[:,0])
for i in range(I):
Z_new = Z_old**2 + C_row
Z_old = Z_new
Bool = (np.absolute(Z_new) > T) & (Mandelbrot == 1)
Mandelbrot[Bool] = i/I
return Mandelbrot
def MultiP(C, I, T):
pool = mp.Pool(processes=os.cpu_count() - 1)
Mc = []
for j in range(N):
C_row = C[j,:]
test = pool.apply_async(mandel, (C_row,))
Mc.append(test)
pool.close()
return Mc
if __name__ == '__main__':
Mc = MultiP(C, I, T)
Mc = [Mc[i].get(timeout = 5) for i in range(len(Mc))]
Another strange issue that I can't understand is, that if I rename the output of MultiP to something different than what the function returns, (eg rename Mc = MultiP() to MC = MultiP(), then the returned values just will not be saved. I've never seen anything like that happen with functions before, and I'm not sure if the problems are related, but I thought I would mention it.
Any input would be appreciated.

This is a bit long for a comment and perhaps not quite an answer. The bottom line is that I could not reproduce your error, but I did have a few remarks.
Code that create new processes needs to be conditionally executed by the test if __name__ == '__main__': on those platforms that use the operating system spawn rather than fork method of creating new processes. In such cases, the new process does not inherit a copy of the main process's address space as it was when the sub-process was created but instead starts execution from the top of the program re-executing everything it finds at global scope. If it weren't for the if __name__ == '__main__': test, it would recursively attempt to create even more sub-processes in an endless loop. Consequently, it's a good practice not to have complex calculations done at the global scope that do not really need to be there for they will be re-executed by every sub-process in your multiprocessing pool. It is better to move those calculations to within the if __name__ == '__main__': block. If the worker functions that you will be calling need to access those values you can either pass then as function arguments (this can be costly if these are large pieces of data) or initialized once each sub-process's address space with a global variable, as is done below.
I also tried to re-create the problem you cited in renaming the return value from MultiP from Mc to MC but had no problems. I also corrected your indentation errors.
When you execute the sequence pool.close() followed by pool.join(), you will block until all submitted tasks complete. So if you have submitted asynchronous tasks with apply_async and do not need to get any return values from the returned AsyncResult instances created, you can use close and join to be sure the tasks have finished executing. If you are using method get on the returned AsyncResult instances, you are also guaranteed that the tasks have completed (or timed out in your case), in which case there is really no need to be issuing close and join. By the way, just because you get a TimeoutError exception when you call get signifying that the task has timed out, it is still actually running. Presumably you do not want to wait for timed out tasks to complete. You should therefore call pool.terminate() to kill any running tasks (this is implicitly called at the termination of a with Pool() as poll: block.
Note the comments I have added.
Rhetorical questions for you (they don't need to be answered, but should be thought about):
You have global variables C, I, T, N. Function MultiP accepts C, I and T as arguments ignoring the global variables but accesses global variable N. mandel accesses everything it needs as global variables. Isn't this inconsistent?
MultiP contains the logic to repeatedly call mandel. It could be doing this with or without multiprocessing. With multiprocessing it could be using apply_async or the potentially more efficient map method (if all the arguments were put into a list and if you used a suitable chunksize argument, which turned out to be greater than 1). Yet you return back to its caller not the final results but rather a list of AsyncResult instances. This means that the caller is dependent on the implementation details of MulitiP. Since the caller is immediately "getting" the results anyway, wouldn't it be wiser just to have MultiP "get" and return the results and reduce the coupling between caller and callee?
import numpy as np
import multiprocessing as mp
import os
N = 100
I = 100
T = 2
def init_pool(c):
global C
C = c
def mandel(C_row):
Mandelbrot = np.ones_like(np.absolute(C_row))
Z_old = np.zeros_like(C[:,0])
for i in range(I):
Z_new = Z_old**2 + C_row
Z_old = Z_new
Bool = (np.absolute(Z_new) > T) & (Mandelbrot == 1)
Mandelbrot[Bool] = i/I
return Mandelbrot
def MultiP(C, I, T):
# initialize each sub-process's global C variable:
pool = mp.Pool(processes=os.cpu_count() - 1, initializer=init_pool, initargs=(C,))
Mc = []
for j in range(N):
C_row = C[j,:]
test = pool.apply_async(mandel, (C_row,))
Mc.append(test)
# previous 5 statements can be replaced with:
# Mc = [pool.apply_async(mandel, (C[j,:],)) for j in range(N)]
#pool.close() # not required
return Mc
if __name__ == '__main__':
# moved here so the calculations are done once:
C_rerow = np.reshape(np.linspace(-2, 1, N), (1, N))
C_imrow = np.reshape(np.linspace(1.5, -1.5, N), (N, 1))
C = C_rerow + C_imrow*1j
MC = MultiP(C, I, T)
# The following can throw a TimeoutError exception:
MC = [MC[i].get(timeout = 5) for i in range(len(MC))]
print(MC)
Prints:
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: overflow encountered in square
Z_new = Z_old**2 + C_row
test.py:19: RuntimeWarning: overflow encountered in absolute
Bool = (np.absolute(Z_new) > T) & (Mandelbrot == 1)
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:17: RuntimeWarning: invalid value encountered in square
Z_new = Z_old**2 + C_row
test.py:19: RuntimeWarning: overflow encountered in absolute
Bool = (np.absolute(Z_new) > T) & (Mandelbrot == 1)
[array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
etc.

Related

Cython: Converting a Cython function with `except` into C like function

I am basically trying to wrap a C library and want to pass Python functions in wrapped C functions using Cython. I am using global variables to store the Python function, calling it in a Cython wrapper and then passing the Cython wrapper as a C function to some other C function. I want to capture the errors in the Python function when called by the Cython wrapper, so, I add a except? -1 at the end of the Cython wrapper declaration.
The problem is when I pass a Cython function with except keyword, it fails to convert it into C like function. If I don't, I can't handle errors occuring in the Python function (I need to do this as the function is called in a loop)
A MRE is shown below.
# file: mre.pyx
# cython: language_level=3
import numpy as np
cimport numpy as np
_glob_params = None
_glob_func = None
cdef extern from "mre.h":
double eval_func(double (*func)(double, double *, int))
cdef double _func_wrapper(double x, double *params, int n) except? -1:
return _glob_func(x, *_glob_params)
def run_func(func, params=(), Py_ssize_t size=100):
cdef Py_ssize_t i
cdef np.ndarray[np.float64_t, ndim=1] out = np.empty(size, dtype=np.float64)
global _glob_func, _glob_params
_glob_func = func
_glob_params = params
for i in range(size):
out[i] = eval_func(_func_wrapper)
return out
// file : mre.h
#pragma once
#include <stdlib.h>
#include <stdio.h>
double eval_func(double (*func)(double, double *, int)) {
double res = func(1.0, NULL, 0);
if ( res < 0. ) {
fprintf(stderr, "ERROR: func < 0. -> not allowed!\n");
}
return res;
}
Error:
Error compiling Cython file:
------------------------------------------------------------
...
_glob_func = func
_glob_params = params
for i in range(size):
out[i] = eval_func(_func_wrapper)
^
------------------------------------------------------------
mre.pyx:24:27: Cannot assign type 'double (double, double *, int) except? -1.0' to 'double (*)(double, double *, int)'
Without except? -1:
>>> from mre import run_func
>>> run_func(None)
TypeError: 'NoneType' object is not callable
Exception ignored in: 'mre._func_wrapper'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not callable
TypeError: 'NoneType' object is not callable
Exception ignored in: 'mre._func_wrapper'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
...
...
...
TypeError: 'NoneType' object is not callable
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
So, is there a way to handle exceptions in Cython without except? I tried to use ctypes to wrap the Python function but it is very slow in practice.
The way exceptions work in Python, once an exception is handled, the global stay must be changed, i.e. error must be cleared (e.g. via PyErr_Clear()), otherwise the interpreter is in an inconsistent state and strange things can happen.
Because your C code is not aware of Python, it cannot clear the error indicator, thus it must happen in Cython code. One possibility would be to add an additional wrapper:
cdef double wrapper_noraise(double x, double *params, int n):
try:
return _func_wrapper(val)
except:
return -1.0
and now pass wrapper_noraise to C code.
Or, in your case, you could directly change _func_wrapper to:
cdef double _func_wrapper(double x, double *params, int n):
try:
return _glob_func(x, *_glob_params)
except:
return -1.0

Suggesting Values in Nevergrad Package

Steps to reproduce
import nevergrad as ng
import numpy as np
loc = ng.p.Scalar(lower=-5,upper=5)
scale = ng.p.Scalar(lower=0, upper=5)
s = ng.p.Scalar(lower=0, upper=10)
k = ng.p.Choice(list(range(2,6)))
w = ng.p.Array(shape=(self.times.shape[0],)).set_bounds(-10,10)
instru = ng.p.Instrumentation(loc=loc,
scale = scale,
s=s,
k=k,
w = w)
optimizer = ng.optimizers.DE(parametrization=instru,
budget=budget)
optimizer.suggest((),{'k':3,'loc':-2,'s':2,'scale':2,'w':np.ones(self.times.shape[0])})
Observed Results
ValueError: Tuple value must be a tuple of size 0, got: ((), {'k': 3, 'loc': -2, 's': 2, 'scale': 2, 'w': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1.])}).
Current value: ()
Expected Results
For initial values to be set in an optimizer run
Has anyone had success using the suggest method in Nevergrad?
If so, would you mind copying/pasting working code? I've been trying different forms of the example in the documentation, but cannot seem to get it to work.
The question was answered in a relevant Github thread:
Basically, suggest should be called the same way as the function to optimize, in your case, given you are using an Instrumentation, I guess it should be:
optimizer.suggest(k=3, loc=-2, s=2, scale=2, w=mp.ones(self.times.shape[0]))
Another option, which can work for all but the Choice parameter would be to use the init option of Array and Scalar (eg: loc = ng.p.Scalar(init=-2, lower=-5, upper=5))

Is there a way in numpy to test whether a matrix is Unitary

I was wondering if there is any function in numpy to determine whether a matrix is Unitary?
This is the function I wrote but it is not working. I would be thankful if you guys can find an error in my function and/or tell me another way to find out if a given matrix is unitary.
def is_unitary(matrix: np.ndarray) -> bool:
unitary = True
n = matrix.size
error = np.linalg.norm(np.eye(n) - matrix.dot( matrix.transpose().conjugate()))
if not(error < np.finfo(matrix.dtype).eps * 10.0 *n):
unitary = False
return unitary
Let's take an obviously unitary array:
>>> a = 0.7
>>> b = (1-a**2)**0.5
>>> m = np.array([[a,b],[-b,a]])
>>> m.dot(m.conj().T)
array([[ 1., 0.],
[ 0., 1.]])
and try your function on it:
>>> is_unitary(m)
Traceback (most recent call last):
File "<ipython-input-28-8dc9ddb462bc>", line 1, in <module>
is_unitary(m)
File "<ipython-input-20-3758c2016b67>", line 5, in is_unitary
error = np.linalg.norm(np.eye(n) - matrix.dot( matrix.transpose().conjugate()))
ValueError: operands could not be broadcast together with shapes (4,4) (2,2)
which happens because
>>> m.size
4
>>> np.eye(m.size)
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
If we replace n = matrix.size with len(m) or m.shape[0] or something, we get
>>> is_unitary(m)
True
I might just use
>>> np.allclose(np.eye(len(m)), m.dot(m.T.conj()))
True
where allclose has rtol and atol parameters.
If you are using NumPy's matrix class, there is a property for the Hermitian conjugate, so:
def is_unitary(m):
return np.allclose(np.eye(m.shape[0]), m.H * m)
e.g.
In [79]: P = np.matrix([[0,-1j],[1j,0]])
In [80]: is_unitary(P)
Out[80]: True

Stochastic Optimization in Python

I am trying to combine cvxopt (an optimization solver) and PyMC (a sampler) to solve convex stochastic optimization problems.
For reference, installing both packages with pip is straightforward:
pip install cvxopt
pip install pymc
Both packages work independently perfectly well. Here is an example of how to solve an LP problem with cvxopt:
# Testing that cvxopt works
from cvxopt import matrix, solvers
# Example from http://cvxopt.org/userguide/coneprog.html#linear-programming
c = matrix([-4., -5.])
G = matrix([[2., 1., -1., 0.], [1., 2., 0., -1.]])
h = matrix([3., 3., 0., 0.])
sol = solvers.lp(c, G, h)
# The solution sol['x'] is correct: (1,1)
However, when I try using it with PyMC (e.g. by putting a distribution on one of the coefficients), PyMC gives an error:
import pymc as pm
import cvxopt
c1 = pm.Normal('c1', mu=-4, tau=.5**-2)
#pm.deterministic
def my_lp_solver(c1=c1):
c = matrix([c1, -5.])
G = matrix([[2., 1., -1., 0.], [1., 2., 0., -1.]])
h = matrix([3., 3., 0., 0.])
sol = solvers.lp(c, G, h)
solution = np.array(sol['x'],dtype=float).flatten()
return solution
m = pm.MCMC(dict(c1=c1, x=x))
m.sample(20000, 10000, 10)
I get the following PyMC error:
<ipython-input-21-5ce2909be733> in x(c1)
14 #pm.deterministic
15 def x(c1=c1):
---> 16 c = matrix([c1, -5.])
17 G = matrix([[2., 1., -1., 0.], [1., 2., 0., -1.]])
18 h = matrix([3., 3., 0., 0.])
TypeError: invalid type in list
Why? Is there any way to make cvxoptplay nicely with PyMC?
Background:
In case anyone wonders, PyMC allows you to sample from any function of your choice. In this particular case, the function from which we sample is one that maps an LP problem to a solution. We are sampling from this function because our LP problem contains stochastic coefficients, so one cannot just apply an LP solver off-the-shelf.
More specifically in this case, a single PyMC output sample is simply a solution to the LP problem. As parameters of the LP problem vary (according to distributions of your choice), the output samples from PyMC would be different, and the hope is to get a posterior distribution.
The solution above is inspired by this answer, the only difference is that I am hoping to use a true general solver (in this case cvxopt)
The type of c1 generated with pm.Normal is numpy array, you just need to strip it out and convert it to float(c1), then it works finely:
>>> #pm.deterministic
... def my_lp_solver(c1=c1):
... c = matrix([float(c1), -5.])
... G = matrix([[2., 1., -1., 0.], [1., 2., 0., -1.]])
... h = matrix([3., 3., 0., 0.])
... sol = solvers.lp(c, G, h)
... solution = np.array(sol['x'],dtype=float).flatten()
... return solution
...
pcost dcost gap pres dres k/t
0: -8.1223e+00 -1.8293e+01 4e+00 0e+00 7e-01 1e+00
1: -8.8301e+00 -9.4605e+00 2e-01 1e-16 4e-02 3e-02
2: -9.0229e+00 -9.0297e+00 2e-03 2e-16 5e-04 4e-04
3: -9.0248e+00 -9.0248e+00 2e-05 3e-16 5e-06 4e-06
4: -9.0248e+00 -9.0248e+00 2e-07 2e-16 5e-08 4e-08
Optimal solution found.

Python: An elegant/efficient way to evaluate function over bi-dimensional indexes?

I am very new to Python (in the past I used Mathematica, Maple, or Matlab scripts). I am very impressed how NumPy can evaluate functions over arrays but having problems trying to implement it in several dimensions. My question is very simple (please don't laugh): is there a more elegant and efficient way to evaluate some function f (which is defined over R^2) without using loops?
import numpy
M=numpy.zeros((10,10))
for i in range(0,10):
for j in range(0,10):
M[i,j]=f(i,j)
return M
The goal when coding with numpy is to implement your computation on the whole array, as much as possible. So if your function is, for example, f(x,y) = x**2 +2*y and you want to apply it to all integer pairs x,y in [0,10]x[0,10], do:
x,y = np.mgrid[0:10, 0:10]
fxy = x**2 + 2*y
If you don't find a way to express your function in such a way, then:
Ask how to do it (and state explicitly the function definition)
use numpy.vectorize
Same example using vectorize:
def f(x,y): return x**2 + 2*y
x,y = np.mgrid[0:10, 0:10]
fxy = np.vectorize(f)(x.ravel(),y.ravel()).reshape(x.shape)
Note that in practice I only use vectorize similarly to python map when the content of the arrays are not numbers. A typical example is to compute the length of all list in an array of lists:
# construct a sample list of lists
list_of_lists = np.array([range(i) for i in range(1000)])
print np.vectorize(len)(list_of_lists)
# [0,1 ... 998,999]
Yes, many numpy functions operate on N-dimensional arrays. Take this example:
>>> M = numpy.zeros((3,3))
>>> M[0][0] = 1
>>> M[2][2] = 1
>>> M
array([[ 1., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 1.]])
>>> M > 0.5
array([[ True, False, False],
[False, False, False],
[False, False, True]], dtype=bool)
>>> numpy.sum(M)
2.0
Note the difference between numpy.sum, which operates on N-dimensional arrays, and sum, which only goes 1 level deep:
>>> sum(M)
array([ 1., 0., 1.])
So if you build your function f() out of operations that work on n-dimensional arrays, then f() itself will work on n-dimensional arrays.
You can also use numpy multi-dimension slicing, like below. You just provide slices for each dimension:
arr = np.zeros((5,5)) # 5 rows, 5 columns
# update only first column
arr[:,0] = 1
# update only last row ... same as arr[-1] = 1
arr[-1,:] = 1
# update center
arr[1:-1, 1:-1] = 1
print arr
output:
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.]])
A pure python answer, not depending upon numpy tools, is to make the Cartesian Product of two sequences:
from itertools import product
for i, j in product(range(0, 10), range(0, 10)):
M[i,j]=f(i,j)
Edit: Actually, I should have read the question properly. This still uses loops, just one less loop.

Categories

Resources