Multiprocessing of two for loops - python

I'm struggling with the implementation of an algorithm in python (2.7) to parallelize the computation of a physics problem. There's a parameter space over two variables (let's say a and b) over which I would like to run my written program f(a,b) which returns two other variables c and d.
Up to now, I worked with two for-loops over a and b to calculate two arrays for c and d which are then saved as txt documents. Since the parameter space is relatively large and each calculation of a point f(a,b) in it takes relatively long, it would be great to use all of my 8 CPU cores for the parameter space scan.
I've read about multithreading and multiprocessing and it seems that multiprocessing is what I'm searching for. Do you know of a good code example for this application or resources to learn about the basics of multiprocessing for my rather simple application?

Here is an example of how you might use multiprocessing with a simple function that takes two arguments and returns a tuple of two numbers, and a parameter space over which you want to do the calculation:
from itertools import product
from multiprocessing import Pool
import numpy as np
def f(a, b):
c = a + b
d = a * b
return (c, d)
a_vals = [1, 2, 3, 4, 5, 6]
b_vals = [10, 11, 12, 13, 14, 15, 16, 17]
na = len(a_vals)
nb = len(b_vals)
p = Pool(8) # <== maximum number of simultaneous worker processes
answers = np.array(p.starmap(f, product(a_vals, b_vals))).reshape(na, nb, 2)
c_vals = answers[:,:,0]
d_vals = answers[:,:,1]
This gives the following:
>>> c_vals
array([[11, 12, 13, 14, 15, 16, 17, 18],
[12, 13, 14, 15, 16, 17, 18, 19],
[13, 14, 15, 16, 17, 18, 19, 20],
[14, 15, 16, 17, 18, 19, 20, 21],
[15, 16, 17, 18, 19, 20, 21, 22],
[16, 17, 18, 19, 20, 21, 22, 23]])
>>> d_vals
array([[ 10, 11, 12, 13, 14, 15, 16, 17],
[ 20, 22, 24, 26, 28, 30, 32, 34],
[ 30, 33, 36, 39, 42, 45, 48, 51],
[ 40, 44, 48, 52, 56, 60, 64, 68],
[ 50, 55, 60, 65, 70, 75, 80, 85],
[ 60, 66, 72, 78, 84, 90, 96, 102]])
The p.starmap returns a list of 2-tuples, from which the c and d values are then extracted.
This assumes that you will do your file I/O in the main program after getting back all the results.
Addendum:
If p.starmap is unavailable (Python 2), then instead you can change your function to take a single input (a 2-element tuple):
def f(inputs):
a, b = inputs
# ... etc as before ...
and then use p.map in place of p.starmap in the above code.
If it is not convenient to change the function (e.g. it is also called from elsewhere), then you can of course write a wrapper function:
def f_wrap(inputs):
a, b = inputs
return f(a, b)
and call that instead.

Related

Print list in specified range Python

I'm new to Python and I have this problem
I have a list of numbers like this:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
I want to print from 11 to 37, that means the output = 11, 13,.... 37.
I tried to print(n[11:37]) but of course it will print [37, 41, 43, 47]
because that is range index.
Any ideas or does Python have any built-in method for this ?
This should do the job...
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
n.sort()
mylist = [x for x in n if x in range(11, 38)]
print(mylist)
Want to print that as comma separated string:
print(mylist.strip('[]'))
This will work. (Assuming list is sorted)
print n[n.index(11): n.index(37)+1]
Output:
[11, 13, 17, 19, 23, 29, 31, 37]
Considering your list is ordered and it has no duplicates:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
print(",".join(map(str,n[n.index(11): n.index(37)+1])))
Here you have a live example
Using numpy:
import numpy as np
narr = np.array(n)
m = (narr >= 11) & (narr <= 37)
for v in narr[m]:
print(v)
# or, to get rid of the loop:
print('\n'.join(map(str, narr[m])))
it pretty simple, since your list is already sorted you can write
my_list = [x for x in n if x in range(11, 38)]
print(*my_list)
what the '*' does is that it unpacks the array into individual elements, a term known as unpacking.This will produce the actual result you wanted and not an array
If your data is sorted, you can use a generator expression with either a range object or chained comparisons:
n = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
print(*(i for i in n if i in range(11, 38)), sep=', ')
print(*(i for i in n if 11 <= i <= 37), sep=', ')
If your data is unsorted and you can use indices of the first occurrences of each value, you can slice your list:
print(*n[n.index(11): n.index(37)+1], sep=', ')
Result with the data you have provided:
11, 13, 17, 19, 23, 29, 31, 37

Multivariable regression with scipy curve_fit: always off by a systematic amount

I have been doing multivariable linear regression (equations of the form: y=b0+b1*x1+b2*x2+...+bnxn) in python. I could successfully solve the following function:
def MultipleRegressionFunc(X,B):
y=B[0]
for i in range(0,len(X)): y += X[i]*B[i+1]
return y
I'll skip the details of that function for now. Suffices to say that using the curve_fit wrapper in scipy with this function has successfully allowed me to solve systems with many variables.
Now I've been wanting to consider possible interactions between variables, so I've modified the function as follows:
def MultipleRegressionFuncBIS(X,B):
#Define terms of the equation
#The first term is 1*b0 (intercept)
terms=[1]
#Adding terms for the "non-interaction" part of the equation
for x in X: terms.append(x)
#Adding terms for the 'interaction' part of the equations
for x in list(combinations(X, 2)): terms.append(x[0]*x[1])
#I'm proceeding in this way because I found that some operations on iterables are not well handled when curve_fit passes numpy arrays to the function
#Setting a float object to hold the result of the calculation
y = 0.0
#Iterating through each term in the equation, and adding the value to y
for i in range(0, len(terms)): y += B[i]*terms[i]
return y
I made a wrapper function for the above to be able to pass multiple linear coefficients to it via curve_fit.
def wrapper_func(X,*B):
return MultipleRegressionFuncBIS(X,B)
Here's some mock input, generated by applying the following formula: 1+2*x1+3*x2+4*x3+5*x1*x2+6*x1*x3+7*x2*x3
x1=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]
x2=[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54]
x3=[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
y=[91, 192, 329, 502, 711, 956, 1237, 1554, 1907, 2296, 2721, 3182, 3679, 4212, 4781, 5386, 6027, 6704, 7417, 8166, 8951, 9772, 10629, 11522, 12451, 13416, 14417, 15454, 16527, 17636, 18781, 19962, 21179, 22432, 23721, 25046, 26407, 27804, 29237, 30706, 32211, 33752, 35329, 36942, 38591, 40276, 41997, 43754, 45547, 47376, 49241, 51142, 53079]
Then I obtain the linear coefficients by calling the following:
linear_coeffs=list(curve_fit(wrapper_func,[x1,x2,x3],y,p0=[1.1,2.2,3.1,4.1,5.1,6.1,7.1],bounds=(0.0,10.0))[0])
print linear_coeffs
Notice that here the p0 estimates are manually set to values extremely close to the real values to rule out the possibility that curve_fit is having a hard time converging.
Yet, the output for this particular case deviates more than I would expect from the real values (expected: [1.0,2.0,3.0,4.0,5.0,6.0,7.0]):
[1.1020684140370627, 2.1149407566785214, 2.9872182044259676, 3.9734017072175436, 5.0575156518729969, 5.9605293645760549, 6.9819549835509491]
Now, here's my problem. While the coefficients do not perfectly match the input model, that is a secondary concern. I do expect some error in real life examples, although that is puzzling in this noice-less mock example. My main problem is that the error is systematic. In the example above, using the coefficients estimated by curve_fit the residuals are systematically equal to 0.10206841, for all values of x1,x2,x3. Other mock datasets produce different, but still systematic residuals.
Can you think of any explanation for this systematic error ?
I post here because I suspect it is a coding issue, rather than a statistical one. I'm very willing to move this question to Cross Validated if it turns out I made a stats error.
Many thanks !

python combining a range and a list of numbers

range(5, 15) [1, 1, 5, 6, 10, 10, 10, 11, 17, 28]
range(6, 24) [4, 10, 10, 10, 15, 16, 18, 20, 24, 30]
range(7, 41) [9, 18, 19, 23, 23, 26, 28, 40, 42, 44]
range(11, 49) [9, 23, 24, 27, 29, 31, 43, 44, 45, 45]
range(38, 50) [1, 40, 41, 42, 44, 48, 49, 49, 49, 50]
I get the above outpout from a print command from a function. What I really want is a combined list of the range, for example in the top line 5,6,7...15,1,1,5,6 etc.
The output range comes from
range_draws=range(int(lower),int(upper))
which I naively thought would give a range. The other numbers come from a sliced list.
Could someone help me to get the desired result.
The range() function returns a special range object to save on memory (no need to keep all the numbers in memory when only the start, end and step size will do). Cast it to a list to 'expand' it:
list(yourrange) + otherlist
To quote the documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

fast categorization (binning)

I've a huge number of entries, every one is a float number. These data x are accesible with an iterator. I need to classify all the entries using selection like 10<y<=20, 20<y<=50, .... where y are data from an other iterables. The number of entries is much more than the number of selections. At the end I want a dictionary like:
{ 0: [all events with 10<x<=20],
1: [all events with 20<x<=50], ... }
or something similar. For example I'm doing:
for x, y in itertools.izip(variable_values, binning_values):
thebin = binner_function(y)
self.data[tuple(thebin)].append(x)
in general y is multidimensional.
This is very slow, is there a faster solution, for example with numpy? I think the problem cames from the list.append method I'm using and not from the binner_function
A fast way to get the assignments in numpy is using np.digitize:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html
You'd still have to split the resulting assignments up into groups. If x or y is multidimensional, you will have to flatten the arrays first. You could then get the unique bin assignments, and then iterate over those in conjunction with np.where to split the the assigments up into groups. This will probably be faster if the number of bins is much smaller than the number of elements that need to be binned.
As a somewhat trivial example that you will need to tweak/elaborate on for your particular problem (but is hopefully enough to get you started with with a numpy solution):
In [1]: import numpy as np
In [2]: x = np.random.normal(size=(50,))
In [3]: b = np.linspace(-20,20,50)
In [4]: assign = np.digitize(x,b)
In [5]: assign
Out[5]:
array([23, 25, 25, 25, 24, 26, 24, 26, 23, 24, 25, 23, 26, 25, 27, 25, 25,
25, 25, 26, 26, 25, 25, 26, 24, 23, 25, 26, 26, 24, 24, 26, 27, 24,
25, 24, 23, 23, 26, 25, 24, 25, 25, 27, 26, 25, 27, 26, 26, 24])
In [6]: uid = np.unique(assign)
In [7]: adict = {}
In [8]: for ii in uid:
...: adict[ii] = np.where(assign == ii)[0]
...:
In [9]: adict
Out[9]:
{23: array([ 0, 8, 11, 25, 36, 37]),
24: array([ 4, 6, 9, 24, 29, 30, 33, 35, 40, 49]),
25: array([ 1, 2, 3, 10, 13, 15, 16, 17, 18, 21, 22, 26, 34, 39, 41, 42, 45]),
26: array([ 5, 7, 12, 19, 20, 23, 27, 28, 31, 38, 44, 47, 48]),
27: array([14, 32, 43, 46])}
For dealing with flattening and then unflattening numpy arrays, see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.unravel_index.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel_multi_index.html
np.searchsorted is your friend. As I read somewhere here in another answer to the same topic, it's currently a good bit faster than digitize, and does the same job.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html

Array initialization in Python

I want to initialize an array with 10 values starting at X and incrementing by Y. I cannot directly use range() as it requires to give the maximum value, not the number of values.
I can do this in a loop, as follows:
a = []
v = X
for i in range(10):
a.append(v)
v = v + Y
But I'm certain there's a cute python one liner to do this ...
>>> x = 2
>>> y = 3
>>> [i*y + x for i in range(10)]
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29]
You can use this:
>>> x = 3
>>> y = 4
>>> range(x, x+10*y, y)
[3, 7, 11, 15, 19, 23, 27, 31, 35, 39]
Just another way of doing it
Y=6
X=10
N=10
[y for x,y in zip(range(0,N),itertools.count(X,Y))]
[10, 16, 22, 28, 34, 40, 46, 52, 58, 64]
And yet another way
map(lambda (x,y):y,zip(range(0,N),itertools.count(10,Y)))
[10, 16, 22, 28, 34, 40, 46, 52, 58, 64]
And yet another way
import numpy
numpy.array(range(0,N))*Y+X
array([10, 16, 22, 28, 34, 40, 46, 52, 58, 64])
And even this
C=itertools.count(10,Y)
[C.next() for i in xrange(10)]
[10, 16, 22, 28, 34, 40, 46, 52, 58, 64]
[x+i*y for i in xrange(1,10)]
will do the job
If I understood your question correctly:
Y = 6
a = [x + Y for x in range(10)]
Edit: Oh, I see I misunderstood the question. Carry on.

Categories

Resources