cexprtk with multiprocessing Python - python

I am using the cexprtk wrapper in python to evaluate arithmetic expressions as it offers very fast evaluation compared to the standard eval(). For a large list of expressions the initial overhead is cumbersome as it has to compile all the terms which can take a long time.
However it offers a very nice feature whereby you only need to compile once and can then re-evaluate the expressions using different values for the variables later on; which I want to do.
I was wondering if it was possible to apply Python multiprocessing to this compilation process? I would break apart the large list of arithmetic expressions into sub-lists and feed them separately into functions which apply the cexprtk compilation to the different lists. These can then be run in parallel.
I attempted to do this, but the output is nan whatever I try. Here is a very simple example showing a working cexprtk code without multiprocessing:
import cexprtk
st = cexprtk.Symbol_Table({"W":1, "X":3, "Y":1, "Z":2}, add_constants= True)
L = ['W+X+Y+Z','Y^2*W+Z']
A = [cexprtk.Expression(x, st) for x in L]
print(A)[0]() ## This gives 3 which is correct
print(A)[1]() ## This gives 7 which is correct
Now here is the attempt at using multiprocessing with two lists and two queues:
from multiprocessing import Process, Queue
import cexprtk
st = cexprtk.Symbol_Table({"W":1, "X":3, "Y":1, "Z":2}, add_constants= True)
## Define two lists
L = ['W+X+Y+Z','Y^2*W+Z']
L2 = ['W^5+Z-Y','Y^7+20-X']
## Define functions and put into queue (que)
def myfunc1(que):
lst1 = [cexprtk.Expression(x, st) for x in L]
def myfunc2(que):
lst2 = [cexprtk.Expression(x, st) for x in L2]
queue1 = Queue()
queue2 = Queue()
p1 = Process(target= myfunc1, args= (queue1,))
p2 = Process(target= myfunc2, args= (queue2,))
ans = queue1.get()
ans2 = queue2.get()
print(ans1[0]()) # Gives nan
print(ans2[0]()) # Gives nan
I feel as though this falls in the category of "embarrassingly parallel problems" as the lists are completely separate and no communication is needed between the processes. I have used this exact method of multiprocessing before with great success; but in this instance it is not giving an answer; and as there are no error messages, I have not got any error feedback to work with.
If you use eval() instead it works without issue; so I assume it is the cexprtk wrapper. Is there a way to achieve what I am after? Or is the Python -> C++ -> Python too much for multiprocessing?


how to ensure multiprocessing code using the configured cpu cores?

I use multiprocessing Pool to run parallel. I tried with 4 cores first in HPC with sub. When it uses 4 core, the time is reduced 4 times compared to 1 core. When I check with qstat, several times it uses 4 cores but after that just 1 core, with exactly the same code.
Could you please give some advice what is wrong with my code or the system?
import pandas as pd
import numpy as np
from multiprocessing import Pool
from datetime import datetime
t1 = pd.read_csv("template.csv",header=None)
s1 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_adfr.csv")
s2 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_dock.csv")
s3 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_gemdock.csv")
s4 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_ledock.csv")
s5 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_plants.csv")
s6 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_psovina.csv")
s7 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_quickvina2.csv")
s8 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_smina.csv")
s9 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_vina.csv")
s10 = pd.read_csv("/home/donp/dude_1000_raw_raw/dude_1000_raw_raw_vinaxb.csv")
#number of core and arrays
n = 4
m = (len(t1) // n)+1
g= m*n - len(t1)
for g1 in range(g):
def block_linear(i):
temp = pd.DataFrame(np.zeros((m,29)))
for a in range(0,m):
sum_matrix = (t1.iloc[a,0]*s1) + (t1.iloc[a,1]*s2) + (t1.iloc[a,2]*s3)+ (t1.iloc[a,3]*s4) + (t1.iloc[a,4]*s5) + (t1.iloc[a,5]*s6) + (t1.iloc[a,6]*s7) + (t1.iloc[a,7]*s8) + (t1.iloc[a,8]*s9) + (t1.iloc[a,9]*s10)
rank_sum= pd.DataFrame.rank(sum_matrix,axis=0,ascending=True,method='min') #real-True
temp.iloc[a,:] = rank_sum.iloc[999].values
temp['median'] = temp.median(axis=1)
temp.index = range(i*m,(i+1)*m)
return temp
if __name__ == '__main__':
pool = Pool(processes=n)
results = pool.map(block_linear,range(0,n))
The main idea is to cut large table into smallers, run calculations and combine together.
Without a more detailed usage of the multiprocessing's Pool package is really difficult to understand and help. Please notice that the Pool package does not guarantee parallelization: the _apply function, for example, only uses one worker of the Pool, and block all your executions. You can check out more details about it here and there.
But assuming you are using the library properly, you should make sure your code is fully parallelizable: an I/O operation on disk, for example, can bottleneck your parallelization and thus making your code run in only one process at a time.
I hope it helped.
Since you provided more details about your problem, I can give more specific tips:
The first thing is that your code is zero parallel. You are just calling the same function N times. This is not how multiprocessing should work.
Instead, the part that should be parallel is the one that is usually in a for loops, like the one you have inside the block_linear().
So, what I recommend to you:
You should change your code to first calculate all your weighted sum and only after that do the rest of the operations. This will help a lot with parallelization.
So, put this operation in a function:
def weighted_sum(column,df2):
temp = pd.DataFrame(np.zeros(m))
for a in range(0,m):
result = (t1.iloc[a,column]*df2)
temp.iloc[a] = result
return temp
So then, you use pool.starmap to parallel the function for the 10 dataframes you have, something like this:
results = pool.starmap(weighted_sum,[(0,s1),(1,s2),(2,s3),....,[9,s10]])
ps: pool.starmap is similar to pool.map but accepts a list of tuple arguments. You can have more details about it here.
At last but not least, you should operate over your results to end your calculations. Since you will have one weighted_sum per column, you can apply a sum over the columns and then the rank_sum.
This is not a fully runnable code to solve your problem, but a general guide of how your should restructure your code to have a multiprocessing advantage. I recommend you to test it over a subsample of the data frames just to make sure it's working properly before you run it on all your data.

Parallelize Python's reduce command

In Python I'm running a command of the form
reduce(func, bigArray[1:], bigArray[0])
and I'd like to add parallel processing to speed it up.
I am aware I can do this manually by splitting the array, running processes on the separate portions, and combining the result.
However, given the ubiquity of running reduce in parallel, I wanted to see if there's a native way, or a library, that will do this automatically.
I'm running a single machine with 6 cores.
For anyone stumbling across this, I ended up writing a helper to do it
def parallelReduce(l, numCPUs, connection=None):
if numCPUs == 1 or len(l) <= 100:
returnVal= reduce(reduceFunc, l[1:], l[0])
if connection != None:
return returnVal
parent1, child1 = multiprocessing.Pipe()
parent2, child2 = multiprocessing.Pipe()
p1 = multiprocessing.Process(target=parallelReduce, args=(l[:len(l) // 2], numCPUs // 2, child1, ) )
p2 = multiprocessing.Process(target=parallelReduce, args=(l[len(l) // 2:], numCPUs // 2 + numCPUs%2, child2, ) )
leftReturn, rightReturn = parent1.recv(), parent2.recv()
returnVal = reduceFunc(leftReturn, rightReturn)
if connection != None:
return returnVal
Note that you can get the number of CPUs with multiprocessing.cpu_count()
Using this function showed substantial performance increase over the serial version.
If you're able to combine map and reduce (or want to concatenate the result instead of a more general reduce) you could use mr4p:
The code for the _reduce function inside the class appears to implement parallel processing via multiprocessing.pool to pool the usual reduce processes, roughly by following a process:
reduce(<Function used to reduce>, pool.map(partial(reduce, <function used to reduce>), <List of results to reduce>))
I haven't tried it yet but it seems the syntax is:
mr4mp.pool().mapreduce(<Function to be mapped>,<Function used to reduce>, <List of entities to apply function on>)

multiprocessing.Pool.map() not working as expected

I understand from simple examples that Pool.map is supposed to behave identically to the 'normal' python code below except in parallel:
def f(x):
# complicated processing
return x+1
y_serial = []
x = range(100)
for i in x: y_serial += [f(x)]
y_parallel = pool.map(f, x)
# y_serial == y_parallel!
However I have two bits of code that I believe should follow this example:
#Linear version
price_datas = []
for csv_file in loop_through_zips(data_directory):
price_datas += [process_bf_data_csv(csv_file)]
#Parallel version
p = Pool()
price_data_parallel = p.map(process_bf_data_csv, loop_through_zips(data_directory))
However the Parallel code doesn't work whereas the Linear code does. From what I can observe, the parallel version appears to be looping through the generator (it's printing out log lines from the generator function) but then not actually performing the "process_bf_data_csv" function. What am I doing wrong here?
.map tries to pull all values from your generator to form it into an iterable before actually starting the work.
Try waiting longer (till the generator runs out) or use multi threading and a queue instead.

How to get multiple return objects from a function used in multiprocessing?

This may be a very easy question but definitely worn me out.
To use multiprocessing, I wrote the following code. the main function creates two processes which both use the same function , called prepare_input_data() but process different input datasets. this function must return multiple objects and values for each input to be used in the next steps of the code (not include here).
What I want is to get more than one value or object as a return from the function I am using in multiprocessing.
def prepare_input_data(inputdata_address,temporary_address, output):
name = p.name
data_address = inputdata_address
layer = loading_layer(data_address)
preprocessing_object = Preprocessing(layer)
nodes= preprocessing_object.node_extraction(layer)
tree = preprocessing_object.index_nodes()
roundabouts_dict , roundabouts_tree= find_roundabouts(layer.address, layer, temporary_address)
#return layer, nodes, tree, roundabouts_dict, roundabouts_tree
#return [layer, nodes, tree, roundabouts_dict, roundabouts_tree]
output.put( [layer, nodes, tree, roundabouts_dict, roundabouts_tree])
if __name__ == '__main__':
print "the data preparation in multi processes starts here"
processes =[]
ref_process = Process(name ="reference", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/NVDB_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output))
cor_process = Process(name ="corresponding", target=prepare_input_data, args=("D:/Ehsan/Skane/Input/Skane_data/Under_processing/identicals/clipped/test/OSM_test3.shp", "D:/Ehsan/Skane/Input/Skane_data/Under_processing/temporary/",output))
for p in processes:
print "the whole data preparation took ",time.time()-start_time
for p in processes:
#ref_info = outputs[0]
# ref_nodes=ref_info[0]
Previous ERROR
when I use return,ref_info[0] has Nonetype.
based on the answer here I changed it to a Queueu object passed to the function then I used put() to add the results and get() to retrieve them for the further processing.
Traceback (most recent call last):
File "C:\Python27\ArcGISx6410.2\Lib\multiprocessing\queues.py", line 262, in _feed
UnpickleableError: Cannot pickle <type 'geoprocessing spatial reference object'> objects
Could you please help me solve how to return more than one value from a function in multiprocessing?
Parallel programming with shared state is a rocky road that even experienced programmers get wrong. A much more beginner-friendly method is to copy data around. This is the only way to move data between subprocesses (not quite true, but that's an advanced topic).
Citing https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes, you'll want to setup a multiprocessing.Queue to fill with your returned data for each of your subprocesses. Afterward you can pass the queue to be read from to the next stage.
For multiple different datasets, such as your layer, nodes, tree, etc, you can use multiple queues to differentiate each return value. It may seem a bit cluttered to use a queue for each, but it's simple and understandable and safe.
Hope that helps.
if you use jpe_types.paralel's Process it will return the return value of the Processes target function like so
import jpe_types.paralel
def fun():
return 4, 23.4, "hi", None
if __name__ == "__main__":
p = jpe_types.paralel.Process(target = fun)
otherwise you could
import multiprocessing as mp
def fun(returner):
returner.send((1, 23,"hi", None))
if __name__ == "__main__":
processes = []
for i in range(2):
sender, recever = mp.Pipe()
p = mp.Process(target = fun, args=(sender,))
processes.append((p, recever))
resses = []
for p, rcver in processes:
using the conection will garantee that the retun's don't get scrambeld
If you are looking to get multiple return values from multiprocessing, then you can do that. Here's a simple example, first in serial python, then with multiprocessing:
>>> a,b = range(10), range(10,0,-1)
>>> import math
>>> map(math.modf, (1.*i/j for i,j in zip(a,b)))
[(0.0, 0.0), (0.1111111111111111, 0.0), (0.25, 0.0), (0.42857142857142855, 0.0), (0.6666666666666666, 0.0), (0.0, 1.0), (0.5, 1.0), (0.3333333333333335, 2.0), (0.0, 4.0), (0.0, 9.0)]
>>> from multiprocessing import Pool
>>> res = Pool().imap(math.modf, (1.*i/j for i,j in zip(a,b)))
>>> for i,ai in enumerate(a):
... x,y = res.next()
... print("{x},{y} = modf({u}/{d})").format(x=x,y=y,u=ai,d=b[i])
0.0,0.0 = modf(0/10)
0.111111111111,0.0 = modf(1/9)
0.25,0.0 = modf(2/8)
0.428571428571,0.0 = modf(3/7)
0.666666666667,0.0 = modf(4/6)
0.0,1.0 = modf(5/5)
0.5,1.0 = modf(6/4)
0.333333333333,2.0 = modf(7/3)
0.0,4.0 = modf(8/2)
0.0,9.0 = modf(9/1)
So to get multiple values in the return from a function with multiprocessing, you only need to have a function that returns multiple values… you will just get the values back as a list of tuples.
The major issue with multiprocessing, as you can see from your error… is that most functions don't serialize. So, if you really want to do what it seems like you want to do… I'd strongly suggest you use pathos (as discussed below). The largest barrier you will have with multiprocessing is that the functions you are passing as the target must be serializable. There are several modifications you can make to your prepare_input_data function… the first of which is to make sure it is encapsulated. If your function is not fully encapsulated (e.g. it has name-reference lookups outside of it's own scope), then it probably won't pickle with pickle. That means, you need to include all imports inside the target function and pass any other variables in through the function input. The error you are seeing (UnPicklableError) is due to your target function and it's dependencies not being able be serialized -- and not that you can't return multiple values from multiprocessing.
While I'd encapsulate the target function anyway as a matter of good practice, it can be a bit tedious and could slow your code down a hair. I also suggest that you convert your code to use dill and pathos.multiprocessing -- dill is an advanced serializer that can pickle almost all python objects, and pathos provides a multiprocessing fork that uses dill. That way, you can pass most python objects in the pipe (i.e. apply) or the map that is available form the Pool object and not worry too much about sweating too hard refactoring your code to make sure plain old pickle and multiprocessing can handle it.
Also, I'd use an asynchronous map instead of doing what you are doing above. pathos.multiprocessing has the ability to take multiple arguments in the map function, so you don't need to wrap them in the tuple args as you've done above. The interface should be much cleaner with an asynchronous map, and you can return multiple arguments if you need to… just pack them in a tuple.
Here's some examples that should demonstrate what I'm referring to above.
Return multiple values:
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> def addsub(x,y):
... return x+y, x-y
>>> a,b = range(10),range(-10,10,2)
>>> res = Pool().imap(addsub, a, b)
>>> for i,ai in enumerate(a):
... add,sub = res.next()
... print("{a} + {b} = {p}; {a} - {b} = {m}".format(a=ai,b=b[i],p=add,m=sub))
0 + -10 = -10; 0 - -10 = 10
1 + -8 = -7; 1 - -8 = 9
2 + -6 = -4; 2 - -6 = 8
3 + -4 = -1; 3 - -4 = 7
4 + -2 = 2; 4 - -2 = 6
5 + 0 = 5; 5 - 0 = 5
6 + 2 = 8; 6 - 2 = 4
7 + 4 = 11; 7 - 4 = 3
8 + 6 = 14; 8 - 6 = 2
9 + 8 = 17; 9 - 8 = 1
Asynchronous map:
Python multiprocessing - tracking the process of pool.map operation
Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()
What can multiprocessing and dill do together?
We still can't run your code… but if you post code that can be run, it might be more possible help edit your code (using the pathos fork and the asynchronous map or otherwise).
FYI: A release for pathos is a little bit overdue (i.e. late), so if you want to try it, it's best to get the code here: https://github.com/uqfoundation

Speed up Python eval when reading and evaluating list of equations from file

I have put together a simple Python script which reads a large list of algebraic expressions from a text file on separate lines, evaluates the mathematics on each line and puts it into a numpy array. The eigenvalues of this matrix are then found. The parameters A,B,C will then be changed and the program run again, hence a function is used to achieve this.
Some of these text files will have millions of lines of equations, so after profiling the code I found that the eval command accounts for approximately 99% of the execution time. I am aware of the dangers of using eval but this code will only ever be used by myself. All other parts of the code are fast, except the call to eval.
Here is the code where mat_size is set to 500 which represents a 500*500 array meaning 250,000 lines of equations are being read in from the file. I cannot provide the file as it is ~ 0.5GB in size, but have provided an example of what it looks like below and it only uses basic mathematical operations.
import numpy as np
from numpy import *
from scipy.linalg import eigvalsh
mat_size = 500
# Read the file line by line
with open("test_file.txt", 'r') as f:
lines = f.readlines()
# Function to evaluate the maths and build the numpy array
def my_func(A,B,C):
lst = []
for i in lines:
# Strip the \n
new = eval(i.rstrip())
# Build the numpy array
AA = np.array(lst,dtype=np.float64)
# Resize it to mat_size
matt = np.resize(AA,(mat_size,mat_size))
return matt
# Function to find eigenvalues of matrix
def optimise(x):
A,B,C = x
test = my_func(A,B,C)
return ev[-(1)]
# Define what A,B,C are, this can be changed each time the program is run
x0 = [7.65,5.38,4.00]
# Print result
A few lines of an example input text file: (mat_size can be changed to 2 to run this file)
I am aware eval is usually bad practice and slow, so I looked for other means to achieving a speed up. I tried methods outlined here but none of these appeared to work. I also tried applying sympy to the problem but that caused a massive slowdown. What is a better way of going about this problem?
From the suggestion to use numexpr instead, I have come across an issue where it grinds to a halt compared to the standard eval. For some instances the matrix elements contain quite a lot of algebraic expressions. Here is an example of just one matrix element, i.e one of the equations in the file (it contains a few more terms not defined in the code above, but can be easily defined at top of the code):
numexpr completely chokes when the matrix elements are of this form, whereas eval evaluates it instantaneously. For just a 10*10 matrix (100 equations in file) numexpr takes about 78 seconds to process the file, whereas eval takes 0.01 seconds. Profiling the code that uses numexpr reveals that the getExprnames and precompile function of numexpr are the causes of the issue with precompile taking 73.5 seconds of the total time and getExprNames taking 3.5 seconds of the time. Why would the precompile cause such a bottleneck in this particular calculation along with the getExprNames? Is this module just not well suited to long algebraic expressions?
I found a way to speed eval() up in this particular instance by making use of the multiprocessing library. I read the file in as usual, but then break the list into equal sized sub-lists which can then be processed separately on different CPU's and the evaluated sub-lists recombined at the end. This offers a nice speedup over the original method. I am sure the code below can be simplified/optimised; but for now it works (for instance what if there is a prime number of list elements? this will mean unequal lists). Some rough benchmarks show it is ~ 3 times faster using the 4 CPU's of my laptop. Here is the code:
from multiprocessing import Process, Queue
with open("test.txt", 'r') as h:
linesHH = h.readlines()
# Get the number of list elements
size = len(linesHH)
# Break apart the list into the desired number of chunks
chunk_size = size/4
chunks = [linesHH[x:x+chunk_size] for x in xrange(0, len(linesHH), chunk_size)]
# Declare variables
A = 0.1
B = 2
C = 2.1
m3 = 1
z3 = 2
# Declare all the functions that process the substrings
def my_funcHH1(A,B,C,que): #add a argument to function for assigning a queue to each chunk function
lstHH1 = []
for i in chunks[0]:
HH1 = eval(i)
def my_funcHH2(A,B,C,que):
lstHH2 = []
for i in chunks[1]:
HH2 = eval(i)
def my_funcHH3(A,B,C,que):
lstHH3 = []
for i in chunks[2]:
HH3 = eval(i)
def my_funcHH4(A,B,C,que):
lstHH4 = []
for i in chunks[3]:
HH4 = eval(i)
queue1 = Queue()
queue2 = Queue()
queue3 = Queue()
queue4 = Queue()
# Declare the processes
p1 = Process(target= my_funcHH1, args= (A,B,C,queue1))
p2 = Process(target= my_funcHH2, args= (A,B,C,queue2))
p3 = Process(target= my_funcHH3, args= (A,B,C,queue3))
p4 = Process(target= my_funcHH4, args= (A,B,C,queue4))
# Start them
HH1 = queue1.get()
HH2 = queue2.get()
HH3 = queue3.get()
HH4 = queue4.get()
# Obtain the final result by combining lists together again.
mergedlist = HH1 + HH2 + HH3 + HH4

