Python: How can you call VTK functions in parallel? - python

I am trying to speed up my python script, which uses vtk methods (and vtkobjects) for processing of geometric measurements. Since some of my methods include looping over very similar meshes and computing enclosed points for each of them, I simply wanted to parallelise such for loops:
averaged_contained_points = []
for intersection_actor in intersection_actors:
contained_points = vtk_mesh.points_inside_mesh(point_data=point_data, mesh=intersection_actor.GetMapper().GetInput())
mean_pos = np.mean(contained_points, axis=0)
averaged_contained_points.append(mean_pos)
In this case the function vtk_mesh.points_inside_mesh calls vtk.vtkSelectEnclosedPoints() and takes a vtkActor and vtkPolyData as input.
The main question is: How can this be converted to run in parallel?
My initial attempt was to import multiprocessing, but I then switched to import pathos.multiprocessing, which seems to have a few advantages, but they work fairly similar.
The problem is that the code below doesn't work.
def _parallel_generate_intersection_avg(inputs):
point_data = inputs[0]
intersection_actor = inputs[1]
contained_points = vtk_mesh.points_inside_mesh(point_data=point_data, mesh=intersection_actor.GetMapper().GetInput())
if len(contained_points) is 0:
return np.array([-1,-1,-1])
return np.mean(contained_points, axis=0)
pool = ProcessingPool(CPU_COUNT)
inputs = [[point_data,intersection_actor] for intersection_actor in intersection_actors]
averaged_contained_points = pool.map(_parallel_generate_intersection_avg, inputs)
It results in these sort of errors:
pickle.PicklingError: Can't pickle 'vtkobject' object: (vtkPolyData)0x111ed5bf0
I have done some research and found that vtkobjects probably can't be pickled:
Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()
However, since I couldn't find a solution for running python vtk code in parallel with the available answers, please let me know if you have any suggestions.
[EDIT]
I didn't try to implement threading, mainly, because I read the comments to the answer in this thread: How do I parallelize a simple Python loop?
Using multiple threads on CPython won't give you better performance
for pure-Python code due to the global interpreter lock (GIL)

It seems that threading doesn't use pickle http://pymotw.com/2/multiprocessing/basics.html:
Unlike with threading, to pass arguments to a multiprocessing Process
the argument must be able to be serialized using pickle.
If anyway you want to use multiprocessing or pickle, you should use a pickable object as input of your function , for example see tvtk (http://docs.enthought.com/mayavi/tvtk/README.html#pickling-tvtk-objects) or use a string as input of vtkreader/writer
example:
def functionWithPickableInput(inputstring0):
r0 = vtk.vtkPolyDataReader()
r0.ReadFromInputStringOn()
r0.SetInputString(inputstring0 )
r0.Update()
polydata0 = r0.GetOutput()
return functionWithVtkInput(polydata0)
#compute the strings to use as input (they are the content of the correspondent vtk file)
vtkstrings = []
w = vtk.vtkPolyDataWriter()
w.WriteToOutputStringOn()
for mesh in meshes:
w.SetInputData(mesh)
w.Update()
w.WriteToOutputStringOn()
vtkstrings.append(w.GetOutputString())
Here I chose to write everything in memory (see methods in http://www.vtk.org/doc/nightly/html/classvtkDataReader.html#a122da63792e83f8eabc612c2929117c3, http://www.vtk.org/doc/nightly/html/classvtkDataWriter.html#a8972eec261faddc3e8f68b86a1180c71 ).
Of course, you will have to call the writer outside the parallel loop, so you will have to judge if the overhead of the writer is reasonable respect to the function you want to parallelize. You can also read your polydata from a file,
if you have ram problems.
If you are familiar with MPI have a look to mpi4py http://www.kitware.com/blog/home/post/716

Related

Multiprocessing Z3 in Python

I have a large list of various types of objects I would like Z3 to synthesize in my Python project. Since constraints associated with each object to be synthesized are independent, this process can be completely parallelized. That is, instead of synthesizing one value at a time, if I have a machine with 4 cores, I can synthesize 4 values at the same time. To do this, we must use Python's multiprocessing package instead of threading (due to GIL and the fact that the workload should be CPU-bound).
For simplicity, say I have a simple str synthesizer that synthesizes a new str that is lexicographically less than a given input value, something like this:
def lt_constraint(value):
solver = Solver()
# do a number of processing on 'value', which is an input string
# ... define char and _chars in code here
template = Concat(Re(StringVal(value[:offset])), char, Star(_chars))
solver.add(InRe(String("var"), template))
if solver.check() == sat:
value = solver.model()[self.var]
return convert_to_str(value)
Now if I have a number of values, I want to run the function above in parallel:
from pathos.multiprocessing import ProcessingPool as Pool
with Pool(processes=4) as pool:
value_list = ['This', 'is', 'an', 'example']
synthesized_strs = pool.map(lt_constraint, value_list)
I use pathos hoping that it will handle pickling issue, but I still received this error:
TypeError: cannot pickle 're.Match' object
which I believe is because Z3 uses methods in re and they need to be pickled when pickling lt_constraint(), but dill cannot pickle those.
Is there any other way to parallelize Z3 for my case (other than implementing pickling myself for re or what not)?
Thanks!
Stack-overflow works the best when you include your whole code, so people can experiment with it. Having said that, I had good luck with the following:
from z3 import *
import concurrent.futures
def getVal(value):
solver = Solver()
var = Int('var')
solver.add(var > value)
if solver.check() == sat:
return solver.model()[var].as_long()
else:
return 'CANT SOLVE'
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [executor.submit(getVal, i) for i in [1, 2, 3]]
results = [f.result() for f in futures]
print(results)
This prints:
$ python3.9 a.py
[2, 3, 4]
Without actually having the details of how you constructed your lt_constraint, it's hard to tell whether this'll work for your case. But it seems using the concurrent.futures library works well with z3; so far as simple constraints are used. Give this a try and see if it handles your case as well. If not; please post the full-code as an minimal-reproducible example. See https://stackoverflow.com/help/minimal-reproducible-example

Running rpy2 in parallel using multiprocessing raises weird exception that cannot be caught

So this is a problem that I have not been able to solve, and neither do I know of a good way to make a MCVE out of. Essentially, it has been briefly discussed here, but as the comments show, there was some disagreement, and the final verdict is still out. Hence I am posting a similar question again, hoping to get a better answer.
Background
I have sensor data from a couple of thousand sensors, that I get every minute. My interest lies in forecasting the data. For this I am using the ARIMA family of forecasting models. Long story short, after discussion with the rest of my research group, we decided to use the Arima function available in the R package forecast, instead of the statsmodels implementation of the same.
Problem Definition
Since, I have data from a few thousand sensors, for which I would like to at least analyse a whole week's worth of data (to begin with), and since a week has 7 days, I have 7 times the number of sensors data with me. Essentially a some 14k sensor-day combinations. Finding the best ARIMA order (which minimizes BIC) and forecasting the next day of week data takes about 1 minute for each sensor-day combination. Which means upwards of 11 days to just process one week data on a single core!
This is obviously a waste, when I have 15 more cores just idling away the whole time. So, obviously, this is a problem for parallel processing. Note that each sensor-day combination does not influence any other sensor-day combination. Also, the rest of my code is fairly well profiled, and optimized.
Issue
The issue is that I get this weird error that I cannot catch anywhere. Here is the error reproduced:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/home/kartik/miniconda3/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/pool.py", line 429, in _handle_results
task = get()
File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/connection.py", line 251, in recv
return ForkingPickler.loads(buf.getbuffer())
File "/home/kartik/miniconda3/lib/python3.5/site-packages/rpy2/robjects/robject.py", line 55, in _reduce_robjectmixin
rinterface_level=rinterface_factory(rdumps, rtypeof)
ValueError: Mismatch between the serialized object and the expected R type (expected 6 but got 24)
Here are a few characteristics of this error that I have discovered:
It is raised in the rpy2 package
It has something to do with Thread 3. Since Python is zero indexed, I am guessing this is the fourth thread. Therefore, 4x6 = 24, which adds up to the numbers shown in the final error statement
rpy2 is being used in only one place in my code where it might have to recode returned values into Python types. Protecting that line in try: ... except: ... does not catch that exception
The exception is not raised when I ditch the multiprocessing and call the function within a loop
The exception does not crash the program, just suspends it forever (till I Ctrl+C it into terminating)
All that I tried till now, have had no effect in resolving the error
Things Tried
I have tried everything from extreme procedural coding, with functions to deal with the least cases (that is only one function to be called in parallel), to extreme encapsulation, where the executable block in the if __name__ == '__main__': calls a function which reads in the data, does the necessary grouping, then passes the groups to another function, which imports multiprocessing and calls another function in parallel, which imports the processing module that imports rpy2, and passes the data to the Arima function in R.
Basically, it doesn't matter if rpy2 is called and initialized deep inside function nests, such that it has no idea another instance might be initialized, or if it is called and initialized once, globally, the error is raised if multiprocessing is involved.
Pseudo Code
Here is an attempt to present at least some basic pseudo code such that the error might be reproduced.
import numpy as np
import pandas as pd
def arima_select(y, order):
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
forecast = importr('forecast')
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
def arima_wrapper(data):
data = data[['tstamp', 'val']]
data.set_index('tstamp', inplace=True)
return arima_select(data, (1,1,1))
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count()) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def wrapper():
df = pd.read_csv('file.csv', parse_dates=[1], infer_datetime_format=True)
df['day'] = df['tstamp'].dt.day
res = applyParallel(df.groupby(['sensor', 'day']), arima_wrapper)
print(res)
Obviously, the above code can be encapsulated further, but I think it should reproduce the error quite accurately.
Data Sample
Here is the output of print(data.head(6)) when placed immediately below data.set_index('tstamp', inplace=True) in arima_wrapper from the pseudo code above:
Or alternatively, data for a sensor, for a whole week can be generated simply with:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
Observations and Questions
The first observation is that this error is only raised when multiprocessing is involved, not when the function (arima_wrapper) is called in a loop. Therefore, it must be associated somehow with multiprocessing issues. R is not very multiprocess friendly, but when written in the way shown in the pseudo code, each instance of R should not know about the existence of the other instances.
The way the pseudo code is structured, there must be an initialization of rpy2 for each call inside the multiple subprocesses spawned by multiprocessing. If that were true, each instance of rpy2 should have spawned its own instance of R, which should just execute one function, and terminate. That would not raise any errors, because it would be similar to the single threaded operation. Is my understanding here accurate, or am I completely or partially missing the point?
Were all instances of rpy2 to somehow share an instance of R, then I might reasonably expect the error. What is true: is R shared among all instances of rpy2, or is there an instance of R for each instance of rpy2?
How might this issue be overcome?
Since SO hates question threads with multiple questions in them, I will prioritize my questions such that partial answers will be accepted. Here is my priority list:
How might this issue be overcome? A working code example that does not raise the issue will be accepted as answer even if it does not answer any other question, provided no other answer does better, or was posted earlier.
Is my understanding of Python imports accurate, or am I missing the point about multiple instances of R? If I am wrong, how should I edit the import statements such that a new instance is created within each subprocess? Answers to this question are likely to point me towards a probable solution, and will be accepted, provided no answer does better, or was posted earlier
Is R shared among all instances of rpy2 or is there an instance of R for each instance of rpy2? Answers to this question will be accepted only if they lead to a resolution of the problem.
(...) Long story short (...)
Really ?
How might this issue be overcome? A working code example that does not
raise the issue will be accepted as answer even if it does not answer
any other question, provided no other answer does better, or was
posted earlier.
Answers may leave a quite bit of work on your end...
Is my understanding of Python imports accurate, or am
I missing the point about multiple instances of R? If I am wrong, how
should I edit the import statements such that a new instance is
created within each subprocess? Answers to this question are likely to
point me towards a probable solution, and will be accepted, provided
no answer does better, or was posted earlier
Python packages/modules are "uniquely" imported across your process which means that all code using the package/module within the process is using the same single import (you don't have a copy per import in a given block).
Because of this, I'd recommend to use an initialization function when creating your Pool rather than repeatedly import rpy2 and setup the conversion each time a task is sent to a worker. You may also gain in performance if each task is short.
def arima_select(y, order):
# FIXME: check whether the rpy2.robjects package
# should be (re) imported as ro to be visible
res = forecast.Arima(y, order=ro.FloatVector(order))
return res
forecast = None
def worker_init():
from rpy2 import robjects as ro
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
pandas2ri.activate()
global forecast
forecast = importr('forecast')
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
Is R shared among all
instances of rpy2 or is there an instance of R for each instance of
rpy2? Answers to this question will be accepted only if they lead to a
resolution of the problem.
rpy2 is making R available by linking its C shared library. One such library per Python process, and that's as a stateful library (R not able to handle concurrency). I think that your issue has more to do with object serialization (see http://rpy2.readthedocs.io/en/version_2.8.x/robjects_serialization.html#object-serialization) than with concurrency.
What is happening is some apparent confusion when reconstructing the R objects after Python pickled the rpy2 object. More specifically, when looking that the R object types mentioned in the error message:
>>> from rpy2.rinterface import str_typeint
>>> str_typeint(6)
'LANGSXP'
>>> str_typeint(24)
'RAWSXP'
I am guessing that the R object returned by forecast.Arima contains an unevaluated R expression (for example the call that lead to that result object) and when serializing and unserializing it is coming back as something different (a raw vector of bytes). This is possibly a bug with R's own serialization mechanism (since rpy2 is using it behind the hood). For now, and solve your issue, you may want to extract what forecast.Arima what you care most about and only return that from the function call ran by the worker.
The following changes to the arima_select function in the pesudo code presented in the question work:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima_select(y, order):
def rimport(packname):
as_environment = ri.baseenv['as.environment']
require = ri.baseenv['require']
require(ri.StrSexpVector([packname]),
quiet = ri.BoolSexpVector((True, )))
packname = ri.StrSexpVector(['package:' + str(packname)])
pack_env = as_environment(packname)
return pack_env
frcst = rimport("forecast")
args = (('y', ri.FloatSexpVector(y)),
('order', ri.FloatSexpVector(order)),
('include.constant', ri.StrSexpVector(const)))
return frcst['Arima'].rcall(args, ri.globalenv)
Keeping the rest of the pseudo code the same. Note that I have since optimized the code further, and it does not require all the functions presented in the question. Basically, the following is necessary and sufficient:
import numpy as np
import pandas as pd
from rpy2 import rinterface as ri
ri.initr()
def arima(y, order=(1,1,1)):
# This is the same as arima_select above, just renamed to arima
...
def applyParallel(groups, func):
from multiprocessing import Pool, cpu_count
with Pool(cpu_count(), worker_init) as p:
ret_list = p.map(func, [group for _, group in groups])
return pd.concat(ret_list, keys=[name for name, _ in groups])
def main():
# Create your df in your favorite way:
def data_gen(start_day):
r = pd.Series(pd.date_range('2016-09-{}'.format(str(start_day)),
periods=24*60, freq='T'),
name='tstamp')
d = pd.Series(np.random.randint(10, 80, 1440), name='val')
s = pd.Series(['sensor1']*1440, name='sensor')
return pd.concat([s, r, d], axis=1)
df = pd.concat([data_gen(day) for day in range(1,8)], ignore_index=True)
applyParallel(df.groupby(['sensor', pd.Grouper(key='tstamp', freq='D')]),
arima) # Note one may use partial from functools to pass order to arima
Note that I also do not call arima directly from applyParallel since my goal is to find the best model for the given series (for a sensor and day). I use a function arima_wrapper to iterate through the order combinations, and call arima at each iteration.

share variable (data from file) among multiple python scripts with not loaded duplicates

I would like to load a big matrix contained in the matrix_file.mtx. This load must be made once. Once the variable matrix is loaded into the memory, I would like many python scripts to share it with not duplicates in order to have a memory efficient multiscript program in bash (or python itself). I can imagine some pseudocode like this:
# Loading and sharing script:
import share
matrix = open("matrix_file.mtx","r")
share.send_to_shared_ram(matrix, as_variable('matrix'))
# Shared matrix variable processing script_1
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>
# Shared matrix variable processing script_2
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>
...
The idea is pointer_to_matrix to point to matrix in RAM, which is only once loaded by the n scripts (not n times). They are separately called from a bash script (or if possible form a python main):
$ python Load_and_share.py
$ python script_1.py -args string &
$ python script_2.py -args string &
$ ...
$ python script_n.py -args string &
I'd also be interested in solutions via hard disk, i.e. matrix could be stored at disk while the share object access to it as being required. Nonetheless, the object (a kind of pointer) in RAM can be seen as the whole matrix.
Thank you for your help.
Between the mmap module and numpy.frombuffer, this is fairly easy:
import mmap
import numpy as np
with open("matrix_file.mtx","rb") as matfile:
mm = mmap.mmap(matfile.fileno(), 0, access=mmap.ACCESS_READ)
# Optionally, on UNIX-like systems in Py3.3+, add:
# os.posix_fadvise(matfile.fileno(), 0, len(mm), os.POSIX_FADV_WILLNEED)
# to trigger background read in of the file to the system cache,
# minimizing page faults when you use it
matrix = np.frombuffer(mm, np.uint8)
Each process would perform this work separately, and get a read only view of the same memory. You'd change the dtype to something other than uint8 as needed. Switching to ACCESS_WRITE would allow modifications to shared data, though it would require synchronization and possibly explicit calls to mm.flush to actually ensure the data was reflected in other processes.
A more complex solution that follows your initial design more closely might be to uses multiprocessing.SyncManager to create a connectable shared "server" for data, allowing a single common store of data to be registered with the manager and returned to as many users as desired; creating an Array (based on ctypes types) with the correct type on the manager, then register-ing a function that returns the same shared Array to all callers would work too (each caller would then convert the returned Array via numpy.frombuffer as before). It's much more involved (it would be easier to have a single Python process initialize an Array, then launch Processes that would share it automatically thanks to fork semantics), but it's the closest to the concept you describe.

How to return multiple values using scipy ndimage.generic_filter in Python?

I'm looking for a way to output multiple values using the generic_filter module in scipy.ndimage like so:
import numpy as np
from scipy import ndimage
a = np.array([range(1,5),range(5,9),range(9,13),range(13,17)])
def summary(a):
minVal = np.min(a)
maxVal = np.max(a)
return [minVal,maxVal]
[arrMin, arrMax] = ndimage.generic_filter(a, summary, footprint=np.ones((3,3)))
But I keep getting the error that a float is expected.
I've played with the 'output' parameter, like so:
arrMin = np.zeros(np.shape(a))
arrMax = np.zeros(np.shape(a))
ndimage.generic_filter(a, summary, footprint=np.ones((3,3)), output = [arrMin, arrMax])
to no avail. I've also tried returning a named tuple, a class, or a dictionary, as per this question none of which have worked.
Based on the comments, you want to perform multiple filters simultaneously rather than performing them separately.
Unfortunately I do not think this filter works that way. It expects you to return a single filtered output value for each corresponding input value. I looked for a way to do simultaneous filters with numpy/scipy but couldn't find anything.
If you can manage a data flow that allows you to load the image, filter, process and produce some small result data in separate parallel paths (one for each filter), then you may get some benefit from using multiprocessing but if you use it naively it's likely to take more time than doing everything sequentially. If you really have a bottleneck that multiprocessing solves you should also look into sharing your input array rather than loading it in each process.

What is the correct way to clean up when using PyOpenAL?

I'm looking at PyOpenAL for some sound needs with Python (obviously). Documentation is sparse (consisting of a demo script, which doesn't work unmodified) but as far as I can tell, there are two layers. Direct wrapping of OpenAL calls and a lightweight 'pythonic' wrapper - it is the latter I'm concerned with. Specifically, how do you clean up correctly? If we take a small example:
import time
import pyopenal
pyopenal.init(None)
l = pyopenal.Listener(22050)
b = pyopenal.WaveBuffer("somefile.wav")
s = pyopenal.Source()
s.buffer = b
s.looping = False
s.play()
while s.get_state() == pyopenal.AL_PLAYING:
time.sleep(1)
pyopenal.quit()
As it is, a message is printed on to the terminal along the lines of "one source not deleted, one buffer not deleted". But I am assuming the we can't use the native OpenAL calls with these objects, so how do I clean up correctly?
EDIT:
I eventually just ditched pyopenal and wrote a small ctypes wrapper over OpenAL and alure (pyopenal exposes the straight OpenAL functions, but I kept getting SIGFPE). Still curious as to what I was supposed to do here.
#relese reference to l b and s
del l
del b
del s
#now the WaveBuffer and Source should be destroyed, so we could:
pyopenal.quit()
Probably de destructor of pyopenal calls quit() before exit so you dont need to call it yourself.

Categories

Resources