as someone who explores the new and mighty world of Python, I am running into an understanding problem for my coding and it would be great if someone could help me on this one.
To make my problem simple I have made an example.
Lets say, I have two functions, running via multiprocessing simultaneously. One is a permanent data listener and one prints the value of it out. In addition I have one object which owns the data, data is set via set/get. So the challenge is how both function can access the data without putting it to global. I guess my lack of understanding is somewhere in how to transfer the object between functions.
NOTE : both functions do not need to be in sync and the while is just for endless loop. It just how to bring the data over.
This gives a code like (I know it is not working, just to get the idea) :
import multiprocessing
#simply a data object
class data(object):
def __init__(self):
self.__value = 1
def set_value(self, value):
self.__value = value
def get_value(self):
return self.__value
# Data listener
def f1(count):
zae = 0
while True:
zae += 1
count.set_value = zae
def f2(count):
while True:
print (count.get_value)
#MainPart
if __name__ == '__main__':
print('start')
count = data()
jobs = []
p1 = multiprocessing.Process(target =f1(count))
p2 = multiprocessing.Process(target =f2(count))
jobs.append(p1)
jobs.append(p2)
p1.start()
p2.start()
print ('end')
Please enlight me,
regards
AdrianMonk
This looks like a neat case for using memory-mapped files.
When a process memory-maps a file (say F) and another process comes along and maps the same file (i.e maps to F.fileno() too), then exactly the same block of memory is mapped into the second process's address space. This allows the two processes to exchange information extremely rapidly by writing into the shared memory. .
Of course you have to manage the proper access (read,write,etc) in your mappings, and then it is just a matter of properly polling/writing the proper locations in the file to satisfy the logic of your application
(see http://docs.python.org/2/library/mmap.html).
The communication channels Pipe or Queue from multiprocessing are designed to solve exactly this kind of problem
Related
I'm using praw to get new submissions from reddit:
for submission in submissions:
print('Submission being evaluated:', submission.id)
p = Process(target = evaluate, args = (submission.id, lock))
p.start()
When using this code I sometimes get ids that link to older submissions.
So I changed my script to check if the submissions are new:
for submission in submissions:
if ((time.time()-submission.created) < 15): #if submission is new
lock.acquire()
print('Submission being evaluated:', submission.id)
lock.release()
p = Process(target = evaluate, args = (submission.id, lock))
p.start()
else:
lock.acquire()
print("Submission "+submission.id+" was older than 15 seconds")
lock.release()
But for an extended period of time the else part didn't get executed even though I got a fair amount of old submission ids with the previous script.
So my question is, when I run print(submission.id) is it running in the background when the subprocess is created, maybe causing a problem and changing the value of submission.id or is it just some coincidence that with the second script I got no old submissions?
Thanks in advance!
To answer the question in the title, no.
sys.stdout, the stream print writes to, is usually line buffered though (though that shouldn't matter in this case, as print writes a newline character (unless told not to)), and is shared between threads and subprocesses (unless explicitly unshared).
Without knowing more about the code around this, it's hard to say more. (Who knows, maybe you have a background thread somewhere in there that sneakily changes submission.id?)
EDIT:
The new information in the original post, namely that
print('Submission being evaluated:', submission.id)
is being printed, not
print(submission.id)
is critical.
Each argument of a print() call is printed atomically, but if two processes or threads are print()ing simultaneously, let's say print('a', 'b'), it's entirely possible that you get a a b b instead of a b a b.
Here is a function I like to use for safely printing to the console. That is the proper use of Lock(), use it around very simple operations. I actually use it in a class, so I dont pass around the lock object as in the below example, but same principle.
Also, the answer is likely yes, but it's more uncertain than certain. Are you also using a lock everytime you read and write submission.id? Generally, if you have an object shared by multiple processes, its best to do this, and, also best to use the Value class from the multiprocessing library, since value objects are designed to be safely shared between processes. Below is a trivial but clear example without processes (thats your job!).
from multiprocessing import Process, Lock, #...
myLock = Lock()
myID = ""
def SetID(id, lock):
with lock:
id = "set with lock"
return
def SafePrint(msg, lock):
lock.acquire()
print(msg)
lock.release()
return
SetID(myID, myLock)
SafePrint(myID, myLock)
I am fairly new to Python, and my experience is specific to its use in Powerflow modelling through the API provided in Siemens PSS/e. I have a script that I have been using for several years that runs some simulation on a large data set.
In order to get to finish quickly, I usually split the inputs up into multiple parts, then run multiple instances of the script in IDLE. Ive recently added a GUI for the inputs, and have refined the code to be more object oriented, creating a class that the GUI passes the inputs to but then works as the original script did.
My question is how do I go about splitting the process from within the program itself rather than making multiple copies? I have read a bit about the mutliprocess module but I am not sure how to apply it to my situation. Essentially I want to be able to instantiate N number of the same object, each running in parallel.
The class I have now (called Bot) is passed a set of arguments and creates a csv output while it runs until it finishes. I have a separate block of code that puts the pieces together at the end but for now I just need to understand the best approach to kicking multiple Bot objects off once I hit run in my GUI. Ther are inputs in the GUI to specify the number of N segments to be used.
I apologize ahead of time if my question is rather vague. Thanks for any information at all as Im sort of stuck and dont know where to look for better answers.
Create a list of configurations:
configurations = [...]
Create a function which takes the relevant configuration, and makes use of your Bot:
def function(configuration):
bot = Bot(configuration)
bot.create_csv()
Create a Pool of workers with however many CPUs you want to use:
from multiprocessing import Pool
pool = Pool(3)
Call the function multiple times which each configuration in your list of configurations.
pool.map(function, configurations)
For example:
from multiprocessing import Pool
import os
class Bot:
def __init__(self, inputs):
self.inputs = inputs
def create_csv(self):
pid = os.getpid()
print('TODO: create csv in process {} using {}'
.format(pid, self.inputs))
def use_bot(inputs):
bot = Bot(inputs)
bot.create_csv()
def main():
configurations = [
['input1_1.txt', 'input1_2.txt'],
['input2_1.txt', 'input2_2.txt'],
['input3_1.txt', 'input3_2.txt']]
pool = Pool(2)
pool.map(use_bot, configurations)
if __name__ == '__main__':
main()
Output:
TODO: create csv in process 10964 using ['input2_1.txt', 'input2_2.txt']
TODO: create csv in process 8616 using ['input1_1.txt', 'input1_2.txt']
TODO: create csv in process 8616 using ['input3_1.txt', 'input3_2.txt']
If you'd like to make life a little less complicated, you can use multiprocess instead of multiprocessing, as there is better support for classes and also for working in the interpreter. You can see below, we can now work directly with a method on a class instance, which is not possible with multiprocessing.
>>> from multiprocess import Pool
>>> import os
>>>
>>> class Bot(object):
... def __init__(self, x):
... self.x = x
... def doit(self, y):
... pid = os.getpid()
... return (pid, self.x + y)
...
>>> p = Pool()
>>> b = Bot(5)
>>> results = p.imap(b.doit, range(4))
>>> print dict(results)
{46552: 7, 46553: 8, 46550: 5, 46551: 6}
>>> p.close()
>>> p.join()
Above, I'm using imap, to get an iterator on the results, which I'll just dump into a dict. Note that you should close your pools after you are done, to clean up. On Windows, you may also want to look at freeze_support, for cases where the code otherwise fails to run.
>>> import multiprocess as mp
>>> mp.freeze_support
For example, if a process is generating an image, and other parallel process is accessing this image through a get method, my intuition tells me that it may be dangerous to access that image while it is being written.
In C++ I have to use mutexes to make sure the image isn't accessed while it is being written, otherwise I'm experiencing random segfaults. but since python has some data protection mechanisms that I don't fully know, I'm not sure if I need to do this.
PSEUDO-CODE:
Class Camera(object):
def __init__(self):
self._capture = camera_handler() #camera_handler is a object that loads the driver and lets you control the camera.
self.new_image = None
self._is_to_run = False
def start(self):
self._is_to_run = True
self._thread = thread_object(target=self.run)
self._thread.start()
def run(self):
while(self._is_to_run):
self.new_image = self._capture.update()
cam = Camera()
cam.start()
while True:
image = cam.new_image
result = do_some_process_image(image)
Is this safe?
First of al, the threading module uses threads, not different processes!
The crucial difference between threads and processes is that the former share an address space (memory), while the latter don't.
The "standard" python implementation (CPython) uses a Global Interpreter Lock to ensure that only one thread at a time can be executing Python bytecode. So for data that can be updated with one one bytecode instruction (like store_fast) you might not need mutexes. When a thread that is modifying such a variable is interrupted, either the store has been done or it hasn't.
But in general you definitely need to protect data structures from reading and modification by multiple threads. If a thread is interrupted while it is in the proces of modifying say a large dictionary and execution is passed to another thread that tries to read from the dictionary, it might find the data in an inconsistant state.
Python shouldn't segfault in situations like this - the global intepreter lock is your friend. However, even in your example there's every chance that a camera interface is going to go into some random C library that doesn't necessarily behave itself. Even then, it doesn't prevent all race conditions in your code and you could easily find inconsistent data because of that.
Python does have Lock which is very low-level and doesn't provide much functionality. Condition is a higher-level type that is better for implementing a mutex-like lock:
# Consume one item
with cv:
while not an_item_is_available():
cv.wait()
get_an_available_item()
# Produce one item
with cv:
make_an_item_available()
cv.notify()
Incidentally, there was a mutex in Python 2, which was deprecated in 2.6 and removed in Python 3.
I think what you are looking for is is the lock Object -> https://docs.python.org/2/library/threading.html#lock-objects
A primitive lock is a synchronization primitive that is not owned by a
particular thread when locked. In Python, it is currently the lowest
level synchronization primitive available, implemented directly by the
thread extension module.
In your example, I would encapsulate the access to the image in a function like this
def image_access(self, image_Data = None):
lock = Lock()
lock.acquire()
temp = self.new_image
try:
if image_Data not None:
self.new_image = image_Data
finally:
lock.release()
if image_Data is None:
return temp
For more on Thread synchronization, see -> http://effbot.org/zone/thread-synchronization.htm
Edit:
Here are the cahnges to the ohter functions
def run(self):
while(self._is_to_run):
self.image_access(self._capture.update())
...
while True:
result = do_some_process_image(cam.image_access())
I have an object that connects to a websocket remote server. I need to do a parallel process at the same time. However, I don't want to create a new connection to the server. Since threads are the easier way to do this, this is what I have been using so far. However, I have been getting a huge latency because of GIL. Can I achieve the same thing as threads but with multiprocesses in parallel?
This is the code that I have:
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
thread.start_new_thread( self.read_commands,() )
except:
print "Error: Unable to start thread"
Is there an equivalent way to do this with multiprocesses?
Thanks!
The direct equivalent is
import multiprocessing
class WebSocketApp(object):
def on_open(self):
# Create another process to make sure the commands are always been read
print "Creating process..."
try:
multiprocessing.Process(target=self.read_commands,).start()
except:
print "Error: Unable to start process"
However, this doesn't address the "shared memory" aspect, which has to be handled a little differently than it is with threads, where you can just use global variables. You haven't really specified what objects you need to share between processes, so it's hard to say exactly what approach you should take. The multiprocessing documentation does cover ways to deal with shared state, however. Do note that in general it's better to avoid shared state if possible, and just explicitly pass state between the processes, either as an argument to the Process constructor or via a something like a Queue.
You sure can, use something along the lines of:
from multiprocessing import Process
class WebSocketApp(object):
def on_open(self):
# Create another thread to make sure the commands are always been read
print "Creating thread..."
try:
p = Process(target = WebSocketApp.read_commands, args = (self, )) # Add other arguments to this tuple
p.start()
except:
print "Error: Unable to start thread"
It is important to note, however, that as soon as the object is sent to the other process the two objects self and self in the different threads diverge and represent different objects. If you wish to communicate you will need to use something like the included Queue or Pipe in the multiprocessing module.
You may need to keep a reference of all the processes (p in this case) in your main thread in order to be able to communicate that your program is terminating (As a still-running child process will appear to hang the parent when it dies), but that depends on the nature of your program.
If you wish to keep the object the same, you can do one of a few things:
Make all of your object properties either single values or arrays and then do something similar to this:
from multiprocessing import Process, Value, Array
class WebSocketApp(object):
def __init__(self):
self.my_value = Value('d', 0.3)
self.my_array = Array('i', [4 10 4])
# -- Snip --
And then these values should work as shared memory. The types are very restrictive though (You must specify their types)
A different answer is to use a manager:
from multiprocessing import Process, Manager
class WebSocketApp(object):
def __init__(self):
self.my_manager = Manager()
self.my_list = self.my_manager.list()
self.my_dict = self.my_manager.dict()
# -- Snip --
And then self.my_list and self.my_dict act as a shared-memory list and dictionary respectively.
However, the types for both of these approaches can be restrictive so you may have to roll your own technique with a Queue and a Semaphore. But it depends what you're doing.
Check out the multiprocessing documentation for more information.
I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)
You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))
Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.