In python, I'm writing a script that runs an external process. This external process does the following steps:
Fetch a value from a config file, taking into account other running
processes.
Run another process, using the value from step 1.
Step 1 can be bypassed by passing in a value to use. Trying to use the same value concurrently is an error, but using it sequentially is valid. (think of it as a pool of pids, with no more than 10 available) Other processes (e.g. a user logging in) can use one of these "pids".
The external process takes a few hours to run, and multiple independent copies must be run. Running them sequentially works, but takes too long.
I'm changing the script to run these processes concurrently using the multiprocessing module. A simplified version of my code is:
from multiprocessing import Pool
import subprocess
def longRunningTask(n):
subprocess.call(["ls", "-l"]) # real code uses a process with no screen I/O
if __name__ == '__main__':
myArray = [1,2,3,4,5]
pool = Pool(processes=3)
pool.map(longRunningTask, myArray)
Using this code fails, because it uses the same "pid" for every process started.
The solutions I've come up with are:
If the call fails, have a random delay and try again. This could end
up busy waiting for hours if enough "pids" are in use.
Create a Queue of the available "pids", get() an item from it before starting the process, and put() it when it completes. This would still need to wait if the "pid" was in use, the same as number 1.
Use a Manager to hold an array of "pids" that are in use (starting empty). Before starting the process, get a "pid", check if it's in the array (start again if it is), add it to the array, remove it when done.
Are there problems with approach 3, or is there a different way to do it?
Related
How to make cmds.duplicate execute immediately when called in maya? Instead of waiting for the entire script to run and then executing it in batches. For example, for this script below, all execution results will appear immediately after the entire script is executed
import time
for i in range(1, 6):
pm.select("pSphere{}".format(i))
time.sleep(0.5)
cmds.duplicate()
I have tried to use python multithreading, like this
import threading
import time
def test():
for i in range(50):
cmds.duplicate('pSphere1')
time.sleep(0.1)
thread = threading.Thread(target=test)
thread.start()
#thread.join()
Sometimes it can success, but sometimes it will crash maya. If the main thread join, it will not achieve the effect. When I want to do a large number of cmds.duplicate, it will resulting in a very high memory consumption, and the program runs more and more slowly. In addition, all duplicate results appear together after the entire python script runs, so I suspect that when I call cmds When duplicating, Maya did not finish executing and outputting the command, but temporarily put the results in a container with variable capacity. With the increase of my calls, the process of dynamic expansion of the container causes the program to become slower and slower, and the memory consumption also increase dramatically. Because I saw that other plug-ins can see the command execution results in real time, so I thought that this should be a proper way to do this just thath I haven't found yet
Your assumptions are not correct. Maya does not need to display anything to complete a tool. If you want to see the results inbetween you can try to use:
pm.refresh()
but this will not change the behaviour in general. I suppose your memory problems have a different source. You could check if it helps to turn off history or the undo queue temporarily.
And of course Ennakard is right with the answer, that most maya commands are not thread save unless mentioned in the docs. Every node creation and modificatons have to be done in the main thread.
The simple answer is you don't, maya command in general and most interaction with maya are not thread safe
threading is usually used for data manipulation before it get used to manipulate anything in maya, but once you start creating node or setting attribute, or any maya modification, no threading.
The problem:
When sending 1000 tasks to apply_async, they run in parallel on all 48 CPUs, but then sometimes fewer and fewer CPUs run, until only one CPU left is running, and only when the last one finishes its task, then all the CPUs continue running again each with a new task. It shouldn't need to wait for any "task batch" like this..
My (simplified) code:
from multiprocessing import Pool
pool = Pool(47)
tasks = [pool.apply_async(json2features, (j,)) for j in jsons]
feats = [t.get() for t in tasks]
jsons = [...] is a list of about 1000 JSONs already loaded to memory and parsed to objects.
json2features(json) does some CPU-heavy work on a json, and returns an array of numbers.
This function may take between 1 second and 15 minutes to run, and because of this I sort the jsons using a heuristic, s.t. hopefully the longest tasks are first in the list, and thus start first.
The json2features function also prints when a task is finished and how long it took. It all runs on an ubuntu server with 48 cores and like I said above, it starts out great, using all 47 cores. Then as the tasks get completed, fewer and fewer cores run, which at first sounds perfectly ok, where it not because after the last core is finished (when I see its print to stdout), all CPUs start running again on new tasks, meaning it wasn't really the end of the list. It may do the same thing again, and then again for the actual end of the list.
Sometimes it can be using just one core for 5 minutes, and when the task is finally done, it starts using all cores again, on new tasks. (So it's not stuck on some IPC overhead)
There are no repeated jsons, nor any dependencies between them (it's all static, fresh-from-disk data, no references etc..), nor any dependency between json2features calls (no global state or anything) except for them using the same terminal for their print.
I was suspicious that the problem was that a worker doesn't get released until get is called on its result, so I tried the following code:
from multiprocessing import Pool
pool = Pool(47)
tasks = [pool.apply_async(print, (i,)) for i in range(1000)]
# feats = [t.get() for t in tasks]
And it does print all 1000 numbers, even though get isn't called.
I have ran out of ideas right now what the problem might be.
Is this really the normal behavior of Pool?
Thanks a lot!
The multiprocessing.Pool relies on a single os.pipe to deliver the tasks to the workers.
Usually on Unix, the default pipe size range from 4 to 64 Kb in size. If the JSONs you are delivering are large in size, you might get the pipe clogged at any given point in time.
This means that, while one of the workers is busy reading the large JSON from the pipe, all the other workers will starve.
It is generally a bad practice to share large data via IPC as it leads to bad performance. This is even underlined in the multiprocessing programming guidelines.
Avoid shared state
As far as possible one should try to avoid shifting large amounts of data between processes.
Instead of reading the JSON files in the main process, just send the workers their file names and let them open and read the content. You will surely notice an improvement in performance because you are moving the JSON loading phase in the concurrent domain as well.
Note that the same is true also for the results. A single os.pipe is used to return the results to the main process as well. If one or more workers clog the results pipe then you will get all the processes waiting for the main one to drain it. Large results should be written to files as well. You can then leverage multithreading on the main process to quickly read back the results from the files.
I have some python code which performs a non-collision task, no race conditions can occur as a result of this parallelism. I'm merely attempting to increase the speed of processing, I've 4 files, and rather than reading each of them one at a time I'd like to open all four files and read/edit data from them simultaneously.
I've read a few questions on here detailing that multi-threading in python isn't possible due to the Global Interpreter Lock, but that multiprocessing gets around this. For the record my code does exactly what it's meant to when I just run it four times from the terminal in separate terminals - I'm guessing this is "multiprocessing", however I'd like a cleaner programmatic solution.
The data-sets are large, so it can be assumed that as soon as a "process" is given to the interpreter, it is essentially locked for a large time period :
Example:
import multiprocessing
def worker():
while true:
#do some stuff
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
This gives me the issue in that, the above will execute the first process, and then never realistically start the second until execution is finished on the first, making them run one after the other.
Is there a way I can effectively execute worker X amount of times by either starting them at the same time, or preventing them from running until they all start? - My OS has access to 8 cores
New to multiprocessing in python, consider that you have the following function:
def do_something_parallel(self):
result_operation1 = doit.main(A,B)
do_something_else(C)
Now the point is that I want the doit.main to run in another process and to be non blocking, so the code in do_something_else will run immediately after the first has been launched in another process.
How can I do it using python subprocess module?
Is there a difference between subprocessing and creating new process aside to another one, why would we need a child processes of other process?
Note: I do not want to use multithreaded approach here..
EDIT: I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.
If you want to run a Python code in a separate process, you could use multiprocessing module:
import multiprocessing
if __name__ == "__main__":
multiprocessing.Process(target=doit.main, args=[A, B]).start()
do_something_else() # this runs immmediately without waiting for main() to return
I wondered whether using a subprocess module and multiprocess module in the same function is prohibited?
No. You can use both subprocess and multiprocessing in the same function (moreover, multiprocessing may use subprocess to start its worker processes internally).
Reason I want this is that I have two things to run: first an exe file, and second a function, each needs it own process.
You don't need multprocessing to run an external command without blocking (obviously, in its own process); subprocess.Popen() is enough:
import subprocess
p = subprocess.Popen(['command', 'arg 1', 'arg 2'])
do_something_else() # this runs immediately without waiting for command to exit
p.wait() # this waits for the command to finish
Subprocess.Popen is definitely what you want if the "worker" process is an executable. Threading is what you need when you need things to happen asynchronously, and multiprocessing is what you need if you want to take advantage of multiple cores for the improved performance (although you will likely find yourself also using threads at the same time as they handle asynchronous output of multiple parallel processes).
The main limitation of multiprocessing is passing information. When a new process is spawned, an entire separate instance of the python interpreter is started with it's own independent memory allocation. The result of this is variables changed by one process won't be changed for other processes. For this functionality you need shared memory objects (also provided by multiprocessing module). One implementation I have done was a parent process that started several worker processes and passed them both an input queue, and an output queue. The function given to the child processes was a loop designed to do some calculations on the inputs pulled from the input queue and then spit them out to the output queue. I then designated a special input that the child would recognize to end the loop and terminate the process.
On your edit - Popen will start the other process in parallel, as will multiprocessing. If you need the child process to communicate with the executable, be sure to pass the file stream handles to the child process somehow.
I have a function that I would like to be evaluated across multiple nodes in a cluster. I've gotten simple examples to run on our cluster using MPI4py, but was hoping to find a python package that makes things a little more user friendly (like implementing the map feature of multiprocessing) but also has a little more control over how many processes get spawned and on which of the nodes. I've seen a few packages that implement map but not any that control how many processes are spawned on each node.
The following code gets close to illustrating what I mean. However, instead of writing it in the typical way you would with MPI4py I've written it like you would with the map function. I wrote it this way because this is ultimately how I'd like to implement the code (with a module that emulates map) and because I'm not quite sure how'd I'd write it using MPI to achieve what I want to do.
from numpy import *
from multiprocessing import Pool
def foo(n):
random.seed(n)
a = random.randn(1000,1000)
b = random.randn(1000,1000)
c = dot(a, b)
return c.mean()
if __name__ == '__main__':
pool = Pool(processes=4)
results = pool.map(foo, range(4))
print results
The reason why I want to control the number of processes sent to each node is that some of the instructions inside of foo can be multithreaded (like dot which would also be linked to the MKL libraries).
If I have a cluster of 12 computers with 2 cores each, I'd like to just send out one job to each of the 12 nodes, where it would implicitly take advantage of both cores. I don't want to spawn 24 jobs (one for each core) because I'm worried about possible thread-thrashing when both processes try to use both cores. I also can't just spawn 12 processes because I can't be certain it will send one to each node and not 2 to the first 6 nodes.
First off, should this be a major concern? How much of an effect would running 24 processes instead of 12 have on performance?
If it will make a difference, is there a python package that will overlay on top of MPI4py and do what I'm looking for?
I wanted the same thing, so I wrote up a proof of concept that keeps track of how many worker processes are idle on each host. If you have a job that will use two threads, then it waits until a host has two idle workers, assigns the job to one of those workers, and keeps the other worker idle until the job is finished.
To specify how many processes to launch on each host, you use a hostfile.
The key is for the root process to receive messages from any other process:
source_host, worker_rank, result = MPI.COMM_WORLD.recv(source=MPI.ANY_SOURCE)
That way, it finds out as soon as each job is finished. Then when it's ready, it sends the job to a specific worker:
comm.send(row, dest=worker_rank)
At the end, it tells all the workers to shut down by sending a None message:
comm.send(None, dest=worker_rank)
After I wrote this, I found jbornschein's mpi4py task pull example. It doesn't handle the thread counts for each job, but I like the way it uses tags for different message types.