Here is my code for a simple multiprocessing task in python
from multiprocessing import Process
def myfunc(num):
tmp = num * num
print 'squared O/P will be ', tmp
return(tmp)
a = [ i**3 for i in range(5)] ## just defining a list
task = [Process(target = myfunc, args = (i,)) for i in a] ## creating processes
for each in task : each.start() # starting processes <------ problem line
for each in task : each.join() # waiting all to finish up
When I run this code, it hangs at certain point, so to identify it I ran it line by line in python shell and found that when I call 'each.start()' The shell pops out a dialogue box as:
" The program is still running , do you want to kill it? '
and I select 'yes' the shell closes.
When I replace Process with 'threading.Thread' the same code runs but with this nonsense output:
Squared Squared Squared Squared Squared 0 1491625
36496481
Is there any help in this regard ? thank in advance
To run my python codes I use Idlex IDE and I start it from terminal.
I have Intel Xeon Processor with 4 cores / 8 Threads, and 8GB RAM
With a little thought I finally found the problem.
This is happening because in Python, the float and int objects are not 'thread-safe', meaning the memory allocated to calculate any function's value by one thread/process can be overwritten by another and hence they show absurd values. This is called a race condition.
To solve this problem, use deque() from the collections module or, even better, use the 'Lock' facility. deque() works with arrays but it's meant for arrays of the same kind (much like MATLAB arrays) and is thread/process safe. 'Lock' avoids race conditions.
So the edit would be :
def myfunc(num):
lock.acquire()
.......some code .....
.......some code......
lock.release()
That's all.
But one problem still persists and that is with the multiprocessing module. Even after calling 'lock', the problem mentioned in the question remains.
Save the code above into a .py file and then run it in a gnome-terminal with
python myfile.py
Where "myfile.py" is the filename you saved to.
I would assume that the IDE you are using is confused somehow by Process()
Related
I am currently trying to parallelize a part of an existing program, and have encountered a strange behaviour that causes my program to stop execution (it is still running, but does not make any further progress).
First, here is a minimal working example:
import multiprocessing
import torch
def foo(x):
print(x)
return torch.zeros(x, x)
size = 200
# first block
print("Run with single core")
res = []
for i in range(size):
res.append(foo(i))
# second block
print("Run with multiprocessing")
data = [i for i in range(size)]
with multiprocessing.Pool(processes=1) as pool:
res = pool.map(foo, data)
The problem is that the script stops running during multiprocessing, and reliably at x = 182. Up to this point, I could come up with some reasonable explanations, but now comes the strange part: If I run the code only in parallel (so only the second block of code), the script works perfectly fine. It also works if I first run the parallel version, and then the single-threaded code. Only at the moment where I first run the first and then the second block, the program gets stuck. This holds also true, if I first run the second block, then the first one, and then the second one again; in that case, the first multiprocessing block works fine, and it gets stuck, the second time I run the multiprocessing version.
The problem seems not to stem from a lack of memory, since I can increase size to much higher values (1000+) when only running the multiprocessing code. Additionally, I have no problem when I use np.zeros((x, x)) instead of torch.zeros. Removing the print function does not help either, so I kept it in for demonstration purposes.
This was also reproducible on other machines, stopping as well at x = 182 when running both blocks, but working fine when only running the second. Python and Pytorch versions were respectively (3.7.3, 1.7.1), (3.8.9, 1.8.1), and (3.8.5, 1.9.0). All systems were Ubuntu based.
Has someone an explanation for this behaviour or did I miss any parameters/options I need to set for this to work?
I am using multiprocessing to calculate a large mass of data; i.e. I periodically spawn a process so that the total number of processes is equal to the number of CPU's on my machine.
I periodically print out the progress of the entire calculation... but this is inconveniently interspersed with Python's welcome messages from each child!
To be clear, this is a Windows specific problem due to how multiprocessing is handled.
E.g.
> python -q my_script.py
Python Version: 3.7.7 on Windows
Then many subsequent duplicates of the same version message print; one for each child process.
How can I suppress these?
I understand that if you run Python on the command line with a -q flag, it suppresses the welcome message; though I don't know how to translate that into my script.
EDIT:
I tried to include the interpreter flag -q like so:
multiprocessing.set_executable(sys.executable + ' -q')
Yet to no avail. I receive a FileNotFoundError which tells me I cannot pass options this way due to how they check arguments.
Anyways, here is the relevant section of code (It's an entire function):
def _parallelize(self, buffer, func, cpus):
## Number of Parallel Processes ##
cpus_max = mp.cpu_count()
cpus = min(cpus_max, cpus) if cpus else int(0.75*cpus_max)
## Total Processes to-do ##
N = ceil(self.SampleLength / DATA_MAX) # Number of Child Processes
print("N: ", N)
q = mp.Queue() # Child Process results Queue
## Initialize each CPU w/ a Process ##
for p in range(min(cpus, N)):
mp.Process(target=func, args=(p, q)).start()
## Collect Validation & Start Remaining Processes ##
for p in tqdm(range(N)):
n, data = q.get() # Collects a Result
i = n * DATA_MAX # Shifts to Proper Interval
buffer[i:i + len(data)] = data # Writes to open HDF5 file
if p < N - cpus: # Starts a new Process
mp.Process(target=func, args=(p + cpus, q)).start()
SECOND EDIT:
I should probably mention that I'm doing everything within an anaconda environment.
The message is printed on interactive startup.
A spawned process does inherit some flags from the child process.
But looking at the code in multiprocessing it does not seem possible to change these parameters from within the program.
So the easiest way to get rid of the messages should be to add the -q option to the original python invocation that starts your program.
I have confirmed that the -q flag is inherited.
So that should suppress the message for the original process and the children that it spawns.
Edit:
If you look at the implementation of set_executable, you will see that you cannot add or change arguments that way. :-(
Edit2:
You wrote:
I'm doing everything within an anaconda environment.
You you mean a virtual environment, or some kind of fancy IDE like spyder?
If you ever have a Python problem, first try reproducing it in plain CPython, running from the command line. IDE's and fancy environments like anaconda sometimes do weird things when running Python.
I'm running Python 2.7 on the GCE platform to do calculations. The GCE instances boot, install various packages, copy 80 Gb of data from a storage bucket and runs a "workermaster.py" script with nohangup. The workermaster runs on an infinite loop which checks a task-queue bucket for tasks. When the task bucket isn't empty it picks a random file (task) and passes work to a calculation module. If there is nothing to do the workermaster sleeps for a number of seconds and checks the task-list again. The workermaster runs continuously until the instance is terminated (or something breaks!).
Currently this works quite well, but my problem is that my code only runs instances with a single CPU. If I want to scale up calculations I have to create many identical single-CPU instances and this means there is a large cost overhead for creating many 80 Gb disks and transferring the data to them each time, even though the calculation is only "reading" one small portion of the data for any particular calculation. I want to make everything more efficient and cost effective by making my workermaster capable of using multiple CPUs, but after reading many tutorials and other questions on SO I'm completely confused.
I thought I could just turn the important part of my workermaster code into a function, and then create a pool of processes that "call" it using the multiprocessing module. Once the workermaster loop is running on each CPU, the processes do not need to interact with each other or depend on each other in any way, they just happen to be running on the same instance. The workermaster prints out information about where it is in the calculation and I'm also confused about how it will be possible to tell the "print" statements from each process apart, but I guess that's a few steps from where I am now! My problems/confusion are that:
1) My workermaster "def" doesn't return any value because it just starts an infinite loop, where as every web example seems to have something in the format myresult = pool.map(.....); and
2) My workermaster "def" doesn't need any arguments/inputs - it just runs, whereas the examples of multiprocessing that I have seen on SO and on the Python Docs seem to have iterables.
In case it is important, the simplified version of the workermaster code is:
# module imports are here
# filepath definitions go here
def workermaster():
while True:
tasklist = cloudstoragefunctions.getbucketfiles('<my-task-queue-bucket')
if tasklist:
tasknumber = random.randint(2, len(tasklist))
assignedtask = tasklist[tasknumber]
print 'Assigned task is now: ' + assignedtask
subprocess.call('gsutil -q cp gs://<my-task-queue-bucket>/' + assignedtask + ' "' + taskfilepath + assignedtask + '"', shell=True)
tasktype = assignedtask.split('#')[0]
if tasktype == 'Calculation':
currentcalcid = assignedtask.split('#')[1]
currentfilenumber = assignedtask.split('#')[2].replace('part', '')
currentstartfile = assignedtask.split('#
currentendfile = assignedtask.split('#')[4].replace('.csv', '')
calcmodule.docalc(currentcalcid, currentfilenumber, currentstartfile, currentendfile)
elif tasktype == 'Analysis':
#set up and run analysis module, etc.
print ' Operation completed!'
os.remove(taskfilepath + assignedtask)
else:
print 'There are no tasks to be processed. Going to sleep...'
time.sleep(30)
Im trying to "call" the function multiple times using the multiprocessing module. I think I need to use the "pool" method, so I've tried this:
import multiprocessing
if __name__ == "__main__":
p = multiprocessing.Pool()
pool_output = p.map(workermaster, [])
My understanding from the docs is that the __name__ line is there only as a workaround for doing multiprocessing in Windows (which I am doing for development, but GCE is on Linux). The p = multiprocessing.Pool() line is creating a pool of workers equal to the number of system CPUs as no argument is specified. It the number of CPUs was 1 then I would expect the code to behave as it does before I attempted to use multiprocessing. The last line is the one that I don't understand. I thought that it was telling each of the processors in the pool that the "target" (thing to run) is workermaster. From the docs there appears to be a compulsory argument which is an iterable, but I don't really understand what this is in my case, as workermaster doesn't take any arguments. I've tried passing it an empty list, empty string, empty brackets (tuple?) and it doesn't do anything.
Please would it be possible for someone help me out? There are lots of discussions about using multiprocessing and this thread Mulitprocess Pools with different functions and this one python code with mulitprocessing only spawns one process each time seem to be close to what I am doing but still have iterables as arguments. If there is anything critical that I have left out please advise and I will modify my post - thank you to anyone who can help!
Pool() is useful if you want to run the same function with different argumetns.
If you want to run function only once then use normal Process().
If you want to run the same function 2 times then you can manually create 2 Process().
If you want to use Pool() to run function 2 times then add list with 2 arguments (even if you don't need arguments) because it is information for Pool() to run it 2 times.
But if you run function 2 times with the same folder then it may run 2 times the same task. if you will run 5 times then it may run 5 times the same task. I don't know if it is needed.
As for Ctrl+C I found on Stackoverflow Catch Ctrl+C / SIGINT and exit multiprocesses gracefully in python but I don't know if it resolves your problem.
I have the following situation, where I create a pool in a for loop as follows (I know it's not very elegant, but I have to do this for pickling reasons). Assume that the pathos.multiprocessing is equivalent to python's multiprocessing library (as it is up to some details, that are not relevant for this problem).
I have the following code I want to execute:
self.pool = pathos.multiprocessing.ProcessingPool(number_processes)
for i in range(5):
all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))
pool._clear()
Now my problem: The loop successfully runs the first iteration. However, at the second iteration, the algorithm suddenly stops (Does not finish the pool.map operation. I suspected that zombie processes are generated, or that the process was somehow switched. Below you will find everything I have tried so far.
for i in range(5):
pool = pathos.multiprocessing.ProcessingPool(number_processes)
all_responses = self.pool.map(wrapper_singlerun, range(self.no_of_restarts))
pool._clear()
gc.collect()
for p in multiprocessing.active_children():
p.terminate()
gc.collect()
print("We have so many active children: ", multiprocessing.active_children()) # Returns []
The above code works perfectly well on my mac. However, when I upload it on the cluster that has the following specs, I get the error that it gets stuck after the first iteration:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04 LTS"
This is the link to the pathos' multiprocessing library file is
I am assuming that you are trying to call this via some function which is not the correct way to use this.
You need to wrap it around with :
if __name__ == '__main__':
for i in range(5):
pool = pathos.multiprocessing.Pool(number_processes)
all_responses = pool.map(wrapper_singlerun,
range(self.no_of_restarts))
If you don't do it will keep on creating a copy of itself and will start putting it into stack which will ultimately fill the stack and block everything. The reason it works on mac is that it has fork while windows does not have it.
I am trying to use the python multiprocessing library in order to parallize a task I am working on:
import multiprocessing as MP
def myFunction((x,y,z)):
...create a sqlite3 database specific to x,y,z
...write to the database (one DB per process)
y = 'somestring'
z = <large read-only global dictionary to be shared>
jobs = []
for x in X:
jobs.append((x,y,z,))
pool = MP.Pool(processes=16)
pool.map(myFunction,jobs)
pool.close()
pool.join()
Sixteen processes are started as seen in htop, however no errors are returned, no files written, no CPU is used.
Could it happen that there is an error in myFunction that is not reported to STDOUT and blocks execution?
Perhaps it is relevant that the python script is called from a bash script running in background.
The lesson learned here was to follow the strategy suggested in one of the comments and use multiprocessing.dummy until everything works.
At least in my case, errors were not visible otherwise and the processes were still running as if nothing had happened.