How am I using the multiprocessing (python) module wrong? - python

Can someone help me figure out why the following code won't run properly? I want to spawn new processes as the previous ones finish but running this code automatically runs everything, i.e. all the jobs report finished and stopped when they arent, and their windows are open as well. Any thoughts on why is_alive() returns false when it is actually true?
import subprocess
import sys
import multiprocessing
import time
start_on = 33 #'!'
end_on = 34
num_processors = 4;
jobs = []
def createInstance():
global start_on, end_on, jobs
cmd = "python scrape.py" + " " + str(start_on) + " " + str(end_on)
print cmd
p = multiprocessing.Process(target=processCreator(cmd))
jobs.append(p)
p.start()
start_on += 1
end_on += 1
print "length of jobs is: " + str(len(jobs))
def processCreator(cmd):
subprocess.Popen(cmd, creationflags=subprocess.CREATE_NEW_CONSOLE)
if __name__ == '__main__':
num_processors = input("How many instances to run simultaneously?: ")
for i in range(num_processors):
createInstance()
while len(jobs) > 0:
jobs = [job for job in jobs if job.is_alive()]
for i in range(num_processors - len(jobs)):
createInstance()
time.sleep(1)
print('*** All jobs finished ***')

Your code is spawning 2 processes on each createInstance() call, I think that's messing the is_alive() call.
p = multiprocessing.Process(target=processCreator(cmd))
This will spawn 1 process to run processCreator(cmd). Then, subprocess.Popen(cmd, creationflags=subprocess.CREATE_NEW_CONSOLE) will spawn a child process to run the command. This subprocess will return immediately, so the parent process.
I think this version will work, removing the usage of multiprocess. I also have changed the cmd definition(see docs):
import subprocess
import sys
import time
start_on = 33 #'!'
end_on = 34
num_processors = 4;
jobs = []
def createInstance():
global start_on, end_on, jobs
cmd = ["python","scrape.py", str(start_on), str(end_on)]
print(str(cmd))
p = subprocess.Popen(cmd, creationflags=subprocess.CREATE_NEW_CONSOLE)
jobs.append(p)
p.start()
start_on += 1
end_on += 1
print "length of jobs is: " + str(len(jobs))
if __name__ == '__main__':
num_processors = input("How many instances to run simultaneously?: ")
for i in range(num_processors):
createInstance()
while len(jobs) > 0:
jobs = [job for job in jobs if job.poll() is None]
for i in range(num_processors - len(jobs)):
createInstance()
time.sleep(1)
print('*** All jobs finished ***')

Related

subprocess.wait(timeout=15) is not working

Code:
import os
import subprocess
execs = ['C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof2.exe', # --> Child 1
'C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof5.exe', # --> Child 2
'C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof10.exe',...] # --> Child 3 and more
print('Parent Process id : ', os.getpid())
process = [subprocess.Popen(exe) for exe in execs]
for proc in process:
try:
proc.wait(timeout=15)
print('Child Process id : ', proc.pid)
if proc.returncode == 0:
print(proc.pid, 'Exited')
except subprocess.TimeoutExpired:
proc.terminate()
print('Child Process with pid',proc.pid ,'is killed')
Some Child Processes will take more than 15 sec to execute. So, I have to kill the process which timeout. But proc.wait(timeout=15) is not raising an exception instead it executes the process.
I also tried [subprocess.Popen(exe, timeout=15) for exe in execs] but got
Error:
TypeError: __init__() got an unexpected keyword argument 'timeout'
You're not waiting for the processes in parallel - you're giving them each a sequential 15 seconds to do their thing.
You might want something like this to give all of them up to 15 seconds in parallel.
import os
import subprocess
import time
execs = [
"C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof2.exe",
"C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof5.exe",
"C:\\Users\\XYZ\\PycharmProjects\\Task1\\dist\\Multiof10.exe",
]
print("Parent Process id : ", os.getpid())
process = [subprocess.Popen(exe) for exe in execs]
start_time = time.time()
while True:
elapsed_time = time.time() - start_time
any_alive = False
for proc in process:
ret = proc.poll()
if ret is None: # still alive
if elapsed_time >= 15:
print("Killing:", proc.pid)
proc.terminate()
any_alive = True
if not any_alive:
break
time.sleep(1)

How to know a lifetime of a process with Python?

Scenario: I have multiple Firefox browsers open. At somepoint i run my script to shut down all firefox processes that have been up for more than 30minutes.
Im doing this on Windows.
Is it possible to get lifetime from a process?
PROCNAME = "firefox.exe"
# Shuts down all PROCNAME processes
for proc in psutil.process_iter():
if proc.name() == PROCNAME:
proc.kill()
I just used psutil as Amandan suggested above. I used the PROCNAME "Google Chrome" since that is the browser I'm running and I was able to get the process creation time using the method below.
I assume you can subtract the current time from the process creation time to get the time that the browser has been running.
import psutil
import datetime
PROCNAME = "Google Chrome"
for proc in psutil.process_iter():
if proc.name() == PROCNAME:
p = psutil.Process(proc.ppid())
print(f"Creation time of {PROCNAME} process: ", datetime.datetime.fromtimestamp(p.create_time()).strftime("%Y-%m-%d %H:%M:%S"))
Output:
psutil is a really great module to retrieve information for all system processes and it is cross-platform.
psutil doesn't provide information on how long the process has been running, however, it does provide a process creation time, so process running time can be easily figured.
import psutil
import time
PROCNAME = "firefox.exe"
for proc in psutil.process_iter():
if proc.name() == PROCNAME:
etime = time.time() - proc.create_time()
print(etime)
if(etime > 1800): #30mintues or more running time
proc.kill()
Another way for Windows :
import os, time, datetime, threading
import subprocess, psutil, statistics
def perf_psutil(n=100):
liste = []
count = 0
ct = time.time()
while count < n:
count+=1
t0 = time.perf_counter_ns()
p = [proc.create_time() for proc in psutil.process_iter() if proc.name() == "explorer.exe"][0]
d = ct - p
t1 = time.perf_counter_ns()
liste.append(t1-t0)
print("Date in sec:",p)
print("Duration:",d)
print("Performance psutil :", statistics.mean(liste)/10**9)
def perf_wmic(n=100):
liste = []
count = 0
ct = datetime.datetime.now()
while count < n:
count+=1
t0 = time.perf_counter_ns()
p = [x.split(b'CreationDate=')[1] for x in subprocess.Popen('wmic PROCESS WHERE NAME="Explorer.exe" GET * /format:list <nul', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0].replace(b'\r\r\n',b',').split(b',') if x.startswith(b'CreationDate=')][0]
d = datetime.datetime.now()-datetime.datetime(int(p[:4]),int(p[4:6]),int(p[6:8]),int(p[8:10]),int(p[10:12]),int(p[12:14]),int(p[15:-4]))
t1 = time.perf_counter_ns()
liste.append(t1-t0)
print("Date :",p)
print("Duration:",d.total_seconds())
print("Performance wmic :", statistics.mean(liste)/10**9)
print('########## PSUTIL ##########')
perf_psutil(10)
print('############################')
print('########### WMIC ###########')
perf_wmic(10)
print('############################')
Results :
########## PSUTIL ##########
Date in sec: 1624790271.081833
Duration: 18625.84829068184
Performance psutil : 0.17050247
############################
########### WMIC ###########
Date : b'20210627123751.081832+120'
Duration: 18628.22999
Performance wmic : 0.06602881
############################

Run python process (using Pool of Multiprocessing ) parallel for batch

See, I need to write a code for ~quarter million input files to run on batch. I saw this post: https://codereview.stackexchange.com/questions/20416/python-parallelization-using-popen
I can't figure it out how to implement this in my code.
What I want
I want to give each process specific number of cores or in other words, specific number of processes only can run at certain time.
If one process is finished another one should takes its place.
My code (using subprocess)
Main.py
import subprocess
import os
import multiprocessing
import time
MAXCPU = multiprocessing.cpu_count()
try:
cp = int(raw_input("Enter Number of CPU's to use (Total %d) = "%MAXCPU))
assert cp <= MAXCPU
except:
print "Bad command taking all %d cores"%MAXCPU
cp =MAXCPU # set MAXCPU as CPU
list_pdb = [i for i in os.listdir(".") if i.endswith(".pdb")] # Input PDB files
assert len(list_pdb) != 0
c = {}
d = {}
t = {}
devnull = file("Devnull","wb")
for each in range(0, len(list_pdb), cp): # Number of cores in Use = 4
for e in range(cp):
if each + e < len(list_pdb):
args = ["sh", "Child.sh", list_pdb[each + e], str(cp)]
p = subprocess.Popen(args, shell=False,
stdout=devnull, stderr=devnull)
c[p.pid] = p
print "Started Process : %s" % list_pdb[each + e]
while c:
print c.keys()
pid, status = os.wait()
if pid in c:
print "Ended Process"
del c[pid]
devnull.close()
Child.sh
#!/bin/sh
sh grand_Child.sh
sh grand_Child.sh
sh grand_Child.sh
sh grand_Child.sh
# Some heavy processes with $1
grand_Child.sh
#!/bin/sh
sleep 5
Output
Here's a version of the code using multiprocessing.Pool. It's a lot simpler, as the module does nearly all the work!
This version also does:
lots of logging, when a proc starts/ends
prints how many files will be processed
lets you process more than numcpus at a time
Often when running multiprocess jobs, it's best to run more processes than CPUs. Different procs will wait on I/O, vs waiting for CPU. Often people run 2n+1, so for a 4 proc system they run 2*4+1 or 9 procs for a job. (I generally hardcode "5" or "10" until there's a reason to change, I'm lazy that way :) )
Enjoy!
source
import glob
import multiprocessing
import os
import subprocess
MAXCPU = multiprocessing.cpu_count()
TEST = False
def do_work(args):
path,numproc = args
curproc = multiprocessing.current_process()
print curproc, "Started Process, args={}".format(args)
devnull = open(os.devnull, 'w')
cmd = ["sh", "Child.sh", path, str(numproc)]
if TEST:
cmd.insert(0, 'echo')
try:
return subprocess.check_output(
cmd, shell=False,
stderr=devnull,
)
finally:
print curproc, "Ended Process"
if TEST:
cp = MAXCPU
list_pdb = glob.glob('t*.py')
else:
cp = int(raw_input("Enter Number of processes to use (%d CPUs) = " % MAXCPU))
list_pdb = glob.glob('*.pdb') # Input PDB files
# assert cp <= MAXCPU
print '{} files, {} procs'.format(len(list_pdb), cp)
assert len(list_pdb) != 0
pool = multiprocessing.Pool(cp)
print pool.map(
do_work, [ (path,cp) for path in list_pdb ],
)
pool.close()
pool.join()
output
27 files, 4 procs
<Process(PoolWorker-2, started daemon)> Started Process, args=('tdownload.py', 4)
<Process(PoolWorker-2, started daemon)> Ended Process
<Process(PoolWorker-2, started daemon)> Started Process, args=('tscapy.py', 4)
<Process(PoolWorker-2, started daemon)> Ended Process

process stop working while queue is not empty

I try to write a script in python to convert url into its corresponding ip. Since the url file is huge (nearly 10GB), so I'm trying to use multiprocessing lib.
I create one process to write output to file and a set of processes to convert url.
Here is my code:
import multiprocessing as mp
import socket
import time
num_processes = mp.cpu_count()
sentinel = None
def url2ip(inqueue, output):
v_url = inqueue.get()
print 'v_url '+v_url
try:
v_ip = socket.gethostbyname(v_url)
output_string = v_url+'|||'+v_ip+'\n'
except:
output_string = v_url+'|||-1'+'\n'
print 'output_string '+output_string
output.put(output_string)
print output.full()
def handle_output(output):
f_ip = open("outputfile", "a")
while True:
output_v = output.get()
if output_v:
print 'output_v '+output_v
f_ip.write(output_v)
else:
break
f_ip.close()
if __name__ == '__main__':
output = mp.Queue()
inqueue = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
print 'run in %d processes' % num_processes
for i in range(num_processes):
p = mp.Process(target=url2ip, args=(inqueue, output))
jobs.append(p)
p.start()
for line in open('inputfile','r'):
print 'ori '+line.strip()
inqueue.put(line.strip())
for i in range(num_processes):
# Send the sentinal to tell Simulation to end
inqueue.put(sentinel)
for p in jobs:
p.join()
output.put(None)
proc.join()
However, it did not work. It did produce several outputs (4 out of 10 urls in the test file) but it just suddenly stops while queues are not empty (I did check queue.empty())
Could anyone suggest what's wrong?Thanks
You're workers exit after processing a single url each, they need to loop internally until they get the sentinel. However, you should probably just look at multiprocessing.pool instead, as that does the bookkeeping for you.

Sending and receiving async over multiprocessing.Pipe() in Python

I'm having some issues getting the Pipe.send to work in this code. What I would ultimately like to do is send and receive messages to and from the foreign process while its running in a fork. This is eventually going to be integrated into a pexpect loop for talking to interpreter processes.
from multiprocessing import Process, Pipe
from pexpect import spawn
class CockProc(Process):
def start(self):
self.process = spawn('coqtop', ['-emacs-U'])
def run(self, conn):
while True:
if not conn.poll():
cmd = conn.recv()
self.process.send(cmd)
self.process.expect('\<\/prompt\>')
result = self.process.before + self.process.after + " "
conn.send(result)
q, p = Pipe()
proc = CockProc()
proc.start()
proc.run(p)
res = q.recv()
command = raw_input(res + " ")
q.send(command)
res = q.recv()
parent_conn.send('OHHAI')
p.join()
`
This works, but might need some more work. Not sure how many of these i can create and loop over.
from multiprocessing import Process, Pipe
from pexpect import spawn
class CockProc(Process):
def start(self):
self.process = spawn('coqtop', ['-emacs-U'])
def run(self, conn):
if conn.poll():
cmd = conn.recv()
self.process.send(cmd + "\n")
print "sent comm"
self.process.expect('\<\/prompt\>')
result = self.process.before + self.process.after + " "
conn.send(result)
here, there = Pipe(duplex=True)
proc = CockProc()
proc.start()
proc.run(there)
while True:
if here.poll():
res = here.recv()
command = raw_input(res + " ")
here.send(command)
proc.run(there)

Categories

Resources