PROBLEM
There are two separate processes that run in parallel and I would like them to communicate back-and-forth.
EXPLANATION OF THE CODE
The code is in Python2.7. In my stripped to minimum script, I use a queue for between processes communication. The process p1 puts data in a queue. The process p2 gets the data from the queue and does something with the data. Then the process p2 puts the modified data back in the queue and finally then the process p1 gets back the modified data from the queue. The modified data must return to the process p1 because this process really is an eventlet server that sends/receives requests.
CODE
#!/usr/bin/python2.7 python2.7
# -*- coding: utf-8 -*-
# script for back-and-forth data exchange between processes
# common modules
import os
import sys
import time
from multiprocessing import Process
from multiprocessing import Queue
from datetime import datetime
someData = {}
class Load():
def post(self):
timestamp = str(datetime.now())
someData = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
queue1.put(someData) # put into queue
print "#20 process 1: put in queue1 =>", someData
time.sleep(3)
while True: # queue1 checking loop, comment out the loop if use time.sleep only
if queue1.empty() == False:
timestamp = str(datetime.now())
res = queue1.get()
res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
print "#28 get from queue1 =>", res
break
else:
print "#31 queue1 empty"
time.sleep(1)
# while True: # queue2 checking loop
# if queue2.empty() == False:
# timestamp = str(datetime.now())
# res = queue2.get()
# res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
# print "#39 get from queue2 =>", res
# break
# else:
# print "#42 queue2 empty"
# time.sleep(1)
class Unload():
def get(self):
try:
if queue1.empty() == False:
data = queue1.get() # retrieve package from queue
#queue1.close()
#queue1.join_thread()
timestamp = str(datetime.now())
data = {"process":"p2","class":"Unload()","method":"get()","timestamp":timestamp}
print "#54 process 2: get from queue1 =>", data
self.doSomething(data) # call method
else:
print "#57 queue1 empty"
pass
except:
print "#60 queue1 error"
pass
def doSomething(self, data):
time.sleep(3)
timestamp = str(datetime.now())
someData = {"process":"p2","class":"Unload()","method":"doSomething()","timestamp":timestamp}
self.someData = someData
print "#68 process 2: do something =>", someData
self.put()
def put(self):
time.sleep(3)
timestamp = str(datetime.now())
self.someData = {"process":"p2","class":"Unload()","method":"put()","timestamp":timestamp}
print "#75 process 2: put back in queue1 =>", self.someData
res = self.someData
queue1.put(res)
#print "#78 process 2: put back in queue2 =>", self.someData
#res = self.someData
#queue2.put(res)
#queue2.close()
#queue2.join_thread()
# main
if __name__ == '__main__':
queue1 = Queue()
#queue2 = Queue()
global p1, p2
p1 = Process(target=Load().post(), args=(queue1,)) # process p1
#p1 = Process(target=Load().post(), args=(queue1,queue2,))
p1.daemon = True
p1.start()
p2 = Process(target=Unload().get(), args=(queue1,)) # process p2
#p2 = Process(target=Unload().get(), args=(queue1,queue2,))
p2.start()
p2.join()
QUESTION
I have checked other resources in regard but they all involve one direction communication. Below is the list of resources.
use-get-nowait-in-python-without-raising-empty-exception
in-python-how-do-you-get-data-back-from-a-particular-process-using-multiprocess
how-to-use-multiprocessing-queue-with-lock
multiprocessing module supports locks
thread-that-i-can-pause-and-resume
exchange-data-between-two-python-processes
How do I get the process1 to wait and retrieve the modified data from process2? Should I consider another approach for the communication between processes e.g pipes, zeroMQ?
ATTEMPT 1: using time.sleep() without the while loops in process 1
With only the time.sleep the data go up to back in the queue but never reach the final destination in process 1. So far so good but the final step is missing. The results are below.
#20 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-23 11:40:30.234466', 'class': 'Load()', 'method': 'post()'}
#54 process 2: get from queue1 => {'process': 'p2', 'timestamp': '2020-02-23 11:40:33.239113', 'class': 'Unload()', 'method': 'get()'}
#68 process 2: do something => {'process': 'p2', 'timestamp': '2020-02-23 11:40:36.242500', 'class': 'Unload()', 'method': 'doSomething()'}
#75 process 2: put back in queue1 => {'process': 'p2', 'timestamp': '2020-02-23 11:40:39.245856', 'class': 'Unload()', 'method': 'put()'}
ATTEMPT 2: using the while loop in process 1
With the while loop checking the queue the data go in the queue but get caught right after, they never reach the process 2. The results are below.
#20 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-23 11:46:14.606356', 'class': 'Load()', 'method': 'post()'}
#28 get from queue1 => {'process': 'p1', 'timestamp': '2020-02-23 11:46:17.610202', 'class': 'Load()', 'method': 'post()'}
#57 queue1 empty
ATTEMPT 3: using two queues
Using two queues: queue1 from process1 to process2, queue2 from process2 to process1. The data go in the queue1 but do not return in queue2, they mysteriously vanish. The results are below.
#20 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-23 11:53:39.745177', 'class': 'Load()', 'method': 'post()'}
#42 queue2 empty
----- UPDATE 20200224: attempts 4, 5 and 6 -----------------------------------------------------------------
ATTEMPT 4: using two queues with manager.Queue()
Using two queues with the manager.Queue(): queue1 from process1 to process2, queue2 from process2 to process1. The data go in the queue1 but do not return in queue2, again they mysteriously vanish. The code and results are below.
The code of the attempt 4:
#!/usr/bin/python2.7 python2.7
# -- coding: utf-8 --
# script for serialized interprocess data exchange
# common modules
import os
import sys
import time
import multiprocessing
from multiprocessing import Process
from multiprocessing import Queue
from multiprocessing import Manager
from datetime import datetime
someData = {}
manager = multiprocessing.Manager()
queue1 = manager.Queue()
queue2 = manager.Queue()
class Load():
def post(self):
timestamp = str(datetime.now())
someData = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
queue1.put(someData) # put into queue
print "#20 process 1: put in queue1 =>", someData
time.sleep(3)
# while True: # queue1 checking loop
# if queue1.empty() == False:
# timestamp = str(datetime.now())
# res = queue1.get()
# res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
# print "#28 get from queue1 =>", res
# break
# else:
# print "#31 queue1 empty"
# time.sleep(1)
while True: # queue2 checking loop
if queue2.empty() == False:
timestamp = str(datetime.now())
res = queue2.get()
res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
print "#39 get from queue2 =>", res
break
else:
print "#42 queue2 empty"
time.sleep(1)
class Unload():
def get(self):
try:
if queue1.empty() == False:
data = queue1.get() # retrieve package from queue
#queue1.close()
#queue1.join_thread()
timestamp = str(datetime.now())
data = {"process":"p2","class":"Unload()","method":"get()","timestamp":timestamp}
print "#54 process 2: get from queue1 =>", data
self.doSomething(data) # call method
else:
print "#57 queue1 empty"
pass
except:
print "#60 queue1 error"
pass
def doSomething(self, data):
time.sleep(3)
timestamp = str(datetime.now())
someData = {"process":"p2","class":"Unload()","method":"doSomething()","timestamp":timestamp}
self.someData = someData
print "#68 process 2: do something =>", someData
self.put()
def put(self):
time.sleep(3)
timestamp = str(datetime.now())
self.someData = {"process":"p2","class":"Unload()","method":"put()","timestamp":timestamp}
res = self.someData
#print "#75 process 2: put back in queue1 =>", self.someData
#queue1.put(res)
print "#78 process 2: put back in queue2 =>", self.someData
queue2.put(res)
#queue2.close()
#queue2.join_thread()
# main
if __name__ == '__main__':
manager = multiprocessing.Manager()
queue1 = manager.Queue()
queue2 = manager.Queue()
global p1, p2
#p1 = Process(target=Load().post(), args=(queue1,)) # process p1
p1 = Process(target=Load().post(), args=(queue1,queue2,))
p1.daemon = True
p1.start()
#p2 = Process(target=Unload().get(), args=(queue1,)) # process p2
p2 = Process(target=Unload().get(), args=(queue1,queue2,))
p2.start()
p2.join()
The results of the attempt 4:
#20 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-24 13:06:17.687762', 'class': 'Load()', 'method': 'post()'}
#42 queue2 empty
ATTEMPT 5: using one queue with manager.Queue()
Using one queue with the manager.Queue(): queue1 from process1 to process2, queue1 back from process2 to process1. The data go in the queue1 but get caught right after, they never reach the process 2. The code results are below.
The code of the attempt 5:
#!/usr/bin/python2.7 python2.7
# -*- coding: utf-8 -*-
# script for serialized interprocess data exchange
# common modules
import os
import sys
import time
import multiprocessing
from multiprocessing import Process
from multiprocessing import Queue
from multiprocessing import Manager
from datetime import datetime
someData = {}
manager = multiprocessing.Manager()
queue1 = manager.Queue()
#queue2 = manager.Queue()
class Load():
def post(self):
timestamp = str(datetime.now())
someData = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
queue1.put(someData) # put into queue
print "#25 process 1: put in queue1 =>", someData
time.sleep(3)
while True: # queue1 checking loop
if queue1.empty() == False:
timestamp = str(datetime.now())
res = queue1.get()
res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
print "#33 get from queue1 =>", res
break
else:
print "#36 queue1 empty"
time.sleep(1)
# while True: # queue2 checking loop
# if queue2.empty() == False:
# timestamp = str(datetime.now())
# res = queue2.get()
# res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
# print "#44 get from queue2 =>", res
# break
# else:
# print "#47 queue2 empty"
# time.sleep(1)
class Unload():
def get(self):
try:
if queue1.empty() == False:
data = queue1.get() # retrieve package from queue
#queue1.close()
#queue1.join_thread()
timestamp = str(datetime.now())
data = {"process":"p2","class":"Unload()","method":"get()","timestamp":timestamp}
print "#59 process 2: get from queue1 =>", data
self.doSomething(data) # call method
else:
print "#62 queue1 empty"
pass
except:
print "#65 queue1 error"
pass
def doSomething(self, data):
time.sleep(3)
timestamp = str(datetime.now())
someData = {"process":"p2","class":"Unload()","method":"doSomething()","timestamp":timestamp}
self.someData = someData
print "#73 process 2: do something =>", someData
self.put()
def put(self):
time.sleep(3)
timestamp = str(datetime.now())
self.someData = {"process":"p2","class":"Unload()","method":"put()","timestamp":timestamp}
res = self.someData
print "#81 process 2: put back in queue1 =>", self.someData
queue1.put(res)
#print "#83 process 2: put back in queue2 =>", self.someData
#queue2.put(res)
#queue2.close()
#queue2.join_thread()
# main
if __name__ == '__main__':
manager = multiprocessing.Manager()
queue1 = manager.Queue()
#queue2 = manager.Queue()
global p1, p2
p1 = Process(target=Load().post(), args=(queue1,)) # process p1
#p1 = Process(target=Load().post(), args=(queue1,queue2,))
p1.daemon = True
p1.start()
p2 = Process(target=Unload().get(), args=(queue1,)) # process p2
#p2 = Process(target=Unload().get(), args=(queue1,queue2,))
p2.start()
p2.join()
The result of the attempt 5:
#25 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-24 14:08:13.975886', 'class': 'Load()', 'method': 'post()'}
#33 get from queue1 => {'process': 'p1', 'timestamp': '2020-02-24 14:08:16.980382', 'class': 'Load()', 'method': 'post()'}
#62 queue1 empty
ATTEMPT 6: using the queue timeouts
As suggested I tried to correct the queue timeouts. The approach is again queue1 from process1 to process2, queue2 from process2 to process1. The data go in the queue1 but do not return in queue2, again they mysteriously vanish. The code and results are below.
The code of the attempt 6:
#!/usr/bin/python2.7 python2.7
# -*- coding: utf-8 -*-
# script for serialized interprocess data exchange
# common modules
import os
import sys
import time
import uuid
import Queue
#from Queue import Empty
import multiprocessing
from multiprocessing import Process
#from multiprocessing import Queue
from datetime import datetime
someData = {}
class Load():
def post(self):
timestamp = str(datetime.now())
someData = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
queue1.put(someData) # put into queue
print "#24 process 1: put in queue1 =>", someData
time.sleep(3)
# while True: # queue1 checking loop
# if queue1.empty() == False:
# timestamp = str(datetime.now())
# res = queue1.get()
# res = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
# print "#33 get from queue1 =>", res
# break
# else:
# print "#36 queue1 empty"
# time.sleep(1)
while True: # queue2 checking loop
try:
someData = queue2.get(True,1)
timestamp = str(datetime.now())
someData = {"process":"p1","class":"Load()","method":"post()","timestamp":timestamp}
print "#43 process 1: got from queue2 =>", someData
break
except Queue.Empty:
print "#46 process1: queue2 empty"
continue
class Unload():
def get(self):
while True: # queue2 checking loop
try:
someData = queue1.get(True,1)
timestamp = str(datetime.now())
someData = {"process":"p2","class":"Unload()","method":"get()","timestamp":timestamp}
print "#56 process2: got from queue1 =>", someData
break
except Queue.Empty:
print "#59 process2: queue1 empty"
continue
self.doSomething(someData) # call method
def doSomething(self, data):
time.sleep(3)
timestamp = str(datetime.now())
someData = {"process":"p2","class":"Unload()","method":"doSomething()","timestamp":timestamp}
self.someData = someData
print "#68 process2: do something =>", someData
self.put(someData)
def put(self,data):
time.sleep(3)
timestamp = str(datetime.now())
self.someData = {"process":"p2","class":"Unload()","method":"put()","timestamp":timestamp}
someData = self.someData
#print "#81 process 2: put back in queue1 =>", self.someData
#queue1.put(res)
print "#78 process2: put back in queue2 =>", someData
queue2.put(someData)
# main
if __name__ == '__main__':
queue1 = multiprocessing.Queue()
queue2 = multiprocessing.Queue()
global p1, p2
#p1 = Process(target=Load().post(), args=(queue1,)) # process p1
p1 = Process(target=Load().post(), args=(queue1,queue2,))
p1.daemon = True
p1.start()
#p2 = Process(target=Unload().get(), args=(queue1,)) # process p2
p2 = Process(target=Unload().get(), args=(queue1,queue2,))
p2.start()
p2.join()
The results of the attempt 6:
#24 process 1: put in queue1 => {'process': 'p1', 'timestamp': '2020-02-24 18:14:46.435661', 'class': 'Load()', 'method': 'post()'}
#46 process1: queue2 empty
NOTE: The suggested approach works when I use it without the classes. The code is below:
import uuid
import multiprocessing
from multiprocessing import Process
import Queue
def load(que_in, que_out):
request = {"id": uuid.uuid4(), "workload": "do_stuff", }
que_in.put(request)
print("load: sent request {}: {}".format(request["id"], request["workload"]))
while True:
try:
result = que_out.get(True, 1)
except Queue.Empty:
continue
print("load: got result {}: {}".format(result["id"], result["result"]))
def unload(que_in, que_out):
def processed(request):
return {"id": request["id"], "result": request["workload"] + " processed", }
while True:
try:
request = que_in.get(True, 1)
except Queue.Empty:
continue
print("unload: got request {}: {}".format(request["id"], request["workload"]))
result = processed(request)
que_out.put(result)
print("unload: sent result {}: {}".format(result["id"], result["result"]))
# main
if __name__ == '__main__':
que_in = multiprocessing.Queue()
que_out = multiprocessing.Queue()
p1 = Process(target=load, args=(que_in, que_out)) # process p1
p1.daemon = True
p1.start()
p2 = Process(target=unload, args=(que_in, que_out)) # process p2
p2.start()
p2.join()
----- UPDATE 20200225: attempt 7 ------------------------------------------------------------------------------
ATTEMPT 7: using one queue with queue timeouts in different classes (working)
In this attempt I use one shared queue between methods of different classes with the corrected timeouts. The data goes from process1 to process2 and back from process2 to process1 in a shared_queue. In this attempt the data travelled correctly. The code and results are below.
The code of the attempt 7:
import uuid
import multiprocessing
from multiprocessing import Process
import Queue
class Input():
def load(self, shared_queue):
request = {"id": uuid.uuid4(), "workload": "do_stuff", }
shared_queue.put(request)
print("load: sent request {}: {}".format(request["id"], request["workload"]))
while True:
try:
result = shared_queue.get(True, 1)
except Queue.Empty:
continue
print("load: got result {}: {}".format(result["id"], result["result"]))
break
class Output():
def unload(self, shared_queue):
def processed(request):
return {"id": request["id"], "result": request["workload"] + " processed", }
while True:
try:
request = shared_queue.get(True, 1)
except Queue.Empty:
continue
print("unload: got request {}: {}".format(request["id"], request["workload"]))
result = processed(request)
shared_queue.put(result)
print("unload: sent result {}: {}".format(result["id"], result["result"]))
# main
if __name__ == '__main__':
shared_queue = multiprocessing.Queue()
up = Input()
down = Output()
p1 = Process(target=up.load, args=(shared_queue,)) # process p1
p1.daemon = True
p1.start()
p2 = Process(target=down.unload, args=(shared_queue,)) # process p2
p2.start()
p1.join()
p2.join()
The results of the attempt 7:
load: sent request a461357a-b39a-43c4-89a8-a77486a5bf45: do_stuff
unload: got request a461357a-b39a-43c4-89a8-a77486a5bf45: do_stuff
unload: sent result a461357a-b39a-43c4-89a8-a77486a5bf45: do_stuff processed
load: got result a461357a-b39a-43c4-89a8-a77486a5bf45: do_stuff processed
I think you just missed queue timeouts usage
try:
result = que_out.get(True, 1)
except queue.Empty:
continue
This simplified example may help you:
import uuid
from multiprocessing import Process
from multiprocessing import Queue
import queue
def load(que_in, que_out):
request = {"id": uuid.uuid4(), "workload": "do_stuff", }
que_in.put(request)
print("load: sent request {}: {}".format(request["id"], request["workload"]))
while True:
try:
result = que_out.get(True, 1)
except queue.Empty:
continue
print("load: got result {}: {}".format(result["id"], result["result"]))
def unload(que_in, que_out):
def processed(request):
return {"id": request["id"], "result": request["workload"] + " processed", }
while True:
try:
request = que_in.get(True, 1)
except queue.Empty:
continue
print("unload: got request {}: {}".format(request["id"], request["workload"]))
result = processed(request)
que_out.put(result)
print("unload: sent result {}: {}".format(result["id"], result["result"]))
# main
if __name__ == '__main__':
que_in = Queue()
que_out = Queue()
p1 = Process(target=load, args=(que_in, que_out)) # process p1
p1.daemon = True
p1.start()
p2 = Process(target=unload, args=(que_in, que_out)) # process p2
p2.start()
p2.join()
Output
load: sent request d9894e41-3e8a-4474-9563-1a99797bc722: do_stuff
unload: got request d9894e41-3e8a-4474-9563-1a99797bc722: do_stuff
unload: sent result d9894e41-3e8a-4474-9563-1a99797bc722: do_stuff processed
load: got result d9894e41-3e8a-4474-9563-1a99797bc722: do_stuff processed
SOLUTION: using one shared queue
I solved the problem after following the suggestions and making some adjustments getting the right targeting of the different classes methods. The back and forth flow of the data between two separate processes is now correct. An important note for me is to pay extra attention to the someData package exchanged between two separate processes, it really has to be the same package that is tossed around. Hence the identifier entry "id": uuid.uuid4() to check if the package is the same with every passage.
#!/usr/bin/python2.7 python2.7
# -*- coding: utf-8 -*-
# script for back and forth communication between two separate processes using a shared queue
# common modules
import os
import sys
import time
import uuid
import Queue
import multiprocessing
from multiprocessing import Process
from datetime import datetime
someData = {}
class Load():
def post(self, sharedQueue):
timestamp = str(datetime.now()) # for timing checking
someData = {"timestamp":timestamp, "id": uuid.uuid4(), "workload": "do_stuff",}
self.someData = someData
sharedQueue.put(someData) # put into the shared queue
print("#25 p1 load: sent someData {}: {}".format(someData["id"], someData["timestamp"], someData["workload"]))
time.sleep(1) # for the time flow
while True: # sharedQueue checking loop
try:
time.sleep(1) # for the time flow
timestamp = str(datetime.now())
someData = sharedQueue.get(True,1)
someData["timestamp"] = timestamp
print("#37 p1 load: got back someData {}: {}".format(someData["id"], someData["timestamp"], someData["workload"]))
break
except Queue.Empty:
print("#37 p1: sharedQueue empty")
continue
break
class Unload():
def get(self, sharedQueue):
while True: # sharedQueue checking loop
try:
someData = sharedQueue.get(True,1)
self.someData = someData
timestamp = str(datetime.now())
someData["timestamp"] = timestamp
print("#50 p2 unload: got someData {}: {}".format(someData["id"], someData["timestamp"], someData["workload"]))
break
except Queue.Empty:
print("#53 p2: sharedQueue empty")
continue
time.sleep(1) # for the time flow
self.doSomething(someData) # pass the data to the method
def doSomething(self, someData): # execute some code here
timestamp = str(datetime.now())
someData["timestamp"] = timestamp
print("#62 p2 unload: doSomething {}: {}".format(someData["id"], someData["timestamp"], someData["workload"]))
self.put(someData)
time.sleep(1) # for the time flow
def put(self,someData):
timestamp = str(datetime.now())
someData["timestamp"] = timestamp
sharedQueue.put(someData)
print("#71 p2 unload: put someData {}: {}".format(someData["id"], someData["timestamp"], someData["workload"]))
time.sleep(1) # for the time flow
# main
if __name__ == '__main__':
sharedQueue = multiprocessing.Queue()
trx = Load()
rcx = Unload()
p1 = Process(target=trx.post, args=(sharedQueue,)) # process p1
p1.daemon = True
p1.start()
p2 = Process(target=rcx.get, args=(sharedQueue,)) # process p2
p2.start()
p1.join()
p2.join()
You have to use Manager-wrapped queue(s) to propagate changes across processes, otherwise each process has its separate queue object and can't see the other one(s). Manager creates a shared instance of the queue for all children processes.
So queue1 = Queue() becomes queue1 = manager.Queue() with from multiprocessing import Manager at the top. If you want to use your two queues approach, you obviously have to wrap the second queue in the same way.
Relevant resources:
Multiple queues from one multiprocessing Manager
Python documentation
The purpose of my program is to download files with threads. I define the unit, and using len/unit threads, the len is the length of the file which is going to be downloaded.
Using my program, the file can be downloaded, but the threads are not stopping. I can't find the reason why.
This is my code...
#! /usr/bin/python
import urllib2
import threading
import os
from time import ctime
class MyThread(threading.Thread):
def __init__(self,func,args,name=''):
threading.Thread.__init__(self);
self.func = func;
self.args = args;
self.name = name;
def run(self):
apply(self.func,self.args);
url = 'http://ubuntuone.com/1SHQeCAQWgIjUP2945hkZF';
request = urllib2.Request(url);
response = urllib2.urlopen(request);
meta = response.info();
response.close();
unit = 1000000;
flen = int(meta.getheaders('Content-Length')[0]);
print flen;
if flen%unit == 0:
bs = flen/unit;
else :
bs = flen/unit+1;
blocks = range(bs);
cnt = {};
for i in blocks:
cnt[i]=i;
def getStr(i):
try:
print 'Thread %d start.'%(i,);
fout = open('a.zip','wb');
fout.seek(i*unit,0);
if (i+1)*unit > flen:
request.add_header('Range','bytes=%d-%d'%(i*unit,flen-1));
else :
request.add_header('Range','bytes=%d-%d'%(i*unit,(i+1)*unit-1));
#opener = urllib2.build_opener();
#buf = opener.open(request).read();
resp = urllib2.urlopen(request);
buf = resp.read();
fout.write(buf);
except BaseException:
print 'Error';
finally :
#opener.close();
fout.flush();
fout.close();
del cnt[i];
# filelen = os.path.getsize('a.zip');
print 'Thread %d ended.'%(i),
print cnt;
# print 'progress : %4.2f'%(filelen*100.0/flen,),'%';
def main():
print 'download at:',ctime();
threads = [];
for i in blocks:
t = MyThread(getStr,(blocks[i],),getStr.__name__);
threads.append(t);
for i in blocks:
threads[i].start();
for i in blocks:
# print 'this is the %d thread;'%(i,);
threads[i].join();
#print 'size:',os.path.getsize('a.zip');
print 'download done at:',ctime();
if __name__=='__main__':
main();
Could someone please help me understand why the threads aren't stopping.
I can't really address your code example because it is quite messy and hard to follow, but a potential reason you are seeing the threads not end is that a request will stall out and never finish. urllib2 allows you to specify timeouts for how long you will allow the request to take.
What I would recommend for your own code is that you split your work up into a queue, start a fixed number of thread (instead of a variable number), and let the worker threads pick up work until it is done. Make the http requests have a timeout. If the timeout expires, try again or put the work back into the queue.
Here is a generic example of how to use a queue, a fixed number of workers and a sync primitive between them:
import threading
import time
from Queue import Queue
def worker(queue, results, lock):
local_results = []
while True:
val = queue.get()
if val is None:
break
# pretend to do work
time.sleep(.1)
local_results.append(val)
with lock:
results.extend(local_results)
print threading.current_thread().name, "Done!"
num_workers = 4
threads = []
queue = Queue()
lock = threading.Lock()
results = []
for i in xrange(100):
queue.put(i)
for _ in xrange(num_workers):
# Use None as a sentinel to signal the threads to end
queue.put(None)
t = threading.Thread(target=worker, args=(queue,results,lock))
t.start()
threads.append(t)
for t in threads:
t.join()
print sorted(results)
print "All done"