I have a simple problem / question about the below code.
ip = '192.168.0.'
count = 0
while count <= 255:
print(count)
count += 1
for i in range(10):
ipg=ip+str(count)
t = Thread(target=conn, args=(ipg,80))
t.start()
I want to execute 10 threads each time and wait for it to finish and then continue with the next 10 threads until count <= 255
I understand my problem and why it does execute 10 threads for every count increase, but not how to solve it, any help would be appreciated.
it can easily achieved using concurrents.futures library
here's the example code:
from concurrent.futures import ThreadPoolExecutor
ip = '192.168.0.'
count = 0
THREAD_COUNT = 10
def work_done(future):
result = future.result()
# work with your result here
def main():
with ThreadPoolExecutor(THREAD_COUNT) as executor:
while count <= 255:
count += 1
ipg=ip+str(count)
executor.submit(conn, ipg, 80).add_done_callback(work_done)
if __name__ == '__main__':
main()
here executor returns future for every task it submits.
keep in mind that if you use add_done_callback() finished task from thread returns to the main thread (which would block your main thread) if you really want true parallelism then you should wait for future objects separately. here's the code snippet for that.
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures._base import wait
futures = []
with ThreadPoolExecutor(THREAD_COUNT) as executor:
while count <= 255:
count += 1
ipg=ip+str(count)
futures.append(executor.submit(conn, ipg, 80))
wait(futures)
for succeded, failed in futures:
# work with your result here
hope this helps!
There are two viable options: multiprocessing with ThreadPool as #martineau suggested and using queue. Here's an example with queue that executes requests concurrently in 10 different threads. Note that it doesn't do any kind of batching, as soon as a thread completes it picks up next task without caring the status of other workers:
import queue
import threading
def conn():
try:
while True:
ip, port = que.get_nowait()
print('Connecting to {}:{}'.format(ip, port))
que.task_done()
except queue.Empty:
pass
que = queue.Queue()
for i in range(256):
que.put(('192.168.0.' + str(i), 80))
# Start workers
threads = [threading.Thread(target=conn) for _ in range(10)]
for t in threads:
t.start()
# Wait que to empty
que.join()
# Wait workers to die
for t in threads:
t.join()
Output:
Connecting to 192.168.0.0:80
Connecting to 192.168.0.1:80
Connecting to 192.168.0.2:80
Connecting to 192.168.0.3:80
Connecting to 192.168.0.4:80
Connecting to 192.168.0.5:80
Connecting to 192.168.0.6:80
Connecting to 192.168.0.7:80
Connecting to 192.168.0.8:80
Connecting to 192.168.0.9:80
Connecting to 192.168.0.10:80
Connecting to 192.168.0.11:80
...
I modified your code so that it has correct logic to do what you want. Please note that I don't run it but hope you'll get the general idea:
import time
from threading import Thread
ip = '192.168.0.'
count = 0
while count <= 255:
print(count)
# a list to keep your threads while they're running
alist = []
for i in range(10):
# count must be increased here to count threads to 255
count += 1
ipg=ip+str(count)
t = Thread(target=conn, args=(ipg,80))
t.start()
alist.append(t)
# check if threads are still running
while len(alist) > 0:
time.sleep(0.01)
for t in alist:
if not t.isAlive():
# remove completed threads
alist.remove(t)
Related
I have found several other questions that touch on this topic but none that are quite like my situation.
I have several very large text files (3+ gigabytes in size).
I would like to process them (say 2 documents) in parallel using multiprocessing. As part of my processing (within a single process) I need to make an API call and because of this would like to have each process have it's own threads to run asynchronously.
I have came up with a simplified example ( I have commented the code to try to explain what I think it should be doing):
import multiprocessing
from threading import Thread
import threading
from queue import Queue
import time
def process_huge_file(*, file_, batch_size=250, num_threads=4):
# create APICaller instance for each process that has it's own Queue
api_call = APICaller()
batch = []
# create threads that will run asynchronously to make API calls
# I expect these to immediately block since there is nothing in the Queue (which is was
# the api_call.run depends on to make a call
threads = []
for i in range(num_threads):
thread = Thread(target=api_call.run)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
####
# start processing the file line by line
for line in file_:
# if we are at our batch size, add the batch to the api_call to to let the threads do
# their api calling
if i % batch_size == 0:
api_call.queue.put(batch)
else:
# add fake line to batch
batch.append(fake_line)
class APICaller:
def __init__(self):
# thread safe queue to feed the threads which point at instances
of these APICaller objects
self.queue = Queue()
def run(self):
print("waiting for something to do")
self.queue.get()
print("processing item in queue")
time.sleep(0.1)
print("finished processing item in queue")
if __name__ == "__main__":
# fake docs
fake_line = "this is a fake line of some text"
# two fake docs with line length == 1000
fake_docs = [[fake_line] * 1000 for i in range(2)]
####
num_processes = 2
procs = []
for idx, doc in enumerate(fake_docs):
proc = multiprocessing.Process(target=process_huge_file, kwargs=dict(file_=doc))
proc.start()
procs.append(proc)
for proc in procs:
proc.join()
As the code is now, "waiting for something to do" prints 8 times (makes sense 4 threads per process) and then it stops or "deadlocks" which is not what I expect - I expect it to start sharing time with the threads as soon as I start putting items in the Queue but the code does not appear to make it this far. I ordinarily would step through to find a hang up but I still don't have a solid understanding of how to best debug using Threads (another topic for another day).
In the meantime, can someone help me figure out why my code is not doing what it should be doing?
I have made a few adjustments and additions and the code appears to do what it is supposed to now. The main adjustments are: adding a CloseableQueue class (from Brett Slatkins Effective Python Item 55), and ensuring that I call close and join on the queue so that the threads properly exit. Full code with these changes below:
import multiprocessing
from threading import Thread
import threading
from queue import Queue
import time
from concurrency_utils import CloseableQueue
def sync_process_huge_file(*, file_, batch_size=250):
batch = []
for idx, line in enumerate(file_):
# do processing on the text
if idx % batch_size == 0:
time.sleep(0.1)
batch = []
# api_call.queue.put(batch)
else:
computation = 0
for i in range(100000):
computation += i
batch.append(line)
def process_huge_file(*, file_, batch_size=250, num_threads=4):
api_call = APICaller()
batch = []
# api call threads
threads = []
for i in range(num_threads):
thread = Thread(target=api_call.run)
threads.append(thread)
thread.start()
for idx, line in enumerate(file_):
# do processing on the text
if idx % batch_size == 0:
api_call.queue.put(batch)
else:
computation = 0
for i in range(100000):
computation += i
batch.append(line)
for _ in threads:
api_call.queue.close()
api_call.queue.join()
for thread in threads:
thread.join()
class APICaller:
def __init__(self):
self.queue = CloseableQueue()
def run(self):
for item in self.queue:
print("waiting for something to do")
pass
print("processing item in queue")
time.sleep(0.1)
print("finished processing item in queue")
print("exiting run")
if __name__ == "__main__":
# fake docs
fake_line = "this is a fake line of some text"
# two fake docs with line length == 1000
fake_docs = [[fake_line] * 10000 for i in range(2)]
####
time_s = time.time()
num_processes = 2
procs = []
for idx, doc in enumerate(fake_docs):
proc = multiprocessing.Process(target=process_huge_file, kwargs=dict(file_=doc))
proc.start()
procs.append(proc)
for proc in procs:
proc.join()
time_e = time.time()
print(f"took {time_e-time_s} ")
class CloseableQueue(Queue):
SENTINEL = object()
def __init__(self, **kwargs):
super().__init__(**kwargs)
def close(self):
self.put(self.SENTINEL)
def __iter__(self):
while True:
item = self.get()
try:
if item is self.SENTINEL:
return # exit thread
yield item
finally:
self.task_done()
As expected this is a great speedup from running synchronously - 120 seconds vs 50 seconds.
I'm trying to make actions with Python requests. Here is my code:
import threading
import resource
import time
import sys
#maximum Open File Limit for thread limiter.
maxOpenFileLimit = resource.getrlimit(resource.RLIMIT_NOFILE)[0] # For example, it shows 50.
# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)
def threadAction(a1, a2):
global number
time.sleep(1) # My actions with Requests for each thread.
print number = number + 1
number = 0 # Count of complete actions
ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
a1 = i
for n in range(10): # Every website I need to do in 3 threads
a2 = n
ThreadActions.append(threading.Thread(target=threadAction, args=(a1,a2)))
for item in ThreadActions:
# But I can't do more than 50 Threads at once, because of maxOpenFileLimit.
while True:
# Thread limiter, analogue of BoundedSemaphore.
if (int(threading.activeCount()) < threadLimiter):
item.start()
break
else:
continue
for item in ThreadActions:
item.join()
But the thing is that after I get 50 Threads up, the Thread limiter starting to wait for some Thread to finish its work. And here is the problem. After scrit went to the Limiter, lsof -i|grep python|wc -l is showing much less than 50 active connections. But before Limiter it has showed all the <= 50 processes. Why is this happening? Or should I use requests.close() instead of requests.session() to prevent it using already oppened sockets?
Your limiter is a tight loop that takes up most of your processing time. Use a thread pool to limit the number of workers instead.
import multiprocessing.pool
# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)
def threadAction(a1, a2):
global number
time.sleep(1) # My actions with Requests for each thread.
print number = number + 1 # DEBUG: This doesn't update number and wouldn't be
# thread safe if it did
number = 0 # Count of complete actions
pool = multiprocessing.pool.ThreadPool(50, chunksize=1)
ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
a1 = i
for n in range(10): # Every website I need to do in 3 threads
a2 = n
ThreadActions.append((a1,a2))
pool.map(ThreadActons)
pool.close()
I used python to execute this program on Ubuntu
import thread
import time
# Define a function for the thread
def print_time( threadName, delay):
count = 0
while True:
count += 1
# Create two threads as follows
try:
for index in xrange(1,50000):
thread.start_new_thread( print_time, ("Thread-" + str(index), 0, ) )
except:
print "Error: unable to start thread"
while 1:
pass
I want all 8 cores are all 100% usage, but through System Monitor i only got 50% usage of the first 4 cores and 25% usage of the last 4 cores.
How can i make all 8 cores with 100% usage by python?
Something like this will get you started. You'd need to tweak num_processes in order to match your hardware.
import multiprocessing as mp
import time
def slow_func():
while True:
for i in xrange(99999):
j = i*i
def main():
num_processes = 4
for _ in range(num_processes):
process = mp.Process(target = slow_func)
process.daemon = True
process.start()
while True:
time.sleep(1)
if __name__ == '__main__':
main()
Edit: this works for me on Windows with 4 cores and gives 4x 25% processor usage.
To compare to the threading module, you can import threading and replace the line process = mp.Process(target = slow_func) with process = threading.Thread(target = slow_func). You should find it uses only one of your cores.
I have a python code with threads, and i need that if in for example 1 hour the threads are not finished, finish all threads and finish the script, and if the hour are not complete wait that all my threads finish.
I try with a daemon thread, and with a sleep of the hour, and if the hour is complete use a: sys.exit() but it not works to me, because always wait to my sleep threadh, then my script wait until the thread finished and the sys.exit() does not work.
import socket, threading, time, sys
from sys import argv
import os
acc_time=0
transactions_ps=5
ins = open(sys.argv[1],'r')
msisdn_list = []
for line in ins:
msisdn_list.append (line.strip('\n'))
# print line
ins.close()
def worker(msisdn_list):
semaphore.acquire()
global transactions_ps
print " ***** ", threading.currentThread().getName(), "Lanzado"
count=1
acc_time=0
print "len: ",len(msisdn_list)
for i in msisdn_list:
try:
init=time.time()
time.sleep(2)
print "sleeping...",i
time.sleep(4)
final=time.time()
acc_time = acc_time+final-init
print acc_time
except IOError:
print "Connection failed",sys.exc_info()[0]
print "Deteniendo ",threading.currentThread().getName()
semaphore.release()
def kill_process(secs_to_die):
time.sleep(secs_to_die)
sys.exit()
seconds_to_die=3600
thread_kill = threading.Thread(target = kill_process, args=(seconds_to_die,))
thread_kill.start()
max_con=5
semaphore = threading.BoundedSemaphore(max_con)
for i in range(0,28,transactions_ps):
w = threading.Thread(target=worker, args=(msisdn_list[i:i+transactions_ps-1],))
w.setDaemon(True)
w.start()
How can to do it
A minimal change to your code that would fix the issue is threading.Barrier:
barrier = Barrier(number_of_threads, timeout=3600)
# create (number_of_threads - 1) threads, pass them barrier
# each thread calls barrier.wait() on exit
barrier.wait() # after number_of_threads .wait() calls or on timeout it returns
A simpler alternative is to use multiprocessing.dummy.Pool that creates daemon threads:
from multiprocessing.dummy import Pool # use threads
start = timer()
endtime = start + 3600
for result in pool.imap_unordered(work, args):
if timer() > endtime:
exit("timeout")
The code doesn't timeout until a work item is done i.e., it expects that processing a single item from the list doesn't take long.
Complete example:
#!/usr/bin/env python3
import logging
import multiprocessing as mp
from multiprocessing.dummy import Pool
from time import monotonic as timer, sleep
info = mp.get_logger().info
def work(i):
info("start %d", i)
sleep(1)
info("end %d", i)
seconds_to_die = 3600
max_con = 5
mp.log_to_stderr().setLevel(logging.INFO) # enable logging
pool = Pool(max_con) # no more than max_con at a time
start = timer()
endtime = start + seconds_to_die
for _ in pool.imap_unordered(work, range(10000)):
if timer() > endtime:
exit("timeout")
You may refer to this implementation of KThread:
http://python.todaysummary.com/q_python_45717.html
I am working on creating a HTTP client which can generate hundreds of connections each second and send up to 10 requests on each of those connections. I am using threading so concurrency can be achieved.
Here is my code:
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://20.20.1.2/tempurl.html')
if response1.status_code == 200:
client_notify('r')
except(exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout) as Err:
client_notify('F')
break
requestCounter += 1
def main():
for q in range(connectionPerSec):
s1 = requests.session()
t1 = threading.Thread(target=generate_req, args=(s1,))
t1.start()
Issues:
It is not scaling above 200 connections/sec with requestRate = 1. I ran other available HTTP clients on the same client machine and against the server, test runs fine and it is able to scale.
When requestRate = 10, connections/sec drops to 30.
Reason: Not able to create targeted number of threads every second.
For issue #2, client machine is not able to create enough request sessions and start new threads. As soon as requestRate is set to more than 1, things start to fall apart.
I am suspecting it has something to do with HTTP connection pooling which requests uses.
Please suggest what am I doing wrong here.
I wasn't able to get things to fall apart, however the following code has some new features:
1) extended logging, including specific per-thread information
2) all threads join()ed at the end to make sure the parent process doesntt leave them hanging
3) multithreaded print tends to interleave the messages, which can be unwieldy. This version uses yield so a future version can accept the messages and print them clearly.
source
import exceptions, requests, threading, time
requestRate = 1
connectionPerSec = 2
def client_notify(msg):
return time.time(), threading.current_thread().name, msg
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://127.0.0.1/')
if response1.status_code == 200:
print client_notify('r')
except (exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout):
print client_notify('F')
break
requestCounter += 1
def main():
for cnum in range(connectionPerSec):
s1 = requests.session()
th = threading.Thread(
target=generate_req, args=(s1,),
name='thread-{:03d}'.format(cnum),
)
th.start()
for th in threading.enumerate():
if th != threading.current_thread():
th.join()
if __name__=='__main__':
main()
output
(1407275951.954147, 'thread-000', 'r')
(1407275951.95479, 'thread-001', 'r')