python-requests with multithreading - python

I am working on creating a HTTP client which can generate hundreds of connections each second and send up to 10 requests on each of those connections. I am using threading so concurrency can be achieved.
Here is my code:
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://20.20.1.2/tempurl.html')
if response1.status_code == 200:
client_notify('r')
except(exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout) as Err:
client_notify('F')
break
requestCounter += 1
def main():
for q in range(connectionPerSec):
s1 = requests.session()
t1 = threading.Thread(target=generate_req, args=(s1,))
t1.start()
Issues:
It is not scaling above 200 connections/sec with requestRate = 1. I ran other available HTTP clients on the same client machine and against the server, test runs fine and it is able to scale.
When requestRate = 10, connections/sec drops to 30.
Reason: Not able to create targeted number of threads every second.
For issue #2, client machine is not able to create enough request sessions and start new threads. As soon as requestRate is set to more than 1, things start to fall apart.
I am suspecting it has something to do with HTTP connection pooling which requests uses.
Please suggest what am I doing wrong here.

I wasn't able to get things to fall apart, however the following code has some new features:
1) extended logging, including specific per-thread information
2) all threads join()ed at the end to make sure the parent process doesntt leave them hanging
3) multithreaded print tends to interleave the messages, which can be unwieldy. This version uses yield so a future version can accept the messages and print them clearly.
source
import exceptions, requests, threading, time
requestRate = 1
connectionPerSec = 2
def client_notify(msg):
return time.time(), threading.current_thread().name, msg
def generate_req(reqSession):
requestCounter = 0
while requestCounter < requestRate:
try:
response1 = reqSession.get('http://127.0.0.1/')
if response1.status_code == 200:
print client_notify('r')
except (exceptions.ConnectionError, exceptions.HTTPError, exceptions.Timeout):
print client_notify('F')
break
requestCounter += 1
def main():
for cnum in range(connectionPerSec):
s1 = requests.session()
th = threading.Thread(
target=generate_req, args=(s1,),
name='thread-{:03d}'.format(cnum),
)
th.start()
for th in threading.enumerate():
if th != threading.current_thread():
th.join()
if __name__=='__main__':
main()
output
(1407275951.954147, 'thread-000', 'r')
(1407275951.95479, 'thread-001', 'r')

Related

Threading in python and Queue leads to memory leak or memory error

I had created simple code using multithreading in python with Queue. I have a main thread which keep on adding the Data in the Queue(Queue maxsize is 2000) and there will be 5 different threads who will take out from the Queue and publish into redis at some specific channel.
The code is working perfectly fine , but after 5 or 6 hours , the publish mechanism becomes slow.
As the threads which is used to remove the data from Queue becomes slow, and started throwing the buffer over flow error, as the Queue size reaches to maxsize. the speed of adding the data to queue is same which was in beginning.
The issue occurs differently on different configuration Linux system. How to identify what kind of error it is throwing? How to debug the problem.
As Stated - the code is very simple , where main thread is required to add the data in the Queue one by one and the 5 other threads can take the data out of the Queue one by one.
Sharing The Code
import redis
import logging
import sys
logging.basicConfig(level = logging.DEBUG)
redisPub = redis.StrictRedis(host='127.0.0.1', port=6379)
def main():
try:
recv_sock = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_UDP)
recv_sock.bind(("", 8000))
recv_sock.setblocking(0)
recv_sock.settimeout(5)
except Exception,e:
exc_type, exc_obj, exc_tb = sys.exc_info()
logging.error("Socket Connection Unsuccessful")
print "Program halted."
sys.exit()
sendThEvent = threading.Event()
sendThEvent.clear()
sendPktQ = Queue.Queue(maxsize=2000)
for i in range(0,5):
thSend = threading.Thread(name = 'sendThread', target = sendThread , args = (sendPktQ,sendThEvent))
thSend.setDaemon(True)
thSend.start()
packetsCounting = 0
while True:
try:
recvData, recvAddr = recv_sock.recvfrom(2048)
sendPktQ.put(recvData)
packetsCounting += 1
else:
logging.error("There is some error")
except socket.timeout:
continue
except Exception,e:
exc_type, exc_obj, exc_tb = sys.exc_info()
logging.error(e)
def sendThread(sendPktQ,listenEvent):
if sendPktQ == None:
pass
else:
while True:
instanceSentCnt += 1
if sendPktQ.qsize() < 1:
event_is_set = listenEvent.wait(0)
packetDict = sendPktQ.get()
data = redisPub.publish('chnlName', packetToSend)
logging.info("packet size reached to ============================================ %s ------------ %s"%(len(packetToSend),data))
if __name__ == "__main__":
main()
Anyone Inputs will be highly appreciated.
I see a sendPktQ.get(), but no sendPktQ.task_done(), for example:
async def _dispatch_packet(self, destination=None) -> None:
"""Send a command unless in listen_only mode."""
if not self.command_queue.empty():
cmd = self.command_queue.get()
if not (destination is None or self.config.get("listen_only")):
destination.write(bytearray(f"{cmd}\r\n".encode("ascii")))
await asyncio.sleep(0.05)
self.command_queue.task_done()
See queue library docs

Python - Entire PC network gets slower during repeated request

First of all, I am new to the network (HTTP communication) and Python.
I am currently using requests and threading module to periodically send or receive data with a specific site. The target site is 'https://api.coinone.co.kr' but I think it does not matter here.
By the example code below, I let Python fetch data every 1 second. At first, it works pretty well. Each request takes about 0.07 s in my computer.
import requests
import time
import threading
url0 = 'https://api.coinone.co.kr/ticker/'
class Fetch:
def __init__(self):
self.thread = threading.Thread(target=self.fcn)
self.t0 = time.perf_counter()
self.period = 1
self.req0 = None
def fcn(self):
while True:
# headers = None
headers = {'Connection': 'close'}
# requests
t0 = time.perf_counter()
req0 = requests.get(url0, headers=headers, params={'currency': 'all'})
resp0 = req0.json()
self.req0 = req0
reqTimeInt0 = time.perf_counter() - t0
# prints
print('elapsed time: %0.3f' % t0)
# print(req0.request.headers)
print(resp0['result'])
print('requests time interval: %0.3f' % reqTimeInt0)
print('')
# timer
self.t0 += self.period
now = time.perf_counter()
sleepInterval = self.t0 - now
if sleepInterval > 0:
time.sleep(sleepInterval)
else:
while self.t0 - now <= 0:
self.t0 += self.period
f1 = Fetch()
f1.thread.start()
But as time passes, the time needed for each 'http get' increases. After about 3 hours, one request takes 0.80 s where it is 10 times larger than it took in the initial state. Furthermore, not only does Python request get slower, but also the entire PC network gets slower (including internet browsing) without any increase in CPU, RAM resources, and network usage. Closing the console does not get back the network speed to normal and I have to reboot the PC. Anyway, after rebooting, the network is completely recovered and the internet works fine.
It seems like some burdens in the network connection are accumulated at each Python request. So I tried adding 'Connection: close' to the header, but it didn't work. Will 'requests.Session()' fix the problem?
I don't even know what to do to figure out the problem. I have to make the repeated requests for at least several days without breaking the connection.
Thank you.
Use a session, as it won't open new network connections, just use one, to make all the requests.
There is the preferred modifications:
class Fetch:
def __init__(self):
self.session = requests.Session
self.thread = threading.Thread(target=self.fcn)
self.t0 = time.perf_counter()
self.period = 1
self.req0 = None
def fcn(self):
while True:
# headers = None
headers = {'Connection': 'close'}
# requests
t0 = time.perf_counter()
req0 = self.session.get(url0, headers=headers, params={'currency': 'all'})
resp0 = req0.json()
self.req0 = req0
... other codes goes there ...

Multiple threads in while loop

I have a simple problem / question about the below code.
ip = '192.168.0.'
count = 0
while count <= 255:
print(count)
count += 1
for i in range(10):
ipg=ip+str(count)
t = Thread(target=conn, args=(ipg,80))
t.start()
I want to execute 10 threads each time and wait for it to finish and then continue with the next 10 threads until count <= 255
I understand my problem and why it does execute 10 threads for every count increase, but not how to solve it, any help would be appreciated.
it can easily achieved using concurrents.futures library
here's the example code:
from concurrent.futures import ThreadPoolExecutor
ip = '192.168.0.'
count = 0
THREAD_COUNT = 10
def work_done(future):
result = future.result()
# work with your result here
def main():
with ThreadPoolExecutor(THREAD_COUNT) as executor:
while count <= 255:
count += 1
ipg=ip+str(count)
executor.submit(conn, ipg, 80).add_done_callback(work_done)
if __name__ == '__main__':
main()
here executor returns future for every task it submits.
keep in mind that if you use add_done_callback() finished task from thread returns to the main thread (which would block your main thread) if you really want true parallelism then you should wait for future objects separately. here's the code snippet for that.
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures._base import wait
futures = []
with ThreadPoolExecutor(THREAD_COUNT) as executor:
while count <= 255:
count += 1
ipg=ip+str(count)
futures.append(executor.submit(conn, ipg, 80))
wait(futures)
for succeded, failed in futures:
# work with your result here
hope this helps!
There are two viable options: multiprocessing with ThreadPool as #martineau suggested and using queue. Here's an example with queue that executes requests concurrently in 10 different threads. Note that it doesn't do any kind of batching, as soon as a thread completes it picks up next task without caring the status of other workers:
import queue
import threading
def conn():
try:
while True:
ip, port = que.get_nowait()
print('Connecting to {}:{}'.format(ip, port))
que.task_done()
except queue.Empty:
pass
que = queue.Queue()
for i in range(256):
que.put(('192.168.0.' + str(i), 80))
# Start workers
threads = [threading.Thread(target=conn) for _ in range(10)]
for t in threads:
t.start()
# Wait que to empty
que.join()
# Wait workers to die
for t in threads:
t.join()
Output:
Connecting to 192.168.0.0:80
Connecting to 192.168.0.1:80
Connecting to 192.168.0.2:80
Connecting to 192.168.0.3:80
Connecting to 192.168.0.4:80
Connecting to 192.168.0.5:80
Connecting to 192.168.0.6:80
Connecting to 192.168.0.7:80
Connecting to 192.168.0.8:80
Connecting to 192.168.0.9:80
Connecting to 192.168.0.10:80
Connecting to 192.168.0.11:80
...
I modified your code so that it has correct logic to do what you want. Please note that I don't run it but hope you'll get the general idea:
import time
from threading import Thread
ip = '192.168.0.'
count = 0
while count <= 255:
print(count)
# a list to keep your threads while they're running
alist = []
for i in range(10):
# count must be increased here to count threads to 255
count += 1
ipg=ip+str(count)
t = Thread(target=conn, args=(ipg,80))
t.start()
alist.append(t)
# check if threads are still running
while len(alist) > 0:
time.sleep(0.01)
for t in alist:
if not t.isAlive():
# remove completed threads
alist.remove(t)

Python, Requests, Threading, How fast do python requests close its sockets?

I'm trying to make actions with Python requests. Here is my code:
import threading
import resource
import time
import sys
#maximum Open File Limit for thread limiter.
maxOpenFileLimit = resource.getrlimit(resource.RLIMIT_NOFILE)[0] # For example, it shows 50.
# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)
def threadAction(a1, a2):
global number
time.sleep(1) # My actions with Requests for each thread.
print number = number + 1
number = 0 # Count of complete actions
ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
a1 = i
for n in range(10): # Every website I need to do in 3 threads
a2 = n
ThreadActions.append(threading.Thread(target=threadAction, args=(a1,a2)))
for item in ThreadActions:
# But I can't do more than 50 Threads at once, because of maxOpenFileLimit.
while True:
# Thread limiter, analogue of BoundedSemaphore.
if (int(threading.activeCount()) < threadLimiter):
item.start()
break
else:
continue
for item in ThreadActions:
item.join()
But the thing is that after I get 50 Threads up, the Thread limiter starting to wait for some Thread to finish its work. And here is the problem. After scrit went to the Limiter, lsof -i|grep python|wc -l is showing much less than 50 active connections. But before Limiter it has showed all the <= 50 processes. Why is this happening? Or should I use requests.close() instead of requests.session() to prevent it using already oppened sockets?
Your limiter is a tight loop that takes up most of your processing time. Use a thread pool to limit the number of workers instead.
import multiprocessing.pool
# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)
def threadAction(a1, a2):
global number
time.sleep(1) # My actions with Requests for each thread.
print number = number + 1 # DEBUG: This doesn't update number and wouldn't be
# thread safe if it did
number = 0 # Count of complete actions
pool = multiprocessing.pool.ThreadPool(50, chunksize=1)
ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
a1 = i
for n in range(10): # Every website I need to do in 3 threads
a2 = n
ThreadActions.append((a1,a2))
pool.map(ThreadActons)
pool.close()

Python multiprocess Process never terminates

My routine below takes a list of urllib2.Requests and spawns a new process per request and fires them off. The purpose is for asynchronous speed, so it's all fire-and-forget (no response needed). The issue is that the processes spawned in the code below never terminate. So after a few of these the box wilL OOM. Context: Django web app. Any help?
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
for req in req_list:
MPQ.put(req)
# call processes on queue
worker = multiprocessing.Process(target=process_request, args=(MPQ,))
worker.daemon = True
worker.start()
# move on after queue is empty
MPQ.join()
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(MPQ):
try:
while True:
req = MPQ.get()
dr = urllib2.urlopen(req)
MPQ.task_done()
except Exception, e:
logging.error(traceback.print_exc())
Maybe i am not right, but
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
pool=[]
for req in req_list:
MPQ.put(req)
# call processes on queue
worker = multiprocessing.Process(target=process_request, args=(MPQ,))
worker.daemon = True
worker.start()
pool.append(worker)
# move on after queue is empty
MPQ.join()
# Close not needed processes
for p in pool: p.terminate()
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(MPQ):
try:
while True:
req = MPQ.get()
dr = urllib2.urlopen(req)
MPQ.task_done()
except Exception, e:
logging.error(traceback.print_exc())
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
MPQ = multiprocessing.JoinableQueue(MP_CONCURRENT)
CHUNK_SIZE = 20 #number of requests sended to one process.
pool = multiprocessing.Pool(MP_CONCURRENT)
def request_manager(req_list):
try:
# put request list in the queue
responce=pool.map(process_request,req_list,CHUNK_SIZE) # function exits after all requests called and pool work ended
# OR
responce=pool.map_async(process_request,req_list,CHUNK_SIZE) #function request_manager exits after all requests passed to pool
except Exception, e:
logging.error(traceback.print_exc())
# prcoess requests in queue
def process_request(req):
dr = urllib2.urlopen(req)
This works ~5-10x faster then your code
Integrate side "brocker" to django (such as rabbitmq or something like it).
Ok after some fiddling (and a good night's sleep) I believe I've figured out the problem (and thank you Eri, you were the inspiration I needed). The main issue of the zombie processes was that I was not signaling back that the process was finished (and killing it) both of which I (naively) thought was happening automagically with multiprocess.
The code that worked:
# function that will be run through the pool
def process_request(req):
try:
dr = urllib2.urlopen(req, timeout=30)
except Exception, e:
logging.error(traceback.print_exc())
# process killer
def sig_end(r):
sys.exit()
# globals
MP_CONCURRENT = int(multiprocessing.cpu_count()) * 2
if MP_CONCURRENT < 2: MP_CONCURRENT = 2
CHUNK_SIZE = 20
POOL = multiprocessing.Pool(MP_CONCURRENT)
# pool initiator
def request_manager(req_list):
try:
resp = POOL.map_async(process_request, req_list, CHUNK_SIZE, callback=sig_end)
except Exception, e:
logging.error(traceback.print_exc())
A couple of notes:
1) The function that will be hit by "map_async" ("process_request" in this example) must be defined first (and before the global declarations).
2) There is probably a more graceful way to exit the process (suggestions welcome).
3) Using pool in this example really was best (thanks again Eri) due to the "callback" feature which allows me to throw a signal right away.

Categories

Resources