I have a code that perform load testing against any specific url. But I have to do load testing of a web service that has different URLs. To do so, I need to make an array of URLs and each thread should hit all the URLs given in an array. How can I do this? This is my code:
import httplib2
import socket
import time
from threading import Event
from threading import Thread
from threading import current_thread
from urllib import urlencode
# Modify these values to control how the testing is done
# How many threads should be running at peak load.
NUM_THREADS = 50
# How many minutes the test should run with all threads active.
TIME_AT_PEAK_QPS = 20 # minutes
# How many seconds to wait between starting threads.
# Shouldn't be set below 30 seconds.
DELAY_BETWEEN_THREAD_START = 30 # seconds
quitevent = Event()
def threadproc():
"""This function is executed by each thread."""
print "Thread started: %s" % current_thread().getName()
h = httplib2.Http(timeout=30)
while not quitevent.is_set():
try:
# HTTP requests to exercise the server go here
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
resp, content = h.request(
"http://www.google.com")
if resp.status != 200:
print "Response not OK"
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
except socket.timeout:
pass
print "Thread finished: %s" % current_thread().getName()
if __name__ == "__main__":
runtime = (TIME_AT_PEAK_QPS * 60 + DELAY_BETWEEN_THREAD_START * NUM_THREADS)
print "Total runtime will be: %d seconds" % runtime
threads = []
try:
for i in range(NUM_THREADS):
t = Thread(target=threadproc)
t.start()
threads.append(t)
time.sleep(DELAY_BETWEEN_THREAD_START)
print "All threads running"
time.sleep(TIME_AT_PEAK_QPS*60)
print "Completed full time at peak qps, shutting down threads"
except:
print "Exception raised, shutting down threads"
quitevent.set()
time.sleep(3)
for t in threads:
t.join(1.0)
print "Finished"
Instead of passing a threadproc to Thread, extend the class:
class Worker(Thread):
def __init__(self, urls):
super(Worker, self).__init__()
self.urls = urls
def run(self):
for url in self.urls:
self.fetch(url)
That said, unless you do this to get a better understanding of threading and how load testing works internally, I suggest to use a mature testing framework like Jmeter instead. Years of experience went into it which you'd have to accumulate, first.
Related
I am new to learning python and got an exercise to create a multithreaded script to take a list of 10 public ftp servers and connect to them anonymously and just do a directory listing. I have the following code and it works when i use the ftp connect within the run function but when i try to create an "ftp" function and utilize it keeps erroring out and then the terminal gets stuck and can't kill the program or get out, which i can't figure out why that keeps happening either?
!/usr/bin/python
import threading
import Queue
import time
from ftplib import FTP
sites = ["speedtest.tele2.net", "test.rebex.net", "test.talia.net", "ftp.swfwmd.state.fl.us", "ftp.heanet.ie", "ftp.rediris.es", "ftp.ch.freebsd.org", "ftp.mirror.nl", "ftp.ussg.iu.edu", "ftp.uni-bayreu$
class WorkerThread(threading.Thread) :
def __init__(self, queue) :
threading.Thread.__init__(self)
self.queue = queue
#def ftp(ip) :
# server = FTP(ip)
# server.login()
# server.retrlines('LIST')
def run(self) :
print "In WorkerThread"
while True :
counter = self.queue.get()
print "Connecting to FTP Server %s" % counter
#self.ftp(counter)
#print "Ordered to sleep for %d seconds!" % counter
#time.sleep(counter)
#print "Finished sleeping for %d seconds" % counter
server = FTP(counter)
server.login()
server.retrlines('LIST')
self.queue.task_done()
queue = Queue.Queue()
for i in range(10) :
print "Creating WorkerThread : %d" % i
worker = WorkerThread(queue)
worker.setDaemon(True)
worker.start()
print "WorkerThread %d Created!" % i
for j in sites :
queue.put(j)
queue.join()
print "All Tasks Over!"
As suggested by:
Is there any way to kill a Thread in Python?
you should put a stop condition a make the threads check on it. Together with the join it allows for the thread to be terminated gracefully. Without entering into some other implication try the code below.
#!/usr/bin/python
import threading
import Queue
import time
from ftplib import FTP
sites = ["speedtest.tele2.net"]
class WorkerThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
self._stop = threading.Event()
def ftp(self, ip):
server = FTP(ip)
server.login()
server.retrlines('LIST')
def run(self):
print "In WorkerThread"
while not self.stopped():
counter = self.queue.get()
print "Connecting to FTP Server %s" % counter
self.ftp(counter)
self.queue.task_done()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.is_set()
if __name__ == '__main__':
queue = Queue.Queue()
for i in range(10):
print "Creating WorkerThread : %d" % i
worker = WorkerThread(queue)
worker.setDaemon(True)
worker.start()
worker.stop()
print "WorkerThread %d Created!" % i
for j in sites:
queue.put(j)
queue.join()
print "All Tasks Over!"
I have some heavy computation that needs to be done upon receiving a request without blocking the main IOLoop. To achieve that goal, I'm using ProcessPoolExecutor in a coroutine:
from concurrent.futures import ProcessPoolExecutor
from functools import partial
from random import uniform
import uuid
import time
from datetime import datetime
import tornado.ioloop
import tornado.web
import tornado.httpserver
def worker_function(msg):
start = time.time()
count = 0
seed = 1
while count < 99999999:
seed = uniform(1.1,1.2)
count += 1
end = time.time()
msg['seed'] = seed
msg['local_time'] = end - start
return msg
class EventHandler(tornado.web.RequestHandler):
def initialize(self):
self.executor = ProcessPoolExecutor(2)
#tornado.gen.coroutine
def get(self):
print "Received request at %s" % datetime.now()
result = yield self.executor.submit(
worker_function, {'id':str(uuid.uuid1())}
)
self.write(result)
self.finish()
print "Finished processing at %s" % datetime.now()
if __name__ == "__main__":
counter = {'count':0}
application = tornado.web.Application([
(r"/test", EventHandler),
])
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
To test the correct behavior, I'm loading the url in two separate browser's tab at around 1 second delay. Here is what the script outputs:
Received request at 2015-09-09 23:58:00.899278
Received request at 2015-09-09 23:58:23.329648
Finished processing at 2015-09-09 23:58:44.530322
Finished processing at 2015-09-09 23:59:05.120466
The two process are indeed running in parallel and I can see two CPU cores being used at 100% in htop. The problem is the 20 seconds delay between the two "Received request".
How can I make sure that the main IOLoop stays snappy?
Ps: The script is running on a Linux VM with 2 CPU cores.
The main issue is that you're testing with a browser, and browsers don't like to request the same url twice at the same time even if it's in two different tabs (they wait for the first request to finish before starting the second to see if they get a cacheable response). Add some unique query parameter to each url and you should see both tabs proceed in parallel (or test with two different browsers instead of two tabs in the same browser).
Also, your ProcessPoolExecutor should be a global (or a member of your Application) instead of a member of your RequestHandler. All requests should share the same executor.
The first few lines of the script explain the structure and the mechanism.
The problem that I'm facing is that the execution is getting stuck at line 53. Once the Downloader acts on the first request it generates the api correctly however on reaching http_object.request(audioscrobbler_api) it gets stuck.
The script was coded and tested on another system and it yielded the correct result.
I can confirm that the httplib2 package is not broken as it functions properly while methods of that library (including request) are called from other scripts.
What is causing the script to get stuck ?
Script:
#
# Album artwork downloading module for Encore Music Player application.
# Loosely based on the producer-consumer model devised by E W Djikstra.
#
# The Downloader class (implemented as a daemon thread) acts as the consumer
# in the system where it reads requests from the buffer and tries to fetch the
# artwork from ws.audioscrobbler.com (LastFM's web service portal).
#
# Requester class, the producer, is a standard thread class that places the request
# in the buffer when started.
#
# DBusRequester class provides an interface to the script and is made available on
# the session bus of the DBus daemon under the name of 'com.encore.AlbumArtDownloader'
# which enables the core music player to request downloads.
#
import threading, urllib, httplib2, md5, libxml2, os, dbus, dbus.service, signal
from collections import deque
from gi.repository import GObject
from dbus.mainloop.glib import DBusGMainLoop
requests = deque()
mutex = threading.Lock()
count = threading.Semaphore(0)
DBusGMainLoop(set_as_default = True)
class Downloader(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while True:
print "=> Downloader waiting for requests"
count.acquire() # wait for new request if buffer is empty
mutex.acquire() # enter critical section
request = requests.popleft()
mutex.release() # leave critical section
(p, q) = request
try:
print "=> Generating api for %s by %s" % (p,q)
params = urllib.urlencode({'method': 'album.getinfo', 'api_key': 'XXX', 'artist': p, 'album': q})
audioscrobbler_api = "http://ws.audioscrobbler.com/2.0/?%s" % params
print "=> Generated URL %s" % (audioscrobbler_api)
http_object = httplib2.Http()
print "=> Requesting response"
resp, content = http_object.request(audioscrobbler_api)
print "=> Received response"
if not resp.status == 200:
print "Unable to fetch artwork for %s by %s" % (q, p)
continue # proceed to the next item in queue if request fails
doc = libxml2.parseDoc(content)
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//image[#size='medium']") # grab the element containing the link to a medium sized artwork
if len(res) < 1:
continue # proceed to the next item in queue if the required image node is not found
image_uri = res[0].content # extract uri from node
wget_status = os.system("wget %s -q --tries 3 -O temp" % (image_uri))
if not wget_status == 0:
continue # proceed to the next item in queue if download fails
artwork_name = "%s.png" % (md5.md5("%s + %s" % (p, q)).hexdigest())
os.system("convert temp -resize 64x64 %s" % artwork_name)
except:
pass # handle http request error
class Requester(threading.Thread):
def __init__(self, request):
self.request = request
threading.Thread.__init__(self)
def run(self):
mutex.acquire() # enter critical section
if not self.request in requests:
requests.append(self.request)
count.release() # signal downloader
mutex.release() # leave critical section
class DBusRequester(dbus.service.Object):
def __init__(self):
bus_name = dbus.service.BusName('com.encore.AlbumArtDownloader', bus=dbus.SessionBus())
dbus.service.Object.__init__(self, bus_name, '/com/encore/AlbumArtDownloader')
#dbus.service.method('com.encore.AlbumArtDownloader')
def queue_request(self, artist_name, album_name):
request = (artist_name, album_name)
requester = Requester(request)
requester.start()
def sigint_handler(signum, frame):
"""Exit gracefully on receiving SIGINT."""
loop.quit()
signal.signal(signal.SIGINT, sigint_handler)
downloader_daemon = Downloader()
downloader_daemon.daemon = True
downloader_daemon.start()
requester_service = DBusRequester()
loop = GObject.MainLoop()
loop.run()
On doing a Ctrl-C
=> Downloader waiting for requests
=> Generating api for paul van dyk by evolution
=> Generated URL http://ws.audioscrobbler.com/2.0/?album=evolution&api_key=XXXXXXXXXXXXXXXXXXXX&method=album.getinfo&artist=paul+van+dyk
=> Requesting response
^C
Thanks !!
When your script gets stuck at line 53, can you break the execution using Ctrl + C and show us the traceback python gives?
The problem was caused by Python's Global Interpreter Lock (GIL).
GObject.threads_init()
fixes the problem.
in below code if i change one of the url to something invalid the whole process will stop and i couldn't exit form terminal using ctrl+c . so my question is how should i handle exception in my main thread run method and if an error happen trigger it and go to the next list element without fail the whole process:
#!/usr/bin/env python
import Queue
import threading
import urllib2
import time
hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com","http://apple.com"]
queue = Queue.Queue()
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
url = urllib2.urlopen(host)
print "connected"
#signals to queue job is done
self.queue.task_done()
start = time.time()
def main():
#spawn a pool of threads, and pass them queue instance
for i in range(5):
t = ThreadUrl(queue)
t.setDaemon(True)
t.start()
#populate queue with data
for host in hosts:
queue.put(host)
#wait on the queue until everything has been processed
queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
use a finally block to make sure the thread always signals even when there is an error.
def run(self):
while True:
#grabs host from queue
host = self.queue.get()
#grabs urls of hosts and prints first 1024 bytes of page
try:
url = urllib2.urlopen(host)
print "connected"
except urllib2.URLError:
print "couldn't connect to %s" % host
finally:
#signals to queue job is done
self.queue.task_done()
I am writing a small multi-threaded http file downloader and would like to be able to shrink the available threads as the code encounters errors
The errors would be specific to http errors returned where the web server is not allowing any more connections
eg. If I setup a pool of 5 threads, each thread is attempting to open it's own connection and download a chunk of the file. The server may only allow 2 connections and will I believe return 503 errors, I want to detect this and shut down a thread, eventually limiting the size of the pool to presumably only the 2 that the server will allow
Can I make a thread stop itself?
Is self.Thread_stop() sufficient?
Do I also need to join()?
Here's my worker class that does the downloading, grabs from the queue to process, once downloaded it dumps the result into resultQ to be saved to file by the main thread
It's in here where I would like to detect a http 503 and stop/kill/remove a thread from the available pools - and of course re-add the failed chunk back to the queue so the remaining threads will process it
class Downloader(threading.Thread):
def __init__(self, queue, resultQ, file_name):
threading.Thread.__init__(self)
self.workQ = queue
self.resultQ = resultQ
self.file_name = file_name
def run(self):
while True:
block_num, url, start, length = self.workQ.get()
print 'Starting Queue #: %s' % block_num
print start
print length
#Download the file
self.download_file(url, start, length)
#Tell queue that this task is done
print 'Queue #: %s finished' % block_num
self.workQ.task_done()
def download_file(self, url, start, length):
request = urllib2.Request(url, None, headers)
if length == 0:
return None
request.add_header('Range', 'bytes=%d-%d' % (start, start + length))
while 1:
try:
data = urllib2.urlopen(request)
except urllib2.URLError, u:
print "Connection did not start with", u
else:
break
chunk = ''
block_size = 1024
remaining_blocks = length
while remaining_blocks > 0:
if remaining_blocks >= block_size:
fetch_size = block_size
else:
fetch_size = int(remaining_blocks)
try:
data_block = data.read(fetch_size)
if len(data_block) == 0:
print "Connection: [TESTING]: 0 sized block" + \
" fetched."
if len(data_block) != fetch_size:
print "Connection: len(data_block) != length" + \
", but continuing anyway."
self.run()
return
except socket.timeout, s:
print "Connection timed out with", s
self.run()
return
remaining_blocks -= fetch_size
chunk += data_block
resultQ.put([start, chunk])
Below is where I init the thread pool, further down I put items to the queue
# create a thread pool and give them a queue
for i in range(num_threads):
t = Downloader(workQ, resultQ, file_name)
t.setDaemon(True)
t.start()
Can I make a thread stop itself?
Don't use self._Thread__stop(). It is enough to exit the thread's run() method (you can check a flag or read a sentinel value from a queue to know when to exit).
It's in here where I would like to detect a http 503 and stop/kill/remove a thread from the available pools - and of course re-add the failed chunk back to the queue so the remaining threads will process it
You can simplify the code by separating responsibilities:
download_file() should not try to reconnect in the infinite loop. If there is an error; let's the code that calls download_file() resubmit it if necessary
the control about the number of concurrent connections can be encapsulated in a Semaphore object. Number of threads may differ from number of concurrent connections in this case
import concurrent.futures # on Python 2.x: pip install futures
from threading import BoundedSemaphore
def download_file(args):
nconcurrent.acquire(timeout=args['timeout']) # block if too many connections
# ...
nconcurrent.release() #NOTE: don't release it on exception,
# allow the caller to handle it
# you can put it into a dictionary: server -> semaphore instead of the global
nconcurrent = BoundedSemaphore(5) # start with at most 5 concurrent connections
with concurrent.futures.ThreadPoolExecutor(max_workers=NUM_THREADS) as executor:
future_to_args = dict((executor.submit(download_file, args), args)
for args in generate_initial_download_tasks())
while future_to_args:
for future in concurrent.futures.as_completed(dict(**future_to_args)):
args = future_to_args.pop(future)
try:
result = future.result()
except Exception as e:
print('%r generated an exception: %s' % (args, e))
if getattr(e, 'code') != 503:
# don't decrease number of concurrent connections
nconcurrent.release()
# resubmit
args['timeout'] *= 2
future_to_args[executor.submit(download_file, args)] = args
else: # successfully downloaded `args`
print('f%r returned %r' % (args, result))
See ThreadPoolExecutor() example.
you should be using a threadpool to control the life of your threads:
http://www.inductiveload.com/posts/easy-thread-pools-in-python-with-threadpool/
Then when a thread exists, you can send a message to the main thread (that is handling the threadpool) and then change the size of the threadpool, and postpone new requests or failed requests in a stack that you'll empty.
tedelanay is absolutely right about the daemon status you're giving to your threads. There is no need to set them as daemons.
Basically, you can simplify your code, you could do something as follows:
import threadpool
def process_tasks():
pool = threadpool.ThreadPool(4)
requests = threadpool.makeRequests(download_file, arguments)
for req in requests:
pool.putRequest(req)
#wait for them to finish (or you could go and do something else)
pool.wait()
if __name__ == '__main__':
process_tasks()
where arguments is up to your strategy. Either you give your threads a queue as argument and then empty the queue. Or you can get process the queue in process_tasks, block while the pool is full, and open a new thread when a thread is done, but the queue is not empty. It all depends on your needs and the context of your downloader.
resources:
http://chrisarndt.de/projects/threadpool/
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/203871
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/196618
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302746
http://lethain.com/using-threadpools-in-python/
A Thread object terminates the thread simply by returning from the run method - it doesn't call stop. If you set your thread to daemon mode, there is no need to join but otherwise the main thread needs to do it. It is common for the thread to use the resultq to report that it is exiting and for the main thread to use that info to do the join. This helps with orderly termination of your process. You can get strange errors during system exit if python is still juggling multiple threads and its best to side-step that.