Concurrency Testing For A Web Service Using Python

Concurrency Testing For A Web Service Using Python - python

I have a web service that is required to handle significant concurrent utilization and volume and I need to test it. Since the service is fairly specialized, it does not lend itself well to a typical testing framework. The test would need to simulate multiple clients concurrently posting to a URL, parsing the resulting Http response, checking that a database has been appropriately updated and making sure certain emails have been correctly sent/received.
The current opinion at my company is that I should write this framework using Python. I have never used Python with multiple threads before and as I was doing my research I came across the Global Interpreter Lock which seems to be the basis of most of Python's concurrency handling. It seems to me that the GIL would prevent Python from being able to achieve true concurrency even on a multi-processor machine. Is this true? Does this scenario change if I use a compiler to compile Python to native code? Am I just barking up the wrong tree entirely and is Python the wrong tool for this job?

The Global Interpreter Lock prevents threads simultaneously executing Python code. This doesn't change when Python is compiled to bytecode, because the bytecode is still run by the Python interpreter, which will enforce the GIL. threading works by switching threads every sys.getcheckinterval() bytecodes.
This doesn't apply to multiprocessing, because it creates multiple Python processes instead of threads. You can have as many of those as your system will support, running truly concurrently.
So yes, you can do this with Python, either with threading or multiprocessing.

you can use python's multiprocessing library to achieve this.
http://docs.python.org/library/multiprocessing.html

Assuming general network conditions, as long you have sufficient system resources Python's regular threading module will allow you to simulate concurrent workload at an higher rate than any a real workload.

Related

Python portable interprocess Semaphore/Event

I'm creating a website using Flask. My WSGI server, Gunicorn, spawns multiple processes.
I have some cross-process objects (notably files) that I want to constrain access to within these processes, and raise events when they are modified.
The choice is normally to use system-wide mutexes/semaphores and events.
However, I can't find a portable (Windows/Mac/Linux) solution for these on Python.
The multiprocessing module (see this question), as far as I can tell, only works for processes spawned by the multiprocessing module itself, which these are not.
There are POSIX semaphores also, but these only work on Linux.
Does anyone know of a more general solution?

I have been researching this for a while, and the closest I could find is the python file-locking library fasteners:
It works quite well in all platforms. The problem it only implements system mutex, but not semaphore like counting. I have implementing my own counting in a locked file with an integer counter and active waiting, but this is still fragile and will leave the system in bad state if one of the process crashes and doesn't update the count properly.

python how to overcome global interpretor lock

I have implemented tool to extract the data from clear quest server using python. I need to do lot of searches in clearquest so I have implemented it using threading.
To do that i try to open individual clearquest session for each thread. When I try to run this I am getting Run Time error and none of the clearquest session opened correctly.
I did bit of research on internet and found that it's because of Global Interpretor Lock in python. I would like to know how to overcome this GIL...Any idea would be much appreciated

Instead of using threads, use different processes and use some sort of IPC to communicate between each process.

I don't think you'll have RuntimeErrors because of the GIL. Can you paste the traceback? If you have some critical parts of the code that are not re entrant, you'll have to isolate them using some concurrency primitives.
The main issue with the GIL is that it will forcibly serialise computation. The result is reduced throughput and scaling.

Python should I use threading

I currently have a couple of small applications < 500 lines. I am intending to eventually run them on a small LINUX ARM box. Is it better to combine these applications and use threading, or continue to have them as two separate applications?
These applications plus a very small website use a small sqlite database, though only one of the applications write everything else currently does reads. Due to constraints of the target box I am using Python 2.6.
I am using SQLite to prevent data loss over several days of use. There is no direct interaction between the two application though there is the potential for database locking issue especially during period data maintenance. Stopping these issue are a concern also the foot print of the two applications as the target devices are pretty small.

Depends on whether you need them to share data and how involved the sharing is. Other than that, from a speed point of view, for a multiprocessing machine, threading won't give you much of an advantage over separate processes.
If sharing can easily take place via a flat file or database then just let them be separate rather than complicating via threading.

For performance purpose, I will suggest you to use threads, process consumes much more resources than threads, it will be faster to create and need less memory (usefull in embedded environment), but of course, you'll have to deal with the common traps of multithreading programmation (concurent access solved by locks, but locks may lead to interlocking...)
If you plan to use many libraries that make low level calls, maybe with C extension developped that could not release properkly the GIL (Global Interpreter Lock), in this case, processes can be a better solution, to allow your applications to run even when one is blocked by GIL.

If you need to pass data between the two, you could use the Queues and other mechanisms in the multiprocessing module.
It's often much simpler to use multiprocessing rather than sharing memory or objects using the threading module.
If you don't need to pass data between your programs, just run them separately (and release any locks on files or databases as soon as possible to reduce contention).

I have decided to adopt a process rather than a threaded approach to resolving this issue. The primary factor in this decision is simplicity. The second factor is whilst one of these applications will be carrying out data acquisition the other will be communicating with a modem on an ad-hoc basis (receiving calls) I don't have control over the calling application but based on my investigations, there is the potential for a lot to go wrong.
There are a couple of factor which may change the approach further down the line primarily the need for the two processes to interact to prevent data contention on the database. Secondly if resources (memory/disk space/cpu) become an issue (due to the size of the device) one application should give me the ability to manage these problems.
That said the data acquisition application is already threaded. This allows the parent thread to manage the worker when exceptions arise, as the device will not be in a managed environment.

How to program to have all processors on your machine used?

I am running a single-threaded python program that performs massive data processing on my windows box. My machine has 8 processors. When I monitor the CPU usage in performance tab under Windows Task Manager, it shows that I am using only a very small fraction of the processing power available to me. Only one processor is being used to the fullest and all the rest are almost idle. What should I do to ensure that all my processors are used? Is multithreading a solution?

multithreading cannot make use of extra processors or cores.
You should spawn new processes instead of new threads.
This tool is by far the simplest among all that I have come across:
parallel python
Overview:
PP is a python module which provides mechanism for parallel
execution of python code on SMP
(systems with multiple processors or
cores) and clusters (computers
connected via network).
It is light, easy to install and integrate with other python software.
PP is an open source and cross-platform module written in pure
python

Multithreading is required for a single process, but it is not necessarily a solution; processor affinity can restrict it to a subset of available cores even if you have more than enough threads to use all.

As an addition to what Jon said, if you're using the standard Python interpreter you should understand the limitations with respect to multi-threading. If your threads are pure-python and aren't making system calls, they can't run concurrently on multiple processors due to the Global Interpreter Lock so the benefits to multi-threading are minimal. In this case, perhaps the recommendation would be to go with multiple processes instead or to switch to another Python implementation such as JPython or IronPython, which do not have a Global Interpreter Lock.

you can get that if your program is of the type that would benefit using python's multiprocessing module
multiprocessing uses multiple python process which avoids problems with the GIL so it's possible to use all of those cores with python code it has a easy threaded map and the basis for more complex schemes
it is similar to parallel python but is limited to the local machine and is included with python 2.6 and higher and is metaphorically similar to python's threading

Assuming your task is parallelizable, then yes, threading is certainly a solution. In particular, if you have a lot of data items to process but they can all be handled independently then it should be relatively straightforward to parallelize.
Using multiple processes instead of multiple threads might be another solution - you haven't told us enough about the problem to say, really.

Do this.
Break your task in to steps or stages. Each step reads something, does part of the overall calculation and writes something.
"""Some Step."""
import json
for some_line in sys.stdin:
object= json.loads( some_line )
# process the object
json.dump( result, sys.stdout )
Something like that ought to do fine.
If you have multiple objects that must be communicated, make a simple dictionary of the objects.
results = { 'a': a, 'b': b }
Connect them in a pipeline, like this.
python step1.py | python step2.py | python step3.py >output_file.dat
If you can break things into 8 or more steps, you will use 8 or more cores. And, BTW, this will be blazingly fast for very little real work.

Threading in a PyQt application: Use Qt threads or Python threads?

I'm writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I'd like to outsource the web connection to a separate worker thread.
[Yes, I know, now I have two problems.]
Anyway, the application uses PyQt4, so I'd like to know what the better choice is: Use Qt's threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?
Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I'd still like a more in-depth answer to the general question:
What are advantages and disadvantages of using PyQt4's (i.e. Qt's) threads over native Python threads (from the threading module)?
Edit 2: Thanks all for you answers. Although there's no 100% agreement, there seems to be widespread consensus that the answer is "use Qt", since the advantage of that is integration with the rest of the library, while causing no real disadvantages.
For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.
There were several answers I considered for the bounty; in the end I chose abbot's for the very relevant external reference; it was, however, a close call.
Thanks again.

This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo's comments on the subject:
It's mostly the same. The main difference is that QThreads are better
integrated with Qt (asynchrnous signals/slots, event loop, etc.).
Also, you can't use Qt from a Python thread (you can't for instance
post event to the main thread through QApplication.postEvent): you
need a QThread for that to work.
A general rule of thumb might be to use QThreads if you're going to interact somehow with Qt, and use Python threads otherwise.
And some earlier comment on this subject from PyQt's author: "they are both wrappers around the same native thread implementations". And both implementations use GIL in the same way.

Python's threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?
EDIT: more on threads
Python threads
Python's threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.
Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.
QT threads
When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.
QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.
In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)
Non-blocking I/O
Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.
With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.
One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.
It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).

The advantage of QThread is that it's integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.
If you are accessing a HTTP server, you should consider QNetworkAccessManager.

I asked myself the same question when I was working to PyTalk.
If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.
With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.
Moreover, there is not very performance question about this choice since both are a C++ bindings.
Here is my experience of PyQt and thread.
I encourage you to use QThread.

Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4's queued connection signals make it easy to send data across threads and will automatically be invoked if you're using QThread; I'm not sure if they will be if you're using Python threads, although it's easy to add a parameter to connect().

I can't really recommend either, but I can try describing differences between CPython and Qt threads.
First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn't hold GIL).
On the other hand, we have Qt threads, which are basically common layer over system threads, don't have Global Interpreter Lock, and thus are capable of running concurrently. I'm not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).
For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL - lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost - if you try switching every few instructions it doesn't help speed.)
Hope it helps with your problems :)

I can't comment on the exact differences between Python and PyQt threads, but I've been doing what you're attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you're trying to solve, the later will help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.