Controlling scheduling priority of python threads?

Controlling scheduling priority of python threads? - python

I've written a script that uses two thread pools of ten threads each to pull in data from an API. The thread pool implements this code on ActiveState. Each thread pool is monitoring a Redis database via PubSub for new entries. When a new entry is published, python passes the data to a function that uses python's Subprocess.POpen to execute a PHP shell to do the actual work of calling the API.
This system of launching PHP shells is necessary for functionality with my PHP web app, so launching PHP shells with Python can't be avoided.
This script will only be running on Linux servers.
How do I control the niceness (scheduling priority) of the application's threads?
Edit:
It seems controlling scheduling priority for individual threads in Python isn't possible. Is there a python solution, or at the very least a UNIX command I can run along with my script, to control the priority?
Edit 2:
Well I didn't end up finding a python way to handle it. I'm just running my script with nice now like this:
nice -n 19 python MyScript.py

I believe that threading priority is not controllable in python due to how they are implemented using a global interpreter lock (GIL). Having said that, even if you could give one thread more CPU processing priority, the python implementation that hands around the GIL would not be aware of this as it handed around the GIL. If you were able to increase niceness in a single thread in your pool (say it is doing a more important job) you would need to use your own implementation of locks to give the higher priority thread access to the GIL more often.
A google search returns this article which I believe is similar to what you are asking
Explains why it doesnt work
http://www.velocityreviews.com/forums/t329441-threading-priority.html
Explains the workaround I was suggesting
http://bytes.com/topic/python/answers/645966-setting-thread-priorities

The python threading-docs mention explicitly that there is no support for setting thread-priorities:
The design of this module is loosely based on Java’s threading model. However, where Java makes locks and condition variables basic behavior of every object, they are separate objects in Python. Python’s Thread class supports a subset of the behavior of Java’s Thread class; currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted. The static methods of Java’s Thread class, when implemented, are mapped to module-level functions.

It doesn't work, but I tried:
getting the parent pid and priority
launching threads using concurrent.futures.ThreadPoolExecutor
using ctypes to get the (linux) thread id from within the thread(works)
using the tid with os.setpriority(os.PRIO_PROCESS,tid,parent_priority+1)
calling pool.shutdown() from the parent.
Even with liberal sprinkling of os.sched_yield(), the child threads never actually run past the setpriority().
Reading man pages, it seems threads don't have the capability to change (even their) scheduling priority; you have to do something with "capabilities" to give the thread the "CAP_SYS_NICE" capability. Running the process with root permissions didn't help either; child threads still don't run.

I know, a lot of time has passed, but I recently came across this question, and I thought it would be useful to add another option.
Have a look at threading2, which is a drop-in replacement and extension for the default threading module, with support – sort of – for priority and affinity.

I was wondering if this answer at another related question might be useful in this scenario? (link)
As you are already using Subprocess.POpen to launch your PHP script, it strikes me that you can use "preexec_fn" and either a predefined function, or a lambda function (as demonstrated in the above linked answer) to set the nice level of each launched PHP thread?

Related

Python portable interprocess Semaphore/Event

I'm creating a website using Flask. My WSGI server, Gunicorn, spawns multiple processes.
I have some cross-process objects (notably files) that I want to constrain access to within these processes, and raise events when they are modified.
The choice is normally to use system-wide mutexes/semaphores and events.
However, I can't find a portable (Windows/Mac/Linux) solution for these on Python.
The multiprocessing module (see this question), as far as I can tell, only works for processes spawned by the multiprocessing module itself, which these are not.
There are POSIX semaphores also, but these only work on Linux.
Does anyone know of a more general solution?

I have been researching this for a while, and the closest I could find is the python file-locking library fasteners:
It works quite well in all platforms. The problem it only implements system mutex, but not semaphore like counting. I have implementing my own counting in a locked file with an integer counter and active waiting, but this is still fragile and will leave the system in bad state if one of the process crashes and doesn't update the count properly.

Multiprocessing or Multithreading for plugin architecture in Python

I'm trying to implement a plugin architecture in Python.
I've started writing it using the Threading module where each plugin is a thread which I invoke using the Thread.start() method (since all plugins subclass BasePlugin which subclasses Thread). However I've just come across the multiprocessing module.
I'm currently wondering if I should switch to the multiprocessing module and share data using shared memory / Pipes etc...
I'd like to get other's opinions on this.
The plugin architecture I've been working on works as follows:
An event is received by the Plugin Manager. The Plugin Manager checks for all the plugins who've subscribed to that type of event. It activates them and sends them the event object (since it holds additional information). If one of the plugins is already active there is no need to spawn it (just send the event object to it).
In addition there are a few resources which belong only to one plugin at any point in time. Each plugin can request the resource (I'm not worrying about any race condition here since there won't be that many plugins active at once).

Threads share memory with the primary process and each other. For example you can have a list that is available to all threads. An item appended to a list can be seen by other threads. But you have to be careful. You have to understand which operations on data structures are thread safe and which are not. What happens to the behaviour of your program when two threads are checking for the existence of a key in a dictionary and then writing to it?
Multiple processes do not share memory. The new process that you start gets a copy of the memory at the point where it was spawned.
Threads use less resources. But can be hard to reason about. On the other hand communication between processes is tricky. And you can't just access an arbitrary Python data structure. Which it sounds like you want to be able to do.
A badly written plugin, if it was in a thread, could crash your whole program. Whereas if it was in a separate process this wouldn't happen. Maybe that's a consideration?

What happens to running threads after forking?

I'm using OpenERP, a Python based ERP, which uses different threads (one-thread per client, etc). I would like to use multiprocessing.Process() to fork() and call a long-running method.
My question is: what will happen to the parent's threads? Will they be copied and continue to run? Will the child process call accept() on the server socket?
Thanks for your answers,

Forking does not copy threads, only the main one. So be very careful with forking multithreaded application as it can cause unpredictable side-effects (e.g when forking happened while some thread was executing in a mutexed critical section), something really can be broken in your forked process unless you know the code you're forking ideally.
Though everything that I said above is true, there's a workaround (at least on Linux) called pthread_atfork() which acts as a callback when a process was forked (you can recreate all needed threads). Though it applies to C applications, it's not applied to Python ones.
For further information you can refer to:
Python issue tracker on this problem - http://bugs.python.org/issue6923
Seek around the web on similar ideas implementation, for example: http://code.google.com/p/python-atfork/

Threading in a PyQt application: Use Qt threads or Python threads?

I'm writing a GUI application that regularly retrieves data through a web connection. Since this retrieval takes a while, this causes the UI to be unresponsive during the retrieval process (it cannot be split into smaller parts). This is why I'd like to outsource the web connection to a separate worker thread.
[Yes, I know, now I have two problems.]
Anyway, the application uses PyQt4, so I'd like to know what the better choice is: Use Qt's threads or use the Python threading module? What are advantages / disadvantages of each? Or do you have a totally different suggestion?
Edit (re bounty): While the solution in my particular case will probably be using a non-blocking network request like Jeff Ober and Lukáš Lalinský suggested (so basically leaving the concurrency problems to the networking implementation), I'd still like a more in-depth answer to the general question:
What are advantages and disadvantages of using PyQt4's (i.e. Qt's) threads over native Python threads (from the threading module)?
Edit 2: Thanks all for you answers. Although there's no 100% agreement, there seems to be widespread consensus that the answer is "use Qt", since the advantage of that is integration with the rest of the library, while causing no real disadvantages.
For anyone looking to choose between the two threading implementations, I highly recommend they read all the answers provided here, including the PyQt mailing list thread that abbot links to.
There were several answers I considered for the bounty; in the end I chose abbot's for the very relevant external reference; it was, however, a close call.
Thanks again.

This was discussed not too long ago in PyQt mailing list. Quoting Giovanni Bajo's comments on the subject:
It's mostly the same. The main difference is that QThreads are better
integrated with Qt (asynchrnous signals/slots, event loop, etc.).
Also, you can't use Qt from a Python thread (you can't for instance
post event to the main thread through QApplication.postEvent): you
need a QThread for that to work.
A general rule of thumb might be to use QThreads if you're going to interact somehow with Qt, and use Python threads otherwise.
And some earlier comment on this subject from PyQt's author: "they are both wrappers around the same native thread implementations". And both implementations use GIL in the same way.

Python's threads will be simpler and safer, and since it is for an I/O-based application, they are able to bypass the GIL. That said, have you considered non-blocking I/O using Twisted or non-blocking sockets/select?
EDIT: more on threads
Python threads
Python's threads are system threads. However, Python uses a global interpreter lock (GIL) to ensure that the interpreter is only ever executing a certain size block of byte-code instructions at a time. Luckily, Python releases the GIL during input/output operations, making threads useful for simulating non-blocking I/O.
Important caveat: This can be misleading, since the number of byte-code instructions does not correspond to the number of lines in a program. Even a single assignment may not be atomic in Python, so a mutex lock is necessary for any block of code that must be executed atomically, even with the GIL.
QT threads
When Python hands off control to a 3rd party compiled module, it releases the GIL. It becomes the responsibility of the module to ensure atomicity where required. When control is passed back, Python will use the GIL. This can make using 3rd party libraries in conjunction with threads confusing. It is even more difficult to use an external threading library because it adds uncertainty as to where and when control is in the hands of the module vs the interpreter.
QT threads operate with the GIL released. QT threads are able to execute QT library code (and other compiled module code that does not acquire the GIL) concurrently. However, the Python code executed within the context of a QT thread still acquires the GIL, and now you have to manage two sets of logic for locking your code.
In the end, both QT threads and Python threads are wrappers around system threads. Python threads are marginally safer to use, since those parts that are not written in Python (implicitly using the GIL) use the GIL in any case (although the caveat above still applies.)
Non-blocking I/O
Threads add extraordinarily complexity to your application. Especially when dealing with the already complex interaction between the Python interpreter and compiled module code. While many find event-based programming difficult to follow, event-based, non-blocking I/O is often much less difficult to reason about than threads.
With asynchronous I/O, you can always be sure that, for each open descriptor, the path of execution is consistent and orderly. There are, obviously, issues that must be addressed, such as what to do when code depending on one open channel further depends on the results of code to be called when another open channel returns data.
One nice solution for event-based, non-blocking I/O is the new Diesel library. It is restricted to Linux at the moment, but it is extraordinarily fast and quite elegant.
It is also worth your time to learn pyevent, a wrapper around the wonderful libevent library, which provides a basic framework for event-based programming using the fastest available method for your system (determined at compile time).

The advantage of QThread is that it's integrated with the rest of the Qt library. That is, thread-aware methods in Qt will need to know in which thread they run, and to move objects between threads, you will need to use QThread. Another useful feature is running your own event loop in a thread.
If you are accessing a HTTP server, you should consider QNetworkAccessManager.

I asked myself the same question when I was working to PyTalk.
If you are using Qt, you need to use QThread to be able to use the Qt framework and expecially the signal/slot system.
With the signal/slot engine, you will be able to talk from a thread to another and with every part of your project.
Moreover, there is not very performance question about this choice since both are a C++ bindings.
Here is my experience of PyQt and thread.
I encourage you to use QThread.

Jeff has some good points. Only one main thread can do any GUI updates. If you do need to update the GUI from within the thread, Qt-4's queued connection signals make it easy to send data across threads and will automatically be invoked if you're using QThread; I'm not sure if they will be if you're using Python threads, although it's easy to add a parameter to connect().

I can't really recommend either, but I can try describing differences between CPython and Qt threads.
First of all, CPython threads do not run concurrently, at least not Python code. Yes, they do create system threads for each Python thread, however only the thread currently holding Global Interpreter Lock is allowed to run (C extensions and FFI code might bypass it, but Python bytecode is not executed while thread doesn't hold GIL).
On the other hand, we have Qt threads, which are basically common layer over system threads, don't have Global Interpreter Lock, and thus are capable of running concurrently. I'm not sure how PyQt deals with it, however unless your Qt threads call Python code, they should be able to run concurrently (bar various extra locks that might be implemented in various structures).
For extra fine-tuning, you can modify the amount of bytecode instructions that are interpreted before switching ownership of GIL - lower values mean more context switching (and possibly higher responsiveness) but lower performance per individual thread (context switches have their cost - if you try switching every few instructions it doesn't help speed.)
Hope it helps with your problems :)

I can't comment on the exact differences between Python and PyQt threads, but I've been doing what you're attempting to do using QThread, QNetworkAcessManager and making sure to call QApplication.processEvents() while the thread is alive. If GUI responsiveness is really the issue you're trying to solve, the later will help.

Terminate long running python threads

What is the recommended way to terminate unexpectedly long running threads in python ? I can't use SIGALRM, since
Some care must be taken if both
signals and threads are used in the
same program. The fundamental thing to
remember in using signals and threads
simultaneously is: always perform
signal() operations in the main thread
of execution. Any thread can perform
an alarm(), getsignal(), pause(),
setitimer() or getitimer(); only the
main thread can set a new signal
handler, and the main thread will be
the only one to receive signals
(this is enforced by the Python signal
module, even if the underlying thread
implementation supports sending
signals to individual threads). This
means that signals can’t be used as a
means of inter-thread
communication.Use locks instead.
Update: each thread in my case blocks -- it is downloading a web page using urllib2 module and sometimes operation takes too many time on an extremely slow sites. That's why I want to terminate such slow threads

Since abruptly killing a thread that's in a blocking call is not feasible, a better approach, when possible, is to avoid using threads in favor of other multi-tasking mechanisms that don't suffer from such issues.
For the OP's specific case (the threads' job is to download web pages, and some threads block forever due to misbehaving sites), the ideal solution is twisted -- as it generally is for networking tasks. In other cases, multiprocessing might be better.
More generally, when threads give unsolvable issues, I recommend switching to other multitasking mechanisms rather than trying heroic measures in the attempt to make threads perform tasks for which, at least in CPython, they're unsuitable.

As Alex Martelli suggested, you could use the multiprocessing module. It is very similar to the Threading module so that should get you off to a start easily. Your code could be like this for example:
import multiprocessing
def get_page(*args, **kwargs):
# your web page downloading code goes here
def start_get_page(timeout, *args, **kwargs):
p = multiprocessing.Process(target=get_page, args=args, kwargs=kwargs)
p.start()
p.join(timeout)
if p.is_alive():
# stop the downloading 'thread'
p.terminate()
# and then do any post-error processing here
if __name__ == "__main__":
start_get_page(timeout, *args, **kwargs)
Of course you need to somehow get the return values of your page downloading code. For that you could use multiprocessing.Pipe or multiprocessing.Queue (or other ways available with multiprocessing). There's more information, as well as samples you could check here.
Lastly, the multiprocessing module is included in python 2.6. It is also available for python 2.5 and 2.4 at pypi (you can use easy_install multiprocessing) or just visit pypi and download and install the packages manually.
Note: I realize this has been posted awhile ago. I was having a similar problem to this and stumbled here and saw Alex Martelli's suggestion. Had it implemented for my problem and decided to share it. (I'd like to thank Alex for pointing me in the right direction.)

Use synchronization objects and ask the thread to terminate. Basically, write co-operative handling of this.
If you start yanking out the thread beneath the python interpreter, all sorts of odd things can occur, and it's not just in Python either, most runtimes have this problem.
For instance, let's say you kill a thread after it has opened a file, there's no way that file will be closed until the application terminates.

If you are trying to kill a thread whose code you do not have control over, it depends if the thread is in a blocking call or not. In my experience if the thread is properly blocking, there is no recommended and portable way of doing this.
I've run up against this when trying to work with code in the standard library (multiprocessing.manager I'm looking at you) with loops coded with no exit condition: nice!
There are some interuptable thread implementations out there (see here for an example), but then, if you have the control of the threaded code yourself, you should be able to write them in a manner where you can interupt them with a condition variable of some sort.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.