I have an APScheduler application running in the background and wanted to be able to interrupt certain jobs after they started. I want to do this to have more control over the processes and to prevent them from affecting other jobs.
I looked up for solutions using APScheduler itself, but what I want seems to be more like interrupting a function that is already running.
Is there a way to do this while not turning off my program? I don't want to delete the job or to stop the application as a whole, just to stop the process at specific and unpredictable situations.
On APScheduler 3, the only way to do this is to set a globally accessible variable or a property thereof to some value that is then explicitly checked by the target function. So your code will have to support the notion of being interrupted. On APScheduler 4 (not released yet, as of this writing), there is going to be better cancellation support, so coroutine-based jobs can be externally cancelled without explicitly supporting this. Regular functions still need to have explicit checks but there will now be a built-in mechanism for that.
Related
What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.
I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.
I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.
I've written a script that uses two thread pools of ten threads each to pull in data from an API. The thread pool implements this code on ActiveState. Each thread pool is monitoring a Redis database via PubSub for new entries. When a new entry is published, python passes the data to a function that uses python's Subprocess.POpen to execute a PHP shell to do the actual work of calling the API.
This system of launching PHP shells is necessary for functionality with my PHP web app, so launching PHP shells with Python can't be avoided.
This script will only be running on Linux servers.
How do I control the niceness (scheduling priority) of the application's threads?
Edit:
It seems controlling scheduling priority for individual threads in Python isn't possible. Is there a python solution, or at the very least a UNIX command I can run along with my script, to control the priority?
Edit 2:
Well I didn't end up finding a python way to handle it. I'm just running my script with nice now like this:
nice -n 19 python MyScript.py
I believe that threading priority is not controllable in python due to how they are implemented using a global interpreter lock (GIL). Having said that, even if you could give one thread more CPU processing priority, the python implementation that hands around the GIL would not be aware of this as it handed around the GIL. If you were able to increase niceness in a single thread in your pool (say it is doing a more important job) you would need to use your own implementation of locks to give the higher priority thread access to the GIL more often.
A google search returns this article which I believe is similar to what you are asking
Explains why it doesnt work
http://www.velocityreviews.com/forums/t329441-threading-priority.html
Explains the workaround I was suggesting
http://bytes.com/topic/python/answers/645966-setting-thread-priorities
The python threading-docs mention explicitly that there is no support for setting thread-priorities:
The design of this module is loosely based on Java’s threading model. However, where Java makes locks and condition variables basic behavior of every object, they are separate objects in Python. Python’s Thread class supports a subset of the behavior of Java’s Thread class; currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted. The static methods of Java’s Thread class, when implemented, are mapped to module-level functions.
It doesn't work, but I tried:
getting the parent pid and priority
launching threads using concurrent.futures.ThreadPoolExecutor
using ctypes to get the (linux) thread id from within the thread(works)
using the tid with os.setpriority(os.PRIO_PROCESS,tid,parent_priority+1)
calling pool.shutdown() from the parent.
Even with liberal sprinkling of os.sched_yield(), the child threads never actually run past the setpriority().
Reading man pages, it seems threads don't have the capability to change (even their) scheduling priority; you have to do something with "capabilities" to give the thread the "CAP_SYS_NICE" capability. Running the process with root permissions didn't help either; child threads still don't run.
I know, a lot of time has passed, but I recently came across this question, and I thought it would be useful to add another option.
Have a look at threading2, which is a drop-in replacement and extension for the default threading module, with support – sort of – for priority and affinity.
I was wondering if this answer at another related question might be useful in this scenario? (link)
As you are already using Subprocess.POpen to launch your PHP script, it strikes me that you can use "preexec_fn" and either a predefined function, or a lambda function (as demonstrated in the above linked answer) to set the nice level of each launched PHP thread?
I've been playing around with the Tornado chat demo. At a casual glance it seems like the new_messages method is not thread safe - it seems like items might get added to the waiters array while that same array is being iterated in the for loop.
Is this demo not thread safe? Or, is it thread safe simply because the Python set object is itself thread-safe? Are Python set objects thread safe? I seem to find conflicting opinions on this question (and the word set is demonically difficult to search for effectively in Google!)
Bonus points - why is the waiters array set to a new set at the conclusion of the iteration instead of emptying the set?
There are no threads involved in Tornado applications by default. Tornado is an event based system, so there is only one execution path. The thing that you will need to figure out about tornado is at what moments you yield execution back to the IOLoop.
While the GIL does protect against a class of thread errors, you still can write applications that access and modify the data outside the access path of the program.
It's thread-safe simply because pure Python is always thread-safe. Due to the global interpreter lock, only one Python thread is running at any one time.
I'm designing a long running process, triggered by a Django management command, that needs to run on a fairly frequent basis. This process is supposed to run every 5 min via a cron job, but I want to prevent it from running a second instance of the process in the rare case that the first takes longer than 5 min.
I've thought about creating a touch file that gets created when the management process starts and is removed when the process ends. A second management command process would then check to make sure the touch file didn't exist before running. But that seems like a problem if a process dies abruptly without properly removing the touch file. It seems like there's got to be a better way to do that check.
Does anyone know any good tools or patterns to help solve this type of issue?
For this reason I prefer to have a long-running process that gets its work off of a shared queue. By long-running I mean that its lifetime is longer than a single unit of work. The process is then controlled by some daemon service such as supervisord which can take over control of restarting the process when it crashes. This delegates the work appropriately to something that knows how to manage process lifecycles and frees you from having to worry about the nitty gritty of posix processes in the scope of your script.
If you have a queue, you also have the luxury of being able to spin up multiple processes that can each take jobs off of the queue and process them, but that sounds like it's out of scope of your problem.
Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?
You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.
This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.
I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.