Python threading: Python gevent using long running C extensions

Python threading: Python gevent using long running C extensions - python

I have a flask app that uses a large integer optimization GoogleOrTools. The underlying package is writen in C but there is a python API that I use to call it. When I get a very long running task it seems as if it blocks other tasks and server becomes unresponsive. The obvious solution is naturally to put in a different service. Which I will impliment when I have the time.
But I'm curious if there is something I'm missing as I read Python threading after every x lines of byte code the thread changes, which should allow the other process to take over? But does that also happen in the case of Gevent? Or if it is acturaly allowing the other threads, but it might greedyly capture all the resources?
It would be really nice if someone with some inside could tell me what is and is not posible.

Related

Multiprocessing | Multithreading ffmpeg in python

I am developing a python WSGI script to interface with an HDHomeRun Prime. In a perfect world it will pass URI values as commands to FFMPEG and display the resulting stream in a browser. I have the "show stuff in browser" and the "pass instructions to FFMPEG" parts working fine, but I do not have them working simultaneously.
1) Given that this middleware is being used to transcode MPEG-2 to h.264, does it make more sense to use multiprocessing or multithread to start and stop the respective processes?
2) If the WSGI script is brokering the initiation of FFMPEG feeds (if the input feed isn't already brokered) and connecting clients to the associated FFServer streams, does mean I'll need to use some sort of pool to keep track of the middleware's activities?

I don't really understand your whole process, but IMO you should start with multithreading, as it is much easier to setup (variables are shared like usual in Python). IF that doesn't meet your requirement (e.g not fast enough), you can move to multiprocessing but it will increase the complexity if you never used multiprocessing in Python (no communication between process, need to use queues or shared variables).
Setup your threads :
import threading
a = threading.Thread(target = func, args=(vars))
a.start()
A nice tutorial here.
You should also know about python's GIL to understand what you are doing in Threading/multiprocessing .

Calling Python code from Twisted

First of all, I should say that this might more a design question rather than about code itself.
I have a network with one Server and multiple Clients (written in Twisted cause I need these asynchronous non-blocking features), such server-client couple it's just only receiving-sending messages.
However, at some point, I want one client to run a python file when received a certain message. That client should keep listening and talking to the Server, also, I should be able to stop that file if needed, so my first thought is starting a thread for that python file and forget about it.
At the end it should go like this: Server sends message to ClientA, ClientA and its dataReceived function interprets the message and decides to run that python file (which I don't know how long will take and maybe contains blocking calls), when that python file finishes running should send the result to ClientB.
So, questions are:
Would it be starting a thread a good idea for that python file in ClientA?
As I want to send the result of that python file to ClientB, can I have another reactor loop inside that python file?
In any case I would highly appreciate any kind of advice as both python and twisted are not my specialty and all these ideas may not be the best ones.
Thanks!

At first reading, I though you were implying twisted isn't python. If you are thinking that, keep in mind the following:
Twisted is a python framework, I.E. it is python. Specifically it's about getting the most out of a single process/thread/core by allowing the programmer to hand-tune the scheduling/order-of-ops in their own code (which is nearly the opposite of the typical use of threads).
While you can interact with threads in twisted, its quite tricky to do without ruining the efficiency of twisted. (for longer description of threads vs events see SO: https://stackoverflow.com/a/23876498/3334178 )
If you really want to spawn your new python away from your twisted python (I.E. get this work running on a different core) then I would look at spawning it off as a process, see Glyph's answer in this SO: https://stackoverflow.com/a/5720492/3334178 for good libraries to get that done.
Processes give the proper separation to allow your twisted apps to run without meaningful slowdown and you should find all your starting/stoping/pausing/killing needs will be fulfilled.
Answering your questions specifically:
Would it be starting a thread a good idea for that python file in ClientA?
I would say "No" its generally not a good idea, and in your specific case you should look at using processes instead.
Can I have another reactor loop inside that python file?
Strictly speaking "no you can't have multiple reactors" - but what twisted can do is concurrently manage hundreds or thousands of separate tasks, all in the same reactor, will do what you need. I.E. run all your different async tasks in one reactor, that is what twisted is built for.
BTW I always recommend the following tutorial for twisted: krondo Twisted Introduction http://krondo.com/?page_id=1327 Its long, but if you get through it this kind of work will become very clear.
All the best!

Python: Continuously and cancelably repeat execution with fixed interval

What is the best way to continuously repeat the execution of a given function at a fixed interval while being able to terminate the executor (thread or process) immediately?
Basically I know two approaches:
use multiprocessing and function with infinite cycle and time.sleep at the end. Processing is terminated with process.terminate() in any state.
use threading and constantly recreate timers at the end of the thread function. Processing is terminated by timer.cancel() while sleeping.
(both “in any state” and “while sleeping” are fine, even though the latter may be not immediate). The problem is that I have to use both multiprocessing and threading as the latter appears not to work on ARM (some fuzzy interaction of python interpreter and vim, outside of vim everything is fine) (I was using the second approach there, have not tried threading+cycle; no code is currently left) and the former spawns way too many processes which I would like not to see unless really required. This leads to a problem of having to code two different approaches while threading with cycle is just a few more imports for drop-in replacements of all multiprocessing stuff wrapped in if/else (except that there is no thread.terminate()). Is there some better way to do the job?
Currently used code is here (currently with cycle for both jobs), but I do not think it will be much useful to answer the question.
Update: The reason why I am using this solution are functions that display file status (and some other things like branch) in version control systems in vim statusline. These statuses must be updated, but updating them immediately cannot be done without using hooks and I have no idea how to set hooks temporary and remove on vim quit without possibly spoiling user configuration. Thus standard solution is cache expiring after N seconds. But when cache expired I need to do an expensive shell call and the delay appears to be noticeable, the more noticeable the heavier IO load is. What I am implementing now is updating values for viewed buffers each N seconds in a separate process thus delays are bothering that process and not me. Threads are likely to also work because GIL does not affect calls to external programs.

I'm not clear on why a single long-lived thread that loops infinitely over the tasks wouldn't work for you? Or why you end up with many processes in the multiprocess option?
My immediate reaction would have been a single thread with a queue to feed it things to do. But I may be misunderstanding the problem.

I do not know how do it simply and/or cleanly in Python, but I was wondering if maybe you couldn't take avantage of an existing system scheduler, e.g. crontab for *nix system.
There is an API in python and it might satisfied your needs.

Kill hanging function in Python in multithreaded enviorment

I would like to kill a function that executes to long. What is important this function is inside C extension (wrapped in Cython), and I would like this solution to work in multithreaded enviorment. Since it is wrapped in Cython this thread could hold GIL.
I have no control whatsoever on what is happening inside this extension (and I think that this code will not respond to interrupts).
I'm fairly certain that this code will be only run on Unix machines. But question Python kill hanging function does not apply because I think that signals would not work in multithreaded enviorment (AFAIK it is undefined which thread will catch them) --- but I might be wrong on this one :) so correct me.
Is there any way for me to resolve this without spawning new processes.

My solution is to wrap this function in another python process and if needed kill that process.
A piece of advice for anyone who googles this question: since process startup time (starting interpreter, loading modules and then loading data into memory) can last couple of seconds you need to group your function calls so this overhead won;t kill you (so there is no reusable solution really).
Example solution, was posted to na another question: How to interrupt native extension code without killing the interpreter?.

Tasks queue process in python

Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?

You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.

This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.

I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.