What packages should I look at for writing a python daemon and processing jobs? Also, what do I need to do for a python daemon?
I'm pretty happy with beanstalkd, which has client libraries available in various languages:
Daemon:
http://kr.github.com/beanstalkd/
Python client library:
http://code.google.com/p/pybeanstalk/
Your question is a bit ambiguous, but I'm assuming you mean you would like to write a python daemon that will process jobs that get thrown in a queue. If not, please say as much. :-)
I've heard a lot of great things about redis. The folks at github built resque as a job processing daemon for Ruby. If you're language flexible, you could just use that, but if you're not, you could emulate it in as much or as little depth as you like making use of redis as your queue system. Depending on how pluggable and extensible you need it to be, this could be a really simple thing to implement.
Another option I ran across after some more googling is redqueue. It looks like it might already implement most of a job queue.
If you're using django, you may wish to consider the Celery project. It's a job queue system based on RabbitMQ which is yet another queuing server with excellent reviews.
As far as creating a daemon in python, there are a number of options. You can look at this page on activestate, which is a good start. Better yet, you can use python-daemon to do it all for you. But if you use one of the above options or beanstalkd as recommended by mczepiel, you probably won't have to make your process run as a daemon.
I have recently (this week) implemented a queue in RabbitMQ with a python daemon extracting the information and storing it on a database (using Django ORM). The daemon has a intermediate buffer so it will wait a little and write in the database in batches, instead of writing each time a little message arrives.
I've made the integration with the queue using this little flopsy module, which is easy to set up. The only problem I've got it to be able to set up a timeout for waiting a message, as the module has not a clear way of doing that. After a while playing with the interactive shell and making a few dir(), I manage to get to the socket object and set up the timeout.
I considered also Celery, but seems to be more focused on using internally a RabbitMQ to allow you to launch tasks (periodically or asynchronously), more that using a queue to communicating with other systems. In our case, the queue can be feed both by Python systems and Ruby ones.
Once I've completed the process, I've made some adjustments to allow running it as a daemon (mostly storing the standard output to a file to allow easy logging) and then create a bash script that launch a start-stop-daemon command. I've followed more or less this schema
I discovered python-daemon just about one day late, so after the work is done it makes no sense revisiting it, but maybe it makes more sense for a Python project.
Related
I am running a number of FastAPI instances with uvicorn with python's subprocess.Popen. I have a small GUI made with PySimpleGUI with which I want to be able to close servers and restart them at will.
The first problem I encountered is that, at least in Windows, starting the uvicorn server appear to create not one, but two, new processes, and calling Popen.terminate() only closes one of these processes, which does not free up the port associated with the server. I fixed this problem using the psutil package to check what new processes have been created after I instantiate a Popen object, and track and terminate the second process with psutil.
What is still a major problem, is that calling psutil.terminate() on the process does not call the FastAPI function under #app.on_event("shutdown"). In the past, we have run all of our servers in individual terminal windows, and find that ctrl-c on those terminal windows will call the shutdown event, but I have found no other way of doing so. ctrl-c on my interface will obviously take down the interface and all the servers, and is somewhat unreliable in hitting the shutdown events for all the servers. My other idea was use psutil.send_signal(signal.CTRL_C_EVENT), but this has the same effect as calling ctrl-c in terminal.
So I am at a loss. I have seen multiple posts around saying that this is a general shortcoming of uvicorn, but have not seen anything that directly confirms my own experience or offers a solution. I also know that the "shutdown" and "startup" events in FastAPI are ported in from Starlette, and are not very well documented in either package. I have seen suggestions to use guvicorn, but my brief look into that confirmed that it is not compatible with windows. Any suggestions?
TL;DR:
APIs are meant to be long-running processes
there is a whole industry around virtualizing to manage the orchestration automatically of when to start or stop a service
there is also "serverless" infrastructure you can hang any of your processes with you not having to spend any effort in this field as it is not meant to be a thing
If you still want to go against everyone else and do manage it your self you can do as this answered question
##### SOLUTION #####
pid = proc.pid
parent = psutil.Process(pid)
for child in parent.children(recursive=True):
child.kill()
##### SOLUTION END ####
A bit of explanation:
From the conception of Rest API as an architecture pattern it was meant to be awaiting always for user's requests coming over the web. it has never been the general intent to manage gracefully and develop a product to handle gracefully the shut down of something that "was meant to run forever" and we build processes to do work to keep it running 24/7/365 as an industry.
If you ever want to leverage the ability to start or stop one to many APIS simultaneously withing the same device is highly recommended d you at least go with something like containers and Kubernetes and just scripting commands against the CLI of Kubernetes for such purpose. In exchange for the extra effort you will gain process isolation from others and the base OS layer ( which will still be less effort than building all that tooling yourself on your own.
My personal favorite is not doing even that and going straight with lambdas as is way easier and better in so many ways. Don't take it from me but from one of the industry-leading companies, Cloudflare and their statements on the subject
Serverless computing offers a number of advantages over traditional cloud-based or server-centric infrastructure. For many developers, serverless architectures offer greater scalability, more flexibility, and quicker time to release, all at a reduced cost.
First of all, I should say that this might more a design question rather than about code itself.
I have a network with one Server and multiple Clients (written in Twisted cause I need these asynchronous non-blocking features), such server-client couple it's just only receiving-sending messages.
However, at some point, I want one client to run a python file when received a certain message. That client should keep listening and talking to the Server, also, I should be able to stop that file if needed, so my first thought is starting a thread for that python file and forget about it.
At the end it should go like this: Server sends message to ClientA, ClientA and its dataReceived function interprets the message and decides to run that python file (which I don't know how long will take and maybe contains blocking calls), when that python file finishes running should send the result to ClientB.
So, questions are:
Would it be starting a thread a good idea for that python file in ClientA?
As I want to send the result of that python file to ClientB, can I have another reactor loop inside that python file?
In any case I would highly appreciate any kind of advice as both python and twisted are not my specialty and all these ideas may not be the best ones.
Thanks!
At first reading, I though you were implying twisted isn't python. If you are thinking that, keep in mind the following:
Twisted is a python framework, I.E. it is python. Specifically it's about getting the most out of a single process/thread/core by allowing the programmer to hand-tune the scheduling/order-of-ops in their own code (which is nearly the opposite of the typical use of threads).
While you can interact with threads in twisted, its quite tricky to do without ruining the efficiency of twisted. (for longer description of threads vs events see SO: https://stackoverflow.com/a/23876498/3334178 )
If you really want to spawn your new python away from your twisted python (I.E. get this work running on a different core) then I would look at spawning it off as a process, see Glyph's answer in this SO: https://stackoverflow.com/a/5720492/3334178 for good libraries to get that done.
Processes give the proper separation to allow your twisted apps to run without meaningful slowdown and you should find all your starting/stoping/pausing/killing needs will be fulfilled.
Answering your questions specifically:
Would it be starting a thread a good idea for that python file in ClientA?
I would say "No" its generally not a good idea, and in your specific case you should look at using processes instead.
Can I have another reactor loop inside that python file?
Strictly speaking "no you can't have multiple reactors" - but what twisted can do is concurrently manage hundreds or thousands of separate tasks, all in the same reactor, will do what you need. I.E. run all your different async tasks in one reactor, that is what twisted is built for.
BTW I always recommend the following tutorial for twisted: krondo Twisted Introduction http://krondo.com/?page_id=1327 Its long, but if you get through it this kind of work will become very clear.
All the best!
In python, is there a cross platform way of creating something similar to Windows named Event in one process, and set it from another process to signal something to the first one?
My specific problem is that I need to create a process that on startup will check if any other instances of itself are running, and if so, signal them to quit. With Windows API I would use CreateEvent with the lpName parameter, and SetEvent.
I've spent about a day now searching for a good answer to this and here is what I am coming up with at this moment:
It is possible to use signals to indicate to the process that some change needs to take place, however in a more complex legacy codebase I am dealing with it causes the process to crash. Signaling interrupts various I/O processes and alike based on python signal docs. You can implement signal handler with signal.SIGUSR1
import signal
def signal_handler(signum, stack):
print('Signal %d received'%signum)
signal.signal(signal.SIGUSR1, signal_handler)
This code can be triggered in Linux et al. through:
$ kill -s SIGUSR1 $pid
I am presently leaning towards kazoo Python Zookeeper library. It requires to stand up Zookeeper as infrastructure.
I do have an additional need for toggling configuration values in my case. However Zookeeper supports a number of interprocessor communication tools that will serve your needs.
UPDATE:
I finally settled on a named pipe (FIFO), calling it inside a thread with readline.
if not os.path.exists(fifo_name):
os.mkfifo(fifo_name)
while True:
with open(fifo_name, 'r') as config_fifo:
line = config_fifo.readline()[:-1]
print(line)
I used tempfile.gettempdir() to find a good location to place the FIFO in the file system. It requires quite a bit of refinement however, since I did not care to parse passed content while you might. Also if you are planing on having more then one consumer of the event you are going to have it propagated to only one consumer as it is a queue.
It seems to me that this is not so much a question as to whether this is possible in Python, but whether such a cross-platform approach exists: if one does, then even if no directly written Python exists, one can always make system calls using subprocess.call() and the like.
As for whether it's a possibility, I can't profess to be much of an expert, but a bit of a search has thrown up these discussions which might prove helpful to you.
This is more of a Python general question however in a context of django.
For now I have this view in django which has to process a lot of data. Usually it takes the server (nginx with django running in proxy using) a couple of minutes to do it. Sometimes the server times out. I don't want to increase the time-out time in nginx. I realize that if I can fork a process in python in the django view so that the forked (child) process will do all the data crunching independently of the django view, then the view would be able to return the request to the user immediately (therefore never timing-out) and the child process would continue running in the background finishing up all the calculation.
So here is the question:
How can I fork an independent process in python (and if possible for the python code to be in the same file)? And if possible how can I assign a unix process priority level to it?
I looked at some of the ways of forking a process in python and it seems there are a few options. Which one is the best appropriate for this scenario?
Thank you.
the 'best practice' answer is to use a queue manager, typically RabbitMQ or any backend handled by Django-celery.
Still, there are a few lighter options that do spawn a new thread. what these options usually lack is some way to track progress, or keep the number of threads under control.
check Django-utils to see if it's enough. if not, go for Celery.
If you really want to fork and set priority, you can use os.fork and os.nice, but I think the multiprocessing module or Celery would be more applicable here.
I've just begun learning sockets with Python. So I've written some examples of chat servers and clients. Most of what I've seen on the internet seems to use threading module for (asynchronous) handling of clients' connections to the server. I do understand that for a scalable server you need to use some additional tricks, because thousands of threads can kill the server (correct me if I'm wrong, but is it due to GIL?), but that's not my concern at the moment.
The strange thing is that I've found somewhere in Python documentation that creating subprocesses is the right way (unfortunately I've lost the reference, sorry :( ) for handling sockets.
So the question is: to use threading or multiprocessing? Or is there even better solution?
Please, give me the answer and explain the difference to me.
By the way: I do know that there are things like Twisted which are well-written.
I'm not looking for a pre-made scalable server, I am instead trying to understand how to write one that can be scaled or will deal with at least 10k clients.
EDIT: The operating system is Linux.
Facebook needed a scalable server so they wrote Tornado (which uses async). Twisted is also famously scalable (it also uses async). Gunicorn is also a top performer (it uses multiple processes). None of the fast, scalable tools that I know about uses threading.
An easy way to experiment with the different approaches is to start with the SocketServer module in the standard library: http://docs.python.org/library/socketserver.html . It lets you easily switch approaches by alternately inheriting from either ThreadingMixin or ForkingMixin.
Also, if you're interested in learning about the async approach, the easiest way to build your understanding is to read a blog post discussing the implementation of Tornado: http://golubenco.org/2009/09/19/understanding-the-code-inside-tornado-the-asynchronous-web-server-powering-friendfeed/
Good luck and happy computing :-)
thousands of threads can kill the server (correct me if I'm wrong, but is it due to GIL?)
For one thing, GIL has nothing to do with no. of threads. If you're are doing IO within these threads, you could have hundreds of thousands of these threads without any problem from GIL or otherwise.
GIL comes into play when you have CPU intensive tasks.
See this very informative talk from David Beazly to know more about GIL.