I'm trying to create a background service in Python. The service will be called from another Python program. It needs to run as a daemon process because it uses a heavy object (300MB) that has to be loaded previously into the memory. I've had a look at python-daemon and still haven't found out how to do it. In particular, I know how to make a daemon run and periodically do some stuff itself, but I don't know how to make it callable from another program. Could you please give some help?
I had a similar situation when I wanted to access a big binary matrix from a web app.
Of course there are many solutions, but I used Redis, a popular in-memory database/cache system, to store and access my object successfully. It has practical Python bindings (several probably equivalent wrapper libraries).
The main advantage is that when the service goes down, a copy of the data still remains on disk. Also, I noticed that once in place, it could be used for other things in my app (for instance Celery proposes it as backend), and actually, for other services in any other unrelated program.
Related
Edit for clarify my question:
I want to attach a python service on uwsgi using this feature (I can't understand the examples) and I also want to be able to communicate results between them. Below I present some context and also present my first thought on the communication matter, expecting maybe some advice or another approach to take.
I have an already developed python application that uses multiprocessing.Pool to run on demand tasks. The main reason for using the pool of workers is that I need to share several objects between them.
On top of that, I want to have a flask application that triggers tasks from its endpoints.
I've read several questions here on SO looking for possible drawbacks of using flask with python's multiprocessing module. I'm still a bit confused but this answer summarizes well both the downsides of starting a multiprocessing.Pool directly from flask and what my options are.
This answer shows an uWSGI feature to manage daemon/services. I want to follow this approach so I can use my already developed python application as a service of the flask app.
One of my main problems is that I look at the examples and do not know what I need to do next. In other words, how would I start the python app from there?
Another problem is about the communication between the flask app and the daemon process/service. My first thought is to use flask-socketIO to communicate, but then, if my server stops I need to deal with the connection... Is this a good way to communicate between server and service? What are other possible solutions?
Note:
I'm well aware of Celery, and I pretend to use it in a near future. In fact, I have an already developed node.js app, on which users perform actions that should trigger specific tasks from the (also) already developed python application. The thing is, I need a production-ready version as soon as possible, and instead of modifying the python application, that uses multiprocessing, I thought it would be faster to create a simple flask server to communicate with node.js through HTTP. This way I would only need to implement a flask app that instantiates the python app.
Edit:
Why do I need to share objects?
Simply because the creation of the objects in questions takes too long. Actually, the creation takes an acceptable amount of time if done once, but, since I'm expecting (maybe) hundreds to thousands of requests simultaneously having to load every object again would be something I want to avoid.
One of the objects is a scikit classifier model, persisted on a pickle file, which takes 3 seconds to load. Each user can create several "job spots" each one will take over 2k documents to be classified, each document will be uploaded on an unknown point in time, so I need to have this model loaded in memory (loading it again for every task is not acceptable).
This is one example of a single task.
Edit 2:
I've asked some questions related to this project before:
Bidirectional python-node communication
Python multiprocessing within node.js - Prints on sub process not working
Adding a shared object to a manager.Namespace
As stated, but to clarify: I think the best solution would be to use Celery, but in order to quickly have a production ready solution, I trying to use this uWSGI attach daemon solution
I can see the temptation to hang on to multiprocessing.Pool. I'm using it in production as part of a pipeline. But Celery (which I'm also using in production) is much better suited to what you're trying to do, which is distribute work across cores to a resource that's expensive to set up. Have N cores? Start N celery workers, which of which can load (or maybe lazy-load) the expensive model as a global. A request comes in to the app, launch a task (e.g., task = predict.delay(args), wait for it to complete (e.g., result = task.get()) and return a response. You're trading a little bit of time learning celery for saving having to write a bunch of coordination code.
I have been looking at daemons for Linux such as httpd and have also looked at some code that can be used as a skeleton. I have done a fair amount of research and now I want to practice writing it. However, I'm not sure of what can I use a daemon for. Any good examples/ideas that I can try to execute?
I was thinking of using a daemon along with libnotify on Ubuntu to have pop-up notifications of select tweets.
Is this a bad example for implementing a daemon?
Will you even need a daemon for this?
Can this be implemented as a service rather than a daemon?
First: PEP 3143 tries to enumerate all of the fiddly details you have to get right to write a daemon in Python. And it specifies a library that takes care of those details for you.
The PEP was deferred—at least in part because the community felt it was more a responsibility of POSIX or some Linux standards group or something to first define exactly what is essential to being a daemon, before Python could have its own position on how to implement one. But it's still a great guide. However, the reference implementation of that proposed library still lives on, as python-daemon, which you can install from PyPI.
Meanwhile, the really interesting question for this project isn't so much service vs. daemon, as root vs. user. Do you want a single process that keeps track of all users' twitter accounts, and sends notifications to anyone who's logged in? Just a per-user process? Or maybe both, a single process watching all the tweets, then sending notifications via user processes?
Of course you don't really need a daemon or service for this. For example, it could be a GUI app whose main window is a configuration dialog, which keeps running (maybe with a traybar thingy) even when you close the config dialog, and it would work just as well. The question isn't whether you need a daemon, but whether it's more appropriate. Which really is a design choice.
Ok, so, I might be missing the plot a bit here, but would really like some help. I am quite new to development etc. and have now come to a point where I need to implement DBus (or some other inter-program communication). I am finding the concept a bit hard to understand though.
My implementation will be to use an HTML website to change certain variables to be used in another program, thus allowing for the program to be dynamically changed in its working. I am doing this on a raspberry PI using Raspbian. I am running a webserver to host my website, and this is where the confusion comes in.
As far as I understand, DBus runs a service which allows you to call methods from a program in another program. So does this mean that my website needs to run a DBUS service to allow me to call methods from it into my program? To complicate things a bit more, I am coding in Python, so I am not sure if I can run a Python script on my website that would allow me to run a DBUS service. Would it be better to use JavaScript?
For me, the most logical solution would be to run a single DBUS service that somehow imports method from different programs and can be queried by others who want to run those methods. Is that possible?
Help would be appreciated!
Thank you in advance!
So does this mean that my website needs to run a DBUS service to
allow me to call methods from it into my program?
A dbus background process (a daemon) would run on your web server, yes.
In fact dbus provides two daemons. One is a system daemon which permits
objects to receive system information (e.g. printer availability for exampple)
and the second is a general user application to application IPC daemon. It is the
second daemon that you definitely use for different applications to communicate.
I am coding in Python, so I am not sure if I can run a Python script
on my website that would allow me to run a DBUS service.
There is no problem using python; dbus has bindings for many languages (e.g Java, perl, ruby, c++, Python). dbus objects can be mapped to python objects.
the most logical solution would be to run a single DBUS service that
somehow imports method from different programs and can be queried by
others who want to run those methods. Is that possible?
Correct - dbus provides a mechanism by which a client process will create dbus object or objects which allow that process to other services to other dbus-aware processes.
This sounds like you should write an isolated D-Bus service to act as a datastore, and communicate synchronously with it in your scripts to write and read values. You can use shelve to persist the values between service invocations.
In the tutorial, the "Making method calls" section covers synchronous calls, and "Exporting objects" covers writing most of the D-Bus service.
I'm trying to do some machinery automation with python, but I've run into a problem.
I have code that does the actual control, code that logs, code the provides a GUI, and some other modules all being called from a single script.
The issue is that an error in one module halts all the others. So, for instance a bug in the GUI will kill the control systems.
I want to be able to have the modules run independently, so one can crash, be restarted, be patched, etc without halting the others.
The only way I can find to make that work is to store the variables in an SQL database, or files or something.
Is there a way for one python script to sort of ..debug another? so that one script can read or change the variables in the other? I can't find a way to do that that also allows to scripts to be started and stopped independently.
Does anyone have any ideas or advice?
A fairly effective way to do this is to use message passing. Each of your modules are independent, but they can send and receive messages to each other. A very good reference on the many ways to achieve this in Python is the Python wiki page for parallel processing.
A generic strategy
Split your program into pieces where there are servers and clients. You could then use middleware such as 0MQ, Apache ActiveMQ or RabbitMQ to send data between different parts of the system.
In this case, your GUI could send a message to the log parser server telling it to begin work. Once it's done, the log parser will send a broadcast message to anyone interested telling the world the a reference to the results. The GUI could be a subscriber to the channel that the log parser subscribes to. Once it receives the message, it will open up the results file and display whatever the user is interested in.
Serialization and deserialization speed is important also. You want to minimise the overhead for communicating. Google Protocol Buffers and Apache Thrift are effective tools here.
You will also need some form of supervision strategy to prevent a failure in one of the servers from blocking everything. supervisord will restart things for you and is quite easy to configure. Again, it is only one of many options in this space.
Overkill much?
It sounds like you have created a simple utility. The multiprocessing module is an excellent way to have different bits of the program running fairly independently. You still apply the same strategy (message passing, no shared shared state, supervision), but with different tactics.
You want multiply independent processes, and you want them to talk to each other. Hence: read what methods of inter-process communication are available on your OS. I recommend sockets (generic, will work over a n/w and with diff OSs). You can easily invent a simple (maybe http-like) protocol on top of TCP, maybe with json for messages. There is a bunch of classes coming with Python distribution to make it easy (SocketServer.ThreadingMixIn, SocketServer.TCPServer, etc.).
This is my first hack at doing any system-level programming (mostly a LAMPhp, specifically Drupal, web dev up to this point).
Because of availability of a library with a very specific feature, I am using Python for an upcoming project. I need to run, restart as needed, monitor and respond to the output of multiple Python script processes, controlled ideally via a HTTP API from another master program which keeps a database of processes that need to be running, and some metadata about those processes (parameters, pid, etc). I'm planning on building this master program in PHP as I have far more experience in it, hence the want for a nice HTTP API.
Is there some best practice for this type of system? Some initial research lead me to supervisord (which has XML-RPC built in, apparently), but I thought I'd check the wisdom of the masses who've actually been down this road before moving forward with testing.
I can't say I have been down this road, but I am working to go down this road. I would look into the multiprocessing libraries for Python. There are network transparent libraries. A couple of routes you could take with those:
1. Create a process that controls all of the other processes. Make this process a server you can control with your PHP.
2. Determine how to get PHP to communicate to these networked Python processes. They may still need to be launched from a central Python process however.