We have a Django app that uses Celery's apply_async() call to send tasks to our RabbitMQ server. The problem is that when there are thousands of requests coming into the Django app, each apply_async() call will cause it to open thousands of new connections to the RabbitMQ server.
In the Celery documentation for apply_async, there is a connection parameter:
connection – Re-use existing broker connection instead of establishing a new one.
My question is, how can I use it in a Django app? I cannot find any examples of how to use this. We are running Django using Gunicorn, ideally we would like to have each worker create one connection to the broker and re-use it between requests. In this way, the number of connections opened on the broker is limited by the amount of workers running.
Related
I have a python application which interacts with vertica database through vertica python client. Currently there is no connection pool to manage the connections, instead for every request a new connection is opened and then closed at the end of the request. However, this design will cost to handle concurrent requests. Also, the python application is run on a uwsgi and an Nginx server to process multiple requests.
I would like to use an existing connection pool to handle connections to vertica from python but I dont seem to find connection pools like C3Po or Hikari in python. Could you please help me with the pools for python - vertica
For native Postgres, have a look at some of the connection pools discussed at Should PostgreSQL connections be pooled in a Python web app, or create a new connection per request?
For Vertica, it doesn't look like connection pooling is available in the native driver though it might be worth posting an issue on GitHub if you'd like more specific details. You could probably use Vertica's ODBC driver through pyODBC since that supports connedction pooling if configured as discussed at http://www.unixodbc.org/doc/conn_pool.html
Some background info, I'm building a web application on top of Pyramid Web Framework. In production, I use CherryPy as the WSGI server.
The question is: How is DB Connection is managed provided that I use Postgres + SQLAlchemy for DB access?
The default SQLAlchemy setup uses internal connection pooling.
Certain number of connections is created on process startup (depending on your setup you can have M processes running N threads)
The connections are recycled across requests (if you have set up your SQLAlchemy connection properly, the question does not show any code for this)
The pool can grow and if the maximum connection limit is reached an exception is risen
I have a few servers that require executing commands on other servers. For example a Bitbucket Server post receive hook executing a git pull on another server. Another example is the CI server pulling a new docker image and restarting an instance on another server.
I would normally use ssh for this, creating a user/group specifically for the job with limited permission.
A few downsides with ssh:
Synchronous ssh call means a git push will have to wait until complete.
If a host is not contactable for whatever reason, the ssh command will fail.
Maintaining keys, users, and sudoers permissions can become unwieldy.
Few possibilities:
Find an open source out of the box solution (I have tried with no luck so far)
Set up an REST API on each server that accepts calls with some type of authentication, e.g. POST https://server/git/pull/?apikey=a1b2c3
Set up Python/Celery to execute tasks on a different queue for each host. This means a celery worker on each server that can execute commands and possibly a service that accepts REST API calls, converting them to Celery tasks.
Is there a nice solution to this problem?
Defining the problem
You want to be able to trigger a remote task without waiting for it to complete.
This can be achieved in any number of ways, including with SSH. You can execute a remote command without waiting for it to complete by closing or redirecting all I/O streams, e.g. like this:
ssh user#host "/usr/bin/foobar </dev/null >/dev/null 2>&1"
You want to be able to defer the task if the host is currently unavailable.
This requires a queuing/retry system of some kind. You will also need to decide whether the target hosts will be querying for messages ("pull") or whether messages will be sent to the target hosts from elsewhere ("push").
You want to simplify access control as much as possible.
There's no way to completely avoid this issue. One solution would be to put most of the authentication logic in a centralized task server. This splits the problem into two parts: configuring access rights in the task server, and configuring authentication between the task server and the target hosts.
Example solutions
Hosts attempt to start tasks over SSH using method above for asynchrony. If host is unavailable, task is written to local file. Cron job periodically retries sending failed tasks. Access control via SSH keys.
Hosts add tasks by writing commands to files on an SFTP server. Cron job on target hosts periodically checks for new commands and executes them if found. Access control managed via SSH keys on the SFTP server.
Hosts post tasks to REST API which adds them to queue. Celery daemon on each target host consumes from queue and executes tasks. Access managed primarily by credentials sent to the task queuing server.
Hosts post tasks to API which adds tasks to queue. Task consumer nodes pull tasks off the queue and send requests to API on target hosts. Authentication managed by cryptographic signature of sender appended to request, verified by task server on target host.
You can also look into tools that do some or all of the required functions out of the box. For example, some Google searching came up with Rundeck which seems to have some job scheduling capabilities and a REST API. You should also consider whether you can leverage any existing automated deployment or management tools already present in your system.
Conclusions
Ultimately, there's no single right answer to this question. It really depends on your particular needs. Ask yourself: How much time and effort do you want to spend creating this system? What about maintenance? How reliable does it need to be? How much does it need to scale? And so on, ad infinitum...
I need to implement a quite simple Django server that server some http requests and listens to a rabbitmq message queue that streams information into the Django app (that should be written to the db). the data must be written to the db in a synchronized order , So I can't use the obvious celery/rabbit configuration. I was told that there is no way to do this in the same Django project. since Django would listen to http requests on It's process. and It can't handle another process to listen for Rabbit - forcing me to to add Another python/django project for the rabbit/db writes part - working with the same models The http bound django project works with.. You can smell the trouble with this config from here. .. Any Ideas how to solve this?
Thanks!
If anyone else bumps into this problem:
The solution is using a RabbitMQ consumer from a different process (But in the same Django codebase) then Django (Not the running through wsgi, etc. you have to start it by it self)
The consumer, connects to the appropriate rabbitmq queues and writes the data into the Django models. Then the usual Django process(es) is actually a "read model" of the data inserted/updated/created/deleted as delivered by the message queue (RabbitMQ or other) from a remote process.
What's the way to go to build a HTML gui for eg a multiplexed tcp server in python?
I am familiar with building websites with Django, but the thing i don't understand is, how is the tcp server part communicating with the Django related views? How could i implement the data sharing (do i see the wood for the trees)?
The problem for me is the mapping between the stateless "get an leave" and the "state full" py module "running as a daemon".
greetings
edit my standalone application skeleton:
#!/usr/bin/python
from django.core.management import setup_environ
import settings
setup_environ(settings)
from myapp.models import fanzy
def main():
for each in fanzy.objects.all():
print each.id, each.foo
if __name__ == '__main__':
main()
Django is just Python, so anything you've written in Python can be imported and referenced in the 'views' that you write for Django to serve back as HTTP responses.
In answer to another part of your question, the way a HTTP server handling TCP connections communicates with the python framework is most commonly through a protocol called WSGI. This is a good place to get more knowledge about the principles of WSGI. This is another.
With regards to running a background process and serving up a view of that processes' activities, it may be better to keep the two problems separate. You could write data to a file or a database and then access and serve this data via your web application.
These are just general comments, because your question is not totally clear. Please feel free to respond with further questions.
It's not always as easy as importing the libraries, mostly because process lifetime. For example, if you run Django through CGI with 1 request per process, then your TCP server won't stay alive between views. Similarly, if you use multiple processes to handle requests (e.g. using FastCGI), then you'll have several servers running at the same time.
If you want to have permanent network connections alive independent of request lifetimes, you'll need to run the TCP server in an external (daemon) process. This is the standard procedure for some caching schemes, where all your Django processes share cached data via a single deamon (e.g. Redis).
Basically, you have two approaches.
Global connection
Either establish a connection per Django process (if you have more than one) as a global object and forward requests to this from your view. This is most convenient if your TCP server is coded to handle multiple requests per connection. However, you'll have problems if your Django process is multi-threaded.
Connection per request
If your TCP server can accept multiple short-lived connections, this is also a viable approach. Just open the connection for the lifetime of your view. If this object is used often enough, you can even add some piece of middleware that opens up the connection and stores it in the request object.