In my django project, I am using celery to run a periodic task that will check a URL that responds with a json and updating my database with some elements from that json.
Since requesting from the URL is limited, the total process of updating the whole database with my task will take about 40 minutes and I will run the task every 2 hours.
If I check a view of my django project, which also requests information from the database while the task is asynchronously running in the background, will I run into any problems?
While requesting information from your database you are reading your database. And in your celery task your are writing data into your database. You can write only once at a time but read as many times as you want as there is no lock permission on database while reading.
The only time when you are going to run into issues while using db with celery is when you use the database as backend for celery because it will continuously poll the db for tasks. If you use a normal broker you should not have issues.
Related
My problem is basically that I am currently making a customized management system based on Django(3.1) Python(v3.7.9) in which I am pulling the data from a third-party tool. The tool is not giving me the webhooks of every data that I want for visualization and analysis.
The webhook is giving me bits of information and I have to perform a GET request to their API to fetch the rest details if those are not in my database. They are asking for a successful response of webhook within 5 secs otherwise it will trigger a retry.
If I try to do a get request within the function of webhook the time of 5 second will get exceeded the solutions that I came up with was to Django Middleware or Django Triggers so which would be best suitable for my problem I am bit confused.
Note: I can not lower the Django version as I have to use Async Functions
This would be a good use case for a task scheduler like Celery.
Django-triggers is an interface to the Celery scheduler, so it might be a good fit.
Keep in mind, Celery has to be run as a separate process next to django.
Another popular task scheduler is rq-scheduler.
This offers a simple implementation using Redis as a message queue. *Note that Loadbalanced/multi-instance applications are not easily setup with RQ.
I want to store fetched data from URL in database of Django but don't want execution of this code each time server runs. What is good way to achieve this ?
I have Django Postgres Database with DateField which is date of sending some message (SMS and email). I would like to schedule delivering somehow (so basically run function with parameters at this date). Everything is running on aws-lambda.
I read Django - Set Up A Scheduled Job? but I wondering if there isn't some strictly aws solution. Or maybe if there is something better than https://aws.amazon.com/solutions/instance-scheduler/.
Thanks!
At the end I have used aws cloudwatch which runs lambda which pings my django endpoint every [n] minutes.
I have a django project where I am using celery with rabbitmq to perform a set of async. tasks. So the setup i have planned goes like this.
Django app running on one server.
Celery workers and rabbitmq running from another server.
My initial issue being, how to do i access django models from the celery tasks resting on another server?
and assuming I am not able to access the Django models, is there a way once the tasks gets completed, i can send a callback to the Django application passing values, so that i get to update the Django's database based on the values passed?
Concerning your first question, accessing django models from the workers' server:
Your django app must be available on both Server A (serving users) and Server B (hosting the celery workers)
Concerning your second question, updating the database based on the values. Do you mean the result of the async task? If so, then you have two options:
You can just save whatever you need to save from within the task itself, assuming you have access to the database.
You could use a results backend (one of which is through the Django ORM) as mentioned in the official documentation of Celery about Keeping Results
I've used the following set up on my application:
Task is initiated from Django - information is extracted from the model instance and passed to the task as a dictionary. NB - this will be more future proof as Celery 4 will default to JSON encoding
Remote server runs task and creates a dictionary of results
Remote server then calls an update task that is only listened for by a worker on the Django server.
Django worker read results dictionary and updates model.
The Django worker listens to a separate queue, those this isn't strictly necessary. Results backend isn't used - data needed is just passed to the task
I am doing some tasks with tens of thousands of activity directory objects that can take several minutes to load.
To speed up things up I'd like to just refresh this data into the sqllite database in the middle of the night (since there's no need for it to be current).
Is there a way to typically approach to this type of problem? Perhaps have Django periodically run a function somehow?
you can write a django admin command and use cron or at to execute the command.
or just use a django cron lib:
django-cron
django-crontab