I have a FastAPI application deployed on DigitalOcean, it has multiple API endpoints and in some of them, I have to run a scraping function as a background job using the RQ package in order not to keep the user waiting for a server response.
I've already managed to create a Redis database on DigitalOcean and successfully connect the application to it, but I'm facing issues with running the RQ worker.
Here's the code, inspired from RQ's official documentation :
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
#connecting to DigitalOcean's redis db
REDIS_URL = os.getenv('REDIS_URL')
conn = redis.Redis.from_url(url=REDIS_URL)
#Create a RQ queue using the Redis connection
q = Queue(connection=conn)
with Connection(conn):
worker = Worker([q], connection=conn) #This instruction works fine
worker.work() #The deployment fails here, the DigitalOcean server crashes at this instruction
The worker/job execution runs just fine locally but fails in DO's server
To what could this be due? is there anything I'm missing or any kind of configuration that needs to be done on DO's endpoint?
Thank you in advance!
I also tried to use FastAPI's BackgroundTask class. At first, it was running smoothly but the job stops running halfway through with no feedback on what was happening in the background from the class itself. I'm guessing it's due to a timeout that doesn't seem to have a custom configuration in FastAPI (perhaps because its background tasks are meant to be low-cost and fast).
I'm also thinking of trying Celery out, but I'm afraid I would run into the same issues as RQ.
Create a configuration file using this command:
sudo nano /etc/systemd/system/myproject.service
[Unit]
Description=Gunicorn instance to serve myproject
After=network.target
[Service]
User=user
Group=www-data
WorkingDirectory=/home/user/myproject
Environment="PATH=/home/user/myproject/myprojectvenv/bin"
ExecStart=/home/user/myproject/myprojectvenv/bin/gunicorn --workers 3 --bind unix:myproject.sock -m 007 wsgi:app
[program:rq_worker]
command=/home/user/myproject/myprojectvenv/bin/rq -A rq_worker -l info
directory=/home/user/myproject
autostart=true
autorestart=true
stderr_logfile=/var/log/celery.err.log
stdout_logfile=/var/log/celery.out.log
[Install]
WantedBy=multi-user.target
Related
I have a flask application which is running using Gunicorn.
This flask application have an API which takes two hours to complete.
If the same API is called twice after a 30 minutes gap between two , then the process handling the first API call is getting restarted after the second API call.
Example:
Initial process starts with API_1
After 30 mins API_1 is called again , then the process handling previous API_1 call is getting restarted.
Command used to start the Gunicorn server:
nohup gunicorn --bind 0.0.0.0:5000 --workers=8 run:app --timeout 7200 --preload> output.log&
No of core : 8
I am not facing any issue while running flask in development mode.
Any idea why its behaving like this ?
You can use other types of worker. Your current worker type is sync. If you want to send a long request probably you must use a thread worker or async worker.
nohup gunicorn --bind 0.0.0.0:5000 -k gevent run:app
Refer to this document
I tried adding the websockets example project to the datastore project and the websockets work but when a page queries the datastore or tries to put a new entity I get a 502 response. In the logs it shows a critical error on the service worker. If I remove the websocket code the datastore code works as intended. The only difference I can see is the entrypoints for the app samples slightly differ
the websocket sample uses
entrypoint: gunicorn -b :$PORT -k flask_sockets.worker main:app
while the datastore sample uses
entrypoint: gunicorn -b :$PORT main:app
websocket sample https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/flexible/websockets
datastore sample
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/flexible/datastore
The problem appears to be that GRPC (the default transport mechanism of the Cloud Datastore client) is not compatible with gevent. Aside from using a different websockets framework, you can work around the issue by activating grpc's gevent compatibility patch, using the following code:
import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()
As a complement of Andrew's answer, you can extend the gunicorn worker class to run gRPC applications.
# gevent_grpc_worker.py
from gunicorn.workers.ggevent import GeventWorker
from grpc.experimental import gevent
class GeventGrpcWorker(GeventWorker):
def patch(self):
super(GeventGrpcWorker, self).patch()
gevent.init_gevent()
self.log.info('patched grpc')
# config.py for gunicorn
import multiprocessing
from gevent_grpc_worker import GeventGrpcWorker
# http://docs.gunicorn.org/en/stable/design.html#how-many-workers
workers = multiprocessing.cpu_count() * 2 + 1
worker_connections = 10000
# Use an asynchronous worker as most of the work is waiting for websites to load
worker_class = '.'.join([GeventGrpcWorker.__module__,
GeventGrpcWorker.__name__])
timeout = 30
Then start your managed application by:
gunicorn -c config.py app:app
As you said it seems there is a problem with the flask_socket.worker, I have test it and it does not work with the datastore client.
I have tried with the Flask-SocketIO framework using the eventlet worker and the datastore queries work fine.
entrypoint: gunicorn -b :$PORT --worker-class eventlet -w 1 main:app
Also you need to add the eventlet module in the requirements.txt file eventlet==0.24.1
The downside of this is that it breaks the compatibility with the websocket code so you need to rewrite this part. Keep in mind that code samples are just intended to show in a few lines how to use the Google Cloud products and copy-paste them without modifying the configuration undelied in the app.yaml is not a good idea.
I have written this task in tasks.py file which is under my django apps directory myapp.
#periodic task that run every minute
#periodic_task(run_every=(crontab(hour="*", minute="*", day_of_week="*")))
def news():
'''
Grab url
'''
logger.info("Start task")
now = datetime.now()
urls = []
urls.append(crawler()) #crawler return dic obj
for url_dic in list(reversed(urls)):
for title, url in url_dict.items():
#Save all the scrape url in database
Url.objects.create(title=headline, url=url)
logger.info("Task finished: result = %s" %url)
The main objectives of this task is to push the url and title to django database every minute
To run this celery task we need to invoke these commands using django ./manage utility how to run these commands as a daemon and I am planning to host this app in heroku
python manage.py celeryd --verbosity=2 --loglevel=DEBUG
python manage.py celerybeat --verbosity=2 --loglevel=DEBUG
but I need to run these two commands command as a daemon in background, How can we run this commands as a daemon so that my celery tasks can run.
A fast fix will be to put "&" after your commands i.e.
python manage.py celeryd --verbosity=2 --loglevel=DEBUG &
python manage.py celerybeat --verbosity=2 --loglevel=DEBUG &
After hitting enter this tasks will act as daemon and still print out the useful debug info. So this is great for initial stage and sometimes small applications that do not rely heavily on celery.
For development purpose i will suggest using supervisor .See THIS POST that gives realy nice info for celery, django and supervisor integration. Read the: "Running Celery workers as daemons" part of the post.
I am working on Django based web app. During unittest, I need to write a test which needs "Celery worker" running in the background.
I have already used:
CELERY_EAGER_PROPAGATES_EXCEPTIONS=True
CELERY_ALWAYS_EAGER=True
BROKER_BACKEND='memory
In over_ride settings, but these are not running celery worker for me in background when needed.
Any help would much appreciated.
Celery won't get run by Django automatically.
You can start a worker process by running from your project root:
$ celery -A my_proj worker
my_proj should be the application name you configured with app = Celery('my_proj')
I'm using flask as a webserver for my UI (it's a simple web interface which controls the recording using gstreamer on ubuntu from a webcam and a framegrabber simultaneously / kinda simple player)
Every time I need to run the command "python main.py" to run the server from command prompt manually.
I've tried the init.d solution or even writing a simple shell script and launching it every time after rebooting the system on start up but it fails to keep the server up and running till the end (just invokes the server and terminates it I guess)
is there any solution that could help me to start the webserver every time after booting the system on startup and keep it on and running?
I'd like to configure my system to boot directly into the browser so don't wanna have any need for more actions by the user.
Any Kind of suggestion/help is appreciated.
I'd like to suggest using supervisor, the documentation is here
for a very simple demo purpose, after you installed it and finish the set up, touch a new a file like this:
[program:flask_app]
command = python main.py
directory = /dir/to/your/app
autostart = true
autorestart = true
then
$ sudo supervisorctl update
Now, you should be good to go. The flask app will start every time after you boot you machine.(note: distribution package has already integrated into the service management infrastructure, if you're using others, see here)
to check whether you app is running:
$ sudo supervisorctl status
For production, you can use nginx+uwsgi+supervisor. The flask deployment documentation is here
One well documented solution is to use Gunicorn and Nginx server:
Install Components and setup a Python virtualenv with dependencies
Create the wsgi.py file :
from myproject import application
if __name__ == "__main__":
application.run()
That will be handled by Gunicorn :
gunicorn --bind 0.0.0.0:8000 wsgi
Configure Gunicorn with setting up a systemd config file: /etc/systemd/system/myproject.service :
[Unit]
Description=Gunicorn instance to serve myproject
After=network.target
[Service]
User=sammy
Group=www-data
WorkingDirectory=/home/sammy/myproject
Environment="PATH=/home/sammy/myproject/myprojectenv/bin"
ExecStart=/home/sammy/myproject/myprojectenv/bin/gunicorn
--workers 3 --bind unix:myproject.sock -m 007 wsgi:app
[Install]
WantedBy=multi-user.target
Start the Gunicorn service at boot :
sudo systemctl start myproject
sudo systemctl enable myproject