Handle a gunicorn worker termination from the FastAPI - python

A FastAPI application restarts after gunicorn worker timeout. Is it possible to handle such a signal from the FastAPI application (shutdown signal doesn't help) before the application restart?
The problem is that some function exceeds the default time limit (30 seconds), which is ok, and we want to handle the situation by catching such a signal to notify a user about an error. Otherwise, the user will see upstream connect error or disconnect/reset before headers. reset reason: connection termination.
INFO [83] uvicorn.error Application startup complete. ()
CRITICAL [70] gunicorn.error WORKER TIMEOUT (pid:83) (83,)
CRITICAL [70] gunicorn.error WORKER TIMEOUT (pid:83) (83,)
WARNING [70] gunicorn.error Worker with pid 83 was terminated due to signal 6 (83, 6)
WARNING [70] gunicorn.error Worker with pid 83 was terminated due to signal 6 (83, 6)
INFO [83] gunicorn.error Booting worker with pid: 83 (83,)
INFO [83] gunicorn.error Booting worker with pid: 83 (83,)
INFO [83] uvicorn.error Started server process [83] (83,)
INFO [83] uvicorn.error Waiting for application startup. ()
INFO [83] uvicorn.error Application startup complete. ()
Unfortunately, a timeout increase isn't feasible.
I did try a #app.on_event("shutdown") and some FastAPI general exception handling methods, but nothing helped.

Gunicorn sends a SIGABRT, signal 6, to a worker process when timed out.
Thus a process, FastAPI in this case, needs to catch the signal, but on_event cannot because FastAPI(Starlette) event doesn't mean signals.
But there is a simple solution, Gunicorn server hooks.
def worker_abort(worker):
...
Called when a worker received the SIGABRT signal.
This call generally happens on timeout.
The callable needs to accept one instance variable for the initialized Worker.
Of course, you will lose the request at that time; you have to find another way to send response to users. I recommend using FastAPI BackgroundTasks or Celery.

Related

Server Request Timeout Error for Flask Application in Heroku using gunicorn

I have tried increasing the timeout in Procfile but I am still facing a request timeout error when I process some data on server. after 30 sec of putting request to server server timeouts. Is there any way to increase this request timeout?
I am getting the category and page no from user and then scraped the data from website and when its actually busy in scraping the server timeouts after 30 sec but still the request is processing in backside.
I am using Heroku with gunicorn and my Procfile settings are:
web gunicorn main:app --timeout 60 --workers=3 --threads=3 --worker-connections=1000
It is not possible to increase the HTTP timeout over 30 seconds.
It looks like you need a different approach where the users do not hang waiting for the response to be processed and trasmissed. You could consider:
a page showing "Working in progress", which reloads every 30 sec (checking if a background process has completed)
notify the users (email, browser notification) once the request has been processed by the backend

Gunicorn is not respecting timeout when using UvicornWorker

I am setting up a timeout check so I made and endpoint:
#app.get("/tc", status_code=200)
def timeout_check():
time.sleep(500)
return "NOT OK"
I am using the docker image tiangolo/uvicorn-gunicorn-fastapi:python3.7
and my command to run the server:
CMD ["gunicorn","--log-level","debug","--keep-alive","15", "--reload", "-b", "0.0.0.0:8080", "--timeout", "15", "--worker-class=uvicorn.workers.UvicornH11Worker", "--workers=10", "myapp.main:app"]
I am expecting the endpoint to fail after 15 seconds, but it doesn't. Seems like the timeout is not respected. Any fix for that?
Async workers behave differently from sync workers:
In sync workers, the worker will be blocked from fulfilling the request, so if a request takes longer than timeout, the worker will be killed and so will the request.
In async workers, the worker is not blocked and stays responsive to fulfill other requests even when the request takes long. ie worker timeout and request timeout are different things in this case.
There is no request timeout parameter for uvicorn right now.
for more details: https://github.com/benoitc/gunicorn/issues/1493

AWS Elastic Beanstalk sqsd error while processing a message

I have an Elastic Beanstalk Python worker environment. The average job running time is about 20 seconds. Sometimes the following scenario happens,
sqsd picks a message from the sqs queue and sends it to the worker.
The worker starts processing the message.
in few seconds (ranges from 1 to 30 seconds) sqsd gets the following error and parks the message in the Dead letter queue as I configured the retries to 1.
127.0.0.1 (-) - - [23/Nov/2017:19:48:17 +0000] "POST / HTTP/1.1" 500 527 "-" "aws-sqsd/2.3"
The worker continues to process the message and finishes successfully. I have logs to trace that.
That makes the environment in general not healthy.
I have the connection timeout = 60 seconds, Inactivity timeout = 600, Visibility timeout = 600, HTTP connections = 2.
I have the following in the configs as well
option_settings:
aws:elasticbeanstalk:container:python:
NumProcesses: 3
NumThreads: 10
files:
"/etc/httpd/conf.d/wsgi_custom.conf":
mode: "000644"
owner: root
group: root
content: |
WSGIApplicationGroup %{GLOBAL}
Is this because of some memory limit that wsgi puts to every request? That is the only thing that I can think of.

How to share in memory resources between Flask methods when deploying with Gunicorn

I have implemented a simple microservice using Flask, where the method that handles the request calculates a response based on the request data and a rather large datastructure loaded into memory.
Now, when I deploy this application using gunicorn and a large number of worker threads, I would simply like to share the datastructure between the request handlers of all workers. Since the data is only read, there is no need for locking or similar. What is the best way to do this?
Essentially what would be needed is this:
load/create the large data structure when the server is initialized
somehow get a handle inside the request handling method to access the data structure
As far as I understand gunicorn allows me to implement various hook functions, e.g. for the time the server gets initialized, but a flask request handler method does not know anything about the gunicorn server data structure.
I do not want to use something like redis or a database system for this, since all data is in a datastructure that needs to be loaded in memory and no deserialization must be involved.
The calculation carried out for each request which uses the large data structure can be lengthy so it must happen concurrently in a truly independent thread or process for each request (this should scale up by running on a multi-core computer).
You can use preloading.
This will allow you to create the data structure ahead of time, then fork each request handling process. This works because of copy-on-write and the knowledge that you are only reading from the large data structure.
Note: Although this will work, it should probably only be used for very small apps or in a development environment. I think the more production-friendly way of doing this would be to queue up these calculations as tasks on the backend since they will be long-running. You can then notify users of the completed state.
Here is a little snippet to see the difference of preloading.
# app.py
import flask
app = flask.Flask(__name__)
def load_data():
print('calculating some stuff')
return {'big': 'data'}
#app.route('/')
def index():
return repr(data)
data = load_data()
Running with gunicorn app:app --workers 2:
[2017-02-24 09:01:01 -0500] [38392] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:01 -0500] [38392] [INFO] Listening at: http://127.0.0.1:8000 (38392)
[2017-02-24 09:01:01 -0500] [38392] [INFO] Using worker: sync
[2017-02-24 09:01:01 -0500] [38395] [INFO] Booting worker with pid: 38395
[2017-02-24 09:01:01 -0500] [38396] [INFO] Booting worker with pid: 38396
calculating some stuff
calculating some stuff
And running with gunicorn app:app --workers 2 --preload:
calculating some stuff
[2017-02-24 09:01:06 -0500] [38403] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:06 -0500] [38403] [INFO] Listening at: http://127.0.0.1:8000 (38403)
[2017-02-24 09:01:06 -0500] [38403] [INFO] Using worker: sync
[2017-02-24 09:01:06 -0500] [38406] [INFO] Booting worker with pid: 38406
[2017-02-24 09:01:06 -0500] [38407] [INFO] Booting worker with pid: 38407

How to find out why uWSGI kill workers?

i have app on Pyramid. I run it in uWSGI with these config:
[uwsgi]
socket = mysite:8055
master = true
processes = 4
vacuum = true
lazy-apps = true
gevent = 100
And nginx config:
server {
listen 8050;
include uwsgi_params;
location / {
uwsgi_pass mysite:8055;
}
}
Usually all fine, but sometimes uWSGI kills workers. And i have no idea why.
I see in uWSGI logs:
DAMN ! worker 2 (pid: 4247) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 2 (new pid: 4457)
but in the logs there is no Python exceptions.
sometimes i see in uWSGI logs:
invalid request block size: 11484 (max 4096)...skip
[uwsgi-http key: my site:8050 client_addr: 127.0.0.1 client_port: 63367] hr_instance_read(): Connection reset by peer [plugins/http/http.c line 614]
And nginx errors.log:
*13388 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1,
*13955 recv() failed (104: Connection reset by peer) while reading response header from upstream, client:
I think this can be solved by adding buffer-size=32768, but it is unlikely due to this uWSGI kill workers.
Why uwsgi can kill workers? And how can I know the reason?
The line "DAMN ! worker 2 (pid: 4247) died, ..." nothing to tells.
signal 9 means it received a SIGKILL. so something sent a kill to your worker. it's relatively likely that the out-of-memory killer decided to kill your app because it was using too much memory. try watching the workers with a process monitor and see if it uses a lot of memory.
Try to add harakiri-verbose = true option in the uWSGI config.
I had the same problem, for me changing the uwsgi.ini file, changing the value of the reload-on-rss setting from 2048 to 4048, and harakiri to 600 solved the problem.
For me it was that I hadn't filled out app.config["SERVER_NAME"] = "x"

Categories

Resources