Webhook Endpoints - Handling Many Concurrent Requests

Webhook Endpoints - Handling Many Concurrent Requests - python

This isn't specifically related to programming, more-so infrastructure, but of all the exchange sites StackOverflow seems to be most knowledgeable in departments of RESTful APIs.
I have a single endpoint configured for handling events that could take in up to 1k events within a 3 minute window. I am noticing a lot of events "missed", but I'm not sure that I'm willing to blame over-utilization right away without fully understanding.
The listening endpoint is /users/events?user=2345345 where 2345345 is the user id. From here we perform necessary actions on that particular user, but what if during this the next user, 2895467 performs an action which results in a new event being sent to /users/events?user=2895467 before the first could be processed. What happens?
I intend to alleviate the concern by using celery to signal tasks which would greatly reduce this, but is it fair to assume that events could be missed while this single endpoint remains synchronous?

Real-life behavior depends on approach used for "deployment".
For example if you are using uwsgi with single unthreaded worker behind nginx, then requests will be processed "sequentially": if second request arrives before first is processed, then second will be "queued" (added to backlog).
How long it can be queued and how many requests may be in queue
depends on the configuration of nginx (listen backlog), configuration of uwsgi (concurrency, listen backlog) and even on configuration
of OS kernel (search for net.core.somaxconn,
net.core.netdev_max_backlog). When queue becomes "full" then new
"concurrent" connections will be dropped instead of being added to queue.

Related

Update single database value on a website with many users

For this question, I'm particularly struggling with how to structure this:
User accesses website
User clicks button
Value x in database increments
My issue is that multiple people could potentially be on the website at the same time and click the button - I want to make sure each user is able to click the button, and update the value and read the incremented value too, but I don't know how to circumvent any synchronisation/concurrency issues.
I'm using flask to run my website backend, and I'm thinking of using MongoDB or Redis to store my single value that needs to be updated.
Please comment if there is any lack of clarity in my question, but this is a problem I've really been struggling with how to solve.
Thanks :)

redis, I think you can use redis hincrby command, or create a distributed lock to make sure there is only one writer at the same time and only the lock holding writer can make the update in your flask framework. Make sure you release the lock after certain period of time or after the writer done using the lock.
mysql, you can start a transaction, and make the update and commit the change to make sure the data is right

To solve this problem I would suggest you follow a micro service architecture.
A service called worker would handle the flask route that's called when the user clicks on the link/button on the website. It would generate a message to be sent to another service called queue manager that maintains a queue of increment/decrement messages from the worker service.
There can be multiple worker service instances running concurrently but the queue manager is a singleton service that takes the messages from each service and adds them to the queue. If the queue manager is busy the worker service will either timeout and retry or return a failure message to the user. If the queue is full a response is sent back to the worker to retry n number of times, and you can count down that n.
A third service called storage manager is run every time the queue is not empty, this service sends the messages to the storage solution (whatever mongo, redis, good ol' sql) and it will ensure the increment/decrement messages are handled in the order they were received in the queue. You could also include a time stamp from the worker service in the message if you wanted to use that to sort the queue.
Generally whatever hosting environment for flask will use gunicorn as the production web server and support multiple concurrent worker instances to handle the http requests, and this would naturally be your worker service.
How you build and coordinate the queue manager and storage manager is down to implementation preference, for instance you could use something like Google Cloud pub/sub system to send messages between different deployed services but that's just off the top of my head. There's a load of different ways to do it, and you're in the best position to decide that.
Without knowing more details about what you're trying to achieve and what's the requirements for concurrent traffic I can't go into greater detail, but that's roughly how I've approached this type of problem in the past. If you need to handle more concurrent users at the website, you can pick a hosting solution with more concurrent workers. If you need the queue to be longer, you can pick a host with more memory, or else write the queue to an intermediate storage. This will slow it down but will make recovering from a crash easier.
You also need to consider handling when messages fail between different services, how to recover from a service crashing or the queue filling up.
EDIT: Been thinking about this over the weekend and a much simpler solution is to just create a new record in a table directly from the flask route that handles user clicks. Then to get your total you just get a count from this table. Your bottlenecks are going to be how many concurrent workers your flask hosting environment supports and how many concurrent connections your storage supports. Both of these can be solved by throwing more resources at them.

Best way to create a queue for handling request to REST Api created via Django

I have the following scenario:
Back end => a geospatial database and the Open Data Cube tool
API => Users can define parameters (xmin,xmax,ymin,ymax) to make GET
requests
Process => On each requests analytics are calculated and satellite
images pixels' values are given back to the user
My question is the following: As the process is quite heavy (it can reserve many GB of RAM) how it is possible to handle multiple requests at the same time? Is there any queue that I can save the requests and serve each one sequently?
Language/frameworks => Python 3.8 and Django
Thanks in advance

Celery + Rabbitmq/Redis is probably what you need.
In this configuration, your heavy processes become "tasks". When called with .delay() they go in the queue and are not handle by your main process anymore.
You might want to check the tuto
https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html

There are many asynchronous message queueing technologies that allow you to do this, lots of which have Python APIs too.
You probably want to use request-response messaging, to correlate the requests you get with the replies you want to send.
A message queueing technology will allow you to take the requests, store them on a queue, and have your server handle them when it's ready. Storing requests on a queue means that they won't get lost. This also allows your application to scale - as more requests come in, they can be dealt with by multiple application instances and still return only one result!
The answer above recommends celery, which is a great choice for this kind of project. Depending on the requirements you have, you can also use pymqi: https://dsuch.github.io/pymqi/examples.html or ZeroMQ: (example here for using a request-response pattern) ZeroMQ - Multiple Publishers and Listener if you need technologies that are more heavy-duty.

RabbitMQ Queued messages keep increasing

We have a Windows based Celery/RabbitMQ server that executes long-running python tasks out-of-process for our web application.
What this does, for example, is take a CSV file and process each line. For every line it books one or more records in our database.
This seems to work fine, I can see the records being booked by the worker processes. However, when I check the rabbitMQ server with the management plugin (the web based management tool) I see the Queued messages increasing, and not coming back down.
Under connections I see 116 connections, about 10-15 per virtual host, all "running" but when I click through, most of them have 'idle' as State.
I'm also wondering why these connections are still open, and if there is something I need to change to make them close themselves:
Under 'Queues' I can see more than 6200 items with state 'idle', and not decreasing.
So concretely I'm asking if these are normal statistics or if I should worry about the Queues increasing but not coming back down and the persistent connections that don't seem to close...
Other than the rather concise help inside the management tool, I can't seem to find any information about what these stats mean and if they are good or bad.
I'd also like to know why the messages are still visible in the queues, and why they are not removed, as the tasks seem t be completed just fine.
Any help is appreciated.

Answering my own question;
Celery sends a result message back for every task in the calling code. This message is sent back via the same AMPQ queue.
This is why the tasks were working, but the queue kept filling up. We were not handling these results, or even interested in them.
I added ignore_result=True to the celery task, so the task does not send result messages back into the queue. This was the main solution to the problem.
Furthermore, the configuration option CELERY_SEND_EVENTS=False was added to speed up celery. If set to TRUE, this option has Celery send events for external monitoring tools.
On top of that CELERY_TASK_RESULT_EXPIRES=3600 now makes sure that even if results are sent back, that they expire after one hour if not picked up/acknowledged.
Finally CELERY_RESULT_PERSISTENT was set to False, this configures celery to not store these result messages on disk. They will vanish when the server crashes, which is fine in our case, as we don't use them.
So in short; if you don't need feedback in your app about if and when the tasks are finished, use ignore_result=True on the celery task, so that no messages are sent back.
If you do need that information, make sure you pick up and handle the results, so that the queue stops filling up.

If you don't need the reliability then you can make your queues transient.
http://celery.readthedocs.org/en/latest/userguide/optimizing.html#optimizing-transient-queues
CELERY_DEFAULT_DELIVERY_MODE = 'transient'

Are uwsgi processes stuck when using Comet(Long Polling)?

I believe nginx is event based so with 1 single worker it can take multiple requests, say 100requests/second. These requests will then be pass on to uwsgi to be process and then once it's done it will push the result back to nginx and nginx will push the result to the user that do http request.
Assuming I am only using 1 worker(no thread) for my uwsgi, uwsgi will process this 100 request one by one right? So it will need to do 100 processes to complete the entire requests.
Now what happen if I am planning to use long polling to get a quick update on my front end How does facebook, gmail send the real time notification?
I believe it will force the uwsgi to process a single request(which is the long polling process) and suspend all the other requests, hence causing the entire system to broke down.
Do I have any misconception of how uwsgi work, or is there any other solution to implement long polling?
Thank You

Your analysis is right, long-polling is not well-suited for multiprocesses or multithreads modes (in term of costs). Each process/thread can manage a single request. Lucky enough uWSGI supports dozens of
non-blocking/evented/microthreads-based technologies (like gevent, or lower-levels greenlets), if your app can be adapted to this patterns (and this is not a no-brain task, so do not hope monkey-patching will be enough) you will win.
In addition to this, if you like/tolerate callback-based programming and you do not need uWSGI specific features, i find Tornado a great solution for the problem.

Time out issues with chrome and flask

I have a web application which acts as an interface to an offsite server which runs a very long task. The user enters information and hits submit and then chrome waits for the response, and loads a new webpage when it receives it. However depending on the network, input of the user, the task can take a pretty long time and occasionally chrome loads a "no data received page" before the data is returned (though the task is still running).
Is there a way to put either a temporary page while my task is thinking or simply force chrome to continue waiting? Thanks in advance

While you could change your timeout on the server or other tricks to try to keep the page "alive", keep in mind that there might be other parts of the connection that you have no control over that could timeout the request (such as the timeout value of the browser, or any proxy between the browser and server, etc). Also, you might need to constantly up your timeout value if the task takes longer to complete (becomes more advanced, or just slower because more people use it).
In the end, this sort of problem is typically solved by a change in your architecture.
Use a Separate Process for Long-Running Tasks
Rather than submitting the request and running the task in the handling view, the view starts the running of the task in a separate process, then immediately returns a response. This response can bring the user to a "Please wait, we're processing" page. That page can use one of the many push technologies out there to determine when the task was completed (long-polling, web-sockets, server-sent events, an AJAX request every N seconds, or the dead-simplest: have the page reload every 5 seconds).
Have your Web Request "Kick Off" the Separate Process
Anyway, as I said, the view handling the request doesn't do the long action: it just kicks off a background process to do the task for it. You can create this background process dispatch yourself (check out this Flask snippet for possible ideas), or use a library like Celery or (RQ).
Once the task is complete, you need some way of notifying the user. This will be dependent on what sort of notification method you picked above. For a simple "ajax request every N seconds", you need to create a view that handles the AJAX request that checks if the task is complete. A typical way to do this is to have the long-running task, as a last step, make some update to a database. The requests for checking the status can then check this part of the database for updates.
Advantages and Disadvantages
Using this method (rather than trying to fit the long-running task into a request) has a few benefits:
1.) Handling long-running web requests is a tricky business due to the fact that there are multiple points that could time out (besides the browser and server). With this method, all your web requests are very short and much less likely to timeout.
2.) Flask (and other frameworks like it) is designed to only support a certain number of threads that can respond to web queries. Assume it has 8 threads: if four of them are handling the long requests, that only leaves four requests to actually handle more typical requests (like a user getting their profile page). Half of your web server could be tied up doing something that is not serving web content! At worse, you could have all eight threads running a long process, meaning your site is completely unable to respond to web requests until one of them finishes.
The main drawback: there is a little more set up work in getting a task queue up and running, and it does make your entire system slightly more complex. However, I would highly recommend this strategy for long-running tasks that run on the web.

I believe this is due to your web server (apache in most cases) which has a timeout to small. Try to increase this number
For apache, have a look at the timeout option
EDIT: I don't think you can do set this time out in Chrome (see this topic on google forums even though it's really old)
In firefox, on the about:config page, type timeout and you'll have some options you can set. I have no idea about Internet Explorer.

Let's assume:
This is not a server issue, so we don't have to go fiddle with Apache, nginx, etc. timeout settings.
The delay is minutes, not hours or days, just to make the scenario manageable.
You control the web page on which the user hits submit, and from which user interaction is managed.
If those obtain, I'd suggest not using a standard HTML form submission, but rather have the submit button kick off a JavaScript function to oversee processing. It would put up a "please be patient...this could take a little while" style message, then use jQuery.ajax, say, to call the long-time-taking server with a long timeout value. jQuery timeouts are measured in milliseconds, so 60000 = 60 seconds. If it's longer than that, increase your specified timeout accordingly. I have seen reports that not all clients will allow super-extra-long timeouts (e.g. Safari on iOS apparently has a 60-second limitation). But in general, this will give you a platform from which to manage the interactions (with your user, with the slow server) rather than being at the mercy of simple web form submission.
There are a few edge cases here to consider. The web server timeouts may indeed need to be adjusted upward (Apache defaults to 300 seconds aka 5 minutes, and nginx less, IIRC). Your client timeouts (on iOS, say) may have maximums too low for the delays you're seeing. Etc. Those cases would require either adjusting at the server, or adopting a different interaction strategy. But an AJAX-managed interaction is where I would start.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.