External API RabbitMQ and Celery rate limit

External API RabbitMQ and Celery rate limit - python

I'm using an external REST API which limits my API request at 1 CPS.
This is the following architecture:
Versions:
Flask
RabbitMQ 3.6.4
AMPQ 1.4.9
kombu 3.0.35
Celery 3.1.23
Python 2.7
API client send Web request to internal API, API process the request and control at which rate is sent to RabbitMQ. These tasks could take from 5 seconds to 120 seconds, and there are situations in which tasks may Queue up and they get sent to external API at a higher rate than the one defined, resulting in numerous failed requests. (Resulting in around 5% of failed requests)
Possible solutions:
Increase External API limit
Add more workers
Keep track of failed tasks and retry them later
Although those solutions may work, is not solving exactly the implementation of my rate limiter and controlling the real rate in which my workers can process the API requests. As later I really need to control the external rate.
I believe if I can control RabbitMQ rate limit in which messages can be sent to workers, this could be a better option. I found the rabbitmq prefetch option but not sure if anyone can recommend other options to control the rate in which messages are sent to consumers?

You will need to create your own rate limiter as Celery's rate limit only works per-worker and "does not work as you expect it to".
I have personally found it completely breaks when trying to add new tasks from another task.
I think the requirement spectrum for rate limiting is too wide and depends on an application itself, so Celery's implementation is intentionally too simple.
Here is an example I've created using Celery + Django + Redis.
Basically it adds a convenience method to your App.Task class which will keep track of your task execution rate in Redis. If it is too high, the task will Retry at a later time.
This example uses sending a SMTP message as an example, but can easily be replaced with API calls.
The algorithm is inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
https://gist.github.com/Vigrond/2bbea9be6413415e5479998e79a1b11a
# Rate limiting with Celery + Django + Redis
# Multiple Fixed Windows Algorithm inspired by Figma https://www.figma.com/blog/an-alternative-approach-to-rate-limiting/
# and Celery's sometimes ambiguous, vague, and one-paragraph documentation
#
# Celery's Task is subclassed and the is_rate_okay function is added
# celery.py or however your App is implemented in Django
import os
import math
import time
from celery import Celery, Task
from django_redis import get_redis_connection
from django.conf import settings
from django.utils import timezone
app = Celery('your_app')
# Get Redis connection from our Django 'default' cache setting
redis_conn = get_redis_connection("default")
# We subclass the Celery Task
class YourAppTask(Task):
def is_rate_okay(self, times=30, per=60):
"""
Checks to see if this task is hitting our defined rate limit too much.
This example sets a rate limit of 30/minute.
times (int): The "30" in "30 times per 60 seconds".
per (int): The "60" in "30 times per 60 seconds".
The Redis structure we create is a Hash of timestamp keys with counter values
{
'1560649027.515933': '2', // unlikely to have more than 1
'1560649352.462433': '1',
}
The Redis key is expired after the amount of 'per' has elapsed.
The algorithm totals the counters and checks against 'limit'.
This algorithm currently does not implement the "leniency" described
at the bottom of the figma article referenced at the top of this code.
This is left up to you and depends on application.
Returns True if under the limit, otherwise False.
"""
# Get a timestamp accurate to the microsecond
timestamp = timezone.now().timestamp()
# Set our Redis key to our task name
key = f"rate:{self.name}"
# Create a pipeline to execute redis code atomically
pipe = redis_conn.pipeline()
# Increment our current task hit in the Redis hash
pipe.hincrby(key, timestamp)
# Grab the current expiration of our task key
pipe.ttl(key)
# Grab all of our task hits in our current frame (of 60 seconds)
pipe.hvals(key)
# This returns a list of our command results. [current task hits, expiration, list of all task hits,]
result = pipe.execute()
# If our expiration is not set, set it. This is not part of the atomicity of the pipeline above.
if result[1] < 0:
redis_conn.expire(key, per)
# We must convert byte to int before adding up the counters and comparing to our limit
if sum([int(count) for count in result[2]]) <= times:
return True
else:
return False
app.Task = YourAppTask
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
...
# SMTP Example
import random
from YourApp.celery import app
from django.core.mail import EmailMessage
# We set infinite max_retries so backlogged email tasks do not disappear
#app.task(name='smtp.send-email', max_retries=None, bind=True)
def send_email(self, to_address):
if not self.is_rate_okay():
# We implement a random countdown between 30 and 60 seconds
# so tasks don't come flooding back at the same time
raise self.retry(countdown=random.randint(30, 60))
message = EmailMessage(
'Hello',
'Body goes here',
'from#yourdomain.com',
[to_address],
)
message.send()

Related

Django channels for asynchronous periodic tasks

I found that most of the docs regarding Django Channels are about WebSockets. But I want to use them in a different way, and I believe it is possible.
How to run the async periodic task using Django channels? For example, I want to check the temperature on some website (through the API) every 15 seconds and I need a notification when its hit > 20.
It also means that this task will live for a long time (maybe even for 3 month), is Django capable of keeping the consumers live for a long time?
Thank you.

Channels is easily put to work for background tasks - see here for notes on doing so with the new version of Channels:
https://github.com/jayhale/channels-examples-bg-task
With a background task in place, you could set up a task that you could call with cron that puts your tasks in the queue at whatever period you would like. Downside to cron is that it doesn't do sub-minute scheduling out of the box - some hacking is required. See: Running a cron every 30 seconds
E.g.:
Add a job to the app user's crontab:
# crontab
/5 * * * * python manage.py cron_every_5_minutes
Your custom management command can spawn the tasks for channels:
# myapp/management/commands/cron_every_5_minutes.py
from asgiref.sync import async_to_sync
from channels.layers import get_channel_layer
from django.core.management.base import BaseCommand
class Command(BaseCommand):
channel_layer = get_channel_layer()
help = 'Periodic task to be run every 5 minutes'
def handle(self, *args, **options):
async_to_sync(channel_layer.send)('background-tasks', {'type': 'task_5_min'})
Regarding the expected reliability of a worker - they can run indefinitely, but you should expect them to occasionally fail. Managing the workers is more of a question of how you intend to supervise the processes (or containers, or however you are architecting).

limited number of user-initiated background processes

I need to allow users to submit requests for very, very large jobs. We are talking 100 gigabytes of memory and 20 hours of computing time. This costs our company a lot of money, so it was stipulated that only 2 jobs could be running at any time, and requests for new jobs when 2 are already running would be rejected (and the user notified that the server is busy).
My current solution uses an Executor from concurrent.futures, and requires setting the Apache server to run only one process, reducing responsiveness (current user count is very low, so it's okay for now).
If possible I would like to use Celery for this, but I did not see in the documentation any way to accomplish this particular setting.
How can I run up to a limited number of jobs in the background in a Django application, and notify users when jobs are rejected because the server is busy?

I have two solutions for this particular case, one an out of the box solution by celery, and another one that you implement yourself.
You can do something like this with celery workers. In particular, you only create two worker processes with concurrency=1 (or well, one with concurrency=2, but that's gonna be threads, not different processes), this way, only two jobs can be done asynchronously. Now you need a way to raise exceptions if both jobs are occupied, then you use inspect, to count the number of active tasks and throw exceptions if required. For implementation, you can checkout this SO post.
You might also be interested in rate limits.
You can do it all yourself, using a locking solution of choice. In particular, a nice implementation that makes sure only two processes are running with redis (and redis-py) is as simple as the following. (Considering you know redis, since you know celery)
from redis import StrictRedis
redis = StrictRedis('localhost', '6379')
locks = ['compute:lock1', 'compute:lock2']
for key in locks:
lock = redis.lock(key, blocking_timeout=5)
acquired = lock.acquire()
if acquired:
do_huge_computation()
lock.release()
break
print("Gonna try next possible slot")
if not acquired:
raise SystemLimitsReached("Already at max capacity !")
This way you make sure only two running processes can exist in the system. A third processes will block in the line lock.acquire() for blocking_timeout seconds, if the locking was successful, acquired would be True, else it's False and you'd tell your user to wait !
I had the same requirement sometime in the past and what I ended up coding was something like the solution above. In particular
This has the least amount of race conditions possible
It's easy to read
Doesn't depend on a sysadmin, suddenly doubling the concurrency of workers under load and blowing up the whole system.
You can also implement the limit per user, meaning each user can have 2 simultaneous running jobs, by only changing the lock keys from compute:lock1 to compute:userId:lock1 and lock2 accordingly. You can't do this one with vanila celery.

First of all you need to limit concurrency on your worker (docs):
celery -A proj worker --loglevel=INFO --concurrency=2 -n <worker_name>
This will help to make sure that you do not have more than 2 active tasks even if you will have errors in the code.
Now you have 2 ways to implement task number validation:
You can use inspect to get number of active and scheduled tasks:
from celery import current_app
def start_job():
inspect = current_app.control.inspect()
active_tasks = inspect.active() or {}
scheduled_tasks = inspect.scheduled() or {}
worker_key = 'celery#%s' % <worker_name>
worker_tasks = active_tasks.get(worker_key, []) + scheduled_tasks.get(worker_key, [])
if len(worker_tasks) >= 2:
raise MyCustomException('It is impossible to start more than 2 tasks.')
else:
my_task.delay()
You can store number of currently executing tasks in DB and validate task execution based on it.
Second approach could be better if you want to scale your functionality - introduce premium users or do not allow to execute 2 requests by one user.

First
You need the first part of SpiXel's solution. According to him, "you only create two worker processes with concurrency=1".
Second
Set the time out for the task waiting in the queue, which is set CELERY_EVENT_QUEUE_TTL and the queue length limit according to how to limit number of tasks in queue and stop feeding when full?.
Therefore, when the two work running jobs, and the task in the queue waiting like 10 sec or any period time you like, the task will be time out. Or if the queue has been fulfilled, new arrival tasks will be dropped out.
Third
you need extra things to deal with notifying "users when jobs are rejected because the server is busy".
Dead Letter Exchanges is what you need. Every time a task is failed because of the queue length limit or message timeout. "Messages will be dropped or dead-lettered from the front of the queue to make room for new messages once the limit is reached."
You can set "x-dead-letter-exchange" to route to another queue, once this queue receive the dead lettered message, you can send a notification message to users.

Is reading a global collections.deque from within a Flask request safe?

I have a Flask application that is supposed to display the result of a long running function to the user on a specified route. The result is about to change every hour or so. In order to avoid the user having to wait for the result, I want to have it cached somewhere in the application and re-compute it in specific intervals in the background (e.g. every hour) so that no user request ever has to wait for the long running computation function.
The idea I came up with to solve this is as follows, however, I am not completely sure whether this is really "safe" to do in a production environment with a multi-threaded or even multi-processed webserver such as waitress, eventlet, gunicorn or what not.
To re-compute the result in the background, I use a BackgroundScheduler from the APScheduler library.
The result is then left-appended in a collections.deque object which is registered as a module-wide variable (since there is no better possibility to save application wide globals in a Flask application as far as I know?!). Since the maximum size of the deque is set as 2, old results will pop out on the right side of the deque as new ones come in.
A Flask view now returns deque[0] to the requester which should always be the newest result. I decided for deque over Queue since the latter has no built-in possibility to read the first item without removing it.
Thus, it is guaranteed that no user ever has to wait for the result because the old one only disappears from "cache" in the very moment the new one comes in.
See below for a minimal example of this. When running the script and hitting http://localhost:5000, one can see the caching in action - "Job finished at" should never be later than 10 seconds plus some very short time for re-computing it behind "Current time", still one should never have to wait the time.sleep(5) seconds from the job function until the request returns.
Is this a valid implementation for the given requirement that will also work in a production-ready WSGI server setting or should this be accomplished differently?
from flask import Flask
from apscheduler.schedulers.background import BackgroundScheduler
import time
import datetime
from collections import deque
# a global deque that is filled by APScheduler and read by a Flask view
deque = deque(maxlen=2)
# a function filling the deque that is executed in regular intervals by APScheduler
def some_long_running_job():
print('complicated long running job started...')
time.sleep(5)
job_finished_at = datetime.datetime.now()
deque.appendleft(job_finished_at)
# a function setting up the scheduler
def start_scheduler():
scheduler = BackgroundScheduler()
scheduler.add_job(some_long_running_job,
trigger='interval',
seconds=10,
next_run_time=datetime.datetime.utcnow(),
id='1',
name='Some Job name'
)
scheduler.start()
# a flask application
app = Flask(__name__)
# a flask route returning an item from the global deque
#app.route('/')
def display_job_result():
current_time = datetime.datetime.now()
job_finished_at = deque[0]
return '''
Current time is: {0} <br>
Job finished at: {1}
'''.format(current_time, job_finished_at)
# start the scheduler and flask server
if __name__ == '__main__':
start_scheduler()
app.run()

Thread-safety is not enough if you run multiple processes:
Even though collections.deque is thread-safe:
Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.
Source: https://docs.python.org/3/library/collections.html#collections.deque
Depending on your configuration, your webserver might run multiple workers in multiple processes, so each of those processes has their own instance of the object.
Even with one worker, thread-safety might not be enough:
You might have selected an asynchronous worker type. The asynchronous worker won't know when it's safe to yield and your code would have to be protected against scenarios like this:
Worker for request 1 reads value a and yields
Worker for request 2 also reads value a, writes a + 1 and yields
Worker for request 1 writes value a + 1, even though it should be a + 1 + 1
Possible solutions:
Use something outside of the Flask app to store the data. This can be a database, in this case preferably an in-memory database like Redis. Or if your worker type is compatible with the multiprocessing module, you can try to use multiprocessing.managers.BaseManager to provide your Python object to all worker processes.
Are global variables thread safe in flask? How do I share data between requests?
How can I provide shared state to my Flask app with multiple workers without depending on additional software?
Store large data or a service connection per Flask session

Optimizing Push Task Queues

I use Google App Engine (Python) as the backend of a mobile game, which includes social network integration (twitter) and global & relative leaderboards. My application makes use of two task queues, one for building out the relationships between players, and one for updating those objects when a player's score changes.
Model
class RelativeUserScore(ndb.Model):
ID_FORMAT = "%s:%s" # "friend_id:follower_id"
#--- NDB Properties
follower_id = ndb.StringProperty(indexed=True) # the follower
user_id = ndb.StringProperty(indexed=True) # the followed (AKA friend)
points = ndb.IntegerProperty(indexed=True) # user data denormalization
screen_name = ndb.StringProperty(indexed=False) # user data denormalization
profile_image_url = ndb.StringProperty(indexed=False) # user data denormalization
This allows me to build the relative leaderboards by querying for objects where the requesting user is the follower.
Push Task Queues
I basically have two major tasks to be performed:
sync-twitter tasks will fetch the friends / followers from twitter's API, and build out the relative user score models. Friends are checked on user sign up, and again if their twitter friend count changes. Followers are only checked on user sign up. This runs in its own module with F4 instances, and min_idle_instances set to 20 (I'd like to reduce both settings if possible, though the instance memory usage requires at least F2 instances).
- name: sync-twitter
target: sync-twitter # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
retry_parameters:
min_backoff_seconds: 30
max_backoff_seconds: 900
update-leaderboard tasks will update all the user's objects after they play a game (which only takes about 2 minutes to do). This runs in its own module with F2 instances, and min_idle_instances set to 10 (I'd like to reduce both settings if possible).
- name: update-leaderboard
target: update-leaderboard # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
I've already optimized these tasks to make them run asynchronously, and have reduced their run time significantly. Most of the time, the tasks take between .5 to 5 seconds. I've also put both task queues on their own dedicated module, and have automatic scaling turned up pretty high (and are using F4 and F2 server types respectively) However, I'm still running into a few issues.
As you can also see I've tried to max out the bucket_size and max_concurrent_requests, so that these tasks run as fast as possible.
Problems
Every once in a while I get a DeadlineExceededError on the request handler that initiates the call. DeadlineExceededErrors: The API call taskqueue.BulkAdd() took too long to respond and was cancelled.
Every once in a while I get a chunk of similar errors within the tasks themselves (for both task types): "Process terminated because the request deadline was exceeded during a loading request". (Note that this isn't listed as a DeadlineExceededError). The logs show these tasks took up the entire 600 seconds allowed. They end up getting rescheduled, and when they re-run, they only take the expected .5 to 5 seconds. I've tried using AppStats to gain more insight into whats going on, but these calls never get recorded as they get killed before appstats is able to save.
With users updating their score as frequently as every two minutes, the update-leaderboard queue starts to fall behind somewhere around 10K CCU. I'd ideally like to be prepared for at least 100K CCU. (By CCU I'm meaning actual users playing our game, not number of concurrent requests, which is only about 500 front-end api requests/second at 25K users. - I use locust.io to load test)
Potential Optimizations / Questions
My first thought is maybe the first two issues deal with only having a single task queue for each of the task types. Maybe this is happening because the underlying Bigtable is splitting during these calls? (See this article, specifically "Queue Sharding for Stable Performance")
So, maybe sharding each queue into 10 different queues. I'd think problem #3 would also benefit from this queue sharding. So...
1. Any idea as to the underlying causes of problems #1 and #2? Would sharding actually help eliminate these errors?
2. If I do queue sharding, could I keep all the queues on their same respective module, and rely on its autoscaling to meet the demand? Or would I be better off with module per shard?
3. Any way to dynamically scale the sharding?
My next thought is to try and reduce the calls to update-leaderboard tasks. Something where not every complete game translates directly into a leaderboard update. But I'd need something where if the user only plays one game, its guaranteed to update the objects eventually. Any suggestions on implementing this reduction?
Finally, all of the modules' auto scaling parameters and the queue's parameters were set arbitrarily, trying to err on the side of maxing these out. Any advice on setting these appropriately so that I'm not spending any more resources than I need?

Django asynchronous requests to remote api

I have the following code (simplified version)
for data in my_data_array:
res = api_request(data)
#write result to db
These request may take some time and there are a lot of them. How can I make each iteration of loop asynchronous and send progress with percentage of completed requests to the front-end with Django.
If I have to use Tornado or Celery, please give me the links with information how to integrate Django with them.

You will need Celery (or other async task queue). To integrate it with Django, see http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html. I recommend to use Celery with Redis, because Redis is often used as a cache, so you don't need to install another backend for Celery (mostly RabbitMQ).
To get the progress bar, count total number of tasks (len(my_data_array)), store the value in cache (e.g. key total_count) and add the second key (e.g. complete_count) with zero value. In every task that completes, increase the complete_count value.
Last step is to query the status. It is just a simple view that loads these two values from cache and returns to the user (html/json).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.