How to autoscale Python webjob based on Service Bus Queues in Azure?

How to autoscale Python webjob based on Service Bus Queues in Azure? - python

When a service bus queue contains any messages, I want my python webjob to scale out so the messages are processed faster.
I have a python webjob which feeds off a Service Bus Queue. The queue is populated each day at midnight and can have between 0 and around 400k messages added to it.
The bottleneck in the current processing is where some data needs to be downloaded, which means that scaling up the webjob won't help as much as parallelising it.
I scaled it up to 10 instances from 1 but that doesn't appear to affect the rate at which messages are consumed from the queue, which suggests that this isn't working the way I expect. When it was on 1 instance it processed ~1.53k in an hour. The hour since scaling out to 10 instances it processed ~1.5k messages (so basically, no difference.)
The code I'm using to interface with the queue is this (if there is a better way of doing this in Python please let me know!):
from azure.servicebus import ServiceBusService, Message, Queue
bus_service = ServiceBusService(
service_namespace= <namespace>,
shared_access_key_name='RootManageSharedAccessKey',
shared_access_key_value=<key>)
while(1):
msg = bus_service.receive_queue_message(<queue>, peek_lock=False, timeout=1)
if msg.body is None:
print("No messages in queue")
time.sleep(5)
else:
number = int(msg.body.decode('utf-8'))
print(number)
I know in C# there is a QueueTrigger attribute for webjobs but I don't know of anything similar for Python.
I would expect that the more instances running in the app service, the faster messages would be processed, so why isn't that what I see?

The bottleneck in the program was the database, which was at maximum. Adding more instances just increased the number of calls on the database and therefore slowed down each instance.
Scaling up the database and optimising the database calls improved performance and also now means that multiple instances can be spun up to further increase throughput.

Related

Update single database value on a website with many users

For this question, I'm particularly struggling with how to structure this:
User accesses website
User clicks button
Value x in database increments
My issue is that multiple people could potentially be on the website at the same time and click the button - I want to make sure each user is able to click the button, and update the value and read the incremented value too, but I don't know how to circumvent any synchronisation/concurrency issues.
I'm using flask to run my website backend, and I'm thinking of using MongoDB or Redis to store my single value that needs to be updated.
Please comment if there is any lack of clarity in my question, but this is a problem I've really been struggling with how to solve.
Thanks :)

redis, I think you can use redis hincrby command, or create a distributed lock to make sure there is only one writer at the same time and only the lock holding writer can make the update in your flask framework. Make sure you release the lock after certain period of time or after the writer done using the lock.
mysql, you can start a transaction, and make the update and commit the change to make sure the data is right

To solve this problem I would suggest you follow a micro service architecture.
A service called worker would handle the flask route that's called when the user clicks on the link/button on the website. It would generate a message to be sent to another service called queue manager that maintains a queue of increment/decrement messages from the worker service.
There can be multiple worker service instances running concurrently but the queue manager is a singleton service that takes the messages from each service and adds them to the queue. If the queue manager is busy the worker service will either timeout and retry or return a failure message to the user. If the queue is full a response is sent back to the worker to retry n number of times, and you can count down that n.
A third service called storage manager is run every time the queue is not empty, this service sends the messages to the storage solution (whatever mongo, redis, good ol' sql) and it will ensure the increment/decrement messages are handled in the order they were received in the queue. You could also include a time stamp from the worker service in the message if you wanted to use that to sort the queue.
Generally whatever hosting environment for flask will use gunicorn as the production web server and support multiple concurrent worker instances to handle the http requests, and this would naturally be your worker service.
How you build and coordinate the queue manager and storage manager is down to implementation preference, for instance you could use something like Google Cloud pub/sub system to send messages between different deployed services but that's just off the top of my head. There's a load of different ways to do it, and you're in the best position to decide that.
Without knowing more details about what you're trying to achieve and what's the requirements for concurrent traffic I can't go into greater detail, but that's roughly how I've approached this type of problem in the past. If you need to handle more concurrent users at the website, you can pick a hosting solution with more concurrent workers. If you need the queue to be longer, you can pick a host with more memory, or else write the queue to an intermediate storage. This will slow it down but will make recovering from a crash easier.
You also need to consider handling when messages fail between different services, how to recover from a service crashing or the queue filling up.
EDIT: Been thinking about this over the weekend and a much simpler solution is to just create a new record in a table directly from the flask route that handles user clicks. Then to get your total you just get a count from this table. Your bottlenecks are going to be how many concurrent workers your flask hosting environment supports and how many concurrent connections your storage supports. Both of these can be solved by throwing more resources at them.

How much data can a websocket consumer handle?

I've built a simple application using Django Channels where a Consumer receives data from an external service, the consumer than sends this data to some subscribers on another consumer.
I'm new to websockets and i had a little concern: the consumer is receiving a lot of data, in the order of 100 (or more) JSON records per second. At what point should i be worried about this service crashing or running into performance issues? Is there some sort of limit for what i'm doing?

there is not explicit limit, however it is worth nothing that for each instances (open connection) of the consumer you can only process one WS message at once.
So if you have a single websocket connection and are sending lots and lots of WS messages down that connection if the consumer does work on these (eg write them to the db) the queue of messages might fill up and you will get an error.
their are a few solutions to this,
Open multiple ws connections and share out the load
in your consumer before doing any work that will take time put it onto a work queue and have some background tasks (that you do not await) consume it.
For this second option it is probably a good idea to create this background queue in your on_connect method and handle shutting it down/waiting for it to flush everything in the on disconnect method.
--
if you are expecting a massive amount of data and don't want to fork out for a costly (high memory) VM you might be better of using a server that is no written in python.
My suggestion for a python developer would be https://docs.vapor.codes/4.0/websockets/ this is a Swift server framework, Swift linguistically if very close to TypeAnotated python so is easier than other high performance options to pick up for a python dev.

limited number of user-initiated background processes

I need to allow users to submit requests for very, very large jobs. We are talking 100 gigabytes of memory and 20 hours of computing time. This costs our company a lot of money, so it was stipulated that only 2 jobs could be running at any time, and requests for new jobs when 2 are already running would be rejected (and the user notified that the server is busy).
My current solution uses an Executor from concurrent.futures, and requires setting the Apache server to run only one process, reducing responsiveness (current user count is very low, so it's okay for now).
If possible I would like to use Celery for this, but I did not see in the documentation any way to accomplish this particular setting.
How can I run up to a limited number of jobs in the background in a Django application, and notify users when jobs are rejected because the server is busy?

I have two solutions for this particular case, one an out of the box solution by celery, and another one that you implement yourself.
You can do something like this with celery workers. In particular, you only create two worker processes with concurrency=1 (or well, one with concurrency=2, but that's gonna be threads, not different processes), this way, only two jobs can be done asynchronously. Now you need a way to raise exceptions if both jobs are occupied, then you use inspect, to count the number of active tasks and throw exceptions if required. For implementation, you can checkout this SO post.
You might also be interested in rate limits.
You can do it all yourself, using a locking solution of choice. In particular, a nice implementation that makes sure only two processes are running with redis (and redis-py) is as simple as the following. (Considering you know redis, since you know celery)
from redis import StrictRedis
redis = StrictRedis('localhost', '6379')
locks = ['compute:lock1', 'compute:lock2']
for key in locks:
lock = redis.lock(key, blocking_timeout=5)
acquired = lock.acquire()
if acquired:
do_huge_computation()
lock.release()
break
print("Gonna try next possible slot")
if not acquired:
raise SystemLimitsReached("Already at max capacity !")
This way you make sure only two running processes can exist in the system. A third processes will block in the line lock.acquire() for blocking_timeout seconds, if the locking was successful, acquired would be True, else it's False and you'd tell your user to wait !
I had the same requirement sometime in the past and what I ended up coding was something like the solution above. In particular
This has the least amount of race conditions possible
It's easy to read
Doesn't depend on a sysadmin, suddenly doubling the concurrency of workers under load and blowing up the whole system.
You can also implement the limit per user, meaning each user can have 2 simultaneous running jobs, by only changing the lock keys from compute:lock1 to compute:userId:lock1 and lock2 accordingly. You can't do this one with vanila celery.

First of all you need to limit concurrency on your worker (docs):
celery -A proj worker --loglevel=INFO --concurrency=2 -n <worker_name>
This will help to make sure that you do not have more than 2 active tasks even if you will have errors in the code.
Now you have 2 ways to implement task number validation:
You can use inspect to get number of active and scheduled tasks:
from celery import current_app
def start_job():
inspect = current_app.control.inspect()
active_tasks = inspect.active() or {}
scheduled_tasks = inspect.scheduled() or {}
worker_key = 'celery#%s' % <worker_name>
worker_tasks = active_tasks.get(worker_key, []) + scheduled_tasks.get(worker_key, [])
if len(worker_tasks) >= 2:
raise MyCustomException('It is impossible to start more than 2 tasks.')
else:
my_task.delay()
You can store number of currently executing tasks in DB and validate task execution based on it.
Second approach could be better if you want to scale your functionality - introduce premium users or do not allow to execute 2 requests by one user.

First
You need the first part of SpiXel's solution. According to him, "you only create two worker processes with concurrency=1".
Second
Set the time out for the task waiting in the queue, which is set CELERY_EVENT_QUEUE_TTL and the queue length limit according to how to limit number of tasks in queue and stop feeding when full?.
Therefore, when the two work running jobs, and the task in the queue waiting like 10 sec or any period time you like, the task will be time out. Or if the queue has been fulfilled, new arrival tasks will be dropped out.
Third
you need extra things to deal with notifying "users when jobs are rejected because the server is busy".
Dead Letter Exchanges is what you need. Every time a task is failed because of the queue length limit or message timeout. "Messages will be dropped or dead-lettered from the front of the queue to make room for new messages once the limit is reached."
You can set "x-dead-letter-exchange" to route to another queue, once this queue receive the dead lettered message, you can send a notification message to users.

Optimizing Push Task Queues

I use Google App Engine (Python) as the backend of a mobile game, which includes social network integration (twitter) and global & relative leaderboards. My application makes use of two task queues, one for building out the relationships between players, and one for updating those objects when a player's score changes.
Model
class RelativeUserScore(ndb.Model):
ID_FORMAT = "%s:%s" # "friend_id:follower_id"
#--- NDB Properties
follower_id = ndb.StringProperty(indexed=True) # the follower
user_id = ndb.StringProperty(indexed=True) # the followed (AKA friend)
points = ndb.IntegerProperty(indexed=True) # user data denormalization
screen_name = ndb.StringProperty(indexed=False) # user data denormalization
profile_image_url = ndb.StringProperty(indexed=False) # user data denormalization
This allows me to build the relative leaderboards by querying for objects where the requesting user is the follower.
Push Task Queues
I basically have two major tasks to be performed:
sync-twitter tasks will fetch the friends / followers from twitter's API, and build out the relative user score models. Friends are checked on user sign up, and again if their twitter friend count changes. Followers are only checked on user sign up. This runs in its own module with F4 instances, and min_idle_instances set to 20 (I'd like to reduce both settings if possible, though the instance memory usage requires at least F2 instances).
- name: sync-twitter
target: sync-twitter # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
retry_parameters:
min_backoff_seconds: 30
max_backoff_seconds: 900
update-leaderboard tasks will update all the user's objects after they play a game (which only takes about 2 minutes to do). This runs in its own module with F2 instances, and min_idle_instances set to 10 (I'd like to reduce both settings if possible).
- name: update-leaderboard
target: update-leaderboard # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
I've already optimized these tasks to make them run asynchronously, and have reduced their run time significantly. Most of the time, the tasks take between .5 to 5 seconds. I've also put both task queues on their own dedicated module, and have automatic scaling turned up pretty high (and are using F4 and F2 server types respectively) However, I'm still running into a few issues.
As you can also see I've tried to max out the bucket_size and max_concurrent_requests, so that these tasks run as fast as possible.
Problems
Every once in a while I get a DeadlineExceededError on the request handler that initiates the call. DeadlineExceededErrors: The API call taskqueue.BulkAdd() took too long to respond and was cancelled.
Every once in a while I get a chunk of similar errors within the tasks themselves (for both task types): "Process terminated because the request deadline was exceeded during a loading request". (Note that this isn't listed as a DeadlineExceededError). The logs show these tasks took up the entire 600 seconds allowed. They end up getting rescheduled, and when they re-run, they only take the expected .5 to 5 seconds. I've tried using AppStats to gain more insight into whats going on, but these calls never get recorded as they get killed before appstats is able to save.
With users updating their score as frequently as every two minutes, the update-leaderboard queue starts to fall behind somewhere around 10K CCU. I'd ideally like to be prepared for at least 100K CCU. (By CCU I'm meaning actual users playing our game, not number of concurrent requests, which is only about 500 front-end api requests/second at 25K users. - I use locust.io to load test)
Potential Optimizations / Questions
My first thought is maybe the first two issues deal with only having a single task queue for each of the task types. Maybe this is happening because the underlying Bigtable is splitting during these calls? (See this article, specifically "Queue Sharding for Stable Performance")
So, maybe sharding each queue into 10 different queues. I'd think problem #3 would also benefit from this queue sharding. So...
1. Any idea as to the underlying causes of problems #1 and #2? Would sharding actually help eliminate these errors?
2. If I do queue sharding, could I keep all the queues on their same respective module, and rely on its autoscaling to meet the demand? Or would I be better off with module per shard?
3. Any way to dynamically scale the sharding?
My next thought is to try and reduce the calls to update-leaderboard tasks. Something where not every complete game translates directly into a leaderboard update. But I'd need something where if the user only plays one game, its guaranteed to update the objects eventually. Any suggestions on implementing this reduction?
Finally, all of the modules' auto scaling parameters and the queue's parameters were set arbitrarily, trying to err on the side of maxing these out. Any advice on setting these appropriately so that I'm not spending any more resources than I need?

RabbitMQ Queued messages keep increasing

We have a Windows based Celery/RabbitMQ server that executes long-running python tasks out-of-process for our web application.
What this does, for example, is take a CSV file and process each line. For every line it books one or more records in our database.
This seems to work fine, I can see the records being booked by the worker processes. However, when I check the rabbitMQ server with the management plugin (the web based management tool) I see the Queued messages increasing, and not coming back down.
Under connections I see 116 connections, about 10-15 per virtual host, all "running" but when I click through, most of them have 'idle' as State.
I'm also wondering why these connections are still open, and if there is something I need to change to make them close themselves:
Under 'Queues' I can see more than 6200 items with state 'idle', and not decreasing.
So concretely I'm asking if these are normal statistics or if I should worry about the Queues increasing but not coming back down and the persistent connections that don't seem to close...
Other than the rather concise help inside the management tool, I can't seem to find any information about what these stats mean and if they are good or bad.
I'd also like to know why the messages are still visible in the queues, and why they are not removed, as the tasks seem t be completed just fine.
Any help is appreciated.

Answering my own question;
Celery sends a result message back for every task in the calling code. This message is sent back via the same AMPQ queue.
This is why the tasks were working, but the queue kept filling up. We were not handling these results, or even interested in them.
I added ignore_result=True to the celery task, so the task does not send result messages back into the queue. This was the main solution to the problem.
Furthermore, the configuration option CELERY_SEND_EVENTS=False was added to speed up celery. If set to TRUE, this option has Celery send events for external monitoring tools.
On top of that CELERY_TASK_RESULT_EXPIRES=3600 now makes sure that even if results are sent back, that they expire after one hour if not picked up/acknowledged.
Finally CELERY_RESULT_PERSISTENT was set to False, this configures celery to not store these result messages on disk. They will vanish when the server crashes, which is fine in our case, as we don't use them.
So in short; if you don't need feedback in your app about if and when the tasks are finished, use ignore_result=True on the celery task, so that no messages are sent back.
If you do need that information, make sure you pick up and handle the results, so that the queue stops filling up.

If you don't need the reliability then you can make your queues transient.
http://celery.readthedocs.org/en/latest/userguide/optimizing.html#optimizing-transient-queues
CELERY_DEFAULT_DELIVERY_MODE = 'transient'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.