GAE: how to quantify Frontend Instance Hours usage?

GAE: how to quantify Frontend Instance Hours usage? - python

We are developing a Python server on Google App Engine that should be capable of handling incoming HTTP POST requests (around 1,000 to 3,000 per minute in total). Each of the requests will trigger some datastore writing operations. In addition we will write a web-client as a human-usable interface for displaying and analyse stored data.
First we are trying to estimate usage for GAE to have at least an approximation about the costs we would have to cover in future based on the number of requests. As for datastore write operations and data storage size it is fairly easy to come up with an approximate number, though it is not so obvious for the frontend and backend instance hours.
As far as I understood each time a request is coming in, an instance is being started which then is running for 15 minutes. If a request is coming in within these 15 minutes, the same instance would have been used. And now it is getting a bit tricky I think: if two requests are coming in at the very same time (which is not so odd with 3,000 requests per minute), is Google firing up another instance, hence Google would count an addition of (at least) 0.15 instance hours? Also I am not quite sure how a web-client that is constantly performing read operations on the datastore in order to display and analyse data would increase the instance hours.
Does anyone know a reliable way of counting instance hours and creating meaningful estimations? We would use that information to know how expensive it would be to run an application on GAE in comparison to just ordering a web server.

There's no 100% sure way to assess the number of frontend instance hours. An instance can serve more than one request at a time. In addition, the algorithm of the scheduler (the system that starts the instances) is not documented by Google.
Depending on how demanding your code is, I think you can expect a standard F1 instance to hold up to 5 requests in parallel, that's a maximum. 2 is a safer bet.
My recommendation, if possible, would be to simulate standard interaction on your website with limited number of users, and see how the number of instances grow, then extrapolate.
For example, let's say you simulate 100 requests per minute during 2 hours, and you see that GAE spawns 5 instances for that, then you can extrapolate that a continuous load of 3000 requests per minute would require 150 instances during the same 2 hours. Then I would double this number for safety, and end up with an estimate of 300 instances.

Related

Google Cloud Datastore latency

I have an API deployed on Cloud Run where each request results in a read + write to Cloud Datastore. A non-trivial amount of the requests are first timers (read from Datastore will return null), so adding caching in front of it may not be too helpful.
Over the past month, the average wall time around calling Datastore and having the data (data = client.get(key, eventual=True)) is 48ms. The payloads are small (a list of dicts, with 10 elements on average, and each dict has two floats).
I'm not sure if I should say that latency is high, but my API has a budget of 100ms to do all that it needs to do and return. If just the data fetching takes ~50% of that time, I'm looking for ways to optimize things.
Questions:
In general, how does 50ms sound for fairly small payloads, fetched by key, from within GCP?
What should I expect (in terms of latency) from Memorystore within GCP?

Assuming you are using Cloud Run and Datastore on the same location, I would say that 50ms is around the expected latency you'd have for reading on datastore, the size of the payload does not matter as much for reads (10 - 1000 document reads do not make a big difference on time of processing/propagating).
Since you have such a small window for your API to operate, this could indeed be a problem if some unnexpected delays occur.
I have never used Memorystore so I can't say what you could expect of actual latency but it might be a better option given that every ms to your app counts.

How can I schedule or queue api calls to maintain rate limit?

I am trying to continuously crawl a large amount of information from a site using the REST api they provide. I have following constraints-
Stay within api limit (5 calls/sec)
Utilising the full limit (making exactly 5 calls per second, 5*60 calls per minute)
Each call will be with different parameters (params will be fetched from db or in-memory cache)
Calls will be made from AWS EC2 (or GAE) and processed data will be stored in AWS RDS/DynamoDB
For now I am just using a scheduled task that runs a python script every minute- and the script makes 10-20 api calls-> processes response-> stores data to DB. I want to scale this procedure (make 5*60= 300 calls per minute) and make it manageable via code (pushing new tasks, pause/resuming them easily, monitoring failures, changing call frequency).
My question is- what are the best available tools to achieve this? Any suggestion/guidance/link is appreciated.
I do know the names of some task queuing frameworks like Celery/RabbitMQ/Redis, but I do not know much about them. However I am wiling to learn one or each of those if these are the best tools to solve my problem, want to hear from SO veterans before jumping in ☺
Also please let me know if there's any other AWS service I should look to use (SQS or AWS Data Pipeline?) to make any step easier.

You needn't add an external dependency just for rate-limiting, as your use case is rather straightforward.
I can think of two options:
Modify the script (that currently wakes up every minute and makes 10-20 API calls) to wake up every second and make 5 calls (sequentially or in parallel).
In your current design, your API calls might not be properly distributed across 1 minute, i.e. you might be making all your 10-20 calls in the first, say, 20 seconds.
If you change that script to run every second, your API call rate will be more balanced.
Change your Python script to a long running daemon, and use a Rate Limiter library, such as this. You can configure the latter to make 1 call per x seconds.

Azure Machine Learning Request Response latency

I have made an Azure Machine Learning Experiment which takes a small dataset (12x3 array) and some parameters and does some calculations using a few Python modules (a linear regression calculation and some more). This all works fine.
I have deployed the experiment and now want to throw data at it from the front-end of my application. The API-call goes in and comes back with correct results, but it takes up to 30 seconds to calculate a simple linear regression. Sometimes it is 20 seconds, sometimes only 1 second. I even got it down to 100 ms one time (which is what I'd like), but 90% of the time the request takes more than 20 seconds to complete, which is unacceptable.
I guess it has something to do with it still being an experiment, or it is still in a development slot, but I can't find the settings to get it to run on a faster machine.
Is there a way to speed up my execution?
Edit: To clarify: The varying timings are obtained with the same test data, simply by sending the same request multiple times. This made me conclude it must have something to do with my request being put in a queue, there is some start-up latency or I'm throttled in some other way.

First, I am assuming you are doing your timing test on the published AML endpoint.
When a call is made to the AML the first call must warm up the container. By default a web service has 20 containers. Each container is cold, and a cold container can cause a large(30 sec) delay. In the string returned by the AML endpoint, only count requests that have the isWarm flag set to true. By smashing the service with MANY requests(relative to how many containers you have running) can get all your containers warmed.
If you are sending out dozens of requests a instance, the endpoint might be getting throttled. You can adjust the number of calls your endpoint can accept by going to manage.windowsazure.com/
manage.windowsazure.com/
Azure ML Section from left bar
select your workspace
go to web services tab
Select your web service from list
adjust the number of calls with slider
By enabling debugging onto your endpoint you can get logs about the execution time for each of your modules to complete. You can use this to determine if a module is not running as you intended which may add to the time.
Overall, there is an overhead when using the Execute python module, but I'd expect this request to complete in under 3 secs.

Optimizing Push Task Queues

I use Google App Engine (Python) as the backend of a mobile game, which includes social network integration (twitter) and global & relative leaderboards. My application makes use of two task queues, one for building out the relationships between players, and one for updating those objects when a player's score changes.
Model
class RelativeUserScore(ndb.Model):
ID_FORMAT = "%s:%s" # "friend_id:follower_id"
#--- NDB Properties
follower_id = ndb.StringProperty(indexed=True) # the follower
user_id = ndb.StringProperty(indexed=True) # the followed (AKA friend)
points = ndb.IntegerProperty(indexed=True) # user data denormalization
screen_name = ndb.StringProperty(indexed=False) # user data denormalization
profile_image_url = ndb.StringProperty(indexed=False) # user data denormalization
This allows me to build the relative leaderboards by querying for objects where the requesting user is the follower.
Push Task Queues
I basically have two major tasks to be performed:
sync-twitter tasks will fetch the friends / followers from twitter's API, and build out the relative user score models. Friends are checked on user sign up, and again if their twitter friend count changes. Followers are only checked on user sign up. This runs in its own module with F4 instances, and min_idle_instances set to 20 (I'd like to reduce both settings if possible, though the instance memory usage requires at least F2 instances).
- name: sync-twitter
target: sync-twitter # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
retry_parameters:
min_backoff_seconds: 30
max_backoff_seconds: 900
update-leaderboard tasks will update all the user's objects after they play a game (which only takes about 2 minutes to do). This runs in its own module with F2 instances, and min_idle_instances set to 10 (I'd like to reduce both settings if possible).
- name: update-leaderboard
target: update-leaderboard # target version / module
bucket_size: 100 # default is 5, max is 100?
max_concurrent_requests: 1000 # default is 1000. what is the max?
rate: 100/s # default is 5/s. what is the max?
I've already optimized these tasks to make them run asynchronously, and have reduced their run time significantly. Most of the time, the tasks take between .5 to 5 seconds. I've also put both task queues on their own dedicated module, and have automatic scaling turned up pretty high (and are using F4 and F2 server types respectively) However, I'm still running into a few issues.
As you can also see I've tried to max out the bucket_size and max_concurrent_requests, so that these tasks run as fast as possible.
Problems
Every once in a while I get a DeadlineExceededError on the request handler that initiates the call. DeadlineExceededErrors: The API call taskqueue.BulkAdd() took too long to respond and was cancelled.
Every once in a while I get a chunk of similar errors within the tasks themselves (for both task types): "Process terminated because the request deadline was exceeded during a loading request". (Note that this isn't listed as a DeadlineExceededError). The logs show these tasks took up the entire 600 seconds allowed. They end up getting rescheduled, and when they re-run, they only take the expected .5 to 5 seconds. I've tried using AppStats to gain more insight into whats going on, but these calls never get recorded as they get killed before appstats is able to save.
With users updating their score as frequently as every two minutes, the update-leaderboard queue starts to fall behind somewhere around 10K CCU. I'd ideally like to be prepared for at least 100K CCU. (By CCU I'm meaning actual users playing our game, not number of concurrent requests, which is only about 500 front-end api requests/second at 25K users. - I use locust.io to load test)
Potential Optimizations / Questions
My first thought is maybe the first two issues deal with only having a single task queue for each of the task types. Maybe this is happening because the underlying Bigtable is splitting during these calls? (See this article, specifically "Queue Sharding for Stable Performance")
So, maybe sharding each queue into 10 different queues. I'd think problem #3 would also benefit from this queue sharding. So...
1. Any idea as to the underlying causes of problems #1 and #2? Would sharding actually help eliminate these errors?
2. If I do queue sharding, could I keep all the queues on their same respective module, and rely on its autoscaling to meet the demand? Or would I be better off with module per shard?
3. Any way to dynamically scale the sharding?
My next thought is to try and reduce the calls to update-leaderboard tasks. Something where not every complete game translates directly into a leaderboard update. But I'd need something where if the user only plays one game, its guaranteed to update the objects eventually. Any suggestions on implementing this reduction?
Finally, all of the modules' auto scaling parameters and the queue's parameters were set arbitrarily, trying to err on the side of maxing these out. Any advice on setting these appropriately so that I'm not spending any more resources than I need?

Extreme django performance issues after launching

We have recently launched a django site which amongst other things, has a screen representing all sorts of data. A request to the server is sent every 10 seconds to get new data. The average response size is 10kb.
The site is working on approx. 30 clients, meaning every client sends a get request every 10 seconds.
When locally testing, responses came back after 80ms. After deployment with 30~ users, we're taking up to 20 seconds!!
So the initial thought is that my code sucks. I went through all my queries and did everything i can to optimize then and reduce calls to the database (which was hard, nearly everything is somwething like object.filter(id=num) and my tables have less thab 5k rows atm...)
But then i noticed the same issue occurs in the admin panel! Which is clearly optimized and doesn't have my perhaps inefficient code, since I didn't write it. Opening the users tab takes 30 seconds at certain requests!!
So, what is it? Do I argue with the company sysadmins and demand a better server? They say we dont need better hardware (running on dual core 2.67ghz and 4gb ram, which isnt a lot, but still shouldn't be THAT slow)
Doesn't the fact that the admin site is slow imply that this is a hardware issue?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.