GAE: Does execution continue after hitting "Exceeded soft private memory limit"? - python

One of my GAE task-queue requests exceeded the soft memory limit (log below). My understanding of the soft memory limit is that it lets the request complete and then after it finishes, it shuts down the instance.
However, from the logs, it looks like when I hit the soft memory limit, execution stops. I see no more logging code after the memory limit message and I've inspected my state and it does not look like the request is completing. I'm not sure if it matters, but this request is executing in within a deferred library TaskQueue.
So, if a TaskQueue hits a soft private memory limit, does execution continue until the request completes or does it immediately halt? Is it possible that only logging code is no longer recorded?
Log:
2012-04-11 23:45:13.203
Exceeded soft private memory limit with 145.848 MB after servicing 3 requests total
W 2012-04-11 23:45:13.203
After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.

What happens here is that handler at the end checks the memory status, if it is above the limit it will log an error and will shutdown the instance.
Since the task has completed successfully (you can see it terminates will status 200) it will not retry it.
When during handler execution the memory status is way above the memory limit, the handler will shutdown the instance and will return error 500, at this case the task will retry.

From my experience: if your instance hits soft memory hit, your request still would be finished, but the response status would be 500.

Related

Memory leak in simple Google App Engine example

I seem to have a memory leak in my Google App Engine app but I cannot figure out why.
After narrowing down the lines of code responsible, I have reduced the problem to a simple cron job that run regularly and all it does is load some entities using a query.
I include the memory usage using logging.info(runtime.memory_usage()) and I can see that the memory usage increases from one call to the next until it exceeds the soft private memory limit.
Below is the code I have used:
class User(ndb.Model):
_use_cache = False
_use_memcache = False
name = ndb.StringProperty(required=True)
...[OTHER_PROPERTIES]...
class TestCron(webapp2.RequestHandler):
def get(self):
is_cron = self.request.headers.get('X-AppEngine-Cron') == 'true'
if is_cron:
logging.info("Memory before keys:")
logging.info(runtime.memory_usage())
keys = models.User.query().fetch(1000, keys_only=True)
logging.info("Memory before get_multi:")
logging.info(runtime.memory_usage())
user_list = ndb.get_multi(keys)
logging.info("Memory after:")
logging.info(runtime.memory_usage())
logging.info(len(user_list))
app = webapp2.WSGIApplication([
('/test_cron', TestCron)
], debug=True)
And in cron.yaml I have:
- description: Test cron
url: /test_cron
schedule: every 1 mins from 00:00 to 23:00
When running this task every minute, every 2 iterations it has to restart a new instance. The first time it starts with 36mb, and on completion it says
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
But on the second execution it starts with 107mb of memory used (meaning it didn't clear the memory from the previous iteration?), and it exceeds the soft private memory limit and terminates the process by saying:
Exceeded soft private memory limit of 128 MB with 134 MB after servicing 6 requests total
After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application.
Below is the full output of both logs:
The output just alternates between these 2 logs. Note that I have disabled the cache in the definition of the model, so shouldn't the memory usage be reset at every function call?

Why Does the Google Sheet API and Google Drive API consume so much memory upon requests [duplicate]

A user of my application attempted to send a file as an email attachment using my application. However, doing so raised the following exception which I'm having trouble deciphering
Exceeded soft private memory limit with 192.023 MB after servicing
2762 requests total
While handling this request, the process that handled this request was
found to be using too much memory and was terminated. This is likely to
cause a new process to be used for the next request to your application.
If you see this message frequently, you may have a memory leak in
your application.
What is the "soft private memory limit" and what was likely to bring about this exception?
The "soft private memory limit" is the memory limit at which App Engine will stop an instance from receiving any more requests, wait for any outstanding requests, and terminate the instance. Think of it as a graceful shutdown when you're using too much memory.
Hitting the soft limit once in a while is ok since all your requests finish as they should. However, every time this happens, your next request may start up a new instance which may have a latency impact.
I assume you are using the lowest-class frontend or backend instance. (F1 or B1 class) Both have 128 MB memory quota, so your app most likely went over this quota limit. However, this quota appears to be not strictly enforced and Google have some leniency in this (thus the term soft limit), I had several F1 app instances consuming ~200MB of memory for minutes before being terminated by the App Engine.
Try increasing your instance class to the next higher-level class (F2 or B2) that have 256MB memory quota and see if the error re-occurs. Also, do some investigation whether the error is reproducible every time you send e-mail with attachments. Because it's possible that what you are seeing is the symptom but not the cause, and the part of your app that consumes lots of memory lies somewhere else.

Hanging on last Backend (or Module) memcache writes until /_ah/stop

I have a batch processing Backend (B4) on python27 runtime with threading enabled. It does a bunch of unpickle/pickle and Numpy/array stuff.
Recently I noticed that I was getting much higher backend charges, which would hit quota almost every time. I migrated to Modules (also B4), thinking that might solve it since I saw the "backends are being removed" notice. However, I still see the same issue.
What seem to happen is that the code hangs on the last (always the last) memcache write until my quota has been drained. The moment the /_ah/stop call is made (because of quota) the backend wakes up again and resumes its processing, then exits because of the shutdown request.
Here are all the relevant logs:
2013-08-05 15:23:33.962 /BatchRankings 500 19413478ms 0kb instance=0 AppEngine-Google; (+http://code.google.com/appengine)
I 2013-08-05 10:00:04.118
mem usage at start of meleerumble: 24.55078125MB
... lots more logs ...
I 2013-08-05 10:01:03.550
split bots into 18 sections
I 2013-08-05 15:23:03.086
wrote 564 bots to memcache
E 2013-08-05 15:23:33.962
Process terminated because the backend took too long to shutdown.
Look at the timestamp between splitting and writing to memcache. Over 5 hours, when this should be taking a few seconds (and does with all of the other times this code is looped over).
In addition, in my logs just below the actual request handler, I see this:
2013-08-05 15:23:02.938 /_ah/stop 200 5ms 0kb instance=0
So, from what I can tell, it looks like the backend hangs inside of the memcache writing, and the /_ah/stop wakes it up when I hit my quota.
Here is the relevant code between those two logging points:
client = memcache.Client()
if len(botsdict) > 0:
splitlist = dict_split(botsdict,32)
logging.info("split bots into " + str(len(splitlist)) + " sections")
for d in splitlist:
rpcList.append(client.set_multi_async(d))
logging.info("wrote " + str(len(botsdict)) + " bots to memcache")
I don't see how 18 set_multi_async calls can take 5h23m. Can the logs be trusted here? Could it be that the actual code is finished but somehow the exit never registered and the logging was the problem? I'm having to disable my backend processing because of this, since it just eats as much quota as I throw at it.
Any help regarding what on earth is happening here would be much appreciated.

What is the "soft private memory limit" in GAE?

A user of my application attempted to send a file as an email attachment using my application. However, doing so raised the following exception which I'm having trouble deciphering
Exceeded soft private memory limit with 192.023 MB after servicing
2762 requests total
While handling this request, the process that handled this request was
found to be using too much memory and was terminated. This is likely to
cause a new process to be used for the next request to your application.
If you see this message frequently, you may have a memory leak in
your application.
What is the "soft private memory limit" and what was likely to bring about this exception?
The "soft private memory limit" is the memory limit at which App Engine will stop an instance from receiving any more requests, wait for any outstanding requests, and terminate the instance. Think of it as a graceful shutdown when you're using too much memory.
Hitting the soft limit once in a while is ok since all your requests finish as they should. However, every time this happens, your next request may start up a new instance which may have a latency impact.
I assume you are using the lowest-class frontend or backend instance. (F1 or B1 class) Both have 128 MB memory quota, so your app most likely went over this quota limit. However, this quota appears to be not strictly enforced and Google have some leniency in this (thus the term soft limit), I had several F1 app instances consuming ~200MB of memory for minutes before being terminated by the App Engine.
Try increasing your instance class to the next higher-level class (F2 or B2) that have 256MB memory quota and see if the error re-occurs. Also, do some investigation whether the error is reproducible every time you send e-mail with attachments. Because it's possible that what you are seeing is the symptom but not the cause, and the part of your app that consumes lots of memory lies somewhere else.

google appengine/python: Can I depend on Task queue retry on failure to keep insertions to a minimum?

Say my app has a page on which people can add comments.
Say after each comment is added a taskqueue worker is added.
So if a 100 comments are added a 100 taskqueue insertions are made.
(note: the above is a hypothetical example to illustrate my question)
Say I wanted to ensure that the number of insertions are kept to a
minimum (so I don't run into the 10k insertion limit)
Could I do something as follows.
a) As each comment is added call taskqueue.add(name="stickytask",
url="/blah")
- Since this is a named taskqueue it will not be re-inserted if a
taskqueue of the same name exists.
b) The /blah worker url reads the newly added comments, processes the
first one and
than if more comments exist to be processed returns a status code
other than 200
- This will ensure that the task is retried and at next try will
process the next comment
and so on.
So all 100 comments are processed with 1 or a few taskqueue insertion.
(Note: If there is a lull
in activity where no new comments are added and all comments are
processed than the
next added comment will result in a new taskqueue insertion. )
However from the docs (see snippet below) it notes that "the system
will back off gradually". Does this mean that on each "non 200" Http
status code returned a delay is inserted into the next retry?
From the docs:
If the execution of a particular Task fails (by returning any HTTP
status code other than 200 OK), App Engine will attempt to retry until
it succeeds. The system will back off gradually so as not to flood
your application with too many requests, but it will retry failed
tasks at least once a day at minimum.
There's no reason to fake a failure (and incur backoff &c) -- that's a hacky and fragile arrangement. If you fear that simply scheduling a task per new comment might exceed the task queues' currently strict limits, then "batch up" as-yet-unprocessed comments in the store (and possibly also in memcache, I guess, for a potential speedup, but, that's optional) and don't schedule any task at that time.
Rather, keep a cron job executing (say) every minute, which may deal with some comments or schedule an appropriate number of tasks to deal with pending comments -- as you schedule tasks from just one cron job, it's easy to ensure you're never scheduling over 10,000 per day.
Don't let task queues make you forget that cron is also there: a good architecture for "batch-like" processing will generally use both cron jobs and queued tasks to simplify its overall design.
To maximize the amount of useful work accomplished in a single request (from ether a queued task or a cron one), consider an approach based on monitoring your CPU usage -- when CPU is the factor limiting the work you can perform per request, this can help you get as many small schedulable units of work done in a single request as is prudently feasible. I think this approach is more solid than waiting for an OverQuotaError, catching it and rapidly closing up, as that may have other consequences out of your app's control.

Categories

Resources