App Engine Deferred: Tracking Down Memory Leaks

App Engine Deferred: Tracking Down Memory Leaks - python

We have an App Engine application that writes many files of a relatively large size to Google Cloud Store. These files are CSVs that are dynamically created, so we use Python's StringIO.StringIO as a buffer and csv.writer as the interface for writing to that buffer.
In general, our process looks like this:
# imports as needed
# (gcs is the Google Cloud Store client)
buffer = StringIO.StringIO()
writer = csv.writer(buffer)
# ...
# write some rows
# ...
data = file_buffer.getdata()
filename = 'someFilename.csv'
try:
with gcs.open(filename, content_type='text/csv', mode='w') as file_stream:
file_stream.write(data)
file_stream.close()
except Exception, e:
# handle exception
finally:
file_buffer.close()
As we understand it, the csv.writer does not need to be closed itself. Rather, only the buffer above and the file_stream need be closed.
We run the above process in a deferred, invoked by App Engine's task queue. Ultimately, we get the following error after a few invocations of our task:
Exceeded soft private memory limit of 128 MB with 142 MB after servicing 11 requests total
Clearly, then, there is a memory leak in our application. However, if the above code is correct (which we admit may not be the case), then our only other idea is that some large amount of memory is being held through the servicing of our requests (as the error message suggests).
Thus, we are wondering if some entities are kept by App Engine during the execution of a deferred. We should also note that our CSVs are ultimately written successfully, despite these error messages.

The symptom described isn't necessarily an indication of an application memory leak. Potential alternate explanations include:
the app's baseline memory footprint (which for the scripting-language sandboxes like python can be bigger than the footprint at the instance startup time, see Memory usage differs greatly (and strangely) between frontend and backend) may be too high for the instance class configured for the app/module. To fix - chose a higher memory instance class (which, as a side effect, also means a faster class instance). Alternatively, if the rate of instance killing due to exceeding memory limits is tolerable, just let GAE recycle the instances :)
peaks of activity, especially if multi-threaded request handling is enabled, means higher memory consumption and also potential overloading of the memory garbage collector. Limiting the number of requests performed in parallel, adding (higher) delays in lower priority deferred task processing and other similar measures reducing the average request processing rate per instance can help give the garbage collector a chance to cleanup leftovers from requests. Scalability should not be harmed (with dynamic scaling) as other instances would be started to help with the activity peak.
Related Q&As:
How does app engine (python) manage memory across requests (Exceeded soft private memory limit)
Google App Engine DB Query Memory Usage
Memory leak in Google ndb library

Related

How to reduce memory consumption (RAM) on Python/Django project?

My memory usage on a Django DRF API project increases over time and RAM is getting filled once I reach 50+ API calls.
So far I tried
loaded all models, class variable upfront
used memory profiler, cleaned code as possible to reduce variable usage
added garbage collection : gc.disable() at beginning and gc.enable() at end of code
added ctypes malloc.trim() at end of code etc
setting gunicorn max-requests limit ( this results in more model loading / response time at that moment)
Any suggestions on how to free up memory at the end of each request ?

Due to the way that the CPython interpreter manages memory, it very rarely actually frees any allocated memory. Generally CPython processes will keep growing and growing in memory usage
Since you are using Gunicorn you can set the max_requests setting which will regularly restart your workers and alleviate some "memory leak" issues

Is there a way to limit Ray object storage max memory usage

I'm trying to capitalize on Ray's parallelization model to process a file record by record. The code works beautifully, but the object storage grows fast and ends up crashing. I'm avoiding using ray.get(function.remote()) because it kills the performance because the task is composed of a few million subtasks and the overhead of waiting for a task to finish. Is there a way to set a global limit to the object storage?
#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))
#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)
#the called function returns either 0 or 1.

You can do ray.init(object_store_memory=10**9) to limit the object store to use 1 GB of your system RAM (as opposed to all of it by default).
object_store_memory – The amount of memory (in bytes) to start the
object store with. By default, this is automatically set based on
available system memory.
(see docs on ray.init())
There is more information in the documentation on memory management at https://docs.ray.io/en/releases-1.11.0/ray-core/memory-management.html.

Why Does the Google Sheet API and Google Drive API consume so much memory upon requests [duplicate]

A user of my application attempted to send a file as an email attachment using my application. However, doing so raised the following exception which I'm having trouble deciphering
Exceeded soft private memory limit with 192.023 MB after servicing
2762 requests total
While handling this request, the process that handled this request was
found to be using too much memory and was terminated. This is likely to
cause a new process to be used for the next request to your application.
If you see this message frequently, you may have a memory leak in
your application.
What is the "soft private memory limit" and what was likely to bring about this exception?

The "soft private memory limit" is the memory limit at which App Engine will stop an instance from receiving any more requests, wait for any outstanding requests, and terminate the instance. Think of it as a graceful shutdown when you're using too much memory.
Hitting the soft limit once in a while is ok since all your requests finish as they should. However, every time this happens, your next request may start up a new instance which may have a latency impact.

I assume you are using the lowest-class frontend or backend instance. (F1 or B1 class) Both have 128 MB memory quota, so your app most likely went over this quota limit. However, this quota appears to be not strictly enforced and Google have some leniency in this (thus the term soft limit), I had several F1 app instances consuming ~200MB of memory for minutes before being terminated by the App Engine.
Try increasing your instance class to the next higher-level class (F2 or B2) that have 256MB memory quota and see if the error re-occurs. Also, do some investigation whether the error is reproducible every time you send e-mail with attachments. Because it's possible that what you are seeing is the symptom but not the cause, and the part of your app that consumes lots of memory lies somewhere else.

App Engine instance memory constantly increasing

I'd expect the memory usage of my app engine instances (Python) to be relatively flat after an initial startup period. Each request to my app is short lived, and it seems all memory usage of single request should be released shortly afterwards.
This is not the case in practice however. Below is a snapshot of instance memory usage provided by the console. My app has relatively low traffic so I generally have only one instance running. Over the two-day period in the graph, the memory usage trend is constantly increasing. (The two blips are where two instances were briefly running.)
I regularly get memory exceeded errors so I'd like to prevent this continuous increase of memory usage.
At the time of the snapshot:
Memcache is using less than 1MB
Task queues are empty
Traffic is low (0.2 count/second)
I'd expect the instance memory usage to fall in these circumstances, but it isn't happening.
Because I'm using Python with its automatic garbage collection, I don't see how I could have caused this.
Is this expected app engine behavior and is there anything I can do to fix it?

I found another answer that explains part of what is going on here. I'll give a summary based on that answer:
When using NDB, entities are stored in a context cache, and the context cache is part of your memory usage.
From the documentation, one would expect that memory to be released upon the completion of an HTTP request.
In practice, the memory is not released upon the completion of the HTTP request. Apparently, context caches are reused, and the cache is cleared before its next use, which can take a long time to happen.
For my situation, I am adding _use_cache=False to most entities to prevent them from being stored in the context cache. Because of the way my app works, I don't need the context caches for these entities, and this reduces memory usage.
The above is only a partial solution however!
Even with caching turned off for most of my entities, my memory usage is still constantly increasing! Below is snapshot over a 2.5 day period where the memory continuously increases from 36 MB to 81 MB. This is over the 4th of July weekend with low traffic.

What is the "soft private memory limit" in GAE?

A user of my application attempted to send a file as an email attachment using my application. However, doing so raised the following exception which I'm having trouble deciphering
Exceeded soft private memory limit with 192.023 MB after servicing
2762 requests total
While handling this request, the process that handled this request was
found to be using too much memory and was terminated. This is likely to
cause a new process to be used for the next request to your application.
If you see this message frequently, you may have a memory leak in
your application.
What is the "soft private memory limit" and what was likely to bring about this exception?

The "soft private memory limit" is the memory limit at which App Engine will stop an instance from receiving any more requests, wait for any outstanding requests, and terminate the instance. Think of it as a graceful shutdown when you're using too much memory.
Hitting the soft limit once in a while is ok since all your requests finish as they should. However, every time this happens, your next request may start up a new instance which may have a latency impact.

I assume you are using the lowest-class frontend or backend instance. (F1 or B1 class) Both have 128 MB memory quota, so your app most likely went over this quota limit. However, this quota appears to be not strictly enforced and Google have some leniency in this (thus the term soft limit), I had several F1 app instances consuming ~200MB of memory for minutes before being terminated by the App Engine.
Try increasing your instance class to the next higher-level class (F2 or B2) that have 256MB memory quota and see if the error re-occurs. Also, do some investigation whether the error is reproducible every time you send e-mail with attachments. Because it's possible that what you are seeing is the symptom but not the cause, and the part of your app that consumes lots of memory lies somewhere else.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

App Engine Deferred: Tracking Down Memory Leaks - python

Related

How to reduce memory consumption (RAM) on Python/Django project?

Is there a way to limit Ray object storage max memory usage

Why Does the Google Sheet API and Google Drive API consume so much memory upon requests [duplicate]

App Engine instance memory constantly increasing

What is the "soft private memory limit" in GAE?

Categories

Resources