I'd expect the memory usage of my app engine instances (Python) to be relatively flat after an initial startup period. Each request to my app is short lived, and it seems all memory usage of single request should be released shortly afterwards.
This is not the case in practice however. Below is a snapshot of instance memory usage provided by the console. My app has relatively low traffic so I generally have only one instance running. Over the two-day period in the graph, the memory usage trend is constantly increasing. (The two blips are where two instances were briefly running.)
I regularly get memory exceeded errors so I'd like to prevent this continuous increase of memory usage.
At the time of the snapshot:
Memcache is using less than 1MB
Task queues are empty
Traffic is low (0.2 count/second)
I'd expect the instance memory usage to fall in these circumstances, but it isn't happening.
Because I'm using Python with its automatic garbage collection, I don't see how I could have caused this.
Is this expected app engine behavior and is there anything I can do to fix it?
I found another answer that explains part of what is going on here. I'll give a summary based on that answer:
When using NDB, entities are stored in a context cache, and the context cache is part of your memory usage.
From the documentation, one would expect that memory to be released upon the completion of an HTTP request.
In practice, the memory is not released upon the completion of the HTTP request. Apparently, context caches are reused, and the cache is cleared before its next use, which can take a long time to happen.
For my situation, I am adding _use_cache=False to most entities to prevent them from being stored in the context cache. Because of the way my app works, I don't need the context caches for these entities, and this reduces memory usage.
The above is only a partial solution however!
Even with caching turned off for most of my entities, my memory usage is still constantly increasing! Below is snapshot over a 2.5 day period where the memory continuously increases from 36 MB to 81 MB. This is over the 4th of July weekend with low traffic.
Related
My memory usage on a Django DRF API project increases over time and RAM is getting filled once I reach 50+ API calls.
So far I tried
loaded all models, class variable upfront
used memory profiler, cleaned code as possible to reduce variable usage
added garbage collection : gc.disable() at beginning and gc.enable() at end of code
added ctypes malloc.trim() at end of code etc
setting gunicorn max-requests limit ( this results in more model loading / response time at that moment)
Any suggestions on how to free up memory at the end of each request ?
Due to the way that the CPython interpreter manages memory, it very rarely actually frees any allocated memory. Generally CPython processes will keep growing and growing in memory usage
Since you are using Gunicorn you can set the max_requests setting which will regularly restart your workers and alleviate some "memory leak" issues
Some observations on Heroku that don't completely mesh with my mental model.
My understanding is that CPython will never release memory once it has been allocated by the OS. So we should never observe a decrease in resident memory of CPython processes. And this is in fact my observation from occasionally profiling my Django application on Heroku; sometimes the resident memory will increase, but it will never decrease.
However, sometimes Heroku will alert me that my worker dyno is using >100% of its memory quota. This generally happens when a long-running response-data-heavy HTTPS request that I make to an external service (using the requests library) fails due to a server-side timeout. In this case, memory usage will spike way past 100%, then gradually drop back to less than 100% of quota, when the alarm ceases.
My question is, how is this memory released back to the OS? AFAIK it can't be CPython releasing it. My guess is that the incoming bytes from the long-running TCP connection are being buffered by the OS, which has the power to de-allocate. It's murky to me when exactly "ownership" of TCP bytes is transferred to my Django app. I'm certainly not explicitly reading lines from the input stream, I delegate all of that to requests.
Apparently, at one time, CPython did NOT ever release memory back to the OS. Then a patch was introduced in Python 2.5 that allowed memory to be released under certain circumstances, detailed here. Therefore it's no longer true to say that python doesn't release memory; it's just that python doesn't often release memory, because it doesn't handle memory fragmentation very well.
At a high-level, python keeps track of its memory in 256K blocks called arenas. Object pools are held in these arenas. Python is smart enough to free the arenas back to the OS when they're empty, but it still doesn't handle fragmentation across the arenas very well.
In my particular circumstance, I was reading large HTTP responses. If you dig down the code chain starting with HttpAdapter.send() in the requests library, you'll eventually find that socket.read() in the python socket library is making a system call to receive from its socket in chunks of 8192 bytes (default buffer size). This is the point at which the OS copies bytes from the kernel to the process, where they will be designated by CPython as string objects of size 8K and shoved into an arena. Note that StringIO, which is the python-land buffer object for sockets, simply keeps a list of these 8K strings rather than mushing them together into a super-string object.
Since 8K fits precisely 32 times into 256K, I think what is happening is that the received bytes are nicely filling up entire arenas without much fragmentation. These arenas can then be freed to the OS when the 8K strings filling them are deleted.
I think I understand why the memory is released gradually (asynchronous garbage collection?), but I still don't understand why it takes so long to release after a connection error. If the memory release always took so long, I should be seeing these memory usage errors all the time, because my python memory usage should spike whenever one of these calls are made. I've checked my logs, and I can sometimes see these violations last for minutes. Seems like an insanely long interval for memory release.
Edit: I have a solid theory on the issue now. This error is being reported to me by a logging system that keeps a reference to the last traceback. The traceback maintains a reference to all the variables in the frames of the traceback, including the StringIO buffer, which in turn holds references to all the 8K-strings read from socket. See the note under sys.exc_clear(): This function is only needed in only a few obscure situations. These include logging and error handling systems that report information on the last or current exception.
Therefore, in exception cases, the 8K-string ref counts don't drop to zero and immediately empty their arenas as they would in the happy path; we have to wait for background garbage collection to detect their reference cycles.
The GC delay is compounded by the fact that when this exception occurs, lots of objects are allocated over 5 minutes until timeout, which I'm guessing is plenty of time for lots of the 8K-strings to make it into the 2nd generation. With default GC thresholds of (700, 10, 10), it would take roughly 700*10 allocations for string objects to make it into the 2nd generation. That comes out to 7000*8192 ~= 57MB, which means that all the strings received before the last 57MB of the bytestream make it into the 2nd gen, maybe even 3rd gen if 570MB is streamed (but that seems high).
Intervals on the order of minutes still seem awfully long for garbage collection of the 2nd generation, but I guess it's possible. Recall that GC isn't triggered only by allocations, the formula is actually trigger == (allocations - deallocations > threshold).
TL;DR Large responses fill up socket buffers that fill up arenas without much fragmentation, allowing Python to actually release their memory back to the OS. In unexceptional cases, this memory will be released immediately upon exit of whatever context referenced the buffers, because the ref count on the buffers will drop to zero, triggering an immediate reclamation. In exceptional cases, as long as the traceback is alive, the buffers will still be referenced, therefore we will have to wait for garbage collection to reclaim them. If the exception occurred in the middle of a connection and a lot of data was already transmitted, then by the time of the exception many buffers will have been classified as members of an elder generation, and we will have to wait even longer for garbage collection to reclaim them.
CPython will release memory, but it's a bit murky.
CPython allocates chunks of memory at a time, lets call them fields.
When you instantiate an object, CPython will use blocks of memory from an existing field if possible; possible in that there's enough contagious blocks for said object.
If there's not enough contagious blocks, it'll allocate a new field.
Here's where it gets murky.
A Field is only freed when it contains zero objects, and while there's garbage collection in CPython, there's no "trash compactor". So if you have a couple objects in a few fields, and each field is only 70% full, CPython wont move those objects all together and free some fields.
It seems pretty reasonable that the large data chunk you're pulling from the HTTP call is getting allocated to "new" fields, but then something goes sideways, the object's reference count goes to zero, then garbage collection runs and returns those fields to the OS.
We have an App Engine application that writes many files of a relatively large size to Google Cloud Store. These files are CSVs that are dynamically created, so we use Python's StringIO.StringIO as a buffer and csv.writer as the interface for writing to that buffer.
In general, our process looks like this:
# imports as needed
# (gcs is the Google Cloud Store client)
buffer = StringIO.StringIO()
writer = csv.writer(buffer)
# ...
# write some rows
# ...
data = file_buffer.getdata()
filename = 'someFilename.csv'
try:
with gcs.open(filename, content_type='text/csv', mode='w') as file_stream:
file_stream.write(data)
file_stream.close()
except Exception, e:
# handle exception
finally:
file_buffer.close()
As we understand it, the csv.writer does not need to be closed itself. Rather, only the buffer above and the file_stream need be closed.
We run the above process in a deferred, invoked by App Engine's task queue. Ultimately, we get the following error after a few invocations of our task:
Exceeded soft private memory limit of 128 MB with 142 MB after servicing 11 requests total
Clearly, then, there is a memory leak in our application. However, if the above code is correct (which we admit may not be the case), then our only other idea is that some large amount of memory is being held through the servicing of our requests (as the error message suggests).
Thus, we are wondering if some entities are kept by App Engine during the execution of a deferred. We should also note that our CSVs are ultimately written successfully, despite these error messages.
The symptom described isn't necessarily an indication of an application memory leak. Potential alternate explanations include:
the app's baseline memory footprint (which for the scripting-language sandboxes like python can be bigger than the footprint at the instance startup time, see Memory usage differs greatly (and strangely) between frontend and backend) may be too high for the instance class configured for the app/module. To fix - chose a higher memory instance class (which, as a side effect, also means a faster class instance). Alternatively, if the rate of instance killing due to exceeding memory limits is tolerable, just let GAE recycle the instances :)
peaks of activity, especially if multi-threaded request handling is enabled, means higher memory consumption and also potential overloading of the memory garbage collector. Limiting the number of requests performed in parallel, adding (higher) delays in lower priority deferred task processing and other similar measures reducing the average request processing rate per instance can help give the garbage collector a chance to cleanup leftovers from requests. Scalability should not be harmed (with dynamic scaling) as other instances would be started to help with the activity peak.
Related Q&As:
How does app engine (python) manage memory across requests (Exceeded soft private memory limit)
Google App Engine DB Query Memory Usage
Memory leak in Google ndb library
I'm having strange patterns of memory consumption in GAE using Python. I'm monitoring the memory use in every request at the very beginning and at the very end using google.appengine.api.runtime.memory_usage().current(). I have a request that at the begin and end uses 42MB and the next request, 3 minutes later, started with 117MB of memory usage, and ended with the same 117MB.
My question is what happened between the two only request of the only one instance used that caused a 75MB memory extra usage?
I'm looking for a memory profiler that let me go deep in an instance and see how is the memory being used, by what global variables, what processes, code, modules imported, and so on.
In this case the normal memory profiler tools doesn't help because the extra memory usage occurs outside a request, so I'm thinking in connect to the instance using the remote_api_shell and from them debug/profile the memory.
If anyone can help me or have experienced similar memory strange comsumptions and solutions I will appreciate.
A user of my application attempted to send a file as an email attachment using my application. However, doing so raised the following exception which I'm having trouble deciphering
Exceeded soft private memory limit with 192.023 MB after servicing
2762 requests total
While handling this request, the process that handled this request was
found to be using too much memory and was terminated. This is likely to
cause a new process to be used for the next request to your application.
If you see this message frequently, you may have a memory leak in
your application.
What is the "soft private memory limit" and what was likely to bring about this exception?
The "soft private memory limit" is the memory limit at which App Engine will stop an instance from receiving any more requests, wait for any outstanding requests, and terminate the instance. Think of it as a graceful shutdown when you're using too much memory.
Hitting the soft limit once in a while is ok since all your requests finish as they should. However, every time this happens, your next request may start up a new instance which may have a latency impact.
I assume you are using the lowest-class frontend or backend instance. (F1 or B1 class) Both have 128 MB memory quota, so your app most likely went over this quota limit. However, this quota appears to be not strictly enforced and Google have some leniency in this (thus the term soft limit), I had several F1 app instances consuming ~200MB of memory for minutes before being terminated by the App Engine.
Try increasing your instance class to the next higher-level class (F2 or B2) that have 256MB memory quota and see if the error re-occurs. Also, do some investigation whether the error is reproducible every time you send e-mail with attachments. Because it's possible that what you are seeing is the symptom but not the cause, and the part of your app that consumes lots of memory lies somewhere else.