GAE python Memory profiler / memory use

GAE python Memory profiler / memory use - python

I'm having strange patterns of memory consumption in GAE using Python. I'm monitoring the memory use in every request at the very beginning and at the very end using google.appengine.api.runtime.memory_usage().current(). I have a request that at the begin and end uses 42MB and the next request, 3 minutes later, started with 117MB of memory usage, and ended with the same 117MB.
My question is what happened between the two only request of the only one instance used that caused a 75MB memory extra usage?
I'm looking for a memory profiler that let me go deep in an instance and see how is the memory being used, by what global variables, what processes, code, modules imported, and so on.
In this case the normal memory profiler tools doesn't help because the extra memory usage occurs outside a request, so I'm thinking in connect to the instance using the remote_api_shell and from them debug/profile the memory.
If anyone can help me or have experienced similar memory strange comsumptions and solutions I will appreciate.

Related

How to reduce memory consumption (RAM) on Python/Django project?

My memory usage on a Django DRF API project increases over time and RAM is getting filled once I reach 50+ API calls.
So far I tried
loaded all models, class variable upfront
used memory profiler, cleaned code as possible to reduce variable usage
added garbage collection : gc.disable() at beginning and gc.enable() at end of code
added ctypes malloc.trim() at end of code etc
setting gunicorn max-requests limit ( this results in more model loading / response time at that moment)
Any suggestions on how to free up memory at the end of each request ?

Due to the way that the CPython interpreter manages memory, it very rarely actually frees any allocated memory. Generally CPython processes will keep growing and growing in memory usage
Since you are using Gunicorn you can set the max_requests setting which will regularly restart your workers and alleviate some "memory leak" issues

Multiprocessing memory leak, processes that stay around forever

I'm trying to solve a multiprocessing memory leak and am trying to fully understand where the problem is. My architecture is looking for the following: A main process that delegates tasks to a few sub-processes. Right now there are only 3 sub-processes. I'm using Queues to send data to these sub-processes and it's working just fine except the memory leak.
It seems most issues people are having with memory leaks involve people either forgetting to join/exit/terminate their processes after completion. My case is a bit different. I want these processes to stay around forever for the entire duration of the application. So the main process will launch these 3 sub-processes, and they will never die until the entire app dies.
Do I still need to join them for any reason?
Is this a bad idea to keep processes around forever? Should I consider killing them and re-launching them at some point despite me not wanting to do that?
Should I not be using multiprocessing.Process for this use case?
I'm making a lot of API calls and generating a lot of dictionaries and arrays of data within my sub processes. I'm assuming my memory leak comes from not properly cleaning that up. Maybe my problem is entirely there and not related to the way I'm using multiprocessing.Process?
from multiprocessing import Process
# This is how I'm creating my 3 sub processes
procs = []
for name in names:
proc = Process(target=print_func, args=(name,))
procs.append(proc)
proc.start()
# Then I want my sub-processes to live forever for the remainder of the application's life
# But memory leaks until I run out of memory
Update 1:
I'm seeing this memory growth/leaking on MacOS 10.15.5 as well as Ubuntu 16.04. It behaves the same way in both OSs. I've tried python 3.6 and python 3.8 and have seen the same results
I never had this leak before going multiprocess. So that's why I was thinking this was related to multiprocess. So when I ran my code on one single process -> no leaking. Once I went multiprocess running the same code -> leaking/bloating memory.
The data that's actually bloating are lists of data (floats & strings). I confirmed this using the python package pympler, which is a memory profiler.
The biggest thing that changed since my multiprocess feature was added is, my data is gathered in the subprocesses then sent to the main process using Pyzmq. So I'm wondering if there are new pointers hanging around somehow preventing python from garbage collecting and fully releasing this lists of floats and strings.
I do have a feature that every ~30 seconds clears "old" data that I no longer need (since my data is time-sensitive). I'm currently investigating this to see if it's working as expected.
Update 2:
I've improved the way I'm deleting old dicts and lists. It seems to have helped but the problem still persists. The python package pympler is showing that I'm no longer leaking memory which is great. When I run it on mac, my activity monitor is showing a consistent increase of memory usage. When I run it on Ubuntu, the free -m command is also showing consistent memory bloating.
Here's what my memory looks like shortly after running the script:
ubuntu:~/Folder/$ free -m
total used free shared buff/cache available
Mem: 7610 3920 2901 0 788 3438
Swap: 0 0 0
After running for a while, memory bloats according to free -m:
ubuntu:~/Folder/$ free -m
total used free shared buff/cache available
Mem: 7610 7385 130 0 93 40
Swap: 0 0 0
ubuntu:~/Folder/$
It eventually crashes from using too much memory.
To test where the leak comes from, I've turned off my feature where my subprocess send data to my main processes via Pyzmq. So the subprocesses are still making API calls and collecting data, just not doing anything with it. The memory leak completely goes away when I do this. So clearly the process of sending data from my subprocesses and then handling the data on my main process is where the leak is happening. I'll continue to debug.
Update 3 POSSIBLY SOLVED:
I may have resolved the issue. Still testing more thoroughly. I did some extra memory clean up on my dicts and lists that contained data. I also gave my EC2 instances ~20 GB of memory. My apps memory usage timeline looks like this:
Runtime after 1 minutes: ~4 GB
Runtime after 2 minutes: ~5 GB
Runtime after 3 minutes: ~6 GB
Runtime after 5 minutes: ~7 GB
Runtime after 10 minutes: ~9 GB
Runtime after 6 hours: ~9 GB
Runtime after 10 hours: ~9 GB
What's odd is that slow increment. Based on how my code works, I don't understand how it slowly increases memory usage from minute 2 to minute 10. It should be using max memory by around minute 2 or 3. Also, previously when I was running ALL of this logic on one single process, my memory usage was pretty low. I don't recall exactly what it was, but it was much much lower than 9 GB.
I've done some reading on Pyzmq and it appears to use a ton of memory. I think the massive memory usage increase comes from Pyzmq. Since I'm using it to send a massive amount of data between processes. I've read that Pyzmq is incredibly slow to release memory from large data messages. So it's very possible that my memory leak was not really a memory leak, it was just me using way way more memory due to Pyzmq and multi-processing sending data around.. I could confirm this by running my code from before my recent changes on a machine with ~20GB of memory.
Update 4 SOLVED:
My previous theory checked out. There was never a memory leak to begin with. The usage of Pyzmq with massive amounts of data dramatically increases memory usage to the point to where I had to ~6x my memory on my EC2 instance. So Pyzmq seems to either use a ton of memory or be very slow at releasing memory or both. Regardless, this has been resolved.

Given that you are on Linux, I'd suggest using https://github.com/vmware/chap to understand why the processes are growing.
To do that, first use ps to figure out the process IDs for each of your processes (the main and the child processes) then use "gcore " for each process to gather a live core. Gather cores again for each process after they have grown a bit.
For each core, you can open it in chap and use the following commands:
redirect on
describe used
The result will be files named like the original cores, followed by ".describe_used".
You can compare them to see which allocations are new.
Once you have identified some interesting new allocations for a process, try using "describe incoming" repeatedly from the chap prompt until you have seen how those allocations are used.

How big is the overhead of tracemalloc?

I would like to log in a production system the current memory usage of a Python script. AWS has Container Insights, but they are extremely well-hidden and I'm not sure how to use them properly within other dashboards / logging- and altering systems. Also, I'm not certain if the log peak memory at all.
The Python script is the production system. It is running on AWS within a Docker container and I ran into issues with a previous approach (link).
tracemalloc seems to be able to to give me the information I want:
# At the start of the script
import tracemalloc
tracemalloc.start()
# script running...
# At the end
current, peak = tracemalloc.get_traced_memory()
logger.info(f"Current memory usage is {current / 10**6} MB")
logger.info(f"Peak memory usage was {peak / 10**6} MB")
tracemalloc.stop()
However, the docs state:
The tracemalloc module is a debug tool
So would it be a bad idea to wrap this around production code? How much overhead is it? Are there other reasons not to use that in production?
(I have a pretty good idea of which parts of the code need most memory and where the peak memory is reached. I want to monitor that part (or maybe rather the size of those few objects / few lines of code). The alternative to tracemalloc seems to be to use something like this)

I've been trying to answer the same question. The best answer I've found is from
https://www.mail-archive.com/python-list#python.org/msg443129.html
which quotes a factor of 3-4 increase memory usage with tracemalloc based on a simple experiment.

Spark+Python set GC memory threshold

I'm trying to run a Python worker (PySpark app) which is using too much memory and my app is getting killed my YARN because of exceeding memory limits (I'm trying to lower memory usage in order to being able to spawn more workers).
I come from Java/Scala, so Python GC works similar than JVM in my head...
Is there a way to tell Python what's the amount of "available memory" it has? I mean, Java GCs when your heap size is almost-full. I want to perform the same operation on Python, so yarn doesn't kill my application because of using too much memory when that memory is garbage (I'm on Python3.3 and there are memory references # my machine).
I've seen resource hard and soft limits, but no documentation say if GCs trigger on them or not. AFAIK nothing triggers GCs by memory usage, does any1 know a way to do so?
Thanks,

CPython (I assume this is the one you use) is significantly different compared to Java. The main garbage collecting method is reference counting. Unless you deal with circular references (IMHO it is not common in normal PySpark workflows) you won't need full GC sweeps at all (data related objects should be collected once data is spilled / pickled).
Spark is also known to kill idle Python workers, even if you enable reuse option, so quite often it skips GC completely.
You can control CPython garbage collecting behavior using set_threshold method:
gc.set_threshold(threshold0[, threshold1[, threshold2]]
or trigger GC sweep manually with collect:
gc.collect(generation=2)
but in my experience most of the GC problems in PySpark come from JVM part, not Python.

App Engine instance memory constantly increasing

I'd expect the memory usage of my app engine instances (Python) to be relatively flat after an initial startup period. Each request to my app is short lived, and it seems all memory usage of single request should be released shortly afterwards.
This is not the case in practice however. Below is a snapshot of instance memory usage provided by the console. My app has relatively low traffic so I generally have only one instance running. Over the two-day period in the graph, the memory usage trend is constantly increasing. (The two blips are where two instances were briefly running.)
I regularly get memory exceeded errors so I'd like to prevent this continuous increase of memory usage.
At the time of the snapshot:
Memcache is using less than 1MB
Task queues are empty
Traffic is low (0.2 count/second)
I'd expect the instance memory usage to fall in these circumstances, but it isn't happening.
Because I'm using Python with its automatic garbage collection, I don't see how I could have caused this.
Is this expected app engine behavior and is there anything I can do to fix it?

I found another answer that explains part of what is going on here. I'll give a summary based on that answer:
When using NDB, entities are stored in a context cache, and the context cache is part of your memory usage.
From the documentation, one would expect that memory to be released upon the completion of an HTTP request.
In practice, the memory is not released upon the completion of the HTTP request. Apparently, context caches are reused, and the cache is cleared before its next use, which can take a long time to happen.
For my situation, I am adding _use_cache=False to most entities to prevent them from being stored in the context cache. Because of the way my app works, I don't need the context caches for these entities, and this reduces memory usage.
The above is only a partial solution however!
Even with caching turned off for most of my entities, my memory usage is still constantly increasing! Below is snapshot over a 2.5 day period where the memory continuously increases from 36 MB to 81 MB. This is over the 4th of July weekend with low traffic.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.