How big is the overhead of tracemalloc?

How big is the overhead of tracemalloc? - python

I would like to log in a production system the current memory usage of a Python script. AWS has Container Insights, but they are extremely well-hidden and I'm not sure how to use them properly within other dashboards / logging- and altering systems. Also, I'm not certain if the log peak memory at all.
The Python script is the production system. It is running on AWS within a Docker container and I ran into issues with a previous approach (link).
tracemalloc seems to be able to to give me the information I want:
# At the start of the script
import tracemalloc
tracemalloc.start()
# script running...
# At the end
current, peak = tracemalloc.get_traced_memory()
logger.info(f"Current memory usage is {current / 10**6} MB")
logger.info(f"Peak memory usage was {peak / 10**6} MB")
tracemalloc.stop()
However, the docs state:
The tracemalloc module is a debug tool
So would it be a bad idea to wrap this around production code? How much overhead is it? Are there other reasons not to use that in production?
(I have a pretty good idea of which parts of the code need most memory and where the peak memory is reached. I want to monitor that part (or maybe rather the size of those few objects / few lines of code). The alternative to tracemalloc seems to be to use something like this)

I've been trying to answer the same question. The best answer I've found is from
https://www.mail-archive.com/python-list#python.org/msg443129.html
which quotes a factor of 3-4 increase memory usage with tracemalloc based on a simple experiment.

Related

Debug IO stall (state D) in parallelized python program, low CPU high memory

I have a Python program parallelized with joblib.Parallel,
however, as you can see in this top screenshot,
each process is using much less than 100% of the CPU, and the process state is "D", i.e., waiting for IO.
The program runs a function once for each of 10000 (very small) datasets. Each such function execution takes a few minutes, and besides doing calculations, it queries a sqlite database via sqlalchemy (reading only), loading in quite a bit of memory.
I suspect that the memory loading and perhaps even leaking may cause the slow-down,
but it may also be from other parts of the program.
Is there any way to get the python function stack where the IO is stalling, when running parallelized?
For CPU profiling, I usually use cProfile. However, here I need to understand memory issues and IO blocking. A further issue is that these issues do not occur when I run only one process, so I need some method that can deal with multithreading.
For memory profiling, I see from other questions that there are object counting tools and allocation trackers such as guppy3 and heapy. However, here I think a stacktrace would be more helpful (which part of the code is stalling / memory-heavy) than what object it is.

tracemalloc can show stack traces.
Probably something like the following could work:
import tracemalloc
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()
# ... call the function leaking memory ...
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'traceback')
stat = top_stats[0]
print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
for line in stat.traceback.format():
print(line)

How to reduce memory consumption (RAM) on Python/Django project?

My memory usage on a Django DRF API project increases over time and RAM is getting filled once I reach 50+ API calls.
So far I tried
loaded all models, class variable upfront
used memory profiler, cleaned code as possible to reduce variable usage
added garbage collection : gc.disable() at beginning and gc.enable() at end of code
added ctypes malloc.trim() at end of code etc
setting gunicorn max-requests limit ( this results in more model loading / response time at that moment)
Any suggestions on how to free up memory at the end of each request ?

Due to the way that the CPython interpreter manages memory, it very rarely actually frees any allocated memory. Generally CPython processes will keep growing and growing in memory usage
Since you are using Gunicorn you can set the max_requests setting which will regularly restart your workers and alleviate some "memory leak" issues

Spark+Python set GC memory threshold

I'm trying to run a Python worker (PySpark app) which is using too much memory and my app is getting killed my YARN because of exceeding memory limits (I'm trying to lower memory usage in order to being able to spawn more workers).
I come from Java/Scala, so Python GC works similar than JVM in my head...
Is there a way to tell Python what's the amount of "available memory" it has? I mean, Java GCs when your heap size is almost-full. I want to perform the same operation on Python, so yarn doesn't kill my application because of using too much memory when that memory is garbage (I'm on Python3.3 and there are memory references # my machine).
I've seen resource hard and soft limits, but no documentation say if GCs trigger on them or not. AFAIK nothing triggers GCs by memory usage, does any1 know a way to do so?
Thanks,

CPython (I assume this is the one you use) is significantly different compared to Java. The main garbage collecting method is reference counting. Unless you deal with circular references (IMHO it is not common in normal PySpark workflows) you won't need full GC sweeps at all (data related objects should be collected once data is spilled / pickled).
Spark is also known to kill idle Python workers, even if you enable reuse option, so quite often it skips GC completely.
You can control CPython garbage collecting behavior using set_threshold method:
gc.set_threshold(threshold0[, threshold1[, threshold2]]
or trigger GC sweep manually with collect:
gc.collect(generation=2)
but in my experience most of the GC problems in PySpark come from JVM part, not Python.

App Engine instance memory constantly increasing

I'd expect the memory usage of my app engine instances (Python) to be relatively flat after an initial startup period. Each request to my app is short lived, and it seems all memory usage of single request should be released shortly afterwards.
This is not the case in practice however. Below is a snapshot of instance memory usage provided by the console. My app has relatively low traffic so I generally have only one instance running. Over the two-day period in the graph, the memory usage trend is constantly increasing. (The two blips are where two instances were briefly running.)
I regularly get memory exceeded errors so I'd like to prevent this continuous increase of memory usage.
At the time of the snapshot:
Memcache is using less than 1MB
Task queues are empty
Traffic is low (0.2 count/second)
I'd expect the instance memory usage to fall in these circumstances, but it isn't happening.
Because I'm using Python with its automatic garbage collection, I don't see how I could have caused this.
Is this expected app engine behavior and is there anything I can do to fix it?

I found another answer that explains part of what is going on here. I'll give a summary based on that answer:
When using NDB, entities are stored in a context cache, and the context cache is part of your memory usage.
From the documentation, one would expect that memory to be released upon the completion of an HTTP request.
In practice, the memory is not released upon the completion of the HTTP request. Apparently, context caches are reused, and the cache is cleared before its next use, which can take a long time to happen.
For my situation, I am adding _use_cache=False to most entities to prevent them from being stored in the context cache. Because of the way my app works, I don't need the context caches for these entities, and this reduces memory usage.
The above is only a partial solution however!
Even with caching turned off for most of my entities, my memory usage is still constantly increasing! Below is snapshot over a 2.5 day period where the memory continuously increases from 36 MB to 81 MB. This is over the 4th of July weekend with low traffic.

GAE python Memory profiler / memory use

I'm having strange patterns of memory consumption in GAE using Python. I'm monitoring the memory use in every request at the very beginning and at the very end using google.appengine.api.runtime.memory_usage().current(). I have a request that at the begin and end uses 42MB and the next request, 3 minutes later, started with 117MB of memory usage, and ended with the same 117MB.
My question is what happened between the two only request of the only one instance used that caused a 75MB memory extra usage?
I'm looking for a memory profiler that let me go deep in an instance and see how is the memory being used, by what global variables, what processes, code, modules imported, and so on.
In this case the normal memory profiler tools doesn't help because the extra memory usage occurs outside a request, so I'm thinking in connect to the instance using the remote_api_shell and from them debug/profile the memory.
If anyone can help me or have experienced similar memory strange comsumptions and solutions I will appreciate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How big is the overhead of tracemalloc? - python

I've been trying to answer the same question. The best answer I've found is from https://www.mail-archive.com/python-list#python.org/msg443129.html which quotes a factor of 3-4 increase memory usage with tracemalloc based on a simple experiment.

Related

Debug IO stall (state D) in parallelized python program, low CPU high memory

How to reduce memory consumption (RAM) on Python/Django project?

Spark+Python set GC memory threshold

App Engine instance memory constantly increasing

GAE python Memory profiler / memory use

Categories

Resources