I have a small python program with a footprint of 12 MB when running. The task is mostly waiting for serial data input and updating a fixed memory structure (not growing) with latest data.
The memory usage stays the same over time (taskmanager)
If I start the program in debug mode it starts up with about a 50 MB footprint but then increases memory usage with a rate of about 4 MB/sec.
Is this a normal behaviour or is there a way to stop / slow down the memory eating?
I am on w10/64, using python 3.6 and pycharm community 2018.2
Generally if additional memory is used Python will not give this back to the operating system but will retain this for later use. Generally this memory is partitioned and allocated to a pool - cPython uses these pools to later allocate the memory to objects of different sizes.
An increasing memory footprint is nothing to be worried about in Python. To find out more check this blog post by Artem Golubin: https://rushter.com/blog/python-memory-managment/
Related
My memory usage on a Django DRF API project increases over time and RAM is getting filled once I reach 50+ API calls.
So far I tried
loaded all models, class variable upfront
used memory profiler, cleaned code as possible to reduce variable usage
added garbage collection : gc.disable() at beginning and gc.enable() at end of code
added ctypes malloc.trim() at end of code etc
setting gunicorn max-requests limit ( this results in more model loading / response time at that moment)
Any suggestions on how to free up memory at the end of each request ?
Due to the way that the CPython interpreter manages memory, it very rarely actually frees any allocated memory. Generally CPython processes will keep growing and growing in memory usage
Since you are using Gunicorn you can set the max_requests setting which will regularly restart your workers and alleviate some "memory leak" issues
I'm trying to solve a multiprocessing memory leak and am trying to fully understand where the problem is. My architecture is looking for the following: A main process that delegates tasks to a few sub-processes. Right now there are only 3 sub-processes. I'm using Queues to send data to these sub-processes and it's working just fine except the memory leak.
It seems most issues people are having with memory leaks involve people either forgetting to join/exit/terminate their processes after completion. My case is a bit different. I want these processes to stay around forever for the entire duration of the application. So the main process will launch these 3 sub-processes, and they will never die until the entire app dies.
Do I still need to join them for any reason?
Is this a bad idea to keep processes around forever? Should I consider killing them and re-launching them at some point despite me not wanting to do that?
Should I not be using multiprocessing.Process for this use case?
I'm making a lot of API calls and generating a lot of dictionaries and arrays of data within my sub processes. I'm assuming my memory leak comes from not properly cleaning that up. Maybe my problem is entirely there and not related to the way I'm using multiprocessing.Process?
from multiprocessing import Process
# This is how I'm creating my 3 sub processes
procs = []
for name in names:
proc = Process(target=print_func, args=(name,))
procs.append(proc)
proc.start()
# Then I want my sub-processes to live forever for the remainder of the application's life
# But memory leaks until I run out of memory
Update 1:
I'm seeing this memory growth/leaking on MacOS 10.15.5 as well as Ubuntu 16.04. It behaves the same way in both OSs. I've tried python 3.6 and python 3.8 and have seen the same results
I never had this leak before going multiprocess. So that's why I was thinking this was related to multiprocess. So when I ran my code on one single process -> no leaking. Once I went multiprocess running the same code -> leaking/bloating memory.
The data that's actually bloating are lists of data (floats & strings). I confirmed this using the python package pympler, which is a memory profiler.
The biggest thing that changed since my multiprocess feature was added is, my data is gathered in the subprocesses then sent to the main process using Pyzmq. So I'm wondering if there are new pointers hanging around somehow preventing python from garbage collecting and fully releasing this lists of floats and strings.
I do have a feature that every ~30 seconds clears "old" data that I no longer need (since my data is time-sensitive). I'm currently investigating this to see if it's working as expected.
Update 2:
I've improved the way I'm deleting old dicts and lists. It seems to have helped but the problem still persists. The python package pympler is showing that I'm no longer leaking memory which is great. When I run it on mac, my activity monitor is showing a consistent increase of memory usage. When I run it on Ubuntu, the free -m command is also showing consistent memory bloating.
Here's what my memory looks like shortly after running the script:
ubuntu:~/Folder/$ free -m
total used free shared buff/cache available
Mem: 7610 3920 2901 0 788 3438
Swap: 0 0 0
After running for a while, memory bloats according to free -m:
ubuntu:~/Folder/$ free -m
total used free shared buff/cache available
Mem: 7610 7385 130 0 93 40
Swap: 0 0 0
ubuntu:~/Folder/$
It eventually crashes from using too much memory.
To test where the leak comes from, I've turned off my feature where my subprocess send data to my main processes via Pyzmq. So the subprocesses are still making API calls and collecting data, just not doing anything with it. The memory leak completely goes away when I do this. So clearly the process of sending data from my subprocesses and then handling the data on my main process is where the leak is happening. I'll continue to debug.
Update 3 POSSIBLY SOLVED:
I may have resolved the issue. Still testing more thoroughly. I did some extra memory clean up on my dicts and lists that contained data. I also gave my EC2 instances ~20 GB of memory. My apps memory usage timeline looks like this:
Runtime after 1 minutes: ~4 GB
Runtime after 2 minutes: ~5 GB
Runtime after 3 minutes: ~6 GB
Runtime after 5 minutes: ~7 GB
Runtime after 10 minutes: ~9 GB
Runtime after 6 hours: ~9 GB
Runtime after 10 hours: ~9 GB
What's odd is that slow increment. Based on how my code works, I don't understand how it slowly increases memory usage from minute 2 to minute 10. It should be using max memory by around minute 2 or 3. Also, previously when I was running ALL of this logic on one single process, my memory usage was pretty low. I don't recall exactly what it was, but it was much much lower than 9 GB.
I've done some reading on Pyzmq and it appears to use a ton of memory. I think the massive memory usage increase comes from Pyzmq. Since I'm using it to send a massive amount of data between processes. I've read that Pyzmq is incredibly slow to release memory from large data messages. So it's very possible that my memory leak was not really a memory leak, it was just me using way way more memory due to Pyzmq and multi-processing sending data around.. I could confirm this by running my code from before my recent changes on a machine with ~20GB of memory.
Update 4 SOLVED:
My previous theory checked out. There was never a memory leak to begin with. The usage of Pyzmq with massive amounts of data dramatically increases memory usage to the point to where I had to ~6x my memory on my EC2 instance. So Pyzmq seems to either use a ton of memory or be very slow at releasing memory or both. Regardless, this has been resolved.
Given that you are on Linux, I'd suggest using https://github.com/vmware/chap to understand why the processes are growing.
To do that, first use ps to figure out the process IDs for each of your processes (the main and the child processes) then use "gcore " for each process to gather a live core. Gather cores again for each process after they have grown a bit.
For each core, you can open it in chap and use the following commands:
redirect on
describe used
The result will be files named like the original cores, followed by ".describe_used".
You can compare them to see which allocations are new.
Once you have identified some interesting new allocations for a process, try using "describe incoming" repeatedly from the chap prompt until you have seen how those allocations are used.
I'm having issues in K8s with memory allocation. Pod gets killed by OMM Killer, because of strictly defined limits, let's say 1GB.
How python 3.7 (python memory manager) allocates memory? From the pod memory graph I can assume that it allocates double amount of memory for heap.
Linux:
You can limit the used resources from inside your program - not sure if it helps you though:
resources.setrlimit
Sets new limits of consumption of resource. The limits argument must be a tuple (soft, hard)
of two integers describing the new limits. A value of RLIM_INFINITY can be used to request a limit that is unlimited.
For windows this might help: Limit python script RAM usage in Windows
Edit: Post for limiting under linux: Limit RAM usage to python program
I have a 700MB SQLite3 database that I'm reading/writing to with a simple Python program.
I'm trying to gauge the memory usage of the program as it operates on the database. I've used these methods:
use Python's memory_profiler to measure memory usage for the main loop function which runs all the insert/selects
use Python's psutil to measure peak memory usage during execution
manually watching the memory usage via top/htop
The first two support the conclusion it uses no more than 20MB at any given time. I can start with an empty database and fill it up with 700MB of data and it remains under 20MB:
Memory profiler's figure never went above 15.805MiB:
Line # Mem usage Increment Line Contents
================================================
...
229 13.227 MiB 0.000 MiB #profile
230 def loop(self):
231 """Loop to record DB entries"""
234 15.805 MiB 2.578 MiB for ev in range(self.numEvents):
...
pstuil said peak usage was 16.22265625MB
Now top/htop is a little weirder. Both said that the python process's memory usage wasn't above 20MB, but I could also clearly see the free memory steadily decreasing as it filled up the database via the used number:
Mem: 4047636k total, 529600k used, 3518036k free, 83636k buffers
My questions:
is there any "hidden" memory usage? Does Python call libsqlite in such a way that it might use memory on its own that isn't reported as belonging to Python either via psutil or top?
is the above method sound for determining the memory usage of a program interacting with the database? Especially top: is top reliable for measuring memory usage of a single process?
is it more or less true that a process interacting with a SQLite database doesn't need to load any sizeable part of it into memory in order to operate on it?
Regarding the last point, my ultimate objective is to use a rather large SQLite database of unknown size on an embedded system with limited RAM and I would like to know if it's true that that memory usage is more or less constant regardless of the size of the database.
SQLite's memory usage doesn't depend on the size of the database; SQLite can handle terabyte-sized databases just fine, and it only loads the parts of the database that it needs (plus a small, configurable-sized cache).
SQLite should be fine on embedded systems; that's originally what it's designed for.
I have a Twisted server under load. When the server is under load, memory usage increases, and it is never reclaimed (even when there are no more clients). Next time it goes into high load, memory usage increases again. Here's a snapshot of the situation at that point:
RSS memory is 400 MB (should be 200MB with usual max number of clients).
gc.garbage is empty, so there are no uncollectable objects.
Using objgraph.py shows no obvious candidates for leaks (no notable difference between a normal, healthy process and a leaking process).
Using pympler shows a few tens of MB (only) used by Python objects (mostly dict, list, str and other native containers).
Valgrind with leak-check=full enabled doesn't show any major leaks (only couple of MBs 'definitively lost') - so C extensions are not the culprit. The total memory also doesn't add up with the 400MB+ shown by top:
==23072== HEAP SUMMARY:
==23072== in use at exit: 65,650,760 bytes in 463,153 blocks
==23072== total heap usage: 124,269,475 allocs, 123,806,322 frees, 32,660,215,602 bytes allocated
The only explanation I can find is that some objects are not tracked by the garbage collector, so that they are not shown by objgraph and pympler, yet use an enormous amount of RAM.
What other tools or solutions do I have? Would compiling the Python interpreter in debug mode help, by using sys.getobjects?
If the code is only leaking under load (did you verify this?), I'd have a look at all spots where messages are buffered. Does the memory usage of the process itself increase? Or does the memory use of the system increase? If it's the latter case, your server might simply be too slow to keep up with the incoming messages and the OS buffer fill up..