Py_EndInterpreter in Python 3 is very slow - python

The application I am working on recently migrated from embedding python 2.7 to python 3.8
We noticed a significant slowdown when calling Py_EndInterpreter in Python 3.8, when using many sub-interpreters.
Looking at the CPU usage I can see that all the time is spent doing garbage collection.
Py_EndInterpreter -> PyImport_Cleanup -> _PyGC_CollectNoFail -> collect.
99% of the CPU time is spent in the collect method of _PyGC_CollectNoFail
Calling Py_EndInterprter when there are 500 sub-interpreters, each call to Py_EndInterpreter takes 2seconds!! For a total of ~3minutes to end the 500 sub-interpreters.
Comparatively in python 2.7 each call to Py_EndInterpreter takes 1 or 2ms independently of how many subinterpreters are alive, for a total of ~500ms to close all sub-interpreters.
When using few sub-interpreters (less than 20), the performance is almost identical between python 2.7 and 3.8.
I tried looking at other applications using many sub-interpreters but it seems like a very rare use case and could not find anyone else having the same issue.
Is there anyone else using many sub-interpreters having similar troubles?
It seems that currently my options are
Take the performance hit...
Leak a bunch of memory and not call Py_EndInterpreter
Fundamentally change how my application embeds python and not use sub-interpreters
??

Related

Python script taking several hundred MB of memory

I have a Python script that runs for about 10 minutes.
As it runs, it gradually consumes RAM. It starts out with about 77MB then ends with 216MB.
It doesn't really do anything that memory intensive. It does create a large number of ActiveRecord object, some of which form large graphs with other ActiveRecord objects. The graphs can contain several dozen nodes. However, I fully expect gc to detect any cyclic references and free them as the graphs go out of scope, which they do periodically throughout the duration of the script's execution.
When I use gc.get_objects, I see that, after I run gc.collect(), there are always 0 objects in generations 0 and 1. However, generation 2 has 105K object when things get going, and ends with about 1.1 million objects when the script finishes. This causes problems, since I am using a small machine with only about 1GB of RAM. (I don't mean to imply I understand Python's memory management very well. Pretty much all I know about it has come from the time spent trying to figure out this problem).
Most of the script is pure Python, though it does use the MySQLdb library to interface the ActiveRecord objects with MySQL.
Should I consider this behavior normal for Python and just buy more RAM? Or should I view this as a memory leak. As time progresses, the script will get bigger and take more time to run, so I'm concerned that I will just have to buy more and more RAM as time goes on.

Profiling Django Startup Memory Footprint

I'm trying to optimize my Django 1.8 project's memory usage, and I have noticed that even from initialization, it's using 80+MB, which seems excessive. I observe this when I simply run ./manage shell --plain. By contrast, an empty Django project on startup uses only 30MB. I want to know what's consuming so much memory.
I have tried various things to reduce memory consumption, including a fair amount of stripping down of the project by removing its apps and URLs. I tried poking around gc.get_objects but it's not comprehensible. I got excited about tracemalloc so I build a custom Python 2.7.8 with tracemalloc included, only to realize that it won't start tracking until I call start() from the prompt, at which point the memory has already been consumed.
Questions:
What kinds of things could be causing this high memory floor?
What process can I use to determine the consumer?
And yes, I realize the versions badly need upgrades. Thanks!
UPDATE #1
I did manage to get some use of out tracemalloc. I inserted the import and start at the beginning of manage.py.
/Users/john/.venv/proj-tracemalloc/lib/python2.7/site-packages/zinnia/comparison.py:19: size=25.0 MiB (+25.0 MiB), count=27168 (+27168), average=993 B
/usr/local/tracemalloc-py2.7.8/py27/lib/python2.7/importlib/__init__.py:37: size=3936 KiB (+3936 KiB), count=9581 (+9581), average=420 B
...
It did reveal one interesting thing - this first line s a blogging app. This loop seems to use a lot of memory, although I guess it's temporary. By line 3 everything is 1.5MB and smaller.
PUNCTUATION = dict.fromkeys(
i for i in range(sys.maxunicode)
if unicodedata.category(six.unichr(i)).startswith('P')
)
UPDATE #2
I went through the tedious process of uninstalling packages, apps, and fixing broken code. In the end, it looks like maybe 10MB was due to my internal apps, and 25MB was due to the Django debugging toolbar. This got me down to 45MB. It doesn't seem unreasonable for my core app to be using up 15MB, taking it down to the 30MB core. I don't use the toolbar in production, but it definitely needs additional memory for actually working.
In the end, I didn't learn much, but at least nothing seems insanely wrong. I was disappointed with tracemalloc, but hopefully it will work better as integrated into Python 3.

Using more memory than available

I have written a program that expands a database of prime numbers. This program is written in python and runs on windows 10 (x64) with 8GB RAM.
The program stores all primes it has found in a list of integers for further calculations and uses approximately 6-7GB of RAM while running. During some runs however, this figure has dropped to below 100MB. The memory usage then stays low for the duration of the run, though increasing as expected as more numbers are added to the prime array. Note that not all runs result in a memory drop.
Memory usage measured with task manager
These, seemingly random, drops has led me the following theories:
There's a bug in my code, making it drop critical data and messing up the results (most likely but not supported by the results)
Python just happens to optimize my code extremely well after a while.
Python or Windows is compensating for my over-usage of the RAM by cleaning out portions of my prime-number array that aren't used that much. (eventually resulting in incorrect calculations)
Python or Windows is compensating for my over-usage of the RAM by allocating disk space instead of ram.
Questions
What could be the reason(s) for this memory drop?
How does python handle programs that use more than available RAM?
How does Windows handle programs that use more than available RAM?
1, 2, and 3 are incorrect theories.
4 is correct. Windows (not Python) is moving some of your process memory to swap space. This is almost totally transparent to your application - you don't need to do anything special to respond to or handle this situation. The only thing you will notice is your application may get slower as information is written to and read from disk. But it all happens transparently. See https://en.wikipedia.org/wiki/Virtual_memory for more information.
Have you heard of paging? Windows dumps some ram (that hasn't been used in a while) to your hard drive to keep your computer from running out or ram and ultimately crashing.
Only Windows deals with memory management. Although, if you use Windows 10, it will also compress your memory, somewhat like a zip file.

IronPython vs Python speed with numpy

I have some Python source code that manipulates lists of lists of numbers (say about 10,000 floating point numbers) and does various calculations on these numbers, including a lot of numpy.linalg.norm for example.
Run time had not been an issue until we recently started using this code from a C# UI (running this Python code from C# via IronPython). I extracted a set of function calls (doing things as described in the 1st paragraph) and found that this code takes about 4x longer to run in IronPython compared to Python 2.7 (and this is after excluding the startup/setup time in C#/IronPython). I'm using a C# stopwatch around the repeated IronPython calls from C# and using the timeit module around an execfile in Python 2.7 (so the Python time results include more operation like loading the file, creating the objects, ... whereas the C# doesn't). The former requires about 4.0 seconds while the latter takes about 0.9 seconds.
Would you expect this kind of difference? Any ideas how I might resolve this issue? Other comments?
Edit:
Here is a simple example of code that runs about 10x slower on IronPython on my machine (4 seconds in Python 2.7 and 40 seconds in IronPython):
n = 700
for i in range(n-1):
for j in range(i, n):
dist = np.linalg.norm(np.array([i, i, i]) - np.array([j, j, j]))
You're using NUMPY?! You're lucky it works in IronPython at all! The support is being added literally as we speak!
To be exact, there's a CPython-extension-to-IronPython interface project and there's a native CLR port or numpy. I dunno which one you're using but both ways are orders of magnitude slower that working with the C version in CPython.
UPDATE:
The Scipy for IronPython port by Enthought that you're apparently using looks abandoned: last commits in the repos linked are a few years old and it's missing from http://www.scipy.org/install.html, too. Judging by the article, it was a partial port with interface in .NET and core in C linked with a custom interface. The previous paragraph stands for it, too.
Using the information from Faster alternatives to numpy.argmax/argmin which is slow , you may get some speedup if you limit data passing back and forth between the CLR and the C core.

Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage

I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?
Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.

Categories

Resources