How can I find memory leaks without external packages?

How can I find memory leaks without external packages? - python

I am writing a data mining script to pull information off of a program called Agisoft PhotoScan for my lab. PhotoScan uses its own Python library (and I'm not sure how to access pip for this particular build), which has caused me a few problems installing other packages. After dragging, dropping, and praying, I've gotten a few packages to work, but I'm still facing a memory leak. If there is no way around it, I can try to install some more packages to weed out the leak, but I'd like to avoid this if possible.
My understanding of Python garbage collection so far is, when an object loses its reference, it should be deleted. I used sys.getrefcount() to check all my variables, but they all stay constant. I have a hunch that the issue could be in the mysql-connector package I installed, or in PhotoScan itself, but I am not sure how to go about testing. I will be more than happy to provide code if that will help!

It turns out that the memory leak was indeed with the PhotoScan program. I've worked around it by having a separate script open and close it, running my original script once each time. Thank you all for the help!

Related

Can python implement program-level virtual memory?

Recently, I wrote a python program, which requires a lot of memory. Then the computer memory is not enough, and it explodes.
It is known that the operating system will use part of the hard disk as virtual memory,which could solve the problem of insufficient memory. If you change the virtual memory of the operating system, you can solve the problem of insufficient memory in python programs, but the scope of impact is too wide.
Can python implement program-level virtual memory? That is, when the memory is insufficient, the hard disk is mapped to the program memory.
I need to run python program with large memory consumption.

Using disk space as memory is usually called swapping.
It is usually simpler to do it yourself than making a script to do it for you. But if you insist on your script doing it for you, then a way is just to execute the commands you would use to do it manually.
Here is a tutorial for how to add swap to a Linux system (first result on google) : https://linuxize.com/post/create-a-linux-swap-file/
Take each command in that tutorial, run them using subprocess, and you will get the desired result.
If you are on Windows (which you did not tell) then the method applies (but I could not find quickly an easy way to do it with commands).

Python VS code taking too much memory and taking too long to auto complete

I am a beginner learning to program python using VS code so my knowledge about both the VS code and the python extension is limited. I am facing two very annoying problems.
Firstly, when the python extension starts the memory usage of vs code jumps from ~300 mb to 1-1.5 Gbs. If I have any thing else open then everything gets extremely sluggish. This seems to me a bit abnormal. I have tried disabling all other extensions but the memory consumption remains the same. Is there a way (or some settings that I can change to reduce the memory consumption?
Secondly, the intellisense autocomplete takes quite a bit of time (sometimes 5-10 mins) before it starts to kick in. Also it stops working sometimes completely. Any pointers what could be causing that?
PS: I am using VS code version 1.50 (September update) and python anaconda 4.8.3.

VSCode as a code editor, in addition to the memory space occupied by VSCode itself, it needs to download the corresponding language services and language extensions to support, so it occupies some memory space.
For memory, it is recommended that you uninstall unnecessary third-party extensions and duplicate language services. In addition, this is a good habit if we use virtual environments in VSCode. The folder of the virtual environment exists in the project, and the installation package is stored in the project without occupying system resources.
For automatic completion, this function is provided by the corresponding language service and extension. please try to reload VSCode and wait for the language service to load before editing the code.
Therefore, you can try to use the extension "Pylance", which not only provides outstanding language service functions but also provides automatic completion.

At least for the intellisense, you could try changing
"python.jediEnabled": false
in your settings.json file. This will allow you to use a newer version of the intellisense, but it might need to download first.
But beyond that, I’d suggest using Pycharm instead. It’s quite snappy, and it has a free version.

monkey patch memory leaks external packages python

I'm trying to understand if there is a way to avoid memory leaks in python in general. It happened a few times already that I had to use external pip packages that gave me memory leaks issues.
I would like to know a way to always monkey patch this.
More specifically, does wrapping the guilty code in a python process always help? if not, why? Is there some other way deal with this?
Thanks

Python - Memory Leak

I'm working on solving a memory leak in my Python application.
Here's the thing - it really only appears to happen on Windows Server 2008 (not R2) but not earlier versions of Windows, and it also doesn't look like it's happening on Linux (although I haven't done nearly as much testing on Linux).
To troubleshoot it, I set up debugging on the garbage collector:
gc.set_debug(gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS)
Then, periodically, I log the contents of gc.garbage.
Thing is, gc.garbage is always empty, yet my memory usage goes up and up and up.
Very puzzling.

If there's never any garbage in gc.garbage, then I'm not sure what you're trying to do by enabling GC debugging. Sure, it'll tell you which objects are considered for cleanup, but that's not particularly interesting if you end up with no circular references that can't be cleaned up.
If your program is using more and more memory according to the OS, there can generally be four different cases at play:
Your application is storing more and more things, keeping references to each one so they don't get collected.
Your application is creating circular references between objects that can't be cleaned up by the gc module (typically because one of them has a __del__ method.)
Your application is freeing (and re-using) memory, but the OS doesn't want the memory re-used, so it keeps allocating new blocks of memory.
The leak is a real memory leak but in a C/C++ extension module your code is using.
From your description it sounds like it's unlikely to be #1 (as it would behave the same on any OS) and apparently not #2 either (since there's nothing in gc.garbage.) Considering #3, Windows (in general) has a memory allocator that's notoriously bad with fragmented allocations, but Python works around this with its obmalloc frontend for malloc(). It may still be an issue specific in Windows Server 2008 system libraries that make it look like your application is using more and more memory, though. Or it may be a case of #4, a C/C++ extension module, or a DLL used by Python or an extension module, with a memory leak.

In general, the first culprit for memory leaks in python is to be found in C extensions.
Do you use any of them?
Furthermore, you say the issue happens only on 2008; I would then check extensions for any incompatibility, because with Vista and 2008 there were quite a lot of small changes that caused issues on that field.
As and alternative, try to execute your application in Windows compatibility mode, choosing Windows XP - this could help solving the issue, especially if it's related to changes in the security.

Speeding up the python "import" loader

I'm getting seriously frustrated at how slow python startup is. Just importing more or less basic modules takes a second, since python runs down the sys.path looking for matching files (and generating 4 stat() calls - ["foo", "foo.py", "foo.pyc", "foo.so"] - for each check). For a complicated project environment, with tons of different directories, this can take around 5 seconds -- all to run a script that might fail instantly.
Do folks have suggestions for how to speed up this process? For instance, one hack I've seen is to set the LD_PRELOAD_32 environment variable to a library that caches the result of ENOENT calls (e.g. failed stat() calls) between runs. Of course, this has all sorts of problems (potentially confusing non-python programs, negative caching, etc.).

zipping up as many pyc files as feasible (with proper directory structure for packages), and putting that zipfile as the very first entry in sys.path (on the best available local disk, ideally) can speed up startup times a lot.

The first things that come to mind are:
Try a smaller path
Make sure your modules are pyc's so they'll load faster
Make sure you don't double import, or import too much
Other than that, are you sure that the disk operations are what's bogging you down? Is your disk/operating system really busy or old and slow?
Maybe a defrag is in order?

When trying to speed things up, profiling is key. Otherwise, how will you know which parts of your code are really the slow ones?
A while ago, I've created the runtime and import profile visualizer tuna, and I think it may be useful here. Simply create an import profile (with Python 3.7+) and run tuna on it:
python3.7 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log

If you run out of options, you can create a ramdisk to store your python packages. A ramdisk appears as a directory in your file system, but will actually be mapped directly to your computer's RAM. Here are some instructions for Linux/Redhat.
Beware: A ramdisk is volatile, so you'll also need to keep a backup of your files on your regular hard drive, otherwise you'll lose your data when your computer shuts down.

Something's missing from your premise--I've never seen some "more-or-less" basic modules take over a second to import, and I'm not running Python on what I would call cutting-edge hardware. Either you're running on some seriously old hardware, or you're running on an overloaded machine, or either your OS or Python installation is broken in some way. Or you're not really importing "basic" modules.
If it's any of the first three issues, you need to look at the root problem for a solution. If it's the last, we really need to know what the specific packages are to be of any help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.