I'm trying to understand if there is a way to avoid memory leaks in python in general. It happened a few times already that I had to use external pip packages that gave me memory leaks issues.
I would like to know a way to always monkey patch this.
More specifically, does wrapping the guilty code in a python process always help? if not, why? Is there some other way deal with this?
Thanks
Related
I am writing a data mining script to pull information off of a program called Agisoft PhotoScan for my lab. PhotoScan uses its own Python library (and I'm not sure how to access pip for this particular build), which has caused me a few problems installing other packages. After dragging, dropping, and praying, I've gotten a few packages to work, but I'm still facing a memory leak. If there is no way around it, I can try to install some more packages to weed out the leak, but I'd like to avoid this if possible.
My understanding of Python garbage collection so far is, when an object loses its reference, it should be deleted. I used sys.getrefcount() to check all my variables, but they all stay constant. I have a hunch that the issue could be in the mysql-connector package I installed, or in PhotoScan itself, but I am not sure how to go about testing. I will be more than happy to provide code if that will help!
It turns out that the memory leak was indeed with the PhotoScan program. I've worked around it by having a separate script open and close it, running my original script once each time. Thank you all for the help!
In my CI/CD environment I have a multiple projects that use mostly the same tests, with a bit of variation. Since all of them are mostly the same, just different projects/builds use them a bit differently, I am looking for a way (if there is one) to package the tests themselves to pass around the projects. EDIT: Packaging tested code is not possible.
The ultimate usage will be something like this:
pip install <test-package>
pytest -m <some-mark-depending-on-build/project> --<additional-variables>
Is there a way to do this?
EDIT: If there is, please point me out toward a solution.
Thanks in advance.
Keeping it here for references.
The way to do this is to create a test package that can run as python module, from main.py.
After researching and some testing, I've concluded that in my case this will create more code to maintain than I would otherwise properly reuse.
I'm developing a website in Python using the (excellent) Flask framework. In the backend code I use APScheduler to run some cron-like jobs every minute, and I use Numpy to calculate some Standard Deviations. Don't know whether the usage of these modules matter, but I thought I'dd better mention them since I guess they might be the most likely to be the cause.
Anyway, in the middle of operation, Python itself seemed to crash, giving the following:
*** Error in `/usr/bin/python': double free or corruption (out): 0x00007f7c3c017260 ***
I might be wrong, but as far as I know, this is pretty serious. So my question is actually; what could cause this, and how can I get more information about a crash like this? What does the (out) mean? I can't really reproduce this, but it happened 4 times in about 5 months now. I'm running the standard Python 2.7 on Ubuntu server 14.04
I searched around and found a couple discussions about similar crashes, of which one thing comes back: concurrency seems to be related somehow (which is why I included the usage of APScheduler).
If anybody has any idea how I could debug this or what could possibly be the cause of this; all tips are welcome!
I had a similar issue.
I had an unused dependency: spacy == 1.6.0
removing it solved the issue.
(maybe upgrading spacy version could also work)
spacy is written in Cython - an optimising static compiler for Python. So it might be related to sum bug memory allocation in spacy implementation.
I have a directory with several python modules in it. Each module is mutually exclusive of all the others but after lots of trial and error, I have decided that each module horks out when using the multiprocessing functionality in Python. I have used the join() function on each process and its just not working like I want.
What I am really looking for is the ability to drop new mutually exclusive python modules in to the directory and have them invoked when the directory is launched. Does anyone know how to do this?
It sounds to me like you are asking about plugin architecture and sandboxing. Does that sound right?
The plugin component has been done and written about else where. SO has code examples on basic ways to import all the files.
The sandbox part is going to be a harder. Have a look at RestrictedPython and the Restricted Execution docs and the generally older but nevertheless helpful discussion of sandboxing.
If you aren't worried about untrusted code but rather want to isolate errors you could just wrap each module in a generic try/except that handles all errors. This would make debugging hard but would ensure that an error in one module didn't bring down the whole system.
If you aren't worried about untrused code but do need to have each file totally isolated then you might be best off looking into various systems of interprocess communication. I've actually had some luck using Redis for this (which sounds ridiculous but actually has been very easy and effective).
Anyway hopefully some of that helps you. Without more information it's hard to provide more than general thoughts and a guide to better googling.
Is there any other way to debug swig extensions except for doing
gdb python stuff.py
?
I have wrapped the legacy library libkdtree++ and followed all the swig related memory managemant points (borrowed ref vs. own ref, etc.). But still, I am not sure whether my binding is not eating up memory. It would be helpful to be able to just debug step by step each publicized function: starting from Python then going to via the C glue binding into C space, and returning back.
Is there already such a possibility?
gdb 7.0 supports python scripting. It might help you in this particular case.
Well, for debugging, you use a debugger ;-).
When debugging, it may be a good idea to configure Python with '--with-pydebug' and recompile. It does additional checks then.
If you are looking for memory leaks, there is a simple way:
Run your code over and over in a loop, and look for Python's memory consumption.