Memory leak debug python

Memory leak debug python - python

I have a python script that works fine on my main computer without problems. But when I uploaded it to the Ubuntu server it started crashing. I thought for a long time what the problem was and looked at the system logs. It turned out that ubuntu automatically forcibly terminates the script due to lack of memory (server configuration is 512 MB of RAM), how can I debug the program on the consumed memory in different work options?

Have a look at something like Guppy3, which includes heapy, a 'heap analysis toolset' that can help you find where the memory's being used/held. Some links to information on how to use it are in the project's README.

If you have a core, consider using https://github.com/vmware/chap, which will allow you to look at both python and native allocations.
Once you have opened the core, probably "summarize used" is a good place to start.

Related

Memory handling for OSX vs RPi

Have a question that I can’t seem to find the answer to. So I have been using a raspberry pi to automate some scripts for data pulling in SQL databases. One issue that came up a few times was my python process gets killed which from the logs it looks like it’s due to insufficient RAM. This is from a raspberry pi 3B+, so only 1gb of ram. My question then is, is there a difference between say running it on a 1gb OSX system? Is there like better RAM management like writing swap files to the hard drive that another operating system/CPU architecture would have in this scenario? Or is it being a python process that the operating system cannot affect directly like that?
Note: this is really just for my understanding of how these factors work. I am pretty sure writing the code to process the data in chunks would work as a workaround to the RPi.

PyCharm reindexing when project is stored on a NAS

I have been working with PyCharm for quite some time now and I recently upgraded my storing system with a NAS.
Everything is working fine except one : PyCharm scans through my files to reindex them very very often. This makes me losing a lot of time waiting for it to end.
When the reindexing occurs:
When a script ends
When a debugging session ends
When PyCharm loses the focus, i.e. I use another application
So it happens basically ALL the time, taking quite a long time (several minutes sometimes).
Misc.:
Windows 10
PyCharm Community Edition 2018.1
Netgear - ReadyNas 422
Do you have any ideas to solve this issues ?

So I have contacted the IntelliJ support and here is their response:
Working with network drives/folders is not supported officially yet.
Using remote development features is recommended (remote interpreter,
deployment etc). Here is more detailed answer
https://intellij-support.jetbrains.com/hc/en-us/community/posts/207069145/comments/207464249.
What I end up doing, which is really not ideal, is to create a local copy of my projects environment and syncing it with a folder in my NAS. To do so I used the SyncBackPro software.

I'm using PyCharm both at home and at work with code stored on a Samba share (using its remote interpreter feature). I don't encounter consistent reindexing but by default it does not support file system notifications to know when a file changed.
However, as a programmer this shouldn't discourage you! You can drop in your own file system notifier that connects to your remote system (assuming your NAS runs Linux and supports SSH) and thus avoid the performance drop.
I actually wrote such a proxy to run the fsnotifier on a remote system a few years ago and I'm still using it. If you are interested, check out https://github.com/ThiefMaster/fsnotifier-remote
Some things in the repo are outdated (JetBrains removed this stupid file size check for example), but it should still provide you a good basis to start from if you are interested in using it.

Slow page loading on apache when using Flask

The Issue
I am using my laptop with Apache to act as a server for a local project involving tensorflow and python which uses an API written in Flask to service GET and POST requests coming from an app and maybe another user on the local network.The problem is that the initial page keeps loading when I specifically import tensorflow or the object detection folder within the research folder in the tensorflow github folder, and it never seems to finish doing so, effectively getting it stuck. I suspect the issue has to do with the packages being large in size, but I didn't have any issue with that when running the application on the development server provided with Flask.
Are there any pointers that I should look for when trying to solve this issue? I checked the memory usage, and it doesn't seem to be rising substantially, as well as the CPU usage.
Debugging process
I am able to print basic hello world to the root page quite quickly, but I isolated the issue to the point when the importing takes place where it gets stuck.
The only thing I can think of is to limit the number of threads that are launched, but when I limited the number of threads per child to 5 and number of connections to 5 in the httpd-mpm.conf file, it didn't help.
The error/access logs don't provide much insight to the matter.
A few notes:
Thus far, I used Flask's development server with multi-threading enabled to serve those requests, but I found it to be prone to crashing after 5 minutes of continuous run, so I am now trying to use Apache using the wsgi interface in order to use Python scripts.
I should also note that I am not servicing html files, just basic GET and POST requests. I am just viewing them using the browser.
If it helps, I also don't use virtual environments.
I am using Windows 10, Apache 2.4 and mod_wsgi 4.5.24

The tensorflow module being a C extension module, may not be implemented so it works properly in Python sub interpreters. To combat this, force your application to run in the main Python interpreter context. Details in:
http://modwsgi.readthedocs.io/en/develop/user-guides/application-issues.html#python-simplified-gil-state-api

Odd message and processing hangs

I have a large project that runs on an application server. It does pipelined processing of large batches of data and works fine on one Linux system (the old production environment) and one windows system (my dev environment).
However, we're upgrading our infrastructure and moving to a new linux system for production, based on the same image used for the existing production system (we use AWS). The python version (2.7) and libraries should be identical because of this, we're verifying this on our own using file hashes, also.
Our issue is that when we attempt to start processing on the new server, we receive a very strange output written to standard out followed by hanging of the server, "Removing descriptor: [some number]". I cannot duplicate this on the dev machine.
Has anyone ever encountered behavior like this in python before? Besides modules in the python standard library we are also using eventlet and beautifulsoup. In the standard library we lean heavily on urllib2, re, cElementTree, and multiprocessing (mostly the pools).

wberry was correct in his comment, I was running into a max descriptors per process issue. This seems highly dependent on operating system. Reducing the size of the batches I was having each processor handle to below the file descriptor limit of the process solved the problem.

Configuring python

I am new to python and struggling to find how to control the amount of memory a python process can take? I am running python on a Cento OS machine with more than 2 GB of main memory size. Python is taking up only 128mb of this and I want to allocate it more. I tried to search all over the internet on this for last half an hour and found absolutely nothing! Why is it so difficult to find information on python related stuff :(
I would be happy if someone could throw some light on how to configure python for various things like allowed memory size, number of threads etc.
A link to a site where most controllable parameters of python are described would be appreciated well.

Forget all that, python just allocates more memory as needed, there is not a myriad of comandline arguments for the VM as in java, just let it run. For all comandline switches you can just run python -h or read man python.

Are you sure that the machine does not have a 128M process limit? If you are running the python script as a CGI inside a web server, it is quite likely that there is a process limit set - you will need to look at the web server configuration.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.