How to profile python process for memory usage? - python

We own a corporate level forum which is developed using python (with Django framework). We are intermittently observing memory usage spikes on our production setup and we wish to track the cause.
The occurrences of these incidences are random and not directly related to the load (as per current study).
I have browsed a lot on internet and especially stackoverflow for some suggestions and was not able to get any similar situation.
Yes, I was able to locate a lot of profiler utils like Python memory profiler but these require some code level inclusion of these modules and as this happens to be in production profiler are not a great help (we plan to review our implementation in the next release).
We wish to review this issue based on occurrence.
Thus I wish to check whether there is any tool that we can use to create a dump for analysis offline (just like heapdumps in java).
Any pointers?
Is gdb the only option?
OS: Linux
Python: 2.7 (currently we do not plan to upgrade until that can help in fixing this issue)
Cheers!
AJ

maybe you can try using valgrind. It's a bit tricky, but you can follow up here if you are interested in it
How to use valgrind with python?

Related

Python VS code taking too much memory and taking too long to auto complete

I am a beginner learning to program python using VS code so my knowledge about both the VS code and the python extension is limited. I am facing two very annoying problems.
Firstly, when the python extension starts the memory usage of vs code jumps from ~300 mb to 1-1.5 Gbs. If I have any thing else open then everything gets extremely sluggish. This seems to me a bit abnormal. I have tried disabling all other extensions but the memory consumption remains the same. Is there a way (or some settings that I can change to reduce the memory consumption?
Secondly, the intellisense autocomplete takes quite a bit of time (sometimes 5-10 mins) before it starts to kick in. Also it stops working sometimes completely. Any pointers what could be causing that?
PS: I am using VS code version 1.50 (September update) and python anaconda 4.8.3.
VSCode as a code editor, in addition to the memory space occupied by VSCode itself, it needs to download the corresponding language services and language extensions to support, so it occupies some memory space.
For memory, it is recommended that you uninstall unnecessary third-party extensions and duplicate language services. In addition, this is a good habit if we use virtual environments in VSCode. The folder of the virtual environment exists in the project, and the installation package is stored in the project without occupying system resources.
For automatic completion, this function is provided by the corresponding language service and extension. please try to reload VSCode and wait for the language service to load before editing the code.
Therefore, you can try to use the extension "Pylance", which not only provides outstanding language service functions but also provides automatic completion.
At least for the intellisense, you could try changing
"python.jediEnabled": false
in your settings.json file. This will allow you to use a newer version of the intellisense, but it might need to download first.
But beyond that, I’d suggest using Pycharm instead. It’s quite snappy, and it has a free version.

How to distribute a Python program with embedded Firebird SQL for Linux and Windows

Summary:
What is the best (easiest, most flexible, simplest) way to redistribute a Firebird SQL database with Python code in a way that end users can use it without going to the trouble of installing and maintaining Firebird?
Background (somewhat long-winded):
I've been trying to write a program that sifts through stock fundamentals and appraises different companies' stock based on those fundamentals and randomized weights. I noticed that after some length of time, the program seemed to just stall. I did use multi-threading here and there, and I considered dead/livelocks, but apart from just combing through the code and seeing if it makes sense, I can't debug that. I noticed I was also eating up a lot of RAM, since all this data was held in big Python dicts in memory. So I figured putting it in SQL databases would fix that.
After a few weeks, I got the code working again with SQLAlchemy and SQLite. Now the problem is that the appraisal function takes ten minutes (!) per stock. Multiplied by a total of twelve "genomes" competing initially, this would add up to around 200 hours. I started thinking maybe this had to do with SQLite's concurrency locks, or something along those lines, so I started trying to use Firebird, since it is the only other one I know that stores a database in a file.
Question elaboration:
Ideally, I would be able to just throw my code on a disk or a server, take it to another computer with Python on it, and run everything out of the box. That's doable with SQLite. Is it possible with Firebird? I know that there is a separate embedded package for Windows, but that Linux only has the libfbembed library that ships with the Classic server. The docs said that Linux always requires some version of the Firebird server proper to be installed.
Will end users need to do any database administration to make that work; that is, will they need to set up users and such manually, as if I had just given them an fdb file and told them to figure out the rest? Or is it enough to install the basic packages for Firebird? Will I ever be able to get something close to the simplicity of SQLite in redistributing Firebird databases? Is there any special syntax I need to pass to SQLalchemy/FDB/Kinterbasedb to use an embedded server? (I could not find anything about this on either SQLalchemy or FDB's websites). Can my program run seamlessly on Linux and Windows, or will there need to be slightly different setups for each case?
Thanks in advance, anyone who can answer some of these questions.
Well, i only can give partial answers. But i think that'll be enough to start with.
Let's start with the Firebird embedded thing: As you have written - using linux as os you have to provide a full install. There is no other way.
HINT: Use the native tgz provided from firebird, not any package delivered from distribution - to avoid dependency hell.
Installing Firebird on Windows : The Windows Firebird Installer is mostly a 'click-through' thing. Luckily you can customize the installer: Install Firebird and look into doc\scripted-install.txt.
HINT: on Win7/8 don't install into %PROGRAM FILES% or %PROGRAM FILES (x86)%
Talking to firebird:
AFAIK you have two options, but for both i don't know if and how they will work with SQLAlchemy:
The fdb module which comes from firebird. When installing the fdb package make shure the appropriate fbclient.dll is in the search path.
the pyfirebirdsql module: https://github.com/nakagami/pyfirebirdsql/ which needs no dll or that. Partial drawback - it is not as fast as the fdb module as there is no real database engine. Personally i only use it for short lookups.
Using the fdb module you also can talk to the firebird services api - from creating over dropping databases to querying header statistics, ending up with backup/restore actions.
That should at least answer the question if the end user needs to perform any database administration.

Tracking global migration to Python 3.x

Python 3.x is looking ever more tempting with cleaned up syntax (I like it, others may not) new features and what looks like a gradual progression towards more speed and better multithreading.
But Python 3.x is still held back by lack of 3rd party support. Important packages like Django, Twisted, etc. are not ported. It's hard to get an overview of where the bottlebecks in the migration are, how far it has come, and if it's progressing at all. The migration dependencies are also hard to map. Also, projects are probably waiting for Python 3.x to offer some major improvement over 2.x that would justify the effort of porting.
Ideally, there would be a site for tracking this migration overall, with (links to) migration plans and dependencies shown so that people willing to help the migration globally could coordinate their efforts and help specific projects. Perhaps also linking to projects' bug tracking systems for relevant migration-related bugs.
But perhaps I'm just not looking hard enough. Does someone know of any efforts to track global migration to Python 3.x?
(By "global", I mean the universe of open source projects built on Python.)
Update:
There's a poll right now on the Python home page which asks about packages you'd like to see ported to Python 3.x.
George Brandl has made a script that generates a graph with the amount of packages supporting Python 3:
The Link on the CheeseShop front page shows the packages in question: http://pypi.python.org/pypi?%3aaction=browse&c=533&show=all
There is also (a pretty crummy) list of unported packages ordered by how many depends on it: http://onpython3yet.com/ Why do I say it's crummy? Well, because it is done entirely without manual fixing up, resulting in things like listing Python as a package. This is to a large extent because people don't know that the "Dependencies" listing isn't a place to just list any sort of random dependencies, it should be used to list the packages that should be auto installed when you use easy_install/PIP. But for example in the Django world, they don't know that so you see things like "django-saddle" depending on Django and Python, and hence not being easy_installable.
That said, the list is interesting, and we see that PIL really should get ported.
Now this is not anything "global" it's just the packages on PyPI, and as such tend to be mostly Python modules, not separate applications. But I think the trend in general is visible there anyway.
The Python Package Index (PyPI) allows you to search for Python 3rd-party modules that support Python 3.x. It even has a Python 3 packages link which lists them all.
But that doesn't track individual projects' progress on Python 3 support. It just tells you which projects have achieved it.
Something I'd be interested to see is a graph of the total number/percentage of Python 3 packages in PyPI over time (from Python 3 release until present). I don't know if anyone has tracked this, or if the PyPI administrators have enough history data to produce such graphs.

Writing a kernel mode profiler for processes in python

I would like seek some guidance in writing a "process profiler" which runs in kernel mode. I am asking for a kernel mode profiler is because I run loads of applications and I do not want my profiler to be swapped out.
When I said "process profiler" I mean to something that would monitor resource usage by the process. including usage of threads and their statistics.
And I wish to write this in python. Point me to some modules or helpful resource.
Please provide me guidance/suggestion for doing it.
Thanks,
Edit::: Would like to add that currently my interest isto write only for linux. however after i built it i will have to support windows.
It's going to be very difficult to do the process monitoring part in Python, since the python interpreter doesn't run in the kernel.
I suspect there are two easy approaches to this:
use the /proc filesystem if you have one (you don't mention your OS)
Use dtrace if you have dtrace (again, without the OS, who knows.)
Okay, following up after the edit.
First, there's no way you're going to be able to write code that runs in the kernel, in python, and is portable between Linux and Windows. Or at least if you were to, it would be a hack that would live in glory forever.
That said, though, if your purpose is to process Python, there are a lot of Python tools available to get information from the Python interpreter at run time.
If instead your desire is to get process information from other processes in general, you're going to need to examine the options available to you in the various OS APIs. Linux has a /proc filesystem; that's a useful start. I suspect Windows has similar APIs, but I don't know them.
If you have to write kernel code, you'll almost certainly need to write it in C or C++.
don't try and get python running in kernel space!
You would be much better using an existing tool and getting it to spit out XML that can be sucked into Python. I wouldn't want to port the Python interpreter to kernel-mode (it sounds grim writing it).
The /proc option does sound good.
some code code that reads proc information to determine memory usage and such. Should get you going:
http://www.pixelbeat.org/scripts/ps_mem.py reads memory information of processes using Python through /proc/smaps like charlie suggested.
Some of your comments on other answers suggest that you are a relatively inexperienced programmer. Therefore I would strongly suggest that you stay away from kernel programming, as it is very hard even for experienced programmers.
Why would you want to write something that
is a very complex system (just look at existing profiling infrastructures and how complex they are)
can not be done in python (I don't know any kernel that would allow execution of python in kernel mode)
already exists (oprofile on Linux)
have you looked at PSI? (http://www.psychofx.com/psi/)
"PSI is a Python module providing direct access to real-time system and process information. PSI is a Python C extension, providing the most efficient access to system information directly from system calls."
it might give you what you are looking for. .... or at least a starting point.
Edit 2014:
I'd recommend checking out psutil instead:
https://pypi.python.org/pypi/psutil
psutil is actively maintained and has some nifty process monitoring features. PSI seems to be somewhat dead (last release 2009).

Python build/release system

I started using Pyant recenently to do various build/release tasks but have recently discovered that development for this project has ended.
I did some research and can't seem to find any other Python build scripts that are comparable. Just wondering if anyone can recommend one? I basically need it to do what ANT does - do SVN updates, move/copy files, archive etc using an XML file.
Thanks,
g
Probably the best answer is to use Ant as-is... that is, use the Java version. My second suggestion would be to use scons. It won't take much time using scons before you're asking, "Who ever thought of using XML to script a build?"
Its not completely comparable but I tend to use fabric. Its more geared towards deployment with support for ssh to production host and runing things as root there etc.
Some people use Paver for build/deployment of Python packages. While I know it works, it does not appeal to me that much.
what about maven? (http://maven.apache.org/) With the right plugins it can do much more then ant, it can even use ant for building if you configure it so.
It's very flexible and supports the full product life cycle. I really recommend you take a look at it.

Categories

Resources