My research requires processing memory traces of applications. For C/C++ programs, this is easy using Intel's PIN library. However, as suggested here Use Intel Pin to instrument Python scripts, I may need to instrument the Python runtime itself, which I'm not sure will represent the true memory behavior of a given python script due to some overheads(If this is not the case, please comment). Some of the existing python memory profilers only talk about the runtime memory "usage" in terms of the heap space usage, etc.
I ended up making an executable from my python script using PyInstaller and running my PINTool over it. However, I'm not sure if this is the right approach.
Is there any way(any library or any hack into the python runtime) that may help in getting the memory traces accessed by the python scripts?
I have a Python program that reads lines of files and analyzes them. The program intentionally reads many lines into the RAM.
The program started getting MemoryError while appending a line (as str) to list. When I check in the task manager (the program runs on Windows 10), I see that the memory of the program is on 1635MB (stable) and the total memory use of the machine is below 50%.
I read that Python does not limit the memory, so what could be the reason?
Technical details:
I use Python 3.6.5 on Windows 10, 64-bit 16GB RAM machine. I run the program from the PowerShell terminal and not through the IDE.
I see that the memory of the program is on 1635MB
Windows EXEs compiled as 32-bit have, by default, a 2GB memory limit even when on 64-bit OS SKUs where plenty more memory is available. You're at 1.6 GB, so you're probably bumping up against this limit.
Make sure you are running the 64-bit version of Python.exe. Python.org's download page defaults to 32-bit for unknown reasons. But if you browse to the bottom of their download page for a given release, you can find the x86-64 version for 64-bit architecture.
I have written a bioinformatics python program which makes heavy use of python's multiprocessing package. I see discrepancies between the memory used by child processes when run on MacOSX and Linux systems. MacOSX uses less memory.
When I profile the memory of the child processes running on each system I see a pronounced difference across the platforms. I profile each process when it begins and ends as follows (based on this SO answer, Note: MacOSX reports the memory useage of the process as Bytes and Linux reports as Kilobytes):
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
Linux reports that each Process requires 1GB whereas MacOSX reports that each job takes roughly 300MB. Whatsmore, MacOSX seems to start small and grows over the course of the Process whereas Linux starts and stays around 1GB.
So my questions:
Does this have something to do with the way either platform handles forking? Perhaps MacOSX spawns a new process whereas Linux forks by default. I am using Python 2.7 so I can't control the start method of processes (I think).
Am I right in thinking that this is a forking issue? Has anyone else come across this problem? How can I control the memory usage in Linux?
Is Python generally slower on Windows vs. a *nix machine? Python seems to blaze on my Mac OS X machine whereas it seems to run slower on my Window's Vista machine. The machines are similar in processing power and the vista machine has 1GBs more memory.
I particularly notice this in Mercurial but I figure this may simply be how Mercurial is packaged on windows.
I wanted to follow up on this and I found something that I believe is 'my answer'. It appears that Windows (vista, which is what I notice this on) is not as fast in handling files. This was mentioned by tony-p-lee.
I found this comparisons of Ubuntu vs Vista vs Win7. Their results are interesting and like they say, you need to take the results with a grain of salt. But I think the results lead me to the cause. Python, which I feel was indirectly tested, is about equivalent if not a tad-bit faster on Windows.. See the section "Richards benchmark".
Here is their graph for file transfers:
(source: tuxradar.com)
I think this specifically help address the question because Hg is really just a series of file reads, copies and overall handling. Its likely this is causing the delay.
http://www.tuxradar.com/content/benchmarked-ubuntu-vs-vista-vs-windows-7
No real numbers here but it certainly feels like the start up time is slower on Windows platforms. I regularly switch between Ubuntu at home and Windows 7 at work and it's an order of magnitude faster starting up on Ubuntu, despite my work machine being at least 4x the speed.
As for runtime performance, it feels about the same for "quiet" applications. If there are any GUI operations using Tk on Windows, they are definitely slower. Any console applications on windows are slower, but this is most likely due to the Windows cmd rendering being slow more than python running slowly.
Maybe the python has more depend on a lot of files open (import different modules).
Windows doesn't handle file open as efficiently as Linux.
Or maybe Linux probably have more utilities depend on python and python scripts/modules are more likely to be buffered in the system cache.
I run Python locally on Windows XP and 7 as well as OSX on my Macbook. I've seen no noticable performance differences in the command line interpreter, wx widget apps run the same, and Django apps also perform virtually identically.
One thing I noticed at work was that the Kaspersky virus scanner tended to slow the python interpreter WAY down. It would take 3-5 seconds for the python prompt to properly appear and 7-10 seconds for Django's test server to fully load. Properly disabling its active scanning brought the start up times back to 0 seconds.
With the OS and network libraries, I can confirm slower performance on Windows, at least for versions =< 2.6.
I wrote a CLI podcast-fetcher script which ran great on Ubuntu, but then wouldn't download anything faster than about 80 kB/s (where ~1.6 MB/s is my usual max) on either XP or 7.
I could partially correct this by tweaking the buffer size for download streams, but there was definitely a major bottleneck on Windows, either over the network or IO, that simply wasn't a problem on Linux.
Based on this, it seems that system and OS-interfacing tasks are better optimized for *nixes than they are for Windows.
Interestingly I ran a direct comparison of a popular Python app on a Windows 10 x64 Machine (low powered admittedly) and a Ubuntu 14.04 VM running on the same machine.
I have not tested load speeds etc, but am just looking at processor usage between the two. To make the test fair, both were fresh installs and I duplicated a part of my media library and applied the same config in both scenarios. Each test was run independently.
On Windows Python was using 20% of my processor power and it triggered System Compressed Memory to run up at 40% (this is an old machine with 6GB or RAM).
With the VM on Ubuntu (linked to my windows file system) the processor usage is about 5% with compressed memory down to about 20%.
This is a huge difference. My trigger for running this test was that the app using python was running my CPU up to 100% and failing to operate. I have now been running it in the VM for 2 weeks and my processor usage is down to 65-70% on average. So both on a long and short term test, and taking into account the overhead of running a VM and second operating system, this Python app is significantly faster on Linux. I can also confirm that the Python app responds better, as does everything else on my machine.
Now this could be very application specific, but it is at minimum interesting.
The PC is an old AMD II X2 X265 Processor, 6GB of RAM, SSD HD (which Python ran from but the VM used a regular 5200rpm HD which gets used for a ton of other stuff including recording of 2 CCTV cameras).
My team is incorporating the Python 2.4.4 runtime into our project in order to leverage some externally developed functionality.
Our platform has a 450Mhz SH4 application core and limited memory for use by the Python runtime and application.
We have ported Python, but initial testing has highlighted the following hurdles:
a) start-up times for the Python runtime can be as bad as 25 seconds (when importing the libraries concerned, and in turn their dependencies)
b) Python never seems to release memory to the OS during garbage collection - the only recourse is to close the runtime and restart (incurring start-up delays noted above, which often times is impractical)
If we can mitigate these issues our use of Python would be substantially improved. Any guidance from the SO community would be very valuable. Especially from anyone who has knowledge of the intrinsics of how the Python execution engine operates.
Perhaps it is hard to believe, but CPython version 2.4 never releases memory to the OS. This is allegedly fixed in verion Python 2.5.
In addition, performance (processor-wise) was improved in Python 2.5 and Python 2.6 on top of that.
See the C API section in What's new in Python 2.5, look for the item called Evan Jones’s patch to obmalloc
Alex Martelli (whose advice should always be at least considered), says multiprocess is the only way to go to free memory. If you cannot use multiprocessing (module in Python 2.6), os.fork is at least available. Using os.fork in the most primitive manner (fork one work process at the beginning, wait for it to finish, fork a new..) is still better than relauching the interpreter paying 25 seconds for that.