Can python implement program-level virtual memory?

Can python implement program-level virtual memory? - python

Recently, I wrote a python program, which requires a lot of memory. Then the computer memory is not enough, and it explodes.
It is known that the operating system will use part of the hard disk as virtual memory,which could solve the problem of insufficient memory. If you change the virtual memory of the operating system, you can solve the problem of insufficient memory in python programs, but the scope of impact is too wide.
Can python implement program-level virtual memory? That is, when the memory is insufficient, the hard disk is mapped to the program memory.
I need to run python program with large memory consumption.

Using disk space as memory is usually called swapping.
It is usually simpler to do it yourself than making a script to do it for you. But if you insist on your script doing it for you, then a way is just to execute the commands you would use to do it manually.
Here is a tutorial for how to add swap to a Linux system (first result on google) : https://linuxize.com/post/create-a-linux-swap-file/
Take each command in that tutorial, run them using subprocess, and you will get the desired result.
If you are on Windows (which you did not tell) then the method applies (but I could not find quickly an easy way to do it with commands).

Related

How can I measure the memory required to run a Jupyter notebook?

I am preparing a Jupyter notebook which uses large arrays (1-40 GB), and I want to give its memory requirements, or rather:
the amount of free memory (M) necessary to run the Jupyter server and then the notebook (locally),
the amount of free memory (N) necessary to run the notebook (locally) when the server is already running.
The best idea I have is to:
run /usr/bin/time -v jupyter notebook,
assume that "Maximum resident set size" is the memory used by the server alone (S),
download the notebook as a *.py file,
run /usr/bin/time -v ipython notebook.py
assume that "Maximum resident set size" is the memory used by the code itself (C).
Then assume N > C and M > S + C.
I think there must be a better way, as:
I expect Jupyter notebook to use additional memory to communicate with client etc.,
there is also additional memory used by the client run in a browser,
Uncollected garbage contributes to C, but should not be counted as the required memory, should it?

Your task is hard I think.
You have no guarantee that Python is actually putting in ram every variable. Maybe the OS decided to kick some of the memory on the disk using the swap.
You can try to disable it but there may be other things that cache stuff.
You can force garbage collection somehow in python using the gc package but results were inconsistent when I tried to do so.
I don't know if M and N are actually useful to you or if you are just trying to size a server. If it's the later, maybe renting increasing size server on AWS or Digital Ocean and running a runtime benchmark may actually give you faster and more reliable results.

The jupyter-resource-usage extension is an extension preinstalled. It tells you how much memory you have and how much you need, but it's only available in the jupyter notebook you find in anaconda installations. It is located in the top-right corner.

Instrument memory access of python scripts

My research requires processing memory traces of applications. For C/C++ programs, this is easy using Intel's PIN library. However, as suggested here Use Intel Pin to instrument Python scripts, I may need to instrument the Python runtime itself, which I'm not sure will represent the true memory behavior of a given python script due to some overheads(If this is not the case, please comment). Some of the existing python memory profilers only talk about the runtime memory "usage" in terms of the heap space usage, etc.
I ended up making an executable from my python script using PyInstaller and running my PINTool over it. However, I'm not sure if this is the right approach.
Is there any way(any library or any hack into the python runtime) that may help in getting the memory traces accessed by the python scripts?

Python VS code taking too much memory and taking too long to auto complete

I am a beginner learning to program python using VS code so my knowledge about both the VS code and the python extension is limited. I am facing two very annoying problems.
Firstly, when the python extension starts the memory usage of vs code jumps from ~300 mb to 1-1.5 Gbs. If I have any thing else open then everything gets extremely sluggish. This seems to me a bit abnormal. I have tried disabling all other extensions but the memory consumption remains the same. Is there a way (or some settings that I can change to reduce the memory consumption?
Secondly, the intellisense autocomplete takes quite a bit of time (sometimes 5-10 mins) before it starts to kick in. Also it stops working sometimes completely. Any pointers what could be causing that?
PS: I am using VS code version 1.50 (September update) and python anaconda 4.8.3.

VSCode as a code editor, in addition to the memory space occupied by VSCode itself, it needs to download the corresponding language services and language extensions to support, so it occupies some memory space.
For memory, it is recommended that you uninstall unnecessary third-party extensions and duplicate language services. In addition, this is a good habit if we use virtual environments in VSCode. The folder of the virtual environment exists in the project, and the installation package is stored in the project without occupying system resources.
For automatic completion, this function is provided by the corresponding language service and extension. please try to reload VSCode and wait for the language service to load before editing the code.
Therefore, you can try to use the extension "Pylance", which not only provides outstanding language service functions but also provides automatic completion.

At least for the intellisense, you could try changing
"python.jediEnabled": false
in your settings.json file. This will allow you to use a newer version of the intellisense, but it might need to download first.
But beyond that, I’d suggest using Pycharm instead. It’s quite snappy, and it has a free version.

Standalone Python interpreter

I want to run a python program without any underlying OS.
I have read articles on running python on small microcontrollers, but i want it on a bigger processor (Intel, ARM).
My criteria is:
It could be directly run as binary.
The Python interpreter could be loaded, onto which I can run my program.
At worst, tell me an extremely small, basic OS i can run it on.
Note: I want to use my program like a minimalistic operating system. I should be able to load it like any other OS, and it should be able to access memory and have basic I/O.
Note 2: Will there be limitations in terms of python's functions?

Note: this post describes x86 exclusively, as, next to ARM, requested by the OP.
It could be directly run as binary.
Binary? Python is not compiled, so no binary is produced. I think you mean just "run a Python program directly" here.
You could implement an additional compilation step, so that Python source files are compiled to bytecode prior to being executed, of course.
The Python interpreter could be loaded, onto which I can run my program.
"loaded" is a problem here. You need software to load the interpreter, displaying a chicken-egg problem. Intel x86 solves the problem by using a so-called BIOS (Basic I/O System), which starts further, user-defined programs. This "user-defined" program would be your Python interpreter then.
On more modern machines, UEFI is used instead of the legacy BIOS.
I want to use my program like a minimalistic operating system. I
should be able to load it like any other OS, and it should be able to
access memory and have basic I/O.
The aforementioned BIOS provides, as the acronym says, basic I/O functionality like reading/writing from/to disks, reading/writing from/to the screen, etc. Either use these basic routines and abstract from these or circumvent them and rewrite them all from scratch. That includes graphics drivers (a basic VGA driver will suffice), disk drivers (for loading Python files from disk), and filesystem (a simple FAT-16 is sufficient).
After all, you not only need to write a Python interpreter but a whole development environment from scratch.
Will there be limitations in terms of python's functions?
It depends on what you implement. For networking you need the appropriate drivers, for file stuff a filesystem + secondary storage driver. You are the ultimate master of your system you create, so it is up to you how un/limited your Python environment will be.

Python - Memory Leak

I'm working on solving a memory leak in my Python application.
Here's the thing - it really only appears to happen on Windows Server 2008 (not R2) but not earlier versions of Windows, and it also doesn't look like it's happening on Linux (although I haven't done nearly as much testing on Linux).
To troubleshoot it, I set up debugging on the garbage collector:
gc.set_debug(gc.DEBUG_UNCOLLECTABLE | gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS)
Then, periodically, I log the contents of gc.garbage.
Thing is, gc.garbage is always empty, yet my memory usage goes up and up and up.
Very puzzling.

If there's never any garbage in gc.garbage, then I'm not sure what you're trying to do by enabling GC debugging. Sure, it'll tell you which objects are considered for cleanup, but that's not particularly interesting if you end up with no circular references that can't be cleaned up.
If your program is using more and more memory according to the OS, there can generally be four different cases at play:
Your application is storing more and more things, keeping references to each one so they don't get collected.
Your application is creating circular references between objects that can't be cleaned up by the gc module (typically because one of them has a __del__ method.)
Your application is freeing (and re-using) memory, but the OS doesn't want the memory re-used, so it keeps allocating new blocks of memory.
The leak is a real memory leak but in a C/C++ extension module your code is using.
From your description it sounds like it's unlikely to be #1 (as it would behave the same on any OS) and apparently not #2 either (since there's nothing in gc.garbage.) Considering #3, Windows (in general) has a memory allocator that's notoriously bad with fragmented allocations, but Python works around this with its obmalloc frontend for malloc(). It may still be an issue specific in Windows Server 2008 system libraries that make it look like your application is using more and more memory, though. Or it may be a case of #4, a C/C++ extension module, or a DLL used by Python or an extension module, with a memory leak.

In general, the first culprit for memory leaks in python is to be found in C extensions.
Do you use any of them?
Furthermore, you say the issue happens only on 2008; I would then check extensions for any incompatibility, because with Vista and 2008 there were quite a lot of small changes that caused issues on that field.
As and alternative, try to execute your application in Windows compatibility mode, choosing Windows XP - this could help solving the issue, especially if it's related to changes in the security.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.