I have written a code which does some processing , I want to reduce the execution time of the program and I think it can be done if I run it on my RAM which is 1GB.
So will running my program form RAM make any difference to my execution time and if yes how it can be done.
Believe it or not, when you use a modernish computer system, most of your computation is done from RAM. (Well, technically, it's "done" from processor registers, but those are filled from RAM so let's brush that aside for the purposes of this answer)
This is thanks to the magic we call caches and buffers. A disk "cache" in RAM is filled by the operating system whenever something is read from permanent storage. Any further reads of that same data (until and unless it is "evicted" from the cache) only read memory instead of the permanent storage medium.
A "buffer" works similarly for write output, with data first being written to RAM and then eventually flushed out to the underlying medium.
So, in the course of normal operation, any runs of your program after the first (unless you've done a lot of work in between), will already be from RAM. Ditto the program's input file: if it's been read recently, it's already cached in memory! So you're unlikely to be able to speed things up by putting it in memory yourself.
Now, if you want to force things for some reason, you can create a "ramdisk", which is a filesystem backed by RAM. In Linux the easy way to do this is to mount "tmpfs" or put files in the /dev/shm directory. Files on a tmpfs filesystem go away when the computer loses power and are entirely stored in RAM, but otherwise behave like normal disk-backed files. From the way your question is phrased, I don't think this is what you want. I think your real answer is "whatever performance problems you think you have, this is not the cause, sorry".
Related
Is a file stored to disc, when only present for a fraction of a second?
I'm running with python 3.7 on ubuntu 18.04.
I make use of a python script. This script extracts json-files from a zip-package. Resulting files will be processed. Afterwards the resulting files well be deleted.
As I'm running on an SSD. I want to spare write cycles to it.
Does linux buffer such write cycles to the RAM, or do I need to assume, that I'm forcing my poor SSD into sever thousend write cycles per second?
Linux may cache file operations under some circumstances, but you're looking for it to optimize by avoiding ever committing a whole sequence of operations to storage at all, based on there being no net effect. I do not think you can expect that.
It sounds like you might be better served by using a different filesystem in the first place. Linux has memory-backed file systems (served by the tmpfs filesystem driver, for example), so perhaps you want to set up such a filesystem for your application to use for these scratch files.1 Do note, however, that these are backed by virtual memory, so, although this approach should reduce the number of write cycles on your SSD, it might not eliminate all writes.
1 For example, see https://unix.stackexchange.com/a/66331/289373
I have written a program that expands a database of prime numbers. This program is written in python and runs on windows 10 (x64) with 8GB RAM.
The program stores all primes it has found in a list of integers for further calculations and uses approximately 6-7GB of RAM while running. During some runs however, this figure has dropped to below 100MB. The memory usage then stays low for the duration of the run, though increasing as expected as more numbers are added to the prime array. Note that not all runs result in a memory drop.
Memory usage measured with task manager
These, seemingly random, drops has led me the following theories:
There's a bug in my code, making it drop critical data and messing up the results (most likely but not supported by the results)
Python just happens to optimize my code extremely well after a while.
Python or Windows is compensating for my over-usage of the RAM by cleaning out portions of my prime-number array that aren't used that much. (eventually resulting in incorrect calculations)
Python or Windows is compensating for my over-usage of the RAM by allocating disk space instead of ram.
Questions
What could be the reason(s) for this memory drop?
How does python handle programs that use more than available RAM?
How does Windows handle programs that use more than available RAM?
1, 2, and 3 are incorrect theories.
4 is correct. Windows (not Python) is moving some of your process memory to swap space. This is almost totally transparent to your application - you don't need to do anything special to respond to or handle this situation. The only thing you will notice is your application may get slower as information is written to and read from disk. But it all happens transparently. See https://en.wikipedia.org/wiki/Virtual_memory for more information.
Have you heard of paging? Windows dumps some ram (that hasn't been used in a while) to your hard drive to keep your computer from running out or ram and ultimately crashing.
Only Windows deals with memory management. Although, if you use Windows 10, it will also compress your memory, somewhat like a zip file.
I'm new to python so I apologize for any misconceptions.
I have a python file that needs to read/write to stdin/stdout many many times (hundreds of thousands) for a large data science project. I know this is not ideal, but I don't have a choice in this case.
After about an hour of running (close to halfway completed), the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory. Apart from the read/write, I'm hardly doing any computing and am really just trying to get this to run successfully before going any farther.
My Question: Does writing to stdin/stdout a few hundred thousand times use up that much memory? The file basically needs to loop through some large lists (15k ints) and do it a few thousand times. I've got 500 gigs of hard drive space and 12 gigs of ram and am still getting the errors. I even spun up an EC2 instance on AWS and STILL had memory errors. Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything? Is there anyway that I reduce the memory usage to run this successfully?
Appreciate any help.
the process gets terminated on my mac due to "Low Swap" which I believe refers to lack of memory
SWAP space is part of your Main Memory - RAM.
When a user reads a file it puts in it Main Memory (caches, and RAM). When its done it removes it.
However, when a user writes to a file, changes need to be recorded. One problem. What if you are writing to a different file every millisecond. The RAM and L caches reach capacity, so the least recently used (LRU) files are put into SWAP space. And since SWAP is still part of Main Memory (not the hard drive), it is possible to overflow it and lose information, which can cause a crash.
Is it possible that I have some sort of memory leak in the script even though I'm not doing hardly anything?
Possibly
Is there anyway that I reduce the memory usage to run this successfully?
One way is to think of how you are managing the file(s). Reads will not hurt SWAP because the file can just be scrapped, without the need to save. You might want to explicitly save the file (closing and opening the file should work) after a certain amount of information has been processed or a certain amount of time has gone by. Thus, removing the file from SWAP space.
I'm trying to identify a memory leak in a Python program I'm working on. I'm current'y running Python 2.7.4 on Mac OS 64bit. I installed heapy to hunt down the problem.
The program involves creating, storing, and reading large database using the shelve module. I am not using the writeback option, which I know can create memory problems.
Heapy usage shows during the program execution, the memory is roughly constant. Yet, my activity monitor shows rapidly increasing memory. Within 15 minutes, the process has consumed all my system memory (16gb), and I start seeing page outs. Any idea why heapy isn't tracking this properly?
Take a look at this fine article. You are, most likely, not seeing memory leaks but memory fragmentation. The best workaround I have found is to identify what the output of your large working set operation actually is, load the large dataset in a new process, calculate the output, and then return that output to the original process.
This answer has some great insight and an example, as well. I don't see anything in your question that seems like it would preclude the use of PyPy.
I'm dealing with some big (tens of millions of records, around 10gb) database files using SQLite. I'm doint this python's standard interface.
When I try to insert millions of records into the database, or create indices on some of the columns, my computer slowly runs out of memory. If I look at the normal system monitor, it looks like the majority of the system memory is free. However, when I use top, it looks like I have almost no system memory free. If I sort the processes by their memory consuption, then non of them uses more than a couple percent of my memory (including the python process that is running sqlite).
Where is all the memory going? Why do top and Ubuntu's system monitor disagree about how much system memory I have? Why does top tell me that I have very little memory free, and at the same time not show which process(es) is (are) using all the memory?
I'm running Ubuntu 11.04, sqlite3, python 2.7.
10 to 1 says you are confused by linux's filesystem buffer/cache
see
ofstream leaking memory
https://superuser.com/questions/295900/linux-sort-all-data-in-memory/295902#295902
Test it by doing (as root)
echo 3 > /proc/sys/vm/drop_caches
The memory may be not assigned to a process, but it can be e.g. a file on tmpfs filesystem (/dev/shm, /tmp sometimes). You should show us the output of top or free (please note those tools do not show a single 'memory usage' value) to let us tell something more about the memory usage.
In case of inserting records to a database it may be a temporary image created for the current transaction, before it is committed to the real database. Splitting the insertion into many separate transactions (if applicable) may help.
I am just guessing, not enough data.
P.S. It seems I mis-read the original question (I assumed the computer slows down) and there is no problem. sehe's answer is probably better.