This question already has answers here:
Find free disk space in python on OS/X
(7 answers)
Closed 3 years ago.
I have a script that is going to download a lot of data from the internet. But, I have no idea how long will it take or how big the data it will be.
To be more precise I want to analyze some live videos and for that I will download the content using youtube-dl. Since I want to leave it running for a week or two, is there a way so that can avoid running into low memory problem, that the computer checks on a specific interval what is my memory status and if it is below a certain value to stop the execution?
Thanks
You can use shutil.disk_usage(path) from the docs:
shutil.disk_usage(path)
Return disk usage statistics about the given path as a named tuple with the attributes total, used and free, which are the amount of
total, used and free space, in bytes. On Windows, path must be a
directory; on Unix, it can be a file or directory.
Use shutil.disk_usage.
total, used, free = shutil.disk_usage("/")
Related
This question already has answers here:
How did Python read this binary faster the second time?
(2 answers)
Closed 1 year ago.
This was something I came across while working on a project and I'm kind of confused. I have a .txt file with ~15000 lines. And when I run the program once, it takes around 4-5 seconds to go through all the lines. But I added a while True before opening the file and I did file.close() so that it continuously opens, goes through all the lines, and then closes.
But after the first run, I noticed that it takes around 1 second to complete. I made sure to close the files afterwards so what might be causing it to be so much faster?
It's called "file caching" or "warming the cache". All of the major operating systems allocate a goodly portion of your RAM to a file cache. When you read a file, those buffers are retained for a while instead of being released right away. If you read the same file again, it can often pull the data from RAM instead of going to disk.
This question already has answers here:
Locking a file in Python
(15 answers)
Closed 3 years ago.
This is not a duplicate of Locking a file in Python
I have two scripts, one runs every 30 minutes, the other one runs every one minute.
and they both use the same file to do few things.
at some point, every 30 minute they try to access the same file at the same time and they corrupt the file.
I was thinking about using wait. but they are two independent scripts and I am not sure if this is possible.
Any idea?
I thought about using
with FileLock("document.txt")
The problem that arise is; if script-1 acquire the lock for "document.txt" then script-2 wants to access document.txt, is it going to wait that script-1 finish? or is it going to skip that line of code? as the 2nd one isn't an option?
also. once the lock is acquired, how to remove it when it's no longer needed?
One of the simpliest ways to get this done (in case you have write access to the file's directory) is to create an additional file (like filename.lck) to point out that some script is working on that file. Obviously, once a script has finished working with the file, that lock-file needs to be removed.
But honestly, I would be very surprised if such a locking mechanism is not foreseen in Python. How exactly do you open and close the mentioned file? Maybe some parameter already takes care of the locking.
This question already has answers here:
Getting processor information in Python
(10 answers)
Closed 7 years ago.
I'm trying to find out where this value is stored in both windows and osx, in order to do some calculations to make a better task distribution.
Core speed in Hz
Thanks in advance.
Using the platform.process() command only returns the name not the speed
I only managed to get it trough this:
import subprocess
info=subprocess.check_output(["wmic","cpu","get", "name"])
print info.split('#')[1].split(' ')[1]
But for the moment i have no way to tell if it will always return the same result in every machine (no access to other computers right now)
Machine ID
There is currently no cross platform python way of getting a Machine ID, however this has been asked before:
Get a unique computer ID in Python on windows and linux
if you just want the machine name use platform.node()
Number of cores
The multiprocessing module contains the multiprocessing.cpu_count() method
Cores speed in Hz
There is currently no cross platform python way of getting cpu frequency, however this has been asked before: Getting processor information in Python
This question already has answers here:
Finding duplicate files and removing them
(10 answers)
Closed 8 years ago.
I've been tasked with consolidating about 15 years of records from a laboratory, most of which is either student work or raw data. We're talking 100,000+ human-generated files.
My plan is to write a Python 2.7 script that will map the entire directory structure, create checksums for each, and then flag duplicates for deletion. I'm expecting probably 10-25% duplicates.
My understanding is that MD5 collisions are possible, theoretically, but so unlikely that this is essentially a safe procedure (let's say that if 1 collision happened, my job would be safe).
Is this a safe assumption? In case implementation matters, the only Python libraries I intend to use are:
hashlib for the checksum;
sqlite for databasing the results;
os for directory mapping
The probability of finding an md5 collision between two files by accident is:
0.000000000000000000000000000000000000002938735877055718769921841343055614194546663891
the probability of getting hit by 15km size asteroid is 0.00000002. I'd say yes.
Backing up the files and well testing the script remains a good advice, human mistakes and bugs are more luckily to happen.
The recent researches about MD5 collisions may have baffled you because in 2013 some people gave algorithms to generate MD5 collisions in 1 second on a normal computer however I assure you that this does not nullify the use of MD5 for checking file integrity and duplicacy. It is highly unlikely that you'll get two normal-use files with the same hash unless obviously you're deliberately looking for trouble and put up binary files with the same hash. If you're still getting paranoid then I advice you to use larger key-space hash functions like SHA-512.
This question already has answers here:
In-memory size of a Python structure
(7 answers)
Closed 8 years ago.
So clearly there cannot be unlimited memory in Python. I am writing a script that creates lists of dictionaries. But each list has between 250K and 6M objects (there are 6 lists).
Is there a way to actually calculate (possibly based on the RAM of the machine) the maximum memory and the memory required to run the script?
The actual issue I came across:
In running one of the scripts that populates a list with 1.25-1.5 million dictionaries, when it hits 1.227... it simply stops, but returns no error let alone MemoryError. So I am not even sure if this is a memory limit. I have print statements so I can watch what is going on, and it seems to buffer forever as nothing is printing and up until that specific section, the code is running a couple thousand lines per second. Any ideas as to what is making it stop? Is this memory or something else?
If you have that many objects you need to store, you need store them on disk, not in memory. Consider using a database.
If you import the sys module you will have access to the function sys.getsizeof(). You will have to look at each object of the list and for each dictionary compute the value for every key. For more on this see this previous question - In-memory size of a Python structure.