I'm trying to write a Python script that runs on Windows. Files are copied to a folder every few seconds, and I'm polling that folder every 30 seconds for names of the new files that were copied to the folder after the last poll.
What I have tried is to use one of the os.path.getXtime(folder_path) functions and compare that with the timestamp of my previous poll. If the getXtime value is larger than the timestamp, then I work on those files.
I have tried to use the function os.path.getctime(folder_path), but that didn't work because the files were created before I wrote the script. I tried os.path.getmtime(folder_path) too but the modified times are usually smaller than the poll timestamp.
Finally, I tried os.path.getatime(folder_path), which works for the first time the files were copied over. The problem is I also read the files once they were in the folder, so the access time keeps getting updated and I end up reading the same files over and over again.
I'm not sure what is a better way or function to do this.
You've got a bit of an XY problem here. You want to know when files in a folder change, you tried a homerolled solution, it didn't work, and now you want to fix your homerolled solution.
Can I suggest that instead of terrible hackery, you use an existing package designed for monitoring for file changes? One that is not a polling loop, but actually gets notified of changes as they happen? While inotify is Linux-only, there are other options for Windows.
Related
Okay, so I'm looking for an easy way to check if the contents of files in a folder have changed. And if one did change, it updates the version of that file.
I'm guessing this is what is called logging? I'm am completely new to this, so is a bit hard to explain the concept of what I'm looking for. I'll give an example:
Let's I have a reference folder that contains my original data.
Then for every time I run my code it inspects the contents of the files in said reference folder.
If the contents are the same, then it the code continues to run normally.
But if the contents of the files have changed, it updates the version of that file (for example: from '1.0.0' to '1.0.1') and keeps a copy of the changes.
Is there a way to do this in python or a module that helps me accomplish this? Or where can I start looking into this?
I have created a program using python which moves goes through each file in my downloads directory, and moves that file to another directory based on the suffix at the end of the file (.mp3, .mp4, .txt, .jpg, ...). How would I go about automating this program so that it runs in the background of my computer every couple of hours?
What you are referring to is often called a "cronjob". Python has a module called python-crontab that can do this type of thing. Here is a tutorial to help you get started.
I do not know much about python, but in nodejs, there is a tool called cronjob. When you set routine, according to the time that you set, it calls scripts. Maybe there is an equivalent version in python.
I'm trying to record some stats on a script I'm running in python (a few percentages, less than 12 characters worth). I want it to be efficient. I want the stats to keep being updated as the script runs so that if the script were to exit I still have the stats available to be updated when I run the script again.
I've thought of ways such as recording in a csv (seems inefficient since there looks to be no functionality to keep updating the same row ), updating the title of a file within the system. But can think of nothing which is as clean and efficient as I was hoping. Any ideas?
You could store it in a .txt file if you'd like really, or anything. Using the built in module for .csv files in Python, overwriting rows directly - rather than having to recreate the file - isn't possible AFAIK. Check out the sqlite3 module for storing the information in a database.
https://docs.python.org/3/library/sqlite3.html
I have a directory that is full of potentially millions of files. These files "mark" themselves when used, and then my Python program wants to find the "marked" ones then record that they were marked and unmark them. They are individual html files so they can't easily communicate with the python program themselves during this marking process (the user will just open whatever ones they choose).
Because they are marked when used, if I access them by modification date, one at a time, once I reach one that isn't marked I can stop (or at least once I get to one that was modified a decent amount of time in the future). However, all ways I've seen of doing this so far require accessing every file's metadata at least once, and then sorting this data, which isn't ideal with the magnitude of files I have. Note that this check occurs during an update step which occurs every 5 seconds or so combined with other work and so the time ideally needs to be independent of the number of files in the directory.
So is there a way to traverse a directory in order of modification date without visiting all files's medatada's at least once in Python?
No, I don't think there is a way to fetch file names in chunks sorted by modification dates.
You should use file system notifications to know about modified files.
For example use https://github.com/gorakhargosh/watchdog or https://github.com/seb-m/pyinotify/wiki
I have been given the task of converting a C++ script into python script. The aim of the script is to loop through all the directories (by start and end date) in the mediaDB and calculate what size the zipfile is going to be. I am stuck at getting the for loop to go through the directories, its so different in python from C++ which i have more experience in. Could anyone offer any suggestions?
C++ Code
// This will loop over each core files directory and sum the file size.
directory_iterator dirIt(mediaDBCoreFilesDir);
for (directory_iterator dirIt(mediaDBCoreFilesDir);dirIt!=directory_iterator();dirIt++)
Also if anyone has any ideas as to how to get the last update timestamp from a file in python that would be very much appreciated. The C++ code is:
// Get the last update timestamp from the file
std::time_t t = last_write_time(*dirIt);
ptime fileTimeStamp = from_time_t(t);
EDIT: i am trying to write a for statement firstly that loops over all directories and sums up the file size of them. I dont need to edit, delete or print any directories, just get the file sizes. Is it then more appropriate to use os.walk and os.path.getsize?
Secondly, i need to retrieve the last updated timestamp from the files.
Tho i dont really understand this process of getting the timestamp.
You're looking for os.path.walk or glob.glob for enumerating files in a directory subtree, and os.stat (or os.lstat) for getting the timestamp of the most recent modification.