Get the last written file from the series of the sub-folders - python

I have tried this solution:
How to get the latest file in a folder using python
The code I tried is:
import glob
import os
list_of_files = glob.glob('/path/to/folder/**/*.csv')
latest_file = max(list_of_files, key=os.path.getctime)
print (latest_file)
I received the output with respect to the Windows log of timestamp for the files.
But I have maintained a log separate for writing files in the respective sub-folder.
When I opened the log I see that the last updated file was not what the Python code has specified.
I was shocked as my complete process was depending upon the last file written.
Kindly, let me know what I can do to get the last updated file through Python
I want to read the file which is updated last, but as windows is not prioritizing the updation of the file Last modified, I am not seeing any other way out.
Does anyone has any other way to look out for it?

In linux, os.path.getctime() returns the last modified time, but on windows it returns the creation time. You need to use os.path.getmtime to get the modified time on windows.
import glob
import os
list_of_files = glob.glob('/path/to/folder/**/*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
print (latest_file)
This code should work for you.

os.path.getctime is the creation time of the file - it seems you want os.path.getmtime which is the modification time of the file, so, try:
latest_file = max(list_of_files, key=os.path.getmtime)
and see if that does what you want.

Related

Python Finding the newest file in a folder using glob

I am using the following code to try and find the most up to date file in a folder.
However i am getting a funny character back
The actual file is called G:\\foo\\example - YTD.zip
# Identify the file with the latest create timestamp
list_of_files = glob.glob('G:\\foo\\\\*.zip')
latest_file = max(list_of_files, key=os.path.getctime)
latest_file
'G:\\foo\\example � YTD.zip'
Can anyone help?
I am a complete Python newbie so appreciate this could be a rookie question

Open and read latest json file one time only

SO members...how can i read latest json file in a directory one time only (if no new file print something). So far I can only read the latest file ...The sample script (run every 45mins) below open and read latest json file in a directory. In this case latest file is file3.json (json file created every 30mins). Thus, if file4 is not created for some reason (for example server fail to create new json file). If the script run again.. it will still read the same last file3.
files in directory
file1.json
file2.json
file3.json
The script below able to open and read latest json file created in the directory.
import glob
import os
import os.path
import datetime, time
listFiles = glob.iglob('logFile/*.json')
latestFile = max(listFiles, key=os.path.getctime)
with open(latestFile, 'r') as f:
mydata = json.load(f)
print(mydata)
To ensure the script will only read newest file and read the newest file one time only...aspect something below:-
listFiles = glob.iglob('logFile/*.json')
latestFile = max(listFiles, key=os.path.getctime)
if latestFile newer than previous open/read file: # Not sure to compare the latest file with the previous file.
with open(latestFile, 'r') as f:
mydata = json.load(f)
print(mydata)
else:
print("no new file created")
Thank you for your help. Example solution would be good to share.
I can't figure out the solution...seems simple but few days try n error without any luck.
(1)Make sure read latest file in directory
(2)Make sure read file/s that may miss to read (due to script fail to run)
(3)Only read once all the files and if no new file give warning.
Thank you.
After SO discussion and suggestion, I got few methods to resolve or at least to accommodate some of the requirement. I just move files that have been processed. If no file create, script will run nothing and if script fail and once normalize it will run and read all related files available. I think its good for now. Thank you guyz...
Below is the answer rather an approach, I would like to propose:
The idea is as follows:
Every log file that is written to a directory can have a key-val in it called "creation_time": timestamp (fileX.json that gets stored in the server). Now, your script runs at 45min to obtain the file which is dumped to a directory. In normal cases, you must be able to read the file, and finally, when you exit the script you can store the last read filename and the creation_time taken from the fileX.json into a logger.json.
An example for a logger.json is as follows:
{
"creation_time": "03520201330",
"file_name": "file3.json"
}
Whenever a server fail or any delay occurs, there could be a rewritten of the fileX.json or new fileX's.json would have been created in the directory. In these situations, you would first open the logger.json and obtain both the timestamp and last filename as shown in the example above. By using the last filename, you can compare the old timestamp that is present in logger with the new timestamp in fileX.json. If they match basically there is no change you only read ahead files and rewrite the logger.
If that is not the case you would re-read the last fileX.json again and proceed to read other ahead files.

Getting the latest file name etc using glob.glob & max (os.path.getctime)

I am trying to get the file name of the latest file on a directory which has couple hundred files on a network drive.
Basically the idea is to snip the file name (its the date/time the file was downloaded, eg xyz201912191455.csv) and paste it on a config file every time the script is run.
Now the list_of_files usually run in about a second but latest_file takes about 100 seconds which is extremely slow.
Is there a faster way to extract the information about the latest file?
The code sample as below:
import os
import glob
import time
from configparser import ConfigParser
import configparser
list_of_files = glob.glob('filepath\*', recursive=True)
latest_file = max(list_of_files, key=os.path.getctime)
list_of_files2 = glob.glob('filepath\*', recursive=True)
latest_file2 = max(list_of_files2, key=os.path.getctime)
If the filenames already include the datetime, why bother getting their stat information? And if the names are like xyz201912191455.csv, one could use [-16:-4] to extract 201912191455 and as these are zero padded they will sort lexicographically in numerical order. Also recursive=True is not needed here as the pattern does not have a ** in it.
list_of_files = glob.glob('filepath\*')
latest_file = max(list_of_files, key=lambda n: n[-16:-4])

How can I check to see if a file exists before proceeding using Python

I have some code that will find the newest file in a directory and append a time stamp to the file name. It works great as long as there is a file in the directory to rename. If there isn't I am getting:
"ValueError: max() arg is an empty sequence"
Here's my code:
import os
import glob
import datetime
now = datetime.datetime.now()
append = now.strftime("%H%M%S")
newest = max(glob.iglob('1234_fileName*.LOG'), key=os.path.getmtime)
newfile = (append+"_"+newest)
os.rename(newest, newfile)
Any suggestions for simplifying the code would be appreciated as well as explaining how to only run if a "1234_fileName*.LOG" (note the wildcard) file is detected.
What I need this program to do is run periodically (I can use task scheduler for that) and check for a new file. If there is a new file append the hours, minutes and seconds to it's name.
Thanks!
You could use glob.glob() that returns a list instead of glob.iglob() that returns an iterator:
files = glob.glob('1234_fileName*.LOG')
if files:
newest = max(files, key=os.path.getmtime)
newfile = append + "_" + newest
os.rename(newest, newfile)
Both glob() and iglob() use os.listdir() under the hood so there is no difference for a single directory.
max() is complaining that you're asking for the largest of 0 items, and throwing a ValueError. You'll have to catch it. This will continue to throw any IOErrors that might occur:
import os, glob, datetime
try:
app = datetime.datetime.now().strftime("%H%M%S")
newest = max(glob.iglob("1234_filename*.LOG"), key=os.path.getmtime)
newfile = (app + "_" + newest)
os.rename(newest, newfile)
except ValueError:
pass
os.access allows you to check access rights before operations. An example is right under the link.
Also, it's fine to just do things inside a try .. except IOError.

Python - Sort files in directory and use latest file in code

Long time reader, first time poster. I am very new to python and I will try to ask my question properly.
I have posted a snippet of the .py code I am using below. I am attempting to get the latest modified file in the current directory to be listed and then pass it along later in the code.
This is the error I get in my log file when I attempt to run the file:
WindowsError: [Error 2] The system cannot find the file specified: '05-30-2012_1500.wav'
So it appears that it is in fact pulling a file from the directory, but that's about it. And actually, the file that it pulls up is not the most recently modified file in that directory.
latest_page = max(os.listdir("/"), key=os.path.getmtime)
cause = channel.FilePlayer.play(latest_page)
os.listdir returns the names of files, not full paths to those files. Generally, when you use os.listdir(SOME_DIR), you then need os.path.join(SOME_DIR, fname) to get a path you can use to work with the file.
This might work for you:
files = [os.path.join("/", fname) for fname in os.listdir("/")]
latest = max(files, key=os.path.getmtime)
cause = channel.FilePlayer.play(latest)

Categories

Resources