I am using the following code to try and find the most up to date file in a folder.
However i am getting a funny character back
The actual file is called G:\\foo\\example - YTD.zip
# Identify the file with the latest create timestamp
list_of_files = glob.glob('G:\\foo\\\\*.zip')
latest_file = max(list_of_files, key=os.path.getctime)
latest_file
'G:\\foo\\example � YTD.zip'
Can anyone help?
I am a complete Python newbie so appreciate this could be a rookie question
Related
I have some folders which contains a lot of files. They are all build up like this:
Name-0000000000.txt
Name-0000000001.txt
Name-0000000002.txt
Name-0000000003.txt
and so on.
There can be 5000000 of files like this in a folder.
I want to know now how to find out if there is one or more files missing.
I would like to just check if one consecutive number is missing, but how. I know I can check for the first and last name in that folder:
import glob
import os
list_of_files = glob.glob('K:/path_to_files/*')
first_file = min(list_of_files, key=os.path.getctime)
latest_file = max(list_of_files, key=os.path.getctime)
print(first_file)
print(latest_file)
But I have no clue how to find missing files :(
Anyone have an idea?
I have not tried this code myself but something like this should work:
import glob
import os
list_of_files = glob.glob('K:/path_to_files/*')
first_file = min(list_of_files, key=os.path.getctime)
latest_file = max(list_of_files, key=os.path.getctime)
for i in range(0,5000000): #Put the highest numbered file number here
some_file = "Name-"+str(i).zfill(10)+".txt")
if not some_file in list_of_files:
print("file: "+some_file+" is not in the list.")
This code might need some minor adjustments to work for your specific case but it should be enough to guide you in the correct direction :)
This solution only works if you know that there is only file missing.Take summation of all the file names (after removing the suffix and converting them to integers) and then subtract it from the expected sum. The result is the missing file name.
Am I doing something wrong or is finding the most recent file at a file path location supposed to be fairly slow?
The below code is taking upwards of 3 minutes. Is this expected for parsing thru a list of ~850 files?
I am using a regex pattern to find only .txt files and so after searching thru my file share location it returns a list of ~850 files. This is the list it parses thru to get the max(File) by key=os.path.getctime
I tried sort instead of max and to just grab the top file but that wasn't any faster.
import os
import glob
path='C:\Desktop\Test'
fileRegex='*.*txt'
latestFile = get_latest_file(filePath, fileRegex)
def get_latest_file(path,fileRegex):
fullpath = os.path.join(path, fileRegex)
list_of_files = glob.iglob(fullpath, recursive=True)
if not list_of_files:
latestFile=''
latestFile = max(list_of_files, key=os.path.getctime)
return latestFile
Try using os.scandir(), this speeded up my file searching massively.
I have a python script that fails at a specific line of code. I wrote it on Sublime text and had no errors. This is on Python 3.8 & 3.7.
import os
import glob
import time
<lines 4-28 of script are omitted>
list_of_files = glob.glob('Y:\\foldername\\foldername\\Reports\\*.csv')
latest_file = max(list_of_files, key=os.path.getctime)
create_time = os.path.getctime(latest_file)
When I run this script in PyCharm it fails with the following error confirming its that an issue with this specific line of code latest_file = max(list_of_files, key=os.path.getctime)
Here is the error in PyCharm
line 30, in <module>
latest_file = max(list_of_files, key=os.path.getctime) ValueError: max() arg is an empty sequence
If I remove that one line the script runs fine in Windows Task Scheduler and in IDLE and everywhere. I am going to probably just find another way to get the latest file in the specified folder.
Can you guys help me figure out why the script fails? Does the Value error mean I cannot give any parameters to max()? Or point me in the right direction to fix it. Im fairly new to python so this is a learning opportunity!
I am trying to get the file name of the latest file on a directory which has couple hundred files on a network drive.
Basically the idea is to snip the file name (its the date/time the file was downloaded, eg xyz201912191455.csv) and paste it on a config file every time the script is run.
Now the list_of_files usually run in about a second but latest_file takes about 100 seconds which is extremely slow.
Is there a faster way to extract the information about the latest file?
The code sample as below:
import os
import glob
import time
from configparser import ConfigParser
import configparser
list_of_files = glob.glob('filepath\*', recursive=True)
latest_file = max(list_of_files, key=os.path.getctime)
list_of_files2 = glob.glob('filepath\*', recursive=True)
latest_file2 = max(list_of_files2, key=os.path.getctime)
If the filenames already include the datetime, why bother getting their stat information? And if the names are like xyz201912191455.csv, one could use [-16:-4] to extract 201912191455 and as these are zero padded they will sort lexicographically in numerical order. Also recursive=True is not needed here as the pattern does not have a ** in it.
list_of_files = glob.glob('filepath\*')
latest_file = max(list_of_files, key=lambda n: n[-16:-4])
I have tried this solution:
How to get the latest file in a folder using python
The code I tried is:
import glob
import os
list_of_files = glob.glob('/path/to/folder/**/*.csv')
latest_file = max(list_of_files, key=os.path.getctime)
print (latest_file)
I received the output with respect to the Windows log of timestamp for the files.
But I have maintained a log separate for writing files in the respective sub-folder.
When I opened the log I see that the last updated file was not what the Python code has specified.
I was shocked as my complete process was depending upon the last file written.
Kindly, let me know what I can do to get the last updated file through Python
I want to read the file which is updated last, but as windows is not prioritizing the updation of the file Last modified, I am not seeing any other way out.
Does anyone has any other way to look out for it?
In linux, os.path.getctime() returns the last modified time, but on windows it returns the creation time. You need to use os.path.getmtime to get the modified time on windows.
import glob
import os
list_of_files = glob.glob('/path/to/folder/**/*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
print (latest_file)
This code should work for you.
os.path.getctime is the creation time of the file - it seems you want os.path.getmtime which is the modification time of the file, so, try:
latest_file = max(list_of_files, key=os.path.getmtime)
and see if that does what you want.