Search for missing files in folder

Search for missing files in folder - python

I have some folders which contains a lot of files. They are all build up like this:
Name-0000000000.txt
Name-0000000001.txt
Name-0000000002.txt
Name-0000000003.txt
and so on.
There can be 5000000 of files like this in a folder.
I want to know now how to find out if there is one or more files missing.
I would like to just check if one consecutive number is missing, but how. I know I can check for the first and last name in that folder:
import glob
import os
list_of_files = glob.glob('K:/path_to_files/*')
first_file = min(list_of_files, key=os.path.getctime)
latest_file = max(list_of_files, key=os.path.getctime)
print(first_file)
print(latest_file)
But I have no clue how to find missing files :(
Anyone have an idea?

I have not tried this code myself but something like this should work:
import glob
import os
list_of_files = glob.glob('K:/path_to_files/*')
first_file = min(list_of_files, key=os.path.getctime)
latest_file = max(list_of_files, key=os.path.getctime)
for i in range(0,5000000): #Put the highest numbered file number here
some_file = "Name-"+str(i).zfill(10)+".txt")
if not some_file in list_of_files:
print("file: "+some_file+" is not in the list.")
This code might need some minor adjustments to work for your specific case but it should be enough to guide you in the correct direction :)

This solution only works if you know that there is only file missing.Take summation of all the file names (after removing the suffix and converting them to integers) and then subtract it from the expected sum. The result is the missing file name.

Related

Have to run glob.glob twice to work (Python3)

I am trying to grab all of the mp3 files in my Downloads directory (after procedurally downloading them) and move them to a new file. However, anytime I try to use glob to grab a list of the available .mp3 files, I have to glob twice for it to work properly (the first time it is running it returns an empty list). Does anyone know what I am doing wrong here?
import glob
import os
import shutil
newpath = r'localpath/MP3s'
if not os.path.exists(newpath):
os.makedirs(newpath)
list_of_files = glob.glob('localpath/Downloads/*.mp3')
for i in list_of_files:
shutil.move(i, newpath)

This turned out to be a timing issue. The files I was trying to access were still in the process of downloading, with is why the glob was returning empty. I inserted a time.sleep(5) before the glob, and it is now running smoothly.

May I suggest an alternate approach
from pathlib import Path
from shutil import move
music = Path("./soundtrack")
# you can include absolute paths too...
newMusic = Path("./newsoundtrack")
# makes new folder specified in newMusic
newMusic.mkdir(exist_ok=True)
# will select all
list_of_files = music.glob("*")
# u want to select only mp3's do:
# list_of_files = music.glob("*.mp3")
for file in list_of_files:
move(str(file), str(newMusic))

Getting the latest file name etc using glob.glob & max (os.path.getctime)

I am trying to get the file name of the latest file on a directory which has couple hundred files on a network drive.
Basically the idea is to snip the file name (its the date/time the file was downloaded, eg xyz201912191455.csv) and paste it on a config file every time the script is run.
Now the list_of_files usually run in about a second but latest_file takes about 100 seconds which is extremely slow.
Is there a faster way to extract the information about the latest file?
The code sample as below:
import os
import glob
import time
from configparser import ConfigParser
import configparser
list_of_files = glob.glob('filepath\*', recursive=True)
latest_file = max(list_of_files, key=os.path.getctime)
list_of_files2 = glob.glob('filepath\*', recursive=True)
latest_file2 = max(list_of_files2, key=os.path.getctime)

If the filenames already include the datetime, why bother getting their stat information? And if the names are like xyz201912191455.csv, one could use [-16:-4] to extract 201912191455 and as these are zero padded they will sort lexicographically in numerical order. Also recursive=True is not needed here as the pattern does not have a ** in it.
list_of_files = glob.glob('filepath\*')
latest_file = max(list_of_files, key=lambda n: n[-16:-4])

Python delete file in multiple folders

In multiple folders I have a file called _status.json
e.g.:
C:\Users\Me\.fscrawler\Folder1\_status.json
C:\Users\Me\.fscrawler\Folder2\_status.json
....
C:\Users\Me\.fscrawler\*\_status.json
I want to write a short python code, to delete all those files.
I already tried the following code, but it does not work. I dont know why, but I think the solution is pretty easy
import os
os.remove(C:\Users\Me\.fscrawler\*\_status.json)

You will have to walk through all the subfolders to find and delete the file.
for root, dirs, files in os.walk(folder_path):
for name in files:
if name == '_status.json':
#delete the file

I would look into the glob module, and use it to find the files:
example:
import glob
relative_path_to_files = glob.glob('**/_status.json', recursive=True)
then you can operate on the list as you wish :)
Edit:
relative_path_to_files is a list, so you have to iterate over its elements and operate on them:
here is a complete example to find all _status.json in the current directory and its sub-tree recursively:
import glob
import os
for f in glob.glob('**/_status.json', recursive=True):
os.remove(f)

Renaming multiple files at once with Python

I am new to programming. I usually learn for a while , then take a long break and forget most of what I learned. Nevermind that back info.
I tried to create a function which would help me rename files inside a folder and then add an increment at the end of the new name (e.g. blueberry1, blueberry 2,...)
import os
def rename_files(loc,new_name):
file_list= os.listdir(loc)
for file_name in file_list:
count=1
if count <= len(file_list):
composite_name = new_name+str(count)
os.rename(file_name, composite_name)
count+= 1
Well apparently this code doesn't work. Any idea how to fix it?

You need to join the file to the path:
os.rename(os.path.join(loc, file_name), composite_name)
You can also use enumerate for the count:
import os
def rename_files(loc,new_name):
file_list= os.listdir(loc)
for ind, file_name in enumerate(file_list,1):
composite_name = new_name+str(ind)
os.rename(os.path.join(loc, file_name), os.path.join(loc, composite_name)
listdir just returns the file names, not the path so python would have no way of knowing where the original file actually came from unless your cwd was the same directory.

How to search the file with recent timestamp using Python?

I have following kind of n number of files in a folder.
log_20140114-10-43-20_5750.txt
log_20140114-10-43-23_5750.txt
log_20140114-10-43-25_5750.txt
here the only variation in all the above files are timestamp only. but I need the files which is having the latest timestamp. I mean i need the "log_20140114-10-43-25_5750.txt" file only.
I am very new to python.Please help me

import os
import re
r = re.compile(r'log_\d{8}-\d{2}-\d{2}-\d{2}_\d{4}\.txt$')
latest_file = \
max(filter(r.search,os.listdir('/path/to/logs')))
print(latest_file)
(Edited to include filtering the list of files and take #abarnert's efficiency advice.)

If file name already have timestamp in it you can sort names and use last name from list:
lst = os.listdir('.')
lst.sort()
print(lst[-1])

import glob
filelist = glob.glob('./log*.txt')
filelist.sort()
print filelist[-1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search for missing files in folder - python

This solution only works if you know that there is only file missing.Take summation of all the file names (after removing the suffix and converting them to integers) and then subtract it from the expected sum. The result is the missing file name.

Related

Have to run glob.glob twice to work (Python3)

Getting the latest file name etc using glob.glob & max (os.path.getctime)

Python delete file in multiple folders

Renaming multiple files at once with Python

How to search the file with recent timestamp using Python?

Categories

Resources