Python remove files from folder which are not in list - python

I'm looking for help in below code where I can remove the files from the folder which not available in the given csv file.
I read the input file in the pandas' data frame and convert it into the list then
reading the fileName from the folder and comparing the fileName with the available file in the folder and if it exists continue if not remove. but it is removing all the files including the not matching files.
I only want to remove the files which are not present in the file I'm reading using pandas data frame.
import os
import pandas as pd
path = "Adwords/"
flist = pd.read_csv('C:/mediaops/mapping/adword/file_name.csv')
file_name = flist['fileName'].tolist()
for filename in os.listdir(path):
print(filename)
if filename == file_name:
continue
elif filename != file_name:
os.remove(filename)

for filename in os.listdir(path):
print(filename)
if filename not in file_name:
os.remove(filename)

In your original solution, you are trying to do filename == file_name and filename != file_name, but you cannot do that.
See filename is a string and file_name is a list, and you cannot use == to compare them, you need to use membership operators like in and not in, like if filename not in file_name: which I did in my answer below
(Thanks to Tobias's Answer)
Now since that is out of the window, now you can iterate through all files using os.listdir, then use os.remove to remove the necessary files, in addition using os.path.join to get the full path of the file!
import os
#List all files in path
for filename in os.listdir(path):
#If file is not present in list
if filename not in file_name:
#Get full path of file and remove it
full_file_path = os.path.join(path, filename)
os.remove(full_file_path)

The problem is that file_name is a list if string, whereas filename is a single string, so the check filename != file_name will always be true and the file thus always be removed. Instead, use in and not in to check whether the string is (not) in the list of strings. Also, using a set would be faster. Also, those variable names are really confusing.
set_of_files = set(file_name)
for filename in os.listdir(path):
if filename not in set_of_files:
os.remove(filename)
Also, as noted in Devesh's answer, you may have to join the filename to the path in order to be able to actually remove the file.

when I implemented these answers it deleted all files in the dir, not in my list. So I wrote one for any weary traveler that may need this script. User needs to add in the path for where their files are and make a csvfile with the basename of the files that they want to keep. you can also add in the extention of the files that you want to look at if they happen to all the same.
The process is making the csv into a list based on each element in the first column and then checking to see if the files in the current dir are present in the list. If they are not then remove.
import os
import csv
import argparse
import sys
import pathlib
data_path = path = "/path/to/your/dir"
csv_guide = "filenamestokeep.csv"
csv_path = os.path.join(data_path, csv_guide)
ext = "input.your.extention.of.files.to.look.at.as.ext, like .txt"
with open(csv_path, 'r') as csvfile:
good_files = []
for n in csv.reader(csvfile):
if len(n) > 0: good_files.append(n[0])
print(good_files)
all_files = os.listdir(data_path)
for filename in all_files:
if filename.endswith(ext) and filename not in good_files:
print(filename)
full_file_path = os.path.join(data_path, filename)
print("File to delete: {} ".format(filename))
os.remove(full_file_path)
else:
print(f"Ignored -- {filename}")

Related

Edit identical line in several files [duplicate]

I need to iterate through all .asm files inside a given directory and do some actions on them.
How can this be done in a efficient way?
Python 3.6 version of the above answer, using os - assuming that you have the directory path as a str object in a variable called directory_in_str:
import os
directory = os.fsencode(directory_in_str)
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
Or recursively, using pathlib:
from pathlib import Path
pathlist = Path(directory_in_str).glob('**/*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Use rglob to replace glob('**/*.asm') with rglob('*.asm')
This is like calling Path.glob() with '**/' added in front of the given relative pattern:
from pathlib import Path
pathlist = Path(directory_in_str).rglob('*.asm')
for path in pathlist:
# because path is object not string
path_in_str = str(path)
# print(path_in_str)
Original answer:
import os
for filename in os.listdir("/path/to/dir/"):
if filename.endswith(".asm") or filename.endswith(".py"):
# print(os.path.join(directory, filename))
continue
else:
continue
This will iterate over all descendant files, not just the immediate children of the directory:
import os
for subdir, dirs, files in os.walk(rootdir):
for file in files:
#print os.path.join(subdir, file)
filepath = subdir + os.sep + file
if filepath.endswith(".asm"):
print (filepath)
You can try using glob module:
import glob
for filepath in glob.iglob('my_dir/*.asm'):
print(filepath)
and since Python 3.5 you can search subdirectories as well:
glob.glob('**/*.txt', recursive=True) # => ['2.txt', 'sub/3.txt']
From the docs:
The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
Since Python 3.5, things are much easier with os.scandir() and 2-20x faster (source):
with os.scandir(path) as it:
for entry in it:
if entry.name.endswith(".asm") and entry.is_file():
print(entry.name, entry.path)
Using scandir() instead of listdir() can significantly increase the
performance of code that also needs file type or file attribute
information, because os.DirEntry objects expose this information if
the operating system provides it when scanning a directory. All
os.DirEntry methods may perform a system call, but is_dir() and
is_file() usually only require a system call for symbolic links;
os.DirEntry.stat() always requires a system call on Unix but only
requires one for symbolic links on Windows.
Python 3.4 and later offer pathlib in the standard library. You could do:
from pathlib import Path
asm_pths = [pth for pth in Path.cwd().iterdir()
if pth.suffix == '.asm']
Or if you don't like list comprehensions:
asm_paths = []
for pth in Path.cwd().iterdir():
if pth.suffix == '.asm':
asm_pths.append(pth)
Path objects can easily be converted to strings.
Here's how I iterate through files in Python:
import os
path = 'the/name/of/your/path'
folder = os.fsencode(path)
filenames = []
for file in os.listdir(folder):
filename = os.fsdecode(file)
if filename.endswith( ('.jpeg', '.png', '.gif') ): # whatever file types you're using...
filenames.append(filename)
filenames.sort() # now you have the filenames and can do something with them
NONE OF THESE TECHNIQUES GUARANTEE ANY ITERATION ORDERING
Yup, super unpredictable. Notice that I sort the filenames, which is important if the order of the files matters, i.e. for video frames or time dependent data collection. Be sure to put indices in your filenames though!
You can use glob for referring the directory and the list :
import glob
import os
#to get the current working directory name
cwd = os.getcwd()
#Load the images from images folder.
for f in glob.glob('images\*.jpg'):
dir_name = get_dir_name(f)
image_file_name = dir_name + '.jpg'
#To print the file name with path (path will be in string)
print (image_file_name)
To get the list of all directory in array you can use os :
os.listdir(directory)
I'm not quite happy with this implementation yet, I wanted to have a custom constructor that does DirectoryIndex._make(next(os.walk(input_path))) such that you can just pass the path you want a file listing for. Edits welcome!
import collections
import os
DirectoryIndex = collections.namedtuple('DirectoryIndex', ['root', 'dirs', 'files'])
for file_name in DirectoryIndex(*next(os.walk('.'))).files:
file_path = os.path.join(path, file_name)
I really like using the scandir directive that is built into the os library. Here is a working example:
import os
i = 0
with os.scandir('/usr/local/bin') as root_dir:
for path in root_dir:
if path.is_file():
i += 1
print(f"Full path is: {path} and just the name is: {path.name}")
print(f"{i} files scanned successfully.")
Get all the .asm files in a directory by doing this.
import os
path = "path_to_file"
file_type = '.asm'
for filename in os.listdir(path=path):
if filename.endswith(file_type):
print(filename)
print(f"{path}/{filename}")
# do something below
I don't understand why some answers are complicated. This is how I would do it with Python 2.7. Replace DIRECTORY_TO_LOOP with the directory you want to use.
import os
DIRECTORY_TO_LOOP = '/var/www/files/'
for root, dirs, files in os.walk(DIRECTORY_TO_LOOP, topdown=False):
for name in files:
print(os.path.join(root, name))

Reading multiple txt files from multiple folders

I have 20 folders, each containing 50 txt files, I need to read all of them in order to compare the word counts of each folder. I know how to read multiple files in one folder, but it is slow, is there a more efficient way instead of reading the folder one by one like below?
import re
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
import os
import glob
1. folder1
folder_path = '/home/runner/Final-Project/folder1'
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
text = f.read()
print (filename)
print (len(text))
2. folder2
folder_path = '/home/runner/Final-Project/folder2'
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r') as f:
text = f.read()
print (filename)
print (len(text))
You can do something similar using glob like you have, but with the directory names.
folder_path = '/home/runner/Final-Project'
for filename in glob.glob(os.path.join(folder_path,'*','*.txt')):
# process your files
The first '*' in the os.path.join() represents directories of any name. So calling glob.glob() like this will go through and find any text file in any direct sub-directory within folder_path
Below function will return list of files in all the directories and sub-directories without using glob. Read from the list of files and open to read.
def list_of_files(dirName):
files_list = os.listdir(dirName)
all_files = list()
for entry in files_list:
# Create full path
full_path = os.path.join(dirName, entry)
if os.path.isdir(full_path):
all_files = all_files + list_of_files(full_path)
else:
all_files.append(full_path)
return all_files
print(list_of_files(<Dir Path>)) # <Dir Path> ==> your directory path

How to read all .txt files from a directory

I would like to read all the contents from all the text files in a directory. I have 4 text files in the "path" directory, and my codes are;
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
content = f.read()
thelist = content.splitlines()
f.close()
print(filepath)
print(content)
print()
When I run the codes, I can only read the contents from only one text file.
I will be thankful that there are any advice or suggestions from you or that you know any other informative inquiries for this question in stackoverflow.
If you need to filter the files' name per suffix, i.e. file extension, you can either use the string method endswith or the glob module of the standard library https://docs.python.org/3/library/glob.html
Here an example of code which save each file content as a string in a list.
import os
path = '.' # or your path
files_content = []
for filename in filter(lambda p: p.endswith("txt"), os.listdir(path)):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
files_content += [f.read()]
With the glob way here an example
import glob
for filename in glob.glob('*txt'):
print(filename)
This should list your file and you can read them one by one. All the lines of the files are stored in all_lines list. If you wish to store the content too, you can keep append it too
from pathlib import Path
from os import listdir
from os.path import isfile, join
path = "path_to_dir"
only_files = [f for f in listdir(path) if isfile(join(path, f))]
all_lines = []
for file_name in only_files:
file_path = Path(path) / file_name
with open(file_path, 'r') as f:
file_content = f.read()
all_lines.append(file_content.splitlines())
print(file_content)
# use all_lines
Note: when using with you do not need to call close() explicitly
Reference: How do I list all files of a directory?
Basically, if you want to read all the files, you need to save them somehow. In your example, you are overriding thelist with content.splitlines() which deletes everything already in it.
Instead you should define thelist outside of the loop and use thelist.append(content.splitlines) each time, which adds the content to the list each iteration
Then you can iterate over thelist later and get the data out.

Python: If filename in specified path contains string, then move to folder

New to python here.
I would like to create a script that will scan my directory and if the filename contains a certain string in it, then it will automatically move to a folder of my choice.
Have tried this, but to no luck:
import os
import shutil
import fnmatch
import glob
ffe_path = 'E:/FFE'
new_path = 'E:/FFE/Membership/letters'
keyword = 'membership'
os.chdir('E:/FFE/Membership')
os.mkdir('letters')
source_dir = 'E:/FFE'
dest_dir = 'E:/FFE/Membership/letters'
os.chdir(source_dir)
for top, dirs, files in os.walk(source_dir):
for filename in files:
if not filename.endswith('.docx'):
continue
file_path = os.path.join(top, filename)
with open(file_path, 'r') as f:
if '*membership' in f.read():
shutil.move(file_path, os.path.join(dest_dir, filename))
Any insight would be greatly appreciated.
A simple function will do the trick:
def copyCertainFiles(source_folder, dest_folder, string_to_match, file_type=None):
# Check all files in source_folder
for filename in os.listdir(source_folder):
# Move the file if the filename contains the string to match
if file_type == None:
if string_to_match in filename:
shutil.move(os.path.join(source_folder, filename), dest_folder)
# Check if the keyword and the file type both match
elif isinstance(file_type, str):
if string_to_match in filename and file_type in filename:
shutil.move(os.path.join(source_folder, filename), dest_folder)
source_folder = full/relative path of source folder
dest_folder = full/relative path of destination folder (will need to be created beforehand)
string_to_match = a string basis which the files will be copied
file_type (optional) = if only a particular file type should be moved.
You can, of course make this function even better, by having arguments for ignoring case, automatically creating a destination folder if it does not exist, copying all files of a particular filetype if no keyword is specified and so on. Furthermore, you can also use regexes to match filetypes, which will be far more flexible.
f.read reads the file. You most likely don't want to see if the string is in a file's contents. I fixed your code to look in the file's name:
import os
import shutil
import fnmatch
import glob
ffe_path = 'E:/FFE'
new_path = 'E:/FFE/Membership/letters'
keyword = 'membership'
os.chdir('E:/FFE/Membership')
os.mkdir('letters')
source_dir = 'E:/FFE'
dest_dir = 'E:/FFE/Membership/letters'
os.chdir(source_dir)
for top, dirs, files in os.walk(source_dir):
for filename in files:
if not filename.endswith('.docx'):
continue
file_path = os.path.join(top, filename)
if '*membership' in filename:
shutil.move(file_path, os.path.join(dest_dir, filename))

Want to write unique filenames and paths to text file, duplicate names will have different extensions

I am trying to write out the filepath for files with specific file extensions to a text file. There are some files that have different extensions but the same file name, and I am assuming these are duplicates and only want to retain one entry. Here is what I have for code - it is not writing anything out to the file. What am I missing?
import os
path = r'S:\Photogr\ASC'
file_ext_lst = ['.2dm','.2de','.3dm','.3de','.dgn']
txtfile = r'D:\test\microstation_filenames_paths.txt'
for dirpath, dirnames, filenames in os.walk(path):
for filename in filenames:
fullPath = os.path.join(dirpath, filename)
name = os.path.splitext(filename)[0]
if filename[-4:] in file_ext_lst:
with open(txtfile,'r+') as f:
for line in f:
if name not in line:
f.write(fullPath +'\n')
f.close()
The following code writes duplicate file names and paths to a text file.
import os
# path = r'S:\Photogr\ASC'
path = 'temp'
file_ext_lst = ['.2dm','.2de','.3dm','.3de','.dgn']
txtfile = r'D:\test\microstation_filenames_paths.txt'
found = dict()
for dirpath, _, filenames in os.walk(path):
for filename in filenames:
fullPath = os.path.join(dirpath, filename)
name,ext = os.path.splitext(filename)
if ext not in file_ext_lst:
continue
if name not in found:
found[name] = fullPath
with open('unique.txt', 'w') as outf:
print >>outf, 'Unique files:'
for name,path in found.iteritems():
print >>outf, '{:<10} {}'.format(name,path)
Disclaimer: I haven't tried creating some sample files and testing the code - if my first two suggestions do not help, I can try and look further!
You can remove f.close() after with open(), Python does that automatically for you.
You can also simplify the with open() block to:
with open(txtfile,'r+') as f:
if name not in f.read():
f.write(fullPath +'\n')
On another note: Opening and writing to your text file could happen a lot, which would be very slow - I would suggest storing your candidates in an array first and writing that to the text file only after the os.walk() part.

Categories

Resources