Shutil find and remove files - python

I am trying to automate some work which is currently done by hand.
The aim is to find all the documents which have, for example, the number 408710 in their file name. Please note that the file name does also include other letters or figures. An example could be 2rsgf54087105f85sfr. The program should now search for all the files which own the combination 408710 and then move them into the right path.
I do know how to move the files, but so far I am only able to move the files by entering the exact file name. In that case I do only have one file and not all the files with the mentioned combination. Of course I do not know the exact file name in advance anyway.
Here the code for the stuff which is working:
import shutil
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
dst = "C:/Users/Startklar/Desktop/Empfangsordner/Sven"
dst2 = "C:/Users/Startklar/Desktop/Empfangsordner/Gerald"
# remove files
shutil.move(src=src + "/AA023300408710LFVI.docx", dst=dst)
shutil.move(src=src + "/BB023310187105ADIK.docx", dst=dst2)

If you just want to remove the files you can do it like this using regexp:
import os
import re
regexp = r'yourPattern.*\.docx$'
res = [f for f in os.listdir(path) if re.search(regexp , f)]
for f in res:
print('Remove: '+f)
os.remove(f)
You will need to find a regular expression which only finds all the files you would like to remove.
If you want infact move the files, like in your example, this looks like this (just guessing the regexp from your example)
import os
import re
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
filters = [["C:/Users/Startklar/Desktop/Empfangsordner/Sven", r'.*LFVI\.docx$'],
["C:/Users/Startklar/Desktop/Empfangsordner/Gerald", r'.*ADIK\.docx$']]
for f in os.listdir(src):
for dst,regexp in filters:
if re.search(regexp , f):
shutil.move(src=f, dst=dst)

Related

Segregate files based on filename

I've got a directory containing multiple images, and I need to separate them into two folders based on a portion of the file name. Here's a sample of the file names:
22DEC167603520981127600_03.jpg
13NOV162302999230157801_07.jpg
08JAN147603811108236510_02.jpg
21OCT152302197661710099_07.jpg
07MAR172302551529900521_01.jpg
19FEB173211074174309177_09.jpg
19FEB173211881209232440_02.jpg
19FEB172302491000265198_04.jpg
I need to move the files into two folders according to the numbers in bold after the date - so files containing 2302 and 3211 would go into an existing folder named "panchromatic" and files with 7603 would go into another folder named "sepia".
I've tried multiple examples from other questions, and none seem to fit this problem. I'm very new to Python, so I'm not sure what example to post. Any help would be greatly appreciated.
You can do this the easy way or the hard way.
Easy way
Test if your filename contains the substring you're looking for.
import os
import shutil
files = os.listdir('.')
for f in files:
# skip non-jpeg files
if not f.endswith('.jpg'):
continue
# move if panchromatic
if '2302' in f or '3211' in f:
shutil.move(f, os.path.join('panchromatic', f))
# move if sepia
elif '7603' in f:
shutil.move(f, os.path.join('sepia', f))
# notify if something else
else:
print('Could not categorize file with name %s' % f)
This solution in its current form is susceptible to mis-classification, as the number we're looking for can appear by chance later in the string. I'll leave you to find ways to mitigate this.
Hard way
Regular expressions. Match the four letter digits after the date with a regular expression. Left for you to explore!
Self explanative, with Python 3, or Python 2 + backport pathlib:
import pathlib
import shutil
# Directory paths. Tailor this to your files layout
# see https://docs.python.org/3/library/pathlib.html#module-pathlib
source_dir = pathlib.Path('.')
sepia_dir = source_dir / 'sepia'
panchro_dir = source_dir / 'panchromatic'
assert sepia_dir.is_dir()
assert panchro_dir.is_dir()
destinations = {
('2302', '3211'): panchro_dir,
('7603',): sepia_dir
}
for filename in source_dir.glob('*.jpg'):
marker = str(filename)[7:11]
for key, value in destinations.items():
if marker in key:
filepath = source_dir / filename
shutil.move(str(filepath), str(value))

In Python, How do I check whether a file exists starting or ending with a substring?

I know about os.path.isfile(fname), but now I need to search if a file exists that is named FILEnTEST.txt where n could be any positive integer (so it could be FILE1TEST.txt or FILE9876TEST.txt)
I guess a solution to this could involve substrings that the filename starts/ends with OR one that involves somehow calling os.path.isfile('FILE' + n + 'TEST.txt') and replacing n with any number, but I don't know how to approach either solution.
You would need to write your own filtering system, by getting all the files in a directory and then matching them to a regex string and seeing if they fail the test or not:
import re
pattern = re.compile("FILE\d+TEST.txt")
dir = "/test/"
for filepath in os.listdir(dir):
if pattern.match(filepath):
#do stuff with matching file
I'm not near a machine with Python installed on it to test the code, but it should be something along those lines.
You can use a regular expression:
/FILE\d+TEST.txt/
Example: regexr.com.
Then you can use said regular expression and iterate through all of the files in a directory.
import re
import os
filename_re = 'FILE\d+TEST.txt'
for filename in os.listdir(directory):
if re.search(filename_re, filename):
# this file has the form FILEnTEST.txt
# do what you want with it now
You can also do it as such:
import os
import re
if len([file for file in os.listdir(directory) if re.search('regex', file)]):
# there's at least 1 such file

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

Python - Shutil.copy vs fnmatch vs regular expression

I have a directory with multiple files beginning with integers. I am attempting to copy some of them to another directory based on a string pattern within the file name. I can successfully copy multiple files starting with integers (which I commented out), but am having trouble filtering based on the string pattern. I'm using shutil.copy, but am having trouble in determining whether to use regex or fnmatch.
My code below filters correctly, but still copies all files, not files with the specific string 'TEST_Payroll'. Any help to do this would be appreciated. Thanks!!
import re
import os
import fnmatch
import shutil
src_files = os.listdir('C:/Users/acars/Desktop/a')
regex_txt = 'TEST_Payroll'
source = 'C:/Users/acars/Desktop/a'
dest1 = 'C:/Users/acars/Desktop/b'
for file_name in src_files:
#if not file_name.startswith(('0','1','2','3','4','5','6','7','8','9',)):
if fnmatch.filter(file_name, 'TEST_Payroll'):
continue
src = os.path.join(source, file_name)
dst = os.path.join(dest1, file_name)
shutil.copy(src, dst)
How about using,
if re.search(r'TEST_Payroll',file_name):
#do something with file
else:
#else do nothing

Python - Strip all drive letters from csv file and replace with Z:

Here is the code example. Basically output.csv needs to remove any drive letter A:-Y: and replace it with Z: I tried to do this with a list (not complete yet) but it generates the error: TypeError: expected a character buffer object
#!/usr/bin/python
import os.path
import os
import shutil
import csv
import re
# Create the videos directory in the current directory
# If the directory exists ignore it.
#
# Moves all files with the .wmv extenstion to the
# videos folder for file structure
#
#Crawl the videos directory then change to videos directory
# create the videos.csv file in the videos directory
# replace any drive letter A:-Y: with Z:
def createCSV():
directory = "videos"
if not os.path.isdir("." + directory + "/"):
os.mkdir("./" + directory + "/")
for file in os.listdir("./"):
if os.path.splitext(file)[1] == ".wmv":
shutil.move(file, os.path.join("videos", file))
listDirectory = os.listdir("videos")
os.chdir(directory)
f = open("videos.csv", "w")
f.writelines(os.path.join(os.getcwd(), f + '\n') for f in listDirectory)
f = open('videos.csv', 'r')
w = open('output.csv', 'w')
f_cont = f.readlines()
for line in f_cont:
regex = re.compile("\b[GHI]:")
re.sub(regex, "Z:", line)
w.write(line)
f.close()
createCSV()
EDIT:
I think my flow/logic is wrong, the output.csv file that gets created still G: in the .csv it was not renamed to Z:\ from the re.sub line.
I can see you use some pythonic snippets, with smart uses of path.join and a commented code. This can get even better, let's rewrite a few things so we can solve your drive letters issue, and gain a more pythonic code on the way :
#!/usr/bin/env python
# -*- coding= UTF-8 -*-
# Firstly, modules can be documented using docstring, so drop the comments
"""
Create the videos directory in the current directory
If the directory exists ignore it.
Moves all files with the .wmv extension to the
videos folder for file structure
Crawl the videos directory then change to videos directory
create the videos.csv file in the videos directory
create output.csv replace any drive letter A:-Y: with Z:
"""
# not useful to import os and os.path as the second is contain in the first one
import os
import shutil
import csv
# import glob, it will be handy
import glob
import ntpath # this is to split the drive
# don't really need to use a function
# Here, don't bother checking if the directory exists
# and you don't need add any slash either
directory = "videos"
ext = "*.wmv"
try :
os.mkdir(directory)
except OSError :
pass
listDirectory = [] # creating a buffer so no need to list the dir twice
for file in glob.glob(ext): # much easier this way, isn't it ?
shutil.move(file, os.path.join(directory, file)) # good catch for shutil :-)
listDirectory.append(file)
os.chdir(directory)
# you've smartly imported the csv module, so let's use it !
f = open("videos.csv", "w")
vid_csv = csv.writer(f)
w = open('output.csv', 'w')
out_csv = csv.writer(w)
# let's do everything in one loop
for file in listDirectory :
file_path = os.path.abspath(file)
# Python includes functions to deal with drive letters :-D
# I use ntpath because I am under linux but you can use
# normal os.path functions on windows with the same names
file_path_with_new_letter = ntpath.join("Z:", ntpath.splitdrive(file_path)[1])
# let's write the csv, using tuples
vid_csv.writerow((file_path, ))
out_csv.writerow((file_path_with_new_letter, ))
It seems like the problem is in the loop at the bottom of your code. The string's replace method doesn't receive a list as its first arguments, but another string. You need to loop through your removeDrives list and call line.remove with every item in that list.
You could use
for driveletter in removedrives:
line = line.replace(driveletter, 'Z:')
thereby iterating over your list and replacing one of the possible drive letters after the other. As abyx wrote, replace expects a string, not a list, so you need this extra step.
Or use a regular expression like
import re
regex = re.compile(r"\b[FGH]:")
re.sub(regex, "Z:", line)
Additional bonus: Regex can check that it's really a drive letter and not, for example, a part of something bigger like OH: hydrogen group.
Apart from that, I suggest you use os.path's own path manipulation functions instead of trying to implement them yourself.
And of course, if you do anything further with the CSV file, take a look at the csv module.
A commentator above has already mentioned that you should close all the files you've opened. Or use with with statement:
with open("videos.csv", "w") as f:
do_stuff()

Categories

Resources