Python - Shutil.copy vs fnmatch vs regular expression - python

I have a directory with multiple files beginning with integers. I am attempting to copy some of them to another directory based on a string pattern within the file name. I can successfully copy multiple files starting with integers (which I commented out), but am having trouble filtering based on the string pattern. I'm using shutil.copy, but am having trouble in determining whether to use regex or fnmatch.
My code below filters correctly, but still copies all files, not files with the specific string 'TEST_Payroll'. Any help to do this would be appreciated. Thanks!!
import re
import os
import fnmatch
import shutil
src_files = os.listdir('C:/Users/acars/Desktop/a')
regex_txt = 'TEST_Payroll'
source = 'C:/Users/acars/Desktop/a'
dest1 = 'C:/Users/acars/Desktop/b'
for file_name in src_files:
#if not file_name.startswith(('0','1','2','3','4','5','6','7','8','9',)):
if fnmatch.filter(file_name, 'TEST_Payroll'):
continue
src = os.path.join(source, file_name)
dst = os.path.join(dest1, file_name)
shutil.copy(src, dst)

How about using,
if re.search(r'TEST_Payroll',file_name):
#do something with file
else:
#else do nothing

Related

Create folders with file name and rename part of it

i have some pdfs files that i need to create folders with part of your name and move the pdfs to the folders.
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
for file in glob.glob("*.pdf"):
dst = cwd + "\\" + file.replace(str(padrao), '').replace('P', '')
os.mkdir(dst)
shutil.move(file, dst)
ex: I have the file P9883231_V1_A0V0_T07-54-369-664_S00001.pdf, P9883231_V1_A0V0_T07-54-369-664_S00002.pdf and
P1235567_V1_A0V0_T07-54-369-664_S00001.pdf.
In this example I need the script to create two folders: 9883231 and 1234567. (the part in italics must be the name of the folder)
notice that in my code I remove the unwanted parts to create the folder, the 'P' at the beginning and part of padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
The problem is that at the end of the padrao the number can be variable, the file can end with "02.pdf" , "03.pdf"
In the example I mentioned above, the folder 9883231 should contain both files.
Regular expressions can do the trick here:
import re
import os
import glob
import shutil
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S000'
for file in glob.glob("*.pdf"):
dst = os.path.join(cwd, re.findall("P(.*)" + padrao + "\d{2}.pdf", file)[0])
os.mkdir(dst)
shutil.move(file, dst)
Notice that I remove the part of padrao that varies.
The regex matches all strings that begin ith a P, followed by the padrao string value, followed by 2 digits, followed by .pdf; and takes the first occurence (no check is made wether it found anything here ...)
Also, it is better practice to use os.path.join() to avoid issues when creating path strings (when whanging os notably)

Shutil find and remove files

I am trying to automate some work which is currently done by hand.
The aim is to find all the documents which have, for example, the number 408710 in their file name. Please note that the file name does also include other letters or figures. An example could be 2rsgf54087105f85sfr. The program should now search for all the files which own the combination 408710 and then move them into the right path.
I do know how to move the files, but so far I am only able to move the files by entering the exact file name. In that case I do only have one file and not all the files with the mentioned combination. Of course I do not know the exact file name in advance anyway.
Here the code for the stuff which is working:
import shutil
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
dst = "C:/Users/Startklar/Desktop/Empfangsordner/Sven"
dst2 = "C:/Users/Startklar/Desktop/Empfangsordner/Gerald"
# remove files
shutil.move(src=src + "/AA023300408710LFVI.docx", dst=dst)
shutil.move(src=src + "/BB023310187105ADIK.docx", dst=dst2)
If you just want to remove the files you can do it like this using regexp:
import os
import re
regexp = r'yourPattern.*\.docx$'
res = [f for f in os.listdir(path) if re.search(regexp , f)]
for f in res:
print('Remove: '+f)
os.remove(f)
You will need to find a regular expression which only finds all the files you would like to remove.
If you want infact move the files, like in your example, this looks like this (just guessing the regexp from your example)
import os
import re
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
filters = [["C:/Users/Startklar/Desktop/Empfangsordner/Sven", r'.*LFVI\.docx$'],
["C:/Users/Startklar/Desktop/Empfangsordner/Gerald", r'.*ADIK\.docx$']]
for f in os.listdir(src):
for dst,regexp in filters:
if re.search(regexp , f):
shutil.move(src=f, dst=dst)

Moving Files: Matching Partial File/Directory Criteria (lastName, firstName) - Glob, Shutil

EDIT: ANSWER Below is the answer to the question. I will leave all subsequent text there just to show you how difficult I made such an easy task..
from pathlib import Path
import shutil
base = "C:/Users/Kenny/Documents/Clients"
for file in Path("C:/Users/Kenny/Documents/Scans").iterdir():
name = file.stem.split('-')[0].rstrip()
subdir = Path(base, name)
if subdir.exists():
dest = Path(subdir, file.name)
shutil.move(file, dest)
Preface:
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching client's name. This question is linked below - a very kind person, Elis Byberi, helped assist me in correcting my original code. I'm encountering another problem though..
To see our discussion and a similar question discussed:
-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
Python move files from directories that match given criteria to new directory
Question: How can you move all of the named files in :/Scans to their appropriately matched folder in :/Clients.
Background: Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed (I tried writing a program to auto-rename.. didn't work) based on client and content, such that the folder encloses PDFs labeled as follows:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/C drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, named similar to the pattern above, as 'lastName, firstName'
EDIT: The code below will move the entire Scans folder to the Clients folder, which is close, but not exactly what I need to be doing. I only need to move the files within Scans to the corresponding Client fold names.
import glob
import shutil
import os
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
os.chdir("C:/Users/Kenny/Documents/Clients")
pattern = '*,*'
for x in glob.glob(pattern):
fileName = os.path.join(source, x)
print(fileName)
shutil.move(source, dest)
EDIT 2 - CLOSE!: The code below will move all the files in Scans to the Clients folder, which is close, but not exactly what I need to be doing. I need to get each file into the correct corresponding file folder within the Clients folder.
This is a step forward from moving the entire Scans folder I would think.
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
for (dirpath, dirnames, filenames) in walk(source):
for file in filenames:
shutil.move(path.join(dirpath,file), dest)
I have the following code below as well, and I am aware it does not do what I want it to do, so I am definitely missing something..
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.listdir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/{^w, $w}?"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?'):
print(file)
shutil.move(file, dest_dir)
1) Should I use os.scandir instead of os.listdir ?
2) Am I moving in the correct direction if I modify the code as such:
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.scandir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/*"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients, *'):
dest_dir = os.path.join(file, glob.glob)
shutil.move(file, dest_dir)
Note within the 'for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?' I have tried replacing 'Clients/{^w, $w}?' with just 'Clients/*'
For the above, I only need the file in :/Scans, written as, "lastName, firstName - [content]" to be matched and moved to /Clients/[lastName, firstName] --- the [content] does not matter. But there are both greedy and nongreedy expressions... which is why I'm unsure about using * or {^w, $w}? -- because we have clients with the same last names, but different first names.
The following error is generated when running the first command:
Error 1
Error 2
The following error (though, there is no error?) is generated when running the second command:
Error 3
EDIT/POSSIBLE ANSWER
Have not yet tested this but, fnmatch(filename, pattern), or, fnmatch.translate(pattern) can be used to test whether the filename string matches the pattern string, returning True or False.
From here perhaps you could write a conditional statement..
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(source, destination)
or
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(file.join(eachFile, source), destination)
I have not tested the two aforesaid codes. I have no idea if they work, but editing allows others to see how my train of thought is progressing.

In Python, How do I check whether a file exists starting or ending with a substring?

I know about os.path.isfile(fname), but now I need to search if a file exists that is named FILEnTEST.txt where n could be any positive integer (so it could be FILE1TEST.txt or FILE9876TEST.txt)
I guess a solution to this could involve substrings that the filename starts/ends with OR one that involves somehow calling os.path.isfile('FILE' + n + 'TEST.txt') and replacing n with any number, but I don't know how to approach either solution.
You would need to write your own filtering system, by getting all the files in a directory and then matching them to a regex string and seeing if they fail the test or not:
import re
pattern = re.compile("FILE\d+TEST.txt")
dir = "/test/"
for filepath in os.listdir(dir):
if pattern.match(filepath):
#do stuff with matching file
I'm not near a machine with Python installed on it to test the code, but it should be something along those lines.
You can use a regular expression:
/FILE\d+TEST.txt/
Example: regexr.com.
Then you can use said regular expression and iterate through all of the files in a directory.
import re
import os
filename_re = 'FILE\d+TEST.txt'
for filename in os.listdir(directory):
if re.search(filename_re, filename):
# this file has the form FILEnTEST.txt
# do what you want with it now
You can also do it as such:
import os
import re
if len([file for file in os.listdir(directory) if re.search('regex', file)]):
# there's at least 1 such file

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

Categories

Resources