Moving Files: Matching Partial File/Directory Criteria (lastName, firstName) - Glob, Shutil

Moving Files: Matching Partial File/Directory Criteria (lastName, firstName) - Glob, Shutil - python

EDIT: ANSWER Below is the answer to the question. I will leave all subsequent text there just to show you how difficult I made such an easy task..
from pathlib import Path
import shutil
base = "C:/Users/Kenny/Documents/Clients"
for file in Path("C:/Users/Kenny/Documents/Scans").iterdir():
name = file.stem.split('-')[0].rstrip()
subdir = Path(base, name)
if subdir.exists():
dest = Path(subdir, file.name)
shutil.move(file, dest)
Preface:
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching client's name. This question is linked below - a very kind person, Elis Byberi, helped assist me in correcting my original code. I'm encountering another problem though..
To see our discussion and a similar question discussed:
-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
Python move files from directories that match given criteria to new directory
Question: How can you move all of the named files in :/Scans to their appropriately matched folder in :/Clients.
Background: Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed (I tried writing a program to auto-rename.. didn't work) based on client and content, such that the folder encloses PDFs labeled as follows:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/C drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, named similar to the pattern above, as 'lastName, firstName'
EDIT: The code below will move the entire Scans folder to the Clients folder, which is close, but not exactly what I need to be doing. I only need to move the files within Scans to the corresponding Client fold names.
import glob
import shutil
import os
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
os.chdir("C:/Users/Kenny/Documents/Clients")
pattern = '*,*'
for x in glob.glob(pattern):
fileName = os.path.join(source, x)
print(fileName)
shutil.move(source, dest)
EDIT 2 - CLOSE!: The code below will move all the files in Scans to the Clients folder, which is close, but not exactly what I need to be doing. I need to get each file into the correct corresponding file folder within the Clients folder.
This is a step forward from moving the entire Scans folder I would think.
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
for (dirpath, dirnames, filenames) in walk(source):
for file in filenames:
shutil.move(path.join(dirpath,file), dest)
I have the following code below as well, and I am aware it does not do what I want it to do, so I am definitely missing something..
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.listdir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/{^w, $w}?"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?'):
print(file)
shutil.move(file, dest_dir)
1) Should I use os.scandir instead of os.listdir ?
2) Am I moving in the correct direction if I modify the code as such:
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.scandir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/*"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients, *'):
dest_dir = os.path.join(file, glob.glob)
shutil.move(file, dest_dir)
Note within the 'for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?' I have tried replacing 'Clients/{^w, $w}?' with just 'Clients/*'
For the above, I only need the file in :/Scans, written as, "lastName, firstName - [content]" to be matched and moved to /Clients/[lastName, firstName] --- the [content] does not matter. But there are both greedy and nongreedy expressions... which is why I'm unsure about using * or {^w, $w}? -- because we have clients with the same last names, but different first names.
The following error is generated when running the first command:
Error 1
Error 2
The following error (though, there is no error?) is generated when running the second command:
Error 3
EDIT/POSSIBLE ANSWER
Have not yet tested this but, fnmatch(filename, pattern), or, fnmatch.translate(pattern) can be used to test whether the filename string matches the pattern string, returning True or False.
From here perhaps you could write a conditional statement..
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(source, destination)
or
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(file.join(eachFile, source), destination)
I have not tested the two aforesaid codes. I have no idea if they work, but editing allows others to see how my train of thought is progressing.

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?

Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation

By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Move files one by one to newly created directories for each file with Python 3

What I have is an initial directory with a file inside D:\BBS\file.x and multiple .txt files in the work directory D:\
What I am trying to do is to copy the folder BBS with its content and incrementing it's name by number, then copy/move each existing .txt file to the newly created directory to make it \BBS1, \BBS2, ..., BBSn (depends on number of the txt).
Visual example of the Before and After:
Initial view of the \WorkFolder
Desired view of the \WorkFolder
Right now I have reached only creating of a new directory and moving txt in it but all at once, not as I would like to. Here's my code:
from pathlib import Path
from shutil import copy
import shutil
import os
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
count = 0
for content in src.iterdir():
addname = src.name.split('_')[0]
out_folder = wkDir.joinpath(f'!{addname}')
out_folder.mkdir(exist_ok=True)
out_path = out_folder.joinpath(content.name)
copy(content, out_path)
files = os.listdir(wkDir)
for f in files:
if f.endswith(".txt"):
shutil.move(f, out_folder)
I kindly request for assistance with incrementing and copying files one by one to the newly created directory for each as mentioned.
Not much skills with python in general. Python3 OS Windows
Thanks in advance

Now, I understand what you want to accomplish. I think you can do it quite easily by only iterating over the text files and for each one you copy the BBS folder. After that you move the file you are currently at. In order to get the folder_num, you may be able to just access the file name's characters at the particular indexes (e.g. f[4:6]) if the name is always of the pattern TextXX.txt. If the prefix "Text" may vary, it is more stable to use regular expressions like in the following sample.
Also, the function shutil.copytree copies a directory with its children.
import re
import shutil
from pathlib import Path
wkDir = Path.cwd()
src = wkDir.joinpath('BBS')
for f in os.listdir(wkDir):
if f.endswith(".txt"):
folder_num = re.findall(r"\d+", f)[0]
target = wkDir.joinpath(f"{src.name}{folder_num}")
# copy BBS
shutil.copytree(src, target)
# move .txt file
shutil.move(f, target)

Copy files (as backup) and change original file names (rearranging contents)

i'm a total python noob but i want to learn it and integrate it to my workflow.
I have about 400 files containing 4 different parts in the filename separated by an underline:
-> Version_Date_ProjectName_ProjectNumber
As we allways look at the Projectnumber first, we arranged the contents of the filename for new projects to:
-> ProjectNumber_Version_ProjektName
My Problem now is, that i like to rename all the existing files to be rearranged to the new format while having them backed up in a subdirectory called "Archiv".
It just has to be a simple script that i put in the directory and every file in this directory will be copied as backup and changed to the new filename.
EDIT:
My first step was to create a subfolder within the source directory, and it worked somehow. But no i saw, that i just need to backup the files with a specific file extension.
import os, shutil
src_dir= os.curdir
dst_dir= os.path.join(os.curdir, "Archiv")
shutil.copytree(src_dir, dst_dir)
i tried to extend the code with the solutions from here but it doesn't work out. :/

import os
import shutil
import glob
src_path = "YOU_SOURCE_PATH"
dest_path = "YOUR DESTINATION PATH"
if not os.path.exists(dest_path):
os.makedirs(dest_path)
files = glob.iglob(os.path.join(src_dir, "*.pdf"))
for file in files:
if os.path.isfile(file):
shutil.copy2(file, dest_path)

Finding correct path to files in subfolders with os.walk with python?

I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)

dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.

file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.

Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Moving Files: Matching Partial File/Directory Criteria (lastName, firstName) - Glob, Shutil - python

Related

Python: Finding files in directory but ignoring folders and their contents

Move files one by one to newly created directories for each file with Python 3

Copy files (as backup) and change original file names (rearranging contents)

Finding correct path to files in subfolders with os.walk with python?

Creating subdirectories and sorting files based on filename PYTHON

Categories

Resources