Recursively find and copy files from many folders - python

I have some files in an array that I want to recursively search from many folders
An example of the filename array is ['A_010720_X.txt','B_120720_Y.txt']
Example of folder structure is as below which I can also provide as an array e.g ['A','B'] and ['2020-07-01','2020-07-12']. The "DL" remains the same for all.
C:\A\2020-07-01\DL
C:\B\2020-07-12\DL
etc
I have tried to use shutil but it doesn't seem to work effectively for my requirement as I can only pass in a full file name and not a wildcard. The code I have used with shutil which works but without wildcards and with absolute full file name and path e.g the code below will only give me A_010720_X.txt
I believe the way to go would be using glob or pathlib which i have not used before or cannot find some good examples similar to my use case
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
RootDir1 = r'C:\A\2020-07-01\DL'
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
for root, dirs, files in os.walk((os.path.normpath(RootDir1)), topdown=False):
for name in files:
if name in filenames_i_want:
print ("Found")
SourceFolder = os.path.join(root,name)
shutil.copy2(SourceFolder, TargetFolder)

I think this should do what you need assuming they are all .txt files.
import glob
import shutil
filenames_i_want = ['A_010720_X.txt','B_120720_Y.txt']
TargetFolder = r'C:\ELK\LOGS\ATH\DEST'
all_files = []
for directory in ['A', 'B']:
files = glob.glob('C:\{}\*\DL\*.txt'.format(directory))
all_files.append(files)
for file in all_files:
if file in filenames_i_want:
shutil.copy2(file, TargetFolder)

Related

Delete file by non standard extension

I know how to delete files by extension but what if my files are looking like this:
update_24-08-2022_14-54.zip.001
Where last 3 digits can be between 001-029
Here is code that I'm using for standard zip files
files_in_directory = os.listdir(directory)
filtered_files = [file for file in files_in_directory if file.endswith(".zip")]
for file in filtered_files:
path_to_file = os.path.join(directory, file)
os.remove(path_to_file)
Assuming the double extensions are of the form .zip.xyz, with xyz being triple digits, you can use globbing:
import glob
import os
for path in glob.glob('*.zip.[0-9][0-9][0-9]'):
os.remove(path)
(As a usual precaution, check first, by replacing os.remove with print).
If you have a specific directory, its name stored in directory, you can use:
import glob
import os
for path in glob.glob(os.path.join(directory, '*.zip.[0-9][0-9][0-9]')):
os.remove(path)
There is no need to join the directory and path inside the for loop (as is the case in the question): path itself will already contain the directory name.

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Moving Files: Matching Partial File/Directory Criteria (lastName, firstName) - Glob, Shutil

EDIT: ANSWER Below is the answer to the question. I will leave all subsequent text there just to show you how difficult I made such an easy task..
from pathlib import Path
import shutil
base = "C:/Users/Kenny/Documents/Clients"
for file in Path("C:/Users/Kenny/Documents/Scans").iterdir():
name = file.stem.split('-')[0].rstrip()
subdir = Path(base, name)
if subdir.exists():
dest = Path(subdir, file.name)
shutil.move(file, dest)
Preface:
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching client's name. This question is linked below - a very kind person, Elis Byberi, helped assist me in correcting my original code. I'm encountering another problem though..
To see our discussion and a similar question discussed:
-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
Python move files from directories that match given criteria to new directory
Question: How can you move all of the named files in :/Scans to their appropriately matched folder in :/Clients.
Background: Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed (I tried writing a program to auto-rename.. didn't work) based on client and content, such that the folder encloses PDFs labeled as follows:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/C drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, named similar to the pattern above, as 'lastName, firstName'
EDIT: The code below will move the entire Scans folder to the Clients folder, which is close, but not exactly what I need to be doing. I only need to move the files within Scans to the corresponding Client fold names.
import glob
import shutil
import os
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
os.chdir("C:/Users/Kenny/Documents/Clients")
pattern = '*,*'
for x in glob.glob(pattern):
fileName = os.path.join(source, x)
print(fileName)
shutil.move(source, dest)
EDIT 2 - CLOSE!: The code below will move all the files in Scans to the Clients folder, which is close, but not exactly what I need to be doing. I need to get each file into the correct corresponding file folder within the Clients folder.
This is a step forward from moving the entire Scans folder I would think.
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
for (dirpath, dirnames, filenames) in walk(source):
for file in filenames:
shutil.move(path.join(dirpath,file), dest)
I have the following code below as well, and I am aware it does not do what I want it to do, so I am definitely missing something..
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.listdir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/{^w, $w}?"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?'):
print(file)
shutil.move(file, dest_dir)
1) Should I use os.scandir instead of os.listdir ?
2) Am I moving in the correct direction if I modify the code as such:
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.scandir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/*"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients, *'):
dest_dir = os.path.join(file, glob.glob)
shutil.move(file, dest_dir)
Note within the 'for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?' I have tried replacing 'Clients/{^w, $w}?' with just 'Clients/*'
For the above, I only need the file in :/Scans, written as, "lastName, firstName - [content]" to be matched and moved to /Clients/[lastName, firstName] --- the [content] does not matter. But there are both greedy and nongreedy expressions... which is why I'm unsure about using * or {^w, $w}? -- because we have clients with the same last names, but different first names.
The following error is generated when running the first command:
Error 1
Error 2
The following error (though, there is no error?) is generated when running the second command:
Error 3
EDIT/POSSIBLE ANSWER
Have not yet tested this but, fnmatch(filename, pattern), or, fnmatch.translate(pattern) can be used to test whether the filename string matches the pattern string, returning True or False.
From here perhaps you could write a conditional statement..
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(source, destination)
or
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(file.join(eachFile, source), destination)
I have not tested the two aforesaid codes. I have no idea if they work, but editing allows others to see how my train of thought is progressing.

Creating subdirectories and sorting files based on filename PYTHON

I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.

How to check if multiple files exist in different directories

I know how to use python to check to see if a file exists, but what I am after is trying to see if multiple files of the same name exist throughout my working directory. Take for instance:
gamedata/areas/
# i have 2 folders in this directory
# testarea and homeplace
1. gamedata/areas/testarea/
2. gamedata/areas/homeplace/
Each folder of homeplace and testarea for instance contains a file called 'example'
Is there a pythonic way to use 'os' or similiar to check to see if the file 'example' can be found in both testarea and homeplace?
Although is their a way to do this without manually and statically using
os.path.isfile()
because throughout the life of the program new directories will be made, and I don't want to constantly go back into the code to change it.
You can check in every directory bellow gamedata/areas/:
This only goes down one level, you could extend it to go down as many levels as you want.
from os import listdir
from os.path import isdir, isfile, join
base_path = "gamedata/areas/"
files = listdir(base_path)
only_directories = [path for path in files if isdir(join(base_path,path))]
for directory_path in only_directories:
dir_path = join(base_path, directory_path)
for file_path in listdir(dir_path):
full_file_path = join(base_path, dir_path, file_path)
is_file = isfile(full_file_path)
is_example = "example" in file_path
if is_file and is_example:
print "Found One!!"
Hope it helps!
Maybe something like
places = ["testarea", "homeplace"]
if all(os.path.isfile(os.path.join("gamedata/areas/", x, "example") for x in places)):
print("Missing example")
If the condition is false, this doesn't tell you which subdirectory does not contain the file example, though. You can update places as necessary.
As I mentioned in the comments, os.walk is your friend:
import os
ROOT="gamedata/areas"
in_dirs = [path for (path, dirs, filenames)
in os.walk(ROOT)
if 'example' in filenames]
in_dirs will be a list of subdirectories where example is found

Categories

Resources