I've got a directory containing multiple images, and I need to separate them into two folders based on a portion of the file name. Here's a sample of the file names:
22DEC167603520981127600_03.jpg
13NOV162302999230157801_07.jpg
08JAN147603811108236510_02.jpg
21OCT152302197661710099_07.jpg
07MAR172302551529900521_01.jpg
19FEB173211074174309177_09.jpg
19FEB173211881209232440_02.jpg
19FEB172302491000265198_04.jpg
I need to move the files into two folders according to the numbers in bold after the date - so files containing 2302 and 3211 would go into an existing folder named "panchromatic" and files with 7603 would go into another folder named "sepia".
I've tried multiple examples from other questions, and none seem to fit this problem. I'm very new to Python, so I'm not sure what example to post. Any help would be greatly appreciated.
You can do this the easy way or the hard way.
Easy way
Test if your filename contains the substring you're looking for.
import os
import shutil
files = os.listdir('.')
for f in files:
# skip non-jpeg files
if not f.endswith('.jpg'):
continue
# move if panchromatic
if '2302' in f or '3211' in f:
shutil.move(f, os.path.join('panchromatic', f))
# move if sepia
elif '7603' in f:
shutil.move(f, os.path.join('sepia', f))
# notify if something else
else:
print('Could not categorize file with name %s' % f)
This solution in its current form is susceptible to mis-classification, as the number we're looking for can appear by chance later in the string. I'll leave you to find ways to mitigate this.
Hard way
Regular expressions. Match the four letter digits after the date with a regular expression. Left for you to explore!
Self explanative, with Python 3, or Python 2 + backport pathlib:
import pathlib
import shutil
# Directory paths. Tailor this to your files layout
# see https://docs.python.org/3/library/pathlib.html#module-pathlib
source_dir = pathlib.Path('.')
sepia_dir = source_dir / 'sepia'
panchro_dir = source_dir / 'panchromatic'
assert sepia_dir.is_dir()
assert panchro_dir.is_dir()
destinations = {
('2302', '3211'): panchro_dir,
('7603',): sepia_dir
}
for filename in source_dir.glob('*.jpg'):
marker = str(filename)[7:11]
for key, value in destinations.items():
if marker in key:
filepath = source_dir / filename
shutil.move(str(filepath), str(value))
Related
I am trying to automate some work which is currently done by hand.
The aim is to find all the documents which have, for example, the number 408710 in their file name. Please note that the file name does also include other letters or figures. An example could be 2rsgf54087105f85sfr. The program should now search for all the files which own the combination 408710 and then move them into the right path.
I do know how to move the files, but so far I am only able to move the files by entering the exact file name. In that case I do only have one file and not all the files with the mentioned combination. Of course I do not know the exact file name in advance anyway.
Here the code for the stuff which is working:
import shutil
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
dst = "C:/Users/Startklar/Desktop/Empfangsordner/Sven"
dst2 = "C:/Users/Startklar/Desktop/Empfangsordner/Gerald"
# remove files
shutil.move(src=src + "/AA023300408710LFVI.docx", dst=dst)
shutil.move(src=src + "/BB023310187105ADIK.docx", dst=dst2)
If you just want to remove the files you can do it like this using regexp:
import os
import re
regexp = r'yourPattern.*\.docx$'
res = [f for f in os.listdir(path) if re.search(regexp , f)]
for f in res:
print('Remove: '+f)
os.remove(f)
You will need to find a regular expression which only finds all the files you would like to remove.
If you want infact move the files, like in your example, this looks like this (just guessing the regexp from your example)
import os
import re
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
filters = [["C:/Users/Startklar/Desktop/Empfangsordner/Sven", r'.*LFVI\.docx$'],
["C:/Users/Startklar/Desktop/Empfangsordner/Gerald", r'.*ADIK\.docx$']]
for f in os.listdir(src):
for dst,regexp in filters:
if re.search(regexp , f):
shutil.move(src=f, dst=dst)
I am trying to write a program that will sort my files into folders based on the names of the files using Python. However, I am unsure on how to do this.
This is what I know how to do already, which moves a file, but only one.
import shutil
original = r'C:\Users\******\Documents\stacktest\cat1.txt'
new = r'C:\Users\******\Documents\stacktest\cat\cat1.txt'
shutil.move(original, new)
This moves the cat1.txt file into the cat folder.
Sorry if this post isn't clear. I'll try to clarify if needed. If anyone can help me figure this out, then thank you for your help!
I am not sure which files you want to move, say, you have some criteria upfront. If you have the explicit list of files, just let filenames=<list of names> and skip reading all the files and filtering them.
import os
import shutil
folder_from = r'C:\Users\******\Documents\stacktest'
folder_two = r'C:\Users\******\Documents\stacktest\cat'
# read all files in the folder
# note that os.listdir returns all files and directories in the folder
filenames = [f for f in os.listdir(folder_from) if os.isfile(os.join(folder_from, f))]
# filter filenames by the criteria you have in mind
filenames = list(map(<function that filters>, filenames))
# move the files
for f in filenames:
shutil.move(os.path.join(folder_from, f), os.path.join(folder_to, f))
EDIT: ANSWER Below is the answer to the question. I will leave all subsequent text there just to show you how difficult I made such an easy task..
from pathlib import Path
import shutil
base = "C:/Users/Kenny/Documents/Clients"
for file in Path("C:/Users/Kenny/Documents/Scans").iterdir():
name = file.stem.split('-')[0].rstrip()
subdir = Path(base, name)
if subdir.exists():
dest = Path(subdir, file.name)
shutil.move(file, dest)
Preface:
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching client's name. This question is linked below - a very kind person, Elis Byberi, helped assist me in correcting my original code. I'm encountering another problem though..
To see our discussion and a similar question discussed:
-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
Python move files from directories that match given criteria to new directory
Question: How can you move all of the named files in :/Scans to their appropriately matched folder in :/Clients.
Background: Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed (I tried writing a program to auto-rename.. didn't work) based on client and content, such that the folder encloses PDFs labeled as follows:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/C drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, named similar to the pattern above, as 'lastName, firstName'
EDIT: The code below will move the entire Scans folder to the Clients folder, which is close, but not exactly what I need to be doing. I only need to move the files within Scans to the corresponding Client fold names.
import glob
import shutil
import os
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
os.chdir("C:/Users/Kenny/Documents/Clients")
pattern = '*,*'
for x in glob.glob(pattern):
fileName = os.path.join(source, x)
print(fileName)
shutil.move(source, dest)
EDIT 2 - CLOSE!: The code below will move all the files in Scans to the Clients folder, which is close, but not exactly what I need to be doing. I need to get each file into the correct corresponding file folder within the Clients folder.
This is a step forward from moving the entire Scans folder I would think.
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
for (dirpath, dirnames, filenames) in walk(source):
for file in filenames:
shutil.move(path.join(dirpath,file), dest)
I have the following code below as well, and I am aware it does not do what I want it to do, so I am definitely missing something..
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.listdir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/{^w, $w}?"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?'):
print(file)
shutil.move(file, dest_dir)
1) Should I use os.scandir instead of os.listdir ?
2) Am I moving in the correct direction if I modify the code as such:
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.scandir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/*"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients, *'):
dest_dir = os.path.join(file, glob.glob)
shutil.move(file, dest_dir)
Note within the 'for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?' I have tried replacing 'Clients/{^w, $w}?' with just 'Clients/*'
For the above, I only need the file in :/Scans, written as, "lastName, firstName - [content]" to be matched and moved to /Clients/[lastName, firstName] --- the [content] does not matter. But there are both greedy and nongreedy expressions... which is why I'm unsure about using * or {^w, $w}? -- because we have clients with the same last names, but different first names.
The following error is generated when running the first command:
Error 1
Error 2
The following error (though, there is no error?) is generated when running the second command:
Error 3
EDIT/POSSIBLE ANSWER
Have not yet tested this but, fnmatch(filename, pattern), or, fnmatch.translate(pattern) can be used to test whether the filename string matches the pattern string, returning True or False.
From here perhaps you could write a conditional statement..
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(source, destination)
or
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(file.join(eachFile, source), destination)
I have not tested the two aforesaid codes. I have no idea if they work, but editing allows others to see how my train of thought is progressing.
I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.
Im rather new to python but I have been attemping to learn the basics.
Anyways I have several files that once i have extracted from their zip files (painfully slow process btw) produce several hundred subdirectories with 2-3 files in each. Now what I want to do is extract all those files ending with 'dem.tif' and place them in a seperate file (move not copy).
I may have attempted to jump into the deep end here but the code i've written runs without error so it must not be finding the files (that do exist!) as it gives me the else statement. Here is the code i've created
import os
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
def move():
for (dirpath, dirs, files) in os.walk(src):
if files.endswith('dem.tif'):
shutil.move(os.path.join(src,files),dst)
print ('Moving ', + files, + ' to ', + dst)
else:
print 'No Such File Exists'
First, welcome to the community, and python! You might want to change your user name, especially if you frequent here. :)
I suggest the following (stolen from Mr. Beazley):
# genfind.py
#
# A function that generates files that match a given filename pattern
import os
import shutil
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
filesToMove = gen_find("*dem.tif",src)
for name in filesToMove:
shutil.move(name, dst)
I think you've mixed up the way you should be using os.walk().
for dirpath, dirs, files in os.walk(src):
print dirpath
print dirs
print files
for filename in files:
if filename.endswith('dem.tif'):
shutil.move(...)
else:
...
Update: the questioner has clarified below that he / she is actually calling the move function, which was the first point in my answer.
There are a few other things to consider:
You've got the order of elements returned in each tuple from os.walk wrong, I'm afraid - check the documentation for that function.
Assuming you've fixed that, also bear in mind that you need to iterate over files, and you need to os.join each of those to root, rather than src
The above would be obvious, hopefully, if you print out the values returned by os.walk and comment out the rest of the code in that loop.
With code that does potentially destructive operations like moving files, I would always first try some code that just prints out the parameters to shutil.move until you're sure that it's right.
Any particular reason you need to do it in Python? Would a simple shell command not be simpler? If you're on a Unix-like system, or have access to Cygwin on Windows:
find src_dir -name "*dem.tif" -exec mv {} dst_dir