i have some pdfs files that i need to create folders with part of your name and move the pdfs to the folders.
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
for file in glob.glob("*.pdf"):
dst = cwd + "\\" + file.replace(str(padrao), '').replace('P', '')
os.mkdir(dst)
shutil.move(file, dst)
ex: I have the file P9883231_V1_A0V0_T07-54-369-664_S00001.pdf, P9883231_V1_A0V0_T07-54-369-664_S00002.pdf and
P1235567_V1_A0V0_T07-54-369-664_S00001.pdf.
In this example I need the script to create two folders: 9883231 and 1234567. (the part in italics must be the name of the folder)
notice that in my code I remove the unwanted parts to create the folder, the 'P' at the beginning and part of padrao = '_V1_A0V0_T07-54-369-664_S00001.pdf'
The problem is that at the end of the padrao the number can be variable, the file can end with "02.pdf" , "03.pdf"
In the example I mentioned above, the folder 9883231 should contain both files.
Regular expressions can do the trick here:
import re
import os
import glob
import shutil
cwd = os.getcwd()
padrao = '_V1_A0V0_T07-54-369-664_S000'
for file in glob.glob("*.pdf"):
dst = os.path.join(cwd, re.findall("P(.*)" + padrao + "\d{2}.pdf", file)[0])
os.mkdir(dst)
shutil.move(file, dst)
Notice that I remove the part of padrao that varies.
The regex matches all strings that begin ith a P, followed by the padrao string value, followed by 2 digits, followed by .pdf; and takes the first occurence (no check is made wether it found anything here ...)
Also, it is better practice to use os.path.join() to avoid issues when creating path strings (when whanging os notably)
Related
I have a directory with some files with an underscore and no extension that I would like to add an extension to it and get rid of the underscore.
Example: list of files in directory
filename1_.jpg
file_name_2_
file_name3
The renamed files should be changed to look like whats below:
file_name_2.jpg
file_name3.jpg
I was going to start with just looking for files with no extension but that just list all the files.
Code I tried below:
# Finding files with extension using for loop
#import os module
import os
# Specifies the path in path variable
path = "/tmp/0/jpg/"
for i in os.listdir(path):
# List files with extension
if i.endswith(""):
print("Files with extension no extension:",i)
PS: I'm trying to do this all in python since it's going to be used in another piece of python code.
The issue with if i.endswith("") is that it lists every single file. What you want to do is check for every file that ends in an underscore, and then double check it has no extension. This is my method:
import os
path = r"your path"
for i in os.listdir(path):
if i.endswith("_"):
if os.path.splitext(i)[1] == "":
print("Files with extension no extension:",i)
os.rename(str(i), f"{str(i)[:-1]}.jpg")
The beginning is what you did. However, I then check to see if it ends with an _. After that, the line os.path.splitext(i)[1] basically recieves the extension of the file, and we check to make sure their is no extension (ie extension = ""). Finally, we use the os.rename() function to rename the file.
str(i)[:-1] basically removes the last character of the file, which in this case is the underscore. We then add a .jpg extension to it, and now all the files have been renamed.
I've reused your snippet with the specified usecase:
# Finding files with extension using for loop
#import os module
import os
# Specifies the path in path variable
path = "/tmp/0/jpg/"
names = []
for i in os.listdir(path):
# List files with extension
# this will handle the files with _ at the end and does not have an extension
if i.endswith("_") or not i.endswith('.jpg'):
names.append(f'{i.replace("_","")}.jpg')
# this will handle the files that contains _ and ends with an extension
elif '_' in i and i.endswith('.jpg'):
names.append(f'{i.replace("_", "")}')
print(names)
Output:
['filename2.jpg', 'filename3.jpg', 'filename1.jpg']
You can use the regex library re:
import os
import re
path = '<Your\\Path\\>'
for file in os.listdir(path):
if not(re.search(r'\..*', file)) and os.path.isfile(path + file):
new_name = re.sub('_$', '', file)
new_name += '.jpg'
os.rename(file, new_name)
Explanation:
We escape the dot symbol \.
We match anything (every character) afterwards
The os.path.isfile makes sure you won't rename folders, because they are being listed as well in the os.listdir() method
Using re.sub in order to remove the _ in file names that end with this character (preserving it if it shows anywhere else in the name)
If we find matches, it means the file is in the format random.something and it If it finds something, we can skip it as the file's name has an extension.
If you only meant to change the name of files that don't end with .jpg you can change the regex pattern in my code to \.jpeg
EDIT: ANSWER Below is the answer to the question. I will leave all subsequent text there just to show you how difficult I made such an easy task..
from pathlib import Path
import shutil
base = "C:/Users/Kenny/Documents/Clients"
for file in Path("C:/Users/Kenny/Documents/Scans").iterdir():
name = file.stem.split('-')[0].rstrip()
subdir = Path(base, name)
if subdir.exists():
dest = Path(subdir, file.name)
shutil.move(file, dest)
Preface:
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching client's name. This question is linked below - a very kind person, Elis Byberi, helped assist me in correcting my original code. I'm encountering another problem though..
To see our discussion and a similar question discussed:
-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
Python move files from directories that match given criteria to new directory
Question: How can you move all of the named files in :/Scans to their appropriately matched folder in :/Clients.
Background: Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed (I tried writing a program to auto-rename.. didn't work) based on client and content, such that the folder encloses PDFs labeled as follows:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/C drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, named similar to the pattern above, as 'lastName, firstName'
EDIT: The code below will move the entire Scans folder to the Clients folder, which is close, but not exactly what I need to be doing. I only need to move the files within Scans to the corresponding Client fold names.
import glob
import shutil
import os
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
os.chdir("C:/Users/Kenny/Documents/Clients")
pattern = '*,*'
for x in glob.glob(pattern):
fileName = os.path.join(source, x)
print(fileName)
shutil.move(source, dest)
EDIT 2 - CLOSE!: The code below will move all the files in Scans to the Clients folder, which is close, but not exactly what I need to be doing. I need to get each file into the correct corresponding file folder within the Clients folder.
This is a step forward from moving the entire Scans folder I would think.
source = "C:/Users/Kenny/Documents/Scans"
dest = "C:/Users/Kenny/Documents/Clients"
for (dirpath, dirnames, filenames) in walk(source):
for file in filenames:
shutil.move(path.join(dirpath,file), dest)
I have the following code below as well, and I am aware it does not do what I want it to do, so I am definitely missing something..
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.listdir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/{^w, $w}?"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?'):
print(file)
shutil.move(file, dest_dir)
1) Should I use os.scandir instead of os.listdir ?
2) Am I moving in the correct direction if I modify the code as such:
import glob
import shutil
import os
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.scandir(path)
for file in dirs:
print(file)
dest_dir = "C:/Users/Kenny/Documents/Clients/*"
for file in glob.glob(r'C:Users/Kenny/Documents/Clients, *'):
dest_dir = os.path.join(file, glob.glob)
shutil.move(file, dest_dir)
Note within the 'for file in glob.glob(r'C:Users/Kenny/Documents/Clients/{^w, $w}?' I have tried replacing 'Clients/{^w, $w}?' with just 'Clients/*'
For the above, I only need the file in :/Scans, written as, "lastName, firstName - [content]" to be matched and moved to /Clients/[lastName, firstName] --- the [content] does not matter. But there are both greedy and nongreedy expressions... which is why I'm unsure about using * or {^w, $w}? -- because we have clients with the same last names, but different first names.
The following error is generated when running the first command:
Error 1
Error 2
The following error (though, there is no error?) is generated when running the second command:
Error 3
EDIT/POSSIBLE ANSWER
Have not yet tested this but, fnmatch(filename, pattern), or, fnmatch.translate(pattern) can be used to test whether the filename string matches the pattern string, returning True or False.
From here perhaps you could write a conditional statement..
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(source, destination)
or
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
shutil.move(file.join(eachFile, source), destination)
I have not tested the two aforesaid codes. I have no idea if they work, but editing allows others to see how my train of thought is progressing.
I have multiple text files with names containing 6 groups of period-separated digits matching the pattern year.month.day.hour.minute.second.
I want to add a .txt suffix to these files to make them easier to open as text files.
I tried the following code and I I tried with os.rename without success:
Question
How can I add .txt to the end of these file names?
path = os.chdir('realpath')
for f in os.listdir():
file_name = os.path.splitext(f)
name = file_name +tuple(['.txt'])
print(name)
You have many problems in your script. You should read each method's documentation before using it. Here are some of your mistakes:
os.chdir('realpath') - Do you really want to go to the reapath directory?
os.listdir(): − Missing argument, you need to feed a path to listdir.
print(name) - This will print the new filename, not actually rename the file.
Here is a script that uses a regex to find files whose names are made of 6 groups of digits (corresponding to your pattern year.month.day.hour.minute.second) in the current directory, then adds the .txt suffix to those files with os.rename:
import os
import re
regex = re.compile("[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+")
for filename in os.listdir("."):
if regex.match(filename):
os.rename(filename, filename + ".txt")
I have a large directory with many part files with their revisions, I want to recursively create a new folder for each part, and then move all of the related files into that folder. I am trying to do this by isolating a 7 digit number which would be used as an identifier for the part, and all the related filenames would also include this number.
import os
import shutil
import csv
import glob
from fnmatch import fnmatch, filter
from os.path import isdir, join
from shutil import copytree, copy2, Error, copystat
from shutil import copytree, ignore_patterns
dirname = ' '
# pattern = '*???????*'
for root, dirs, files in os.walk(dirname):
for fpath in files:
print(fpath)
if fpath[0:6].isdigit():
matchdir = os.mkdir(os.path.join(os.path.dirname(fpath)))
partnum = str(fpath[0:6])
pattern = str(partnum)
filematch = fnmatch(files, pattern)
print(filematch)
shutil.move(filematch, matchdir)
This is what I have so far, basically I'm not sure how to get the original filename and use it as the matching patter for the rest of the files. The original filename I want to use for this matching pattern is just a 7 digit number, and all of the related files may have other characters (REV-2) for example.
Don't overthink it
I think you're getting confused about what os.walk() gives you - recheck the docs. dirs and files are just a list of names of the directories / files, not the full paths.
Here's my suggestion. Assuming that you're starting with a directory layout something like:
directory1
1234567abc.txt
1234567abc.txt
1234567bcd.txt
2234567abc.txt
not-interesting.txt
And want to end with something like:
directory1
1234567
abc.txt
1234567
abc.txt
bcd.txt
2234567
abc.txt
not-interesting.txt
If that's correct, then there's no need to rematch the files in the directory, just operate on each file individually, and make the part directory only if it doesn't already exist. I would also use a regular expression to do this, so something like:
import os
import re
import shutil
for root, dirs, files in os.walk(dirname):
for fname in files:
# Match a string starting with 7 digits followed by everything else.
# Capture each part in a group so we can access them later.
match_object = re.match('([0-9]{7})(.*)$', fname)
if match_object is None:
# The regular expression did not match, ignore the file.
continue
# Form the new directory path using the number from the regular expression and the current root.
new_dir = os.path.join(root, match_object.group(1))
if not os.path.isdir(new_dir):
os.mkdir(new_dir)
new_file_path = os.path.join(new_dir, match_object.group(2))
# Or, if you don't want to change the filename, use:
new_file_path = os.path.join(new_dir, fname)
old_file_path = os.path.join(root, fname)
shutil.move(old_file_path, new_file_path)
Note that I have:
Switched the sense of the condition, we continue the loop immediately if the file is not interesting. This is a useful pattern to use to make sure that your code does not get too heavily indented.
Changed the name of fpath to fname. This is because it's not a path but just the name of the file, so it's better to call it fname.
Please clarify the question if that's not what you meant!
[edit] to show how to copy the file without changing its name.
I have a directory with multiple files beginning with integers. I am attempting to copy some of them to another directory based on a string pattern within the file name. I can successfully copy multiple files starting with integers (which I commented out), but am having trouble filtering based on the string pattern. I'm using shutil.copy, but am having trouble in determining whether to use regex or fnmatch.
My code below filters correctly, but still copies all files, not files with the specific string 'TEST_Payroll'. Any help to do this would be appreciated. Thanks!!
import re
import os
import fnmatch
import shutil
src_files = os.listdir('C:/Users/acars/Desktop/a')
regex_txt = 'TEST_Payroll'
source = 'C:/Users/acars/Desktop/a'
dest1 = 'C:/Users/acars/Desktop/b'
for file_name in src_files:
#if not file_name.startswith(('0','1','2','3','4','5','6','7','8','9',)):
if fnmatch.filter(file_name, 'TEST_Payroll'):
continue
src = os.path.join(source, file_name)
dst = os.path.join(dest1, file_name)
shutil.copy(src, dst)
How about using,
if re.search(r'TEST_Payroll',file_name):
#do something with file
else:
#else do nothing