I am writing a script to save some images in a folder each time it runs.
I would like make a new folder each it runs with a enumerating folder names. for example if I run it first time , it just save the images in C:\images\folder1 and next time I run it, it will save the images in C:\images\folder2 and C:\images\folder3 and so on.
And if I delete these folders, and start running again, it would start from the "C:\images\folder1" again.
I found this solution works for file names but not for the folder names:
Create file but if name exists add number
The pathlib library is the standard pythonic way of dealing with any kind of folders or files and is system independent. As far as creating a new folder name, that could be done in a number of ways. You could check for the existence of each file (like Patrick Gorman's answer) or you could save a user config file with a counter that keeps track of where you left off or you could recall your file creation function if the file already exists moving the counter. If you are planning on having a large number of sub-directories (millions), then you might consider performing a binary search for the next folder to create (instead of iterating through the directory).
Anyway, in windows creating a file/folder with the same name, adds a (2), (3), (4), etc. to the filename. The space and parenthesis make it particularly easy to identify the number of the file/folder. If you want the number directly appended, like folder1, folder2, folder3, etc., then that becomes a little tricky to detect. We essentially need to check what the folder endswith as an integer. Finding particular expressions within in a tricky string is normally done with re (regular expressions). If we had a space and parenthesis we probably wouldn't need re to detect the integer in the string.
from pathlib import Path
import re
def create_folder(string_or_path):
path = Path(string_or_path)
if not path.exists():
#You can't create files and folders with the same name in Windows. Hence, check exists.
path.mkdir()
else:
#Check if string ends with numbers and group the first part and the numbers.
search = re.search('(.*?)([0-9]+$)',path.name)
if search:
basename,ending = search.groups()
newname = basename + str(int(ending)+1)
else:
newname = path.name + '1'
create_folder(path.parent.joinpath(newname))
path = Path(r'C:\images\folder1')
create_folder(path) #creates folder1
create_folder(path) #creates folder2, since folder1 exists
create_folder(path) #creates folder3, since folder1 and 2 exist
path = Path(r'C:\images\space')
create_folder(path) #creates space
create_folder(path) #creates space1, since space exists
Note: Be sure to use raw-strings when dealing with windows paths, since "\f" means something in a python string; hence you either have to do "\\f" or tell python it is a raw-string.
I feel like you could do something by getting a list of the directories and then looping over numbers 1 to n for the different possible directories until one can't be found.
from pathlib import Path
import os
path = Path('.')
folder = "folder"
i = 1
dirs = [e for e in path.iterdir() if e.is_dir()]
while True:
if folder+str(i) not in dirs:
folder = folder+str(i)
break
i = i+1
os.mkdir(folder)
I'm sorry if I made any typos, but that seems like a way that should work.
Related
I am writing a simple python script that looks in the subfolders of the selected subfolder for files and summarizes which extensions are used and how many.
I am not really familiar with os.walk and I am really stuck with the "for file in files" section
`
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
`
I thought a for loop was the best option for the loop that summarizes the files and extensions but it only takes the last subfolder and gives a output of the files that are in the last map.
Does anybody maybe have a fix or a different aproach for it?
FULL CODE:
`
import os
# Set file path using / {End with /}
root_path="C:/Users/me/Documents/"
# Initialize variables to keep track of file counts
total_file_count=0
file_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
# Initialize a count for each file type
file_counts[subfolder] = {}
# Iterate through all files in the subfolder
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
# Print total file count
print(f"There are a total of {total_file_count} files.")
# Print the file counts for each subfolder
for subfolder, counts in file_counts.items():
print(f"In the {subfolder} subfolder:")
for extension, count in counts.items():
print(f"There are {count} .{extension} files")
`
Thank you in advance :)
If I understand correctly, you want to count the extensions in ALL subfolders of the given folder, but are only getting one folder. If that is indeed the problem, then the issue is this loop
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
You are iterating through os.walk, but you keep overwriting the subfolder variable. So while it will print out every subfolder, it will only remember the LAST subfolder it encounters - leading to the code returning only on subfolder.
Solution 1: Fix the loop
If you want to stick with os.walk, you just need to fix the loop. First things first - define files as a real variable. Don't rely on using the temporary variable from the loop. You actually already have this: file_counts!
Then, you need someway to save the files. I see that you want to split this up by subfolder, so what we can do is use file_counts, and use it to map each subfolder to a list of files (you are trying to do this, but are fundamentally misunderstanding some python code; see my note below about this).
So now, we have a dictionary mapping each subfolder to a list of files! We would just need to iterate through this and count the extensions. The final code looks something like this:
file_counts = {}
extension_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
subfolder = root.split("/")[-1]
file_counts[subfolder] = files
extensions_counts[subfolder]={}
# Iterate through all subfolders, and then through all files
for subfolder in file_counts:
for file in file_counts[subfolder]:
total_file_count += 1
extension = file.split(".")[-1]
if extension not in extension_counts[subfolder]:
extension_counts[subfolder][extension] = 1
else:
extension_counts[subfolder][extension] += 1
Solution 2: Use glob
Instead of os.walk, you can use the glob module, which will return a list of all files and directories wherever you search. It is a powerful tool that uses wildcard matching, and you can read about it here
Note
In your code, you write
# Initialize a count for each file type
file_counts[subfolder] = {}
Which feels like a MATLAB coding scheme. First, subfolder is a variable, and not a vector, so this would only initialize a count for a single file type (and even if it was a list, you get an unhashable type error). Second, this seems to stem from the idea that continuously assigning a variable in a loop builds a list instead of overwriting, which is not true. If you want to do that, you need to initialize an empty list, and use .append().
Note 2: Electric Boogaloo
There are two big ways to make this code good, and here are hints
Look into default dictionaries. They will make your code less redundant
Do you REALLY need to save the numbers and THEN count? What if you counted directly?
Rather than using os.walk you could use the rglob and glob methods of Path object. E.g.,
from pathlib import Path
root_path="C:/Users/me/Documents/"
# get a list of all the directories within root (and recursively within those subdirectories
dirs = [d for d in Path().rglob(root_path + "*") if d.is_dir()]
dirs.append(Path(root_path)) # append root directory
# loop through all directories
for curdir in dirs:
# get suffixes (i.e., extensions) of all files in the directory
suffixes = set([s.suffix for s in curdir.glob("*") if s.is_file()])
print(f"In the {curdir}:")
# loop through the suffixes
for suffix in suffixes:
# get all the files in the currect directory with that extension
suffiles = curdir.glob(f"*{suffix}")
print(f"There are {len(list(suffiles))} {suffix} files")
I have wrote a code which creates a dictionary that stores all the absolute paths of folders from the current path as keys, and all of its filenames as values, respectively. This code would only be applied to paths that have folders which only contain file images. Here:
import os
import re
# Main method
the_dictionary_list = {}
for name in os.listdir("."):
if os.path.isdir(name):
path = os.path.abspath(name)
print(f'\u001b[45m{path}\033[0m')
match = re.match(r'/(?:[^\\])[^\\]*$', path)
print(match)
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[path] = list_of_file_contents
print('\n')
print('\u001b[43mthe_dictionary_list:\033[0m')
print(the_dictionary_list)
The thing is, that I want this dictionary to store only the last folder names as keys instead of its absolute paths, so I was planning to use this re /(?:[^\\])[^\\]*$, which would be responsible for obtaining the last name (of a file or folder from a given path), and then add those last names as keys in the dictionary in the for loop.
I wanted to test the code above first to see if it was doing what I wanted, but it didn't seem so, the value of the match variable became None in each iteration, which didn't make sense to me, everything else works fine.
So I would like to know what I'm doing wrong here.
I would highly recommend to use the builtin library pathlib. It would appear you are interested in the f.name part. Here is a cheat sheet.
I decided to rewrite the code above, in case of wanting to apply it only in the current directory (where this program would be found).
import os
# Main method
the_dictionary_list = {}
for subdir in os.listdir("."):
if os.path.isdir(subdir):
path = os.path.abspath(subdir)
print(f'\u001b[45m{path}\033[0m')
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[subdir] = list_of_file_contents
print('\n')
print('\033[1;37;40mThe dictionary list:\033[0m')
for subdir in the_dictionary_list:
print('\u001b[43m'+subdir+'\033[0m')
for archivo in the_dictionary_list[subdir]:
print(" ", archivo)
print('\n')
print(the_dictionary_list)
This would be useful in case the user wants to run the program with a double click on a specific location (my personal case)
I am trying to rename folders in bulk based on the folders/files contained within them (then moving the image files at the end of each path into they're respective model/color folder directories).
Each folder/file has a similar naming convention of MODEL_COLOR.
The code below works, though seems to only be working correctly on the first folder, in other words, the folders are being renamed correctly but the last leg of code seems to be taking the folder which contains the images and moves it to the corresponding path, instead of specifically moving the images to the corresponding path and dropping the folder they're originally in.
On the first folder the loop iterates, it actually moves the images to the correct Model > Color directory, though on all folders after that it seems to be moving the folder containing the images into the correct Model > Color directory, instead of just moving the images alone into the corresponding directory.
After looking at the forums I've seen similar issues where when changing the directory or deleting certain instances, the loop can't iterate correctly due to the initial set changing during the looping process (i.e. deleting or renaming part of the path while iterating). I'm pretty sure it's a simple fix but I can't seem to find the solution that'll work best.
Standard FolderNames:
CL4003IN_45F
CL4003IN_56F
CL40157U_01D
CL40157U_52H
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model/Color [is working, but moves file_path into base directory]
# arr = []
for file_path in glob.glob(os.path.join(folder, '*_*')):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, new_subdir))
# arr.append(file_path)
Completing the iteration of glob before the loop by storing it in a list helped avoid some unwanted errors.
#...
for file_path in list(glob.glob(os.path.join(folder, '*_*')
...#
But by modifying my code and removing the following from the loop:
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
pass
Allowed the code to iterate through all the folders in the directory without transferring the folder before the file into the new_dir > new_subdir directory.
The new code that works across a multitude of folders within a directory is:
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model > Color
for file_path in list(glob.glob(os.path.join(folder, '*_*'), recursive=True)):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
shutil.move(file_path, os.path.join(new_dir, new_subdir))
This may not be the most efficient code (and may not work across all instances, that remains to be determined!), though definitely works as intended for now.
Special thanks to those that helped with their suggestions.
The reason why you don't see the actual error is that you catch too many errors in your except: statement.
You intend to catch FileExistsError, so you should also just look for this one. Otherwise you would have noticed that the code in fact throws a FileNotFoundError.
The reason for that is that os.mkdir does not automatically create parent directories. It only creates directories one layer deep, but your code requires two layers of new directories.
For that, you would have to use os.makedirs(...) instead. Conveniently, os.makedirs also accepts an exist_ok flag, which gets rid of your entire try:-except: construct.
A further annotation is that you have a lot of duplicate calculations in your code:
file_path.rpartition('_') gets calculated twice
os.path.join(new_dir, new_subdir) gets calculated twice
I'd suggest storing those in meaningful variables. This speeds up your code, makes it more readable and more maintainable.
Here's a reworked version of your code:
import glob
import os
import shutil
folder = 'C:\\testing'
for file_path in glob.glob(os.path.join(folder, '*_*')):
(new_dir, _, new_subdir) = file_path.rpartition('_')
target_path = os.path.join(new_dir, new_subdir)
os.makedirs(target_path, exist_ok=True)
shutil.move(file_path, target_path)
Further improvements/fixes
There's still a bunch wrong in your code:
You don't check if the thing found by glob is a file
Splitting by _ does not stop at the directory divider. This means that something like C:\\my_files\\bla will get split into C:\\my and files\\bla.
I think you didn't care about either of those, because you thought 'the user would not use the script like this'. But this is actually a case that will happen:
You have a file C:\\my_files\\CL4003IN_45F, which will get moved to C:\\my_files\\CL4003IN\\45F\\CL4003IN_45F, as expected.
You run the script again. The script will find C:\\my_files\\CL4003IN. It doesn't check if it's a folder, so it will process it anyway. It will then split it into C:\\my and files\\CL4003IN.
The entire folder C:\\my_files\\CL4003IN will get moved to C:\\my\\files\\CL4003IN. Therefore the original file CL4003IN_45F ends up in C:\\my\\files\\CL4003IN\\CL4003IN\\45F\\CL4003IN_45F
The solution is:
only use rpartition on the filename, not the entire path
check if it actually is a file, or a directory
Most of these tasks get solved easier with pathlib. I took the liberty of rewriting your code and fixing those issues:
from pathlib import Path
folder = Path('C:\\testing')
for file_path in folder.iterdir():
file_name = file_path.name
# Ignore directories
if not file_path.is_file():
continue
# Split the filename by '_'
(target_dir, _, target_subdir) = file_name.rpartition('_')
# If no '_' found, ignore
if not target_dir:
continue
# Compute the target path and create it if necessary
target_path = folder / target_dir / target_subdir
target_path.mkdir(parents=True, exist_ok=True)
# Move the file to its new position
target_file_path = target_path / file_name
file_path.rename(target_file_path)
One last remark: folder.iterdir() actually does return an iterator. But that shouldn't be a problem in this case, as we explicitely check if the path is an existing file and not a directory or something that already got deleted. But if you want to be 100% safe write list(folder.iterdir()).
I'm trying to write a Python script that searches a folder for all files with the .txt extension. In the manuals, I have only seen it hardcoded into glob.glob("hardcoded path").
How do I make the directory that glob searches for patterns a variable? Specifically: A user input.
This is what I tried:
import glob
input_directory = input("Please specify input folder: ")
txt_files = glob.glob(input_directory+"*.txt")
print(txt_files)
Despite giving the right directory with the .txt files, the script prints an empty list [ ].
If you are not sure whether a path contains a separator symbol at the end (usually '/' or '\'), you can concatenate using os.path.join. This is a much more portable method than appending your local OS's path separator manually, and much shorter than writing a conditional to determine if you need to every time:
import glob
import os
input_directory = input('Please specify input folder: ')
txt_files = glob.glob(os.path.join(input_directory, '*.txt'))
print(txt_files)
For Python 3.4+, you can use pathlib.Path.glob() for this:
import pathlib
input_directory = pathlib.Path(input('Please specify input folder: '))
if not input_directory.is_dir():
# Input is invalid. Bail or ask for a new input.
for file in input_directory.glob('*.txt'):
# Do something with file.
There is a time of check to time of use race between the is_dir() and the glob, which unfortunately cannot be easily avoided because glob() just returns an empty iterator in that case. On Windows, it may not even be possible to avoid because you cannot open directories to get a file descriptor. This is probably fine in most cases, but could be a problem if your application has a different set of privileges from the end user or from other applications with write access to the parent directory. This problem also applies to any solution using glob.glob(), which has the same behavior.
Finally, Path.glob() returns an iterator, and not a list. So you need to loop over it as shown, or pass it to list() to materialize it.
I'm using Glob.Glob to search a folder, and the sub-folders there in for all the invoices I have. To simplify that I'm going to add the program to the context menu, and have it take the path as the first part of,
import glob
for filename in glob.glob(path + "/**/*.pdf", recursive=True):
print(filename)
I'll have it keep the list and send those files to a Printer, in a later version, but for now just writing the name is a good enough test.
So my question is twofold:
Is there anything fundamentally wrong with the way I'm writing this?
Can anyone point me in the direction of how to actually capture folder path and provide it as path-variable?
You should have a look at this question: Python script on selected file. It shows how to set up a "Sent To" command in the context menu. This command calls a python script an provides the file name sent via sys.argv[1]. I assume that also works for a directory.
I do not have Python3.5 so that I can set the flag recursive=True, so I prefer to provide you a solution which you can run on any Python version (known up to day).
The solution consists in using calling os.walk() to run explore the directories and the set build-in type.
it is better to use set instead of list as with this later one you'll need more code to check if the directory you want to add is not listed already.
So basically you can keep two sets: one for the names of files you want to print and the other one for the directories and their sub folders.
So you can adapat this solution to your class/method:
import os
path = '.' # Any path you want
exten = '.pdf'
directories_list = set()
files_list = set()
# Loop over direcotries
for dirpath, dirnames, files in os.walk(path):
for name in files:
# Check if extension matches
if name.lower().endswith(exten):
files_list.add(name)
directories_list.add(dirpath)
You can then loop over directories_list and files_list to print them out.