looping throught the folder - python

I need to solve trivial task running in loop sequence of the commands:
1) to take input .dcd file from the folder
2) to make some operations with the file
3) to save results in list
My code (which is not working !) looks like
# make LIST OF THE input DCD FILES
path="./inputs/"
dirs=os.listdir(path)
for traj in dirs:
trajectory = command(traj)
it correctly define name of the input but wrote that evvery file is empty
alternatively I've used below script to loop through the files using digit variable assidned to name of each file (which is not good in my current task because I need to keep name of each input file avoiding to use digits!)
# number of input files
n=3
for i in xrange (1,n+1):
trajectory = command('./inputs/file_%d.dcd' %(i))
In the last case all dcd files were correctly loaded (in opposit to the first example)! So the question what should I to fix in the first example?

os.listdir() gives you only the base filenames relative to the directory. No path is included.
Prefix your filenames with the path:
for traj in dirs:
trajectory = command(os.path.join(path, traj))

Related

Simple Python program that checks in each subfolder how many files there are and which extensions the file contains

I am writing a simple python script that looks in the subfolders of the selected subfolder for files and summarizes which extensions are used and how many.
I am not really familiar with os.walk and I am really stuck with the "for file in files" section
`
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
`
I thought a for loop was the best option for the loop that summarizes the files and extensions but it only takes the last subfolder and gives a output of the files that are in the last map.
Does anybody maybe have a fix or a different aproach for it?
FULL CODE:
`
import os
# Set file path using / {End with /}
root_path="C:/Users/me/Documents/"
# Initialize variables to keep track of file counts
total_file_count=0
file_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
# Initialize a count for each file type
file_counts[subfolder] = {}
# Iterate through all files in the subfolder
for file in files:
total_file_count += 1
# Get the file extension
extension = file.split(".")[-1]
# If the extension is not in the dictionary, add it
if extension not in file_counts[subfolder]:
file_counts[subfolder][extension] = 1
# If the extension is already in the dictionary, increase the count by 1
else:
file_counts[subfolder][extension] += 1
# Print total file count
print(f"There are a total of {total_file_count} files.")
# Print the file counts for each subfolder
for subfolder, counts in file_counts.items():
print(f"In the {subfolder} subfolder:")
for extension, count in counts.items():
print(f"There are {count} .{extension} files")
`
Thank you in advance :)
If I understand correctly, you want to count the extensions in ALL subfolders of the given folder, but are only getting one folder. If that is indeed the problem, then the issue is this loop
for root, dirs, files in os.walk(root_path):
# Get currenty subfolder name
subfolder = root.split("/")[-1]
print(subfolder)
You are iterating through os.walk, but you keep overwriting the subfolder variable. So while it will print out every subfolder, it will only remember the LAST subfolder it encounters - leading to the code returning only on subfolder.
Solution 1: Fix the loop
If you want to stick with os.walk, you just need to fix the loop. First things first - define files as a real variable. Don't rely on using the temporary variable from the loop. You actually already have this: file_counts!
Then, you need someway to save the files. I see that you want to split this up by subfolder, so what we can do is use file_counts, and use it to map each subfolder to a list of files (you are trying to do this, but are fundamentally misunderstanding some python code; see my note below about this).
So now, we have a dictionary mapping each subfolder to a list of files! We would just need to iterate through this and count the extensions. The final code looks something like this:
file_counts = {}
extension_counts = {}
# Iterate through all subfolders and files using os.walk
for root, dirs, files in os.walk(root_path):
subfolder = root.split("/")[-1]
file_counts[subfolder] = files
extensions_counts[subfolder]={}
# Iterate through all subfolders, and then through all files
for subfolder in file_counts:
for file in file_counts[subfolder]:
total_file_count += 1
extension = file.split(".")[-1]
if extension not in extension_counts[subfolder]:
extension_counts[subfolder][extension] = 1
else:
extension_counts[subfolder][extension] += 1
Solution 2: Use glob
Instead of os.walk, you can use the glob module, which will return a list of all files and directories wherever you search. It is a powerful tool that uses wildcard matching, and you can read about it here
Note
In your code, you write
# Initialize a count for each file type
file_counts[subfolder] = {}
Which feels like a MATLAB coding scheme. First, subfolder is a variable, and not a vector, so this would only initialize a count for a single file type (and even if it was a list, you get an unhashable type error). Second, this seems to stem from the idea that continuously assigning a variable in a loop builds a list instead of overwriting, which is not true. If you want to do that, you need to initialize an empty list, and use .append().
Note 2: Electric Boogaloo
There are two big ways to make this code good, and here are hints
Look into default dictionaries. They will make your code less redundant
Do you REALLY need to save the numbers and THEN count? What if you counted directly?
Rather than using os.walk you could use the rglob and glob methods of Path object. E.g.,
from pathlib import Path
root_path="C:/Users/me/Documents/"
# get a list of all the directories within root (and recursively within those subdirectories
dirs = [d for d in Path().rglob(root_path + "*") if d.is_dir()]
dirs.append(Path(root_path)) # append root directory
# loop through all directories
for curdir in dirs:
# get suffixes (i.e., extensions) of all files in the directory
suffixes = set([s.suffix for s in curdir.glob("*") if s.is_file()])
print(f"In the {curdir}:")
# loop through the suffixes
for suffix in suffixes:
# get all the files in the currect directory with that extension
suffiles = curdir.glob(f"*{suffix}")
print(f"There are {len(list(suffiles))} {suffix} files")

Why does my loop work correctly on the first iteration but not on the full set I'm looping through?

I am trying to rename folders in bulk based on the folders/files contained within them (then moving the image files at the end of each path into they're respective model/color folder directories).
Each folder/file has a similar naming convention of MODEL_COLOR.
The code below works, though seems to only be working correctly on the first folder, in other words, the folders are being renamed correctly but the last leg of code seems to be taking the folder which contains the images and moves it to the corresponding path, instead of specifically moving the images to the corresponding path and dropping the folder they're originally in.
On the first folder the loop iterates, it actually moves the images to the correct Model > Color directory, though on all folders after that it seems to be moving the folder containing the images into the correct Model > Color directory, instead of just moving the images alone into the corresponding directory.
After looking at the forums I've seen similar issues where when changing the directory or deleting certain instances, the loop can't iterate correctly due to the initial set changing during the looping process (i.e. deleting or renaming part of the path while iterating). I'm pretty sure it's a simple fix but I can't seem to find the solution that'll work best.
Standard FolderNames:
CL4003IN_45F
CL4003IN_56F
CL40157U_01D
CL40157U_52H
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model/Color [is working, but moves file_path into base directory]
# arr = []
for file_path in glob.glob(os.path.join(folder, '*_*')):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
# Handle the case where the target dir already exist.
pass
shutil.move(file_path, os.path.join(new_dir, new_subdir))
# arr.append(file_path)
Completing the iteration of glob before the loop by storing it in a list helped avoid some unwanted errors.
#...
for file_path in list(glob.glob(os.path.join(folder, '*_*')
...#
But by modifying my code and removing the following from the loop:
try:
os.mkdir(os.path.join(new_dir, new_subdir))
except WindowsError:
pass
Allowed the code to iterate through all the folders in the directory without transferring the folder before the file into the new_dir > new_subdir directory.
The new code that works across a multitude of folders within a directory is:
import glob, os, shutil
folder = 'C:\\testing'
# create new folder directory based on Model > Color
for file_path in list(glob.glob(os.path.join(folder, '*_*'), recursive=True)):
new_dir = file_path.rpartition('_')[0]
new_subdir = file_path.rpartition('_')[2]
shutil.move(file_path, os.path.join(new_dir, new_subdir))
This may not be the most efficient code (and may not work across all instances, that remains to be determined!), though definitely works as intended for now.
Special thanks to those that helped with their suggestions.
The reason why you don't see the actual error is that you catch too many errors in your except: statement.
You intend to catch FileExistsError, so you should also just look for this one. Otherwise you would have noticed that the code in fact throws a FileNotFoundError.
The reason for that is that os.mkdir does not automatically create parent directories. It only creates directories one layer deep, but your code requires two layers of new directories.
For that, you would have to use os.makedirs(...) instead. Conveniently, os.makedirs also accepts an exist_ok flag, which gets rid of your entire try:-except: construct.
A further annotation is that you have a lot of duplicate calculations in your code:
file_path.rpartition('_') gets calculated twice
os.path.join(new_dir, new_subdir) gets calculated twice
I'd suggest storing those in meaningful variables. This speeds up your code, makes it more readable and more maintainable.
Here's a reworked version of your code:
import glob
import os
import shutil
folder = 'C:\\testing'
for file_path in glob.glob(os.path.join(folder, '*_*')):
(new_dir, _, new_subdir) = file_path.rpartition('_')
target_path = os.path.join(new_dir, new_subdir)
os.makedirs(target_path, exist_ok=True)
shutil.move(file_path, target_path)
Further improvements/fixes
There's still a bunch wrong in your code:
You don't check if the thing found by glob is a file
Splitting by _ does not stop at the directory divider. This means that something like C:\\my_files\\bla will get split into C:\\my and files\\bla.
I think you didn't care about either of those, because you thought 'the user would not use the script like this'. But this is actually a case that will happen:
You have a file C:\\my_files\\CL4003IN_45F, which will get moved to C:\\my_files\\CL4003IN\\45F\\CL4003IN_45F, as expected.
You run the script again. The script will find C:\\my_files\\CL4003IN. It doesn't check if it's a folder, so it will process it anyway. It will then split it into C:\\my and files\\CL4003IN.
The entire folder C:\\my_files\\CL4003IN will get moved to C:\\my\\files\\CL4003IN. Therefore the original file CL4003IN_45F ends up in C:\\my\\files\\CL4003IN\\CL4003IN\\45F\\CL4003IN_45F
The solution is:
only use rpartition on the filename, not the entire path
check if it actually is a file, or a directory
Most of these tasks get solved easier with pathlib. I took the liberty of rewriting your code and fixing those issues:
from pathlib import Path
folder = Path('C:\\testing')
for file_path in folder.iterdir():
file_name = file_path.name
# Ignore directories
if not file_path.is_file():
continue
# Split the filename by '_'
(target_dir, _, target_subdir) = file_name.rpartition('_')
# If no '_' found, ignore
if not target_dir:
continue
# Compute the target path and create it if necessary
target_path = folder / target_dir / target_subdir
target_path.mkdir(parents=True, exist_ok=True)
# Move the file to its new position
target_file_path = target_path / file_name
file_path.rename(target_file_path)
One last remark: folder.iterdir() actually does return an iterator. But that shouldn't be a problem in this case, as we explicitely check if the path is an existing file and not a directory or something that already got deleted. But if you want to be 100% safe write list(folder.iterdir()).

How to create a unique folder name (location path) in Windows?

I am writing a script to save some images in a folder each time it runs.
I would like make a new folder each it runs with a enumerating folder names. for example if I run it first time , it just save the images in C:\images\folder1 and next time I run it, it will save the images in C:\images\folder2 and C:\images\folder3 and so on.
And if I delete these folders, and start running again, it would start from the "C:\images\folder1" again.
I found this solution works for file names but not for the folder names:
Create file but if name exists add number
The pathlib library is the standard pythonic way of dealing with any kind of folders or files and is system independent. As far as creating a new folder name, that could be done in a number of ways. You could check for the existence of each file (like Patrick Gorman's answer) or you could save a user config file with a counter that keeps track of where you left off or you could recall your file creation function if the file already exists moving the counter. If you are planning on having a large number of sub-directories (millions), then you might consider performing a binary search for the next folder to create (instead of iterating through the directory).
Anyway, in windows creating a file/folder with the same name, adds a (2), (3), (4), etc. to the filename. The space and parenthesis make it particularly easy to identify the number of the file/folder. If you want the number directly appended, like folder1, folder2, folder3, etc., then that becomes a little tricky to detect. We essentially need to check what the folder endswith as an integer. Finding particular expressions within in a tricky string is normally done with re (regular expressions). If we had a space and parenthesis we probably wouldn't need re to detect the integer in the string.
from pathlib import Path
import re
def create_folder(string_or_path):
path = Path(string_or_path)
if not path.exists():
#You can't create files and folders with the same name in Windows. Hence, check exists.
path.mkdir()
else:
#Check if string ends with numbers and group the first part and the numbers.
search = re.search('(.*?)([0-9]+$)',path.name)
if search:
basename,ending = search.groups()
newname = basename + str(int(ending)+1)
else:
newname = path.name + '1'
create_folder(path.parent.joinpath(newname))
path = Path(r'C:\images\folder1')
create_folder(path) #creates folder1
create_folder(path) #creates folder2, since folder1 exists
create_folder(path) #creates folder3, since folder1 and 2 exist
path = Path(r'C:\images\space')
create_folder(path) #creates space
create_folder(path) #creates space1, since space exists
Note: Be sure to use raw-strings when dealing with windows paths, since "\f" means something in a python string; hence you either have to do "\\f" or tell python it is a raw-string.
I feel like you could do something by getting a list of the directories and then looping over numbers 1 to n for the different possible directories until one can't be found.
from pathlib import Path
import os
path = Path('.')
folder = "folder"
i = 1
dirs = [e for e in path.iterdir() if e.is_dir()]
while True:
if folder+str(i) not in dirs:
folder = folder+str(i)
break
i = i+1
os.mkdir(folder)
I'm sorry if I made any typos, but that seems like a way that should work.

Export in each loop in their corresponding folder?

I have a list with directories.
shapelist
that has:
['C:\\Users\\user\\Desktop\\etg\\v1\\ASTENOT\\ASTENOT.shp',
'C:\\Users\\user\\Desktop\\etg\\v2\\ASTENOT\\ASTENOT.shp',
'C:\\Users\\user\\Desktop\\etg\\v3\\ASTENOT\\ASTENOT.shp',
'C:\\Users\\user\\Desktop\\etg\\v4\\ASTENOT\\ASTENOT.shp']
I want in each loop to use each ASTENOT from the list above which resides in a separate folder.
I have solved this part.
The issue is how to export each outcome in the corresponding folder where each input (each ASTENOT in every loop used) is located.
Example:
I am using this specific function in the loop.
arcpy.FeatureToLine_management(['ASTENOT'],'ASTENOT_lines')
The ['ASTENOT] position is for the input and
the 'ASTENOT_lines' is for the output of the function.
How can I make the output exported in the folder of each corresponding input?
Example: the ASTENOT_lines of the first loop to be exported in the v1\\ASTENOT\\ location the second in v2\\ASTENOT\\ and so on.
My attempt:
for i in shapelist:
arcpy.FeatureToLine_management([i],'ASTENOT_lines')
but exports everything in the current working directory and not in their corresponding folders of their inputs in each loop.
You can pass an absolute path to the FeatureToLine_management method.
The absolute path can be generated by simply replacing ASTENOT.shp in the input path with ASTENOT_lines.
So you can change your code to
for i in shapelist:
outputfile = i.replace('ASTENOT.shp', 'ASTENOT_lines')
arcpy.FeatureToLine_management([i], outputfile)

Python program to produce dictionary of file extensions and sizes

I am trying to create a program in Python that will search through a directory of files and create a dictionary whose keys are the various file extensions in the directory, and whose values constitute lists containing the number of times that extension appears in the directory, the size of the largest file with that extension, the size of the smallest, and the average size of files with that extension.
I have written the following so far:
for root, dirs, files in os.walk('.'):
contents={}
for name in files:
size=(os.path.getsize(name))
title, extension=os.path.splitext(name)
if extension not in contents:
contents[extension]=[1, size, size, size]
else:
contents[extension][0]=contents[extension][0]+1
contents[extension][3]=contents[extension][3]+size
if size>=contents[extension][1]:
contents[extension][1]=size
elif size<contents[extension][2]:
contents[extension][2]=size
contents[extension][3]=contents[extension][3]/contents[extension][0]
print(contents)
If I import os and use os.chdir() to enter the directory I want to explore, this script works to the extent that it returns a dictionary whose keys are the extensions in the directory, and whose values are lists that correctly identify the number of times that extension appears, the size of the largest file with that extension, and the size of the smallest. Where it goes wrong is that the average is calculated correctly in one case, but in the others it is incorrect but in inconsistent ways.
Any advice for fixing this? I'd like the dictionary to show the proper averages in each case. I'm new to Python, and programming, and am clearly missing something!
Thanks in advance.
In your last step,
contents[extension][3]=contents[extension][3]/contents[extension][0]
you're only performing this for a single extension, you need to loop through all your extensions:
for extension in contents:
contents[extension][3]=contents[extension][3]/contents[extension][0]
One thing that's certainly a problem is that to get the size of a file, you need to use the correct relative path. When os.walk() recurses into a subdirectory, the relative path is root+"/"+name -- not just name. So you should be getting the size like this:
size=os.path.getsize(root+"/"+name)
(Your variable root is not actually the "root" of the directory tree; it is each directory whose files are being listed in files.)
Will this fix the problem? Who knows. The way your code is now it should be raising an exception, so either you don't have any subdirectories or you are not showing us your complete code.
Try:
for root, dirs, files in os.walk('.'):
contents={}
for name in files:
size=(os.path.getsize(name))
title, extension=os.path.splitext(name)
if extension not in contents:
contents[extension]=[1, size, size, size]
else:
contents[extension][0]=contents[extension][0]+1
contents[extension][3]=contents[extension][3]+size
if size>=contents[extension][1]:
contents[extension][1]=size
elif size<contents[extension][2]:
contents[extension][2]=size
for k in contents.keys():
contents[k][3]=contents[k][3] / float(contents[k][0])
print(contents)
You are calculating the average only to one of the extensions, the last.
And use float, if you don't do that, the answer is not going to be exact.

Categories

Resources