Traverse directories to a specific depth in Python [duplicate]

Traverse directories to a specific depth in Python [duplicate] - python

This question already has answers here:
Travel directory tree with limited recursion depth
(2 answers)
Closed 5 years ago.
I would like to search and print directories under c:// for example, but only list 1st and 2nd levels down, that do contain SP30070156-1.
what is the most efficient way to get this using python 2 without the script running though the entire sub-directories (so many in my case it would take a very long time)
typical directory names are as follow:
Rooty Hill SP30068539-1 3RD Split Unit AC Project
Oxford Falls SP30064418-1 Upgrade SES MSB
Queanbeyan SP30066062-1 AC

You can try to create a function based on os.walk(). Something like this should get you started:
import os
def walker(base_dir, level=1, string=None):
results = []
for root, dirs, files in os.walk(base_dir):
_root = root.replace(base_dir + '\\', '') #you may need to remove the "+ '\\'"
if _root.count('\\') < level:
if string is None:
results.append(dirs)
else:
if string in dirs:
results.append(dirs)
return results
Then you can just call it with string='SP30070156-1' and level 1 then level 2.
Not sure if it's going to be faster than 40s, though.

here is the code i used, the method is quick to list, if filtered for keyword then it is even quicker
import os
MAX_DEPTH = 1
#folders = ['U:\I-Project Works\PPM 20003171\PPM 11-12 NSW', 'U:\I-Project Works\PPM 20003171\PPM 11-12 QLD']
folders = ['U:\I-Project Works\PPM 20003171\PPM 11-12 NSW']
try:
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
for dir in dirs:
if "SP30070156-1" in dir:
sp_path = root + "\\"+ dir
print(sp_path)
raise Found
if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
del dirs[:]
except:
print "found"

Related

shutil.move - Cannot move directory into itself... (Python)

It's late night for me and I'm banging my head against the wall as to why I can't figure this out.
Trying to split a directory with 100,000 folders (directories) into 4 subfolders with 25,000 folders/directories in each sub_directory.
Here is the code I have:
import os
import shutil
import alive_progress
from alive_progress import alive_bar
import time
# Set the directory you want to separate
src_dir = r'C:\Users\Administrator\Desktop\base'
# Set the number of directories you want in each subdirectory
num_dirs_per_subdir = 25000
# Set the base name for the subdirectories
subdir_base_name = '25k-Split'
# Calculate the number of subdirectories needed
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir
# Iterate over the subdirectories
for i in range(num_subdirs):
# Create the subdirectory path
subdir_path = os.path.join(src_dir, f'{subdir_base_name}_{i}')
# Create the subdirectory
os.mkdir(subdir_path)
# Get the directories to move
dirs_to_move = os.listdir(src_dir)[i*num_dirs_per_subdir:(i+1)*num_dirs_per_subdir]
# Iterate over the directories to move
with alive_bar(1000, force_tty=True) as bar:
for directory in dirs_to_move:
# Construct the source and destination paths
src_path = os.path.join(src_dir, directory)
dst_path = os.path.join(subdir_path, directory)
bar()
# Move the directory
shutil.move(src_path, dst_path)
bar()
I of course receive the following error:
Cannot move a directory 'C:\Users\Administrator\Desktop\base\25k-Split_0' into itself 'C:\Users\Administrator\Desktop\base\25k-Split_0\25k-Split_0'
Any help greatly appreciated.

You have 4 bugs:
You don't calculate the number of directories needed correctly.
Change
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir
to
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir + 1
If you have 1 directory, and want 25,000 directories per subdirectory. How many subdirectories do you need? 1. Not 0.
You need to check if the subdirectory already exists:
# Create the subdirectory path
subdir_path = os.path.join(src_dir, f'{subdir_base_name}_{i}')
if os.path.exists(subdir_path):
raise RuntimeError(f"{subdir_path} already exists")
# Create the subdirectory
os.mkdir(subdir_path)
You should give the target directory to shutil.move:
shutil.move(src_path, subdir_path)
You recalculate the directory list every time, which includes the subdirectories:
# outside loop
directories = os.listdir(src_dir)
# ...
dirs_to_move = directories[i*num_dirs_per_subdir:(i+1)*num_dirs_per_subdir]
I believe issues #2 & 4 are the main problem.

The problem is this line:
dirs_to_move = os.listdir(src_dir)[...
You keep fetching the directory list each time you go through the outer loop range(num_subdirs). After you handle the first subdir, the second iteration of the loop also gets the subdir you just created..
Delete the line above from inside the first loop and calculate directories to move outside the loops only once. Then index into it to get the list of dirs to move without refetching the directory list again, like this:
all_dirs = os.listdir(src_dir)
# Iterate over the subdirectories
for i in range(num_subdirs):
dir_index = i * num_dirs_per_subdir
dirs_to_move = all_dirs[dir_index : dir_index+num_dirs_per_subdir]
...
Your logic doesn't work if number of directories doesn't divide into num_dirs_per_subdir exactly. Here is how you can fix that:
start_index = i*num_dirs_per_subdir
end_index = start_index + num_dirs_per_subdir
if end_index > len(all_dirs):
dirs_to_move = all_dirs[start_index:]
else:
dirs_to_move = all_dirs[start_index : end_index]
...

problem with moving files using os.rename

i have this block of code where i try to move all the files in a folder to a different folder.
import os
from os import listdir
from os.path import isfile, join
def run():
print("Do you want to convert 1 file (0) or do you want to convert all the files in a folder(1)")
oneortwo = input("")
if oneortwo == "0":
filepathonefile = input("what is the filepath of your file?")
filepathonefilewithoutfullpath = os.path.basename(filepathonefile)
newfolder = "C:/Users/EL127032/Documents/fileconvertion/files/" + filepathonefilewithoutfullpath
os.rename(filepathonefile,newfolder)
if oneortwo == "1" :
filepathdirectory = input("what is the filepath of your folder?")
filesindirectory = [f for f in listdir(filepathdirectory) if isfile(join(filepathdirectory, f))]
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
but when i run this it gives
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
FileNotFoundError: [WinError 2] System couldn't find the file: 'C:\Users\EL127032\Documents\Eligant - kopie\Klas 1\Stermodules\Basisbiologie/lopen (1).odt' -> 'C:/Users/EL127032/Documents/fileconvertion/files/lopen (1).odt'
can someone help me please.

You are trying to move the same file twice.
The bug is in this part :
numberoffiles = len(filesindirectory)
handlingfilenumber = 0
while numberoffiles > handlingfilenumber:
currenthandlingfile = filesindirectory[handlingfilenumber]
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
The first time you loop, handlingfilenumber will be 0, so you will move the 0-th file from your filesindirectory list.
Then you loop again, handlingfilenumber is still 0, so you try to move it again, but it is not there anymore (you moved it already on the first turn).
You forgot to increment handlingfilenumber. Add handlingfilenumber += 1 on a line after os.rename and you will be fine.
while loops are more error-prone than simpler for loops, I recommend you use for loops when appropriate.
Here, you want to move each file, so a for loops suffices :
for filename in filesindirectory:
oldpathcurrenthandling = filepathdirectory + "/" + currenthandlingfile
futurepathcurrenhandlingfile = "C:/Users/EL127032/Documents/fileconvertion/files/" + currenthandlingfile
os.rename(oldpathcurrenthandling, futurepathcurrenhandlingfile)
No need to use len, initialize a counter, increment it, get the n-th element, ... And fewer lines.
Three other things :
you could have found the cause of the problem yourself, using debugging, there are plenty of ressources online to explain how to do it. Just printing the name of the file about to be copied (oldpathcurrenthandling) you would have seen it twice and noticed the problem causing the os error.
your variable names are not very readable. Consider following the standard style guide about variable names (PEP 8) and standard jargon, for example filepathonefilewithoutfullpath becomes filename, oldpathcurrenthandling becomes source_file_path (following the source/destination convention), ...
When you have an error, include the stacktrace that Python gives you. It would have pointed directly to the second os.rename case, the first one (when you copy only one file) does not contribute to the problem. It also helps finding a Minimal Reproducible Example.

How do I get the size of sub directory from a directory in python?

Following is the code
import os
def get_size(path):
total_size = 0
for root, dirs, files in os.walk(path):
for f in files:
fp = os.path.join(root, f)
total_size += os.path.getsize(fp)
return total_size
for root,dirs,files in os.walk('F:\House'):
print(get_size(dirs))
OUTPUT :
F:\House 21791204366
F:\House\house md 1832264906
F:\House\house md\house M D 1 1101710538
F:\House\Season 2 3035002265
F:\House\Season 3 3024588888
F:\House\Season 4 2028970391
F:\House\Season 5 3063415301
F:\House\Season 6 2664657424
F:\House\Season 7 3322229429
F:\House\Season 8 2820075762
I need only sub directories after main directory with their sizes. My code is going till the last directory and writing its size.
As an example:
F:\House 21791204366
F:\House\house md 1832264906
F:\House\house md\house M D 1 1101710538
It has printed the size for house md as well as house M D 1 (which is a subdirectory in house md). But I only want it till house md sub directory level.
DESIRED OUTPUT:
I need the size of each sub dir after the main dir level (specified by the user) and not sub sub dir (but their size should be included in parent dirs.)
How do I go about it ?

To print the size of each immediate subdirectory and the total size for the parent directory similar to du -bcs */ command:
#!/usr/bin/env python3.6
"""Usage: du-bcs <parent-dir>"""
import os
import sys
if len(sys.argv) != 2:
sys.exit(__doc__) # print usage
parent_dir = sys.argv[1]
total = 0
for entry in os.scandir(parent_dir):
if entry.is_dir(follow_symlinks=False): # directory
size = get_tree_size_scandir(entry)
# print the size of each immediate subdirectory
print(size, entry.name, sep='\t')
elif entry.is_file(follow_symlinks=False): # regular file
size = entry.stat(follow_symlinks=False).st_size
else:
continue
total += size
print(total, parent_dir, sep='\t') # print the total size for the parent dir
where get_tree_size_scandir()[text in Russian, code in Python, C, C++, bash].
The size of a directory here is the apparent size of all regular files in it and its subdirectories recursively. It doesn't count the size for the directory entries themselves or the actual disk usage for the files. Related: why is the output of du often so different from du -b.

Instead of using os.walk in your getpath function, you can use listdir in conjunction with isdir:
for file in os.listdir(path):
if not os.path.isdir(file):
# Do your stuff
total_size += os.path.getsize(fp)
...
os.walk will visit the entire directory tree, whereas listdir will only visit the files in the current directory.
However, be aware that this will not add the size of the subdirectories to the directory size. So if "Season 1" has 5 files of 100MB each, and 5 directories of 100 MB each, then the size reported by your function will be 500MB only.
Hint: Use recursion if you want the size of subdirectories to get added as well.

Break early from nested os.walk?

I am writing my own redundant file cleanup utility (mainly to help me learn Python 2.7)
The processing logic involves three steps:
Walk through potential redundant folder tree getting a filename.
Walk through the "golden" tree searching for a file previously found in 1.
If the files are equal, delete the redundant file (found in 1).
At this point, to save time, I want to break out of searching through golden tree.
Here is what I have so far.
# step 1
for redundant_root, redundant_dirs, redundant_files in os.walk(redundant_input_path):
redundant_path = redundant_root.split('/')
for redundant_file in redundant_files:
redundant_filename = redundant_root + "\\" + redundant_file
# step 2
for golden_root, golden_dirs, golden_files in os.walk(golden_input_path):
golden_path = golden_root.split('/')
for golden_file in golden_files:
golden_filename = golden_root + "\\" + golden_file
# step 3
if filecmp.cmp(golden_filename, redundant_filename, True):
print("removing " + redundant_filename)
os.remove(redundant_filename)
try:
(os.rmdir(redundant_root))
except:
pass
# here is where I want to break from continuing to search through the golden tree.

Python - Can glob be used multiple times?

I want the user to process files in 2 different folders. The user does by selecting a folder for First_Directory and another folder for Second_Directory. Each of these are defined, have their own algorithms and work fine if only one directory is selected at a time. If the user selects both, only the First_Directory is processed.
Both also contain the glob module as shown in the simplified code which I think the problem lies. My question is: can the glob module be used multiple times and if not, is there an alternative?
##Test=name
##First_Directory=folder
##Second_Directory=folder
path_1 = First_Directory
path_2 = Second_Directory
path = path_1 or path_2
os.chdir(path)
def First(path_1):
output_1 = glob.glob('./*.shp')
#Do some processing
def Second(path_2):
output_2 = glob.glob('./*.shp')
#Do some other processing
if path_1 and path_2:
First(path_1)
Second(path_2)
elif path_1:
First(path_1)
elif path_2:
Second(path_2)
else:
pass

You can modify your function to only look for .shp files in the path of interest. Then you can use that function for one path or both.
def globFolder(path):
output_1 = glob.glob(path + '\*.shp')
path1 = "C:\folder\data1"
path2 = "C:\folder\data2"
Then you can use that generic function:
totalResults = globFolder(path1) + globFolder(path2)
This will combine both lists.

I think by restructring your code can obtain your goal:
def First(path,check):
if check:
output = glob.glob(path+'./*.shp')
#Do some processing
else:
output = glob.glob(path+'./*.shp')
#Do some other processing
return output
#
#
#
First(path_1,True)
First(path_2,False)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.