Suppose that I have a directory with 3 directories a, b, c. Within each of these directories I have the four directories 1, 2, 3, 4. Is there a way to list all paths, i.e. a/1,..., a/4, ..., c/1, ..., c/4, in python?
Is there a generalization? For example, given a directory to list all paths within at most 3 levels.
You can try this:
import os
n_levels = 2
cur_level = ['/your/start/path']
for level in range(n_levels):
cur_level = [os.path.join(p, f)
for p in cur_level
if os.path.isdir(p)
for f in os.listdir(p)]
print(cur_level)
It starts at cur_level = '/your/start/path' as level 0, enumerates all its files and directories and stores their full paths back in list cur_level. Then the same is done for each path in the cur_level again, the results are again saved as a list.
Related
It's late night for me and I'm banging my head against the wall as to why I can't figure this out.
Trying to split a directory with 100,000 folders (directories) into 4 subfolders with 25,000 folders/directories in each sub_directory.
Here is the code I have:
import os
import shutil
import alive_progress
from alive_progress import alive_bar
import time
# Set the directory you want to separate
src_dir = r'C:\Users\Administrator\Desktop\base'
# Set the number of directories you want in each subdirectory
num_dirs_per_subdir = 25000
# Set the base name for the subdirectories
subdir_base_name = '25k-Split'
# Calculate the number of subdirectories needed
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir
# Iterate over the subdirectories
for i in range(num_subdirs):
# Create the subdirectory path
subdir_path = os.path.join(src_dir, f'{subdir_base_name}_{i}')
# Create the subdirectory
os.mkdir(subdir_path)
# Get the directories to move
dirs_to_move = os.listdir(src_dir)[i*num_dirs_per_subdir:(i+1)*num_dirs_per_subdir]
# Iterate over the directories to move
with alive_bar(1000, force_tty=True) as bar:
for directory in dirs_to_move:
# Construct the source and destination paths
src_path = os.path.join(src_dir, directory)
dst_path = os.path.join(subdir_path, directory)
bar()
# Move the directory
shutil.move(src_path, dst_path)
bar()
I of course receive the following error:
Cannot move a directory 'C:\Users\Administrator\Desktop\base\25k-Split_0' into itself 'C:\Users\Administrator\Desktop\base\25k-Split_0\25k-Split_0'
Any help greatly appreciated.
You have 4 bugs:
You don't calculate the number of directories needed correctly.
Change
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir
to
num_subdirs = len(os.listdir(src_dir)) // num_dirs_per_subdir + 1
If you have 1 directory, and want 25,000 directories per subdirectory. How many subdirectories do you need? 1. Not 0.
You need to check if the subdirectory already exists:
# Create the subdirectory path
subdir_path = os.path.join(src_dir, f'{subdir_base_name}_{i}')
if os.path.exists(subdir_path):
raise RuntimeError(f"{subdir_path} already exists")
# Create the subdirectory
os.mkdir(subdir_path)
You should give the target directory to shutil.move:
shutil.move(src_path, subdir_path)
You recalculate the directory list every time, which includes the subdirectories:
# outside loop
directories = os.listdir(src_dir)
# ...
dirs_to_move = directories[i*num_dirs_per_subdir:(i+1)*num_dirs_per_subdir]
I believe issues #2 & 4 are the main problem.
The problem is this line:
dirs_to_move = os.listdir(src_dir)[...
You keep fetching the directory list each time you go through the outer loop range(num_subdirs). After you handle the first subdir, the second iteration of the loop also gets the subdir you just created..
Delete the line above from inside the first loop and calculate directories to move outside the loops only once. Then index into it to get the list of dirs to move without refetching the directory list again, like this:
all_dirs = os.listdir(src_dir)
# Iterate over the subdirectories
for i in range(num_subdirs):
dir_index = i * num_dirs_per_subdir
dirs_to_move = all_dirs[dir_index : dir_index+num_dirs_per_subdir]
...
Your logic doesn't work if number of directories doesn't divide into num_dirs_per_subdir exactly. Here is how you can fix that:
start_index = i*num_dirs_per_subdir
end_index = start_index + num_dirs_per_subdir
if end_index > len(all_dirs):
dirs_to_move = all_dirs[start_index:]
else:
dirs_to_move = all_dirs[start_index : end_index]
...
This is my code.
folder_out = []
for a in range(1,80):
folder_letter = "/content/drive/MyDrive/project/Dataset/data/"
folder_out[a] = os.path.join(folder_letter, str(a))
folder_out.append(folder_out[a])
and this is an error
and this what I want
You are using the os method wrong, you want to use os.listdir(Your directory here) to get a list of all directories
import os
dir = os.listdir("/content/drive/MyDrive/project/Dataset/data/")
for f in dir:
print(f)
If you just want a list of all directories, just use os.listdir("/content/drive/MyDrive/project/Dataset/data/")
It's simply pointless to create a variable. They are unnecessary: You can store everything in lists, dictionaries and so on. Creating a new variables inside loop is very very bad practice.
Code correction: save in list instead and access them using loops or slicing.
import os
folder_out = []
for a in range(1,80):
folder_letter = "/content/drive/MyDrive/project/Dataset/data/"
folder= os.path.join(folder_letter, str(a))
folder_out.append(folder)
print(folder_out)
Gives list of folder names.
['/content/drive/MyDrive/project/Dataset/data/1', '/content/drive/MyDrive/project/Dataset/data/2', '/content/drive/MyDrive/project/Dataset/data/3', '/content/drive/MyDrive/project/Dataset/data/4', '/content/drive/MyDrive/project/Dataset/data/5', '/content/drive/MyDrive/project/Dataset/data/6', '/content/drive/MyDrive/project/Dataset/data/7', '/content/drive/MyDrive/project/Dataset/data/8', '/content/drive/MyDrive/project/Dataset/data/9',.....]
If you want to iterate over them.
for elment in folder_out:
print(elment)
Which gives #
element 1
elem2nt 2...
Like
for x in folder_out:
print(f"folder_out{c}: {x}")
c= c+1
Gives what you want
folder_out0: /content/drive/MyDrive/project/Dataset/data/1
folder_out1: /content/drive/MyDrive/project/Dataset/data/2
folder_out2: /content/drive/MyDrive/project/Dataset/data/3
folder_out3: /content/drive/MyDrive/project/Dataset/data/4
folder_out4: /content/drive/MyDrive/project/Dataset/data/5
folder_out5: /content/drive/MyDrive/project/Dataset/data/6
folder_out6: /content/drive/MyDrive/project/Dataset/data/7
folder_out7: /content/drive/MyDrive/project/Dataset/data/8
folder_out8: /content/drive/MyDrive/project/Dataset/data/9
folder_out9: /content/drive/MyDrive/project/Dataset/data/10
folder_out10: /content/drive/MyDrive/project/Dataset/data/11
folder_out11: /content/drive/MyDrive/project/Dataset/data/12
folder_out12: /content/drive/MyDrive/project/Dataset/data/13
folder_out13: /content/drive/MyDrive/project/Dataset/data/14
folder_out14: /content/drive/MyDrive/project/Dataset/data/15
folder_out15: /content/drive/MyDrive/project/Dataset/data/16
If you want to create a folder for each path:
import os
for x in folder_out:
os.mkdir(x)
which will create 79 empty folders
I needed a way to pull 10% of the files in a folder, at random, for sampling after every "run." Luckily, my current files are numbered numerically, and sequentially. So my current method is to list file names, parse the numerical portion, pull max and min values, count the number of files and multiply by .1, then use random.sample to get a "random [10%] sample." I also write these names to a .txt then use shutil.copy to move the actual files.
Obviously, this does not work if I have an outlier, i.e. if I have a file 345.txt among other files from 513.txt - 678.txt. I was wondering if there was a direct way to simply pull a number of files from a folder, randomly? I have looked it up and cannot find a better method.
Thanks.
Using numpy.random.choice(array, N) you can select N items at random from an array.
import numpy as np
import os
# list all files in dir
files = [f for f in os.listdir('.') if os.path.isfile(f)]
# select 0.1 of the files randomly
random_files = np.random.choice(files, int(len(files)*.1))
I was unable to get the other methods to work easily with my code, but I came up with this.
output_folder = 'C:/path/to/folder'
for x in range(int(len(files) *.1)):
to_copy = choice(files)
shutil.copy(os.path.join(subdir, to_copy), output_folder)
This will give you the list of names in the folder with mypath being the path to the folder.
from os import listdir
from os.path import isfile, join
from random import shuffle
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
shuffled = shuffle(onlyfiles)
small_list = shuffled[:len(shuffled)/10]
This should work
You can use following strategy:
Use list = os.listdir(path) to get all your files in the directory as list of paths.
Next, count your files with range = len(list) function.
Using rangenumber you can get random item number like that random_position = random.randrange(1, range)
Repeat step 3 and save values in a list until you get enough positions (range/10 in your case)
After that you can get required files names like that list[random_position]
Use cycle for for iterating.
Hope this helps!
Based on Karl's solution (which did not work for me under Win 10, Python 3.x), I came up with this:
import numpy as np
import os
# List all files in dir
files = os.listdir("C:/Users/.../Myfiles")
# Select 0.5 of the files randomly
random_files = np.random.choice(files, int(len(files)*.5))
# Get the remaining files
other_files = [x for x in files if x not in random_files]
# Do something with the files
for x in random_files:
print(x)
Following is the code
import os
def get_size(path):
total_size = 0
for root, dirs, files in os.walk(path):
for f in files:
fp = os.path.join(root, f)
total_size += os.path.getsize(fp)
return total_size
for root,dirs,files in os.walk('F:\House'):
print(get_size(dirs))
OUTPUT :
F:\House 21791204366
F:\House\house md 1832264906
F:\House\house md\house M D 1 1101710538
F:\House\Season 2 3035002265
F:\House\Season 3 3024588888
F:\House\Season 4 2028970391
F:\House\Season 5 3063415301
F:\House\Season 6 2664657424
F:\House\Season 7 3322229429
F:\House\Season 8 2820075762
I need only sub directories after main directory with their sizes. My code is going till the last directory and writing its size.
As an example:
F:\House 21791204366
F:\House\house md 1832264906
F:\House\house md\house M D 1 1101710538
It has printed the size for house md as well as house M D 1 (which is a subdirectory in house md). But I only want it till house md sub directory level.
DESIRED OUTPUT:
I need the size of each sub dir after the main dir level (specified by the user) and not sub sub dir (but their size should be included in parent dirs.)
How do I go about it ?
To print the size of each immediate subdirectory and the total size for the parent directory similar to du -bcs */ command:
#!/usr/bin/env python3.6
"""Usage: du-bcs <parent-dir>"""
import os
import sys
if len(sys.argv) != 2:
sys.exit(__doc__) # print usage
parent_dir = sys.argv[1]
total = 0
for entry in os.scandir(parent_dir):
if entry.is_dir(follow_symlinks=False): # directory
size = get_tree_size_scandir(entry)
# print the size of each immediate subdirectory
print(size, entry.name, sep='\t')
elif entry.is_file(follow_symlinks=False): # regular file
size = entry.stat(follow_symlinks=False).st_size
else:
continue
total += size
print(total, parent_dir, sep='\t') # print the total size for the parent dir
where get_tree_size_scandir()[text in Russian, code in Python, C, C++, bash].
The size of a directory here is the apparent size of all regular files in it and its subdirectories recursively. It doesn't count the size for the directory entries themselves or the actual disk usage for the files. Related: why is the output of du often so different from du -b.
Instead of using os.walk in your getpath function, you can use listdir in conjunction with isdir:
for file in os.listdir(path):
if not os.path.isdir(file):
# Do your stuff
total_size += os.path.getsize(fp)
...
os.walk will visit the entire directory tree, whereas listdir will only visit the files in the current directory.
However, be aware that this will not add the size of the subdirectories to the directory size. So if "Season 1" has 5 files of 100MB each, and 5 directories of 100 MB each, then the size reported by your function will be 500MB only.
Hint: Use recursion if you want the size of subdirectories to get added as well.
This question already has answers here:
Travel directory tree with limited recursion depth
(2 answers)
Closed 5 years ago.
I would like to search and print directories under c:// for example, but only list 1st and 2nd levels down, that do contain SP30070156-1.
what is the most efficient way to get this using python 2 without the script running though the entire sub-directories (so many in my case it would take a very long time)
typical directory names are as follow:
Rooty Hill SP30068539-1 3RD Split Unit AC Project
Oxford Falls SP30064418-1 Upgrade SES MSB
Queanbeyan SP30066062-1 AC
You can try to create a function based on os.walk(). Something like this should get you started:
import os
def walker(base_dir, level=1, string=None):
results = []
for root, dirs, files in os.walk(base_dir):
_root = root.replace(base_dir + '\\', '') #you may need to remove the "+ '\\'"
if _root.count('\\') < level:
if string is None:
results.append(dirs)
else:
if string in dirs:
results.append(dirs)
return results
Then you can just call it with string='SP30070156-1' and level 1 then level 2.
Not sure if it's going to be faster than 40s, though.
here is the code i used, the method is quick to list, if filtered for keyword then it is even quicker
import os
MAX_DEPTH = 1
#folders = ['U:\I-Project Works\PPM 20003171\PPM 11-12 NSW', 'U:\I-Project Works\PPM 20003171\PPM 11-12 QLD']
folders = ['U:\I-Project Works\PPM 20003171\PPM 11-12 NSW']
try:
for stuff in folders:
for root, dirs, files in os.walk(stuff, topdown=True):
for dir in dirs:
if "SP30070156-1" in dir:
sp_path = root + "\\"+ dir
print(sp_path)
raise Found
if root.count(os.sep) - stuff.count(os.sep) == MAX_DEPTH - 1:
del dirs[:]
except:
print "found"