What I have a directory of folders and subfolders. What I'm trying to do is get the number of subfolders within the folders, and plot them on a scatter plot using matplotlib. I have the code to get the number of files, but how would I get the number of subfolders within a folder. This probably has a simple answer but I'm a newb to Python. Any help is appreciated.
This is the code I have so far to get the number of files:
import os
import matplotlib.pyplot as plt
def fcount(path):
count1 = 0
for f in os.listdir(path):
if os.path.isfile(os.path.join(path, f)):
count1 += 1
return count1
path = "/Desktop/lay"
print fcount(path)
import os
def fcount(path, map = {}):
count = 0
for f in os.listdir(path):
child = os.path.join(path, f)
if os.path.isdir(child):
child_count = fcount(child, map)
count += child_count + 1 # unless include self
map[path] = count
return count
path = "/Desktop/lay"
map = {}
print fcount(path, map)
Here is a full implementation and tested. It returns the number of subfolders without the current folder. If you want to change that you have to put the + 1 in the last line instead of where the comment is.
I think os.walk could be what you are looking for:
import os
def fcount(path):
count1 = 0
for root, dirs, files in os.walk(path):
count1 += len(dirs)
return count1
path = "/home/"
print fcount(path)
This will walk give you the number of directories in the given path.
Try the following recipe:
import os.path
import glob
folder = glob.glob("path/*")
len(folder)
Answering to:
how would I get the number of subfolders within a folder
You can use the os.path.isdir function similarly to os.path.isfile to count directories.
I guess you are looking for os.walk. Look in the Python reference, it says that:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down
or bottom-up. For each directory in the tree rooted at directory top
(including top itself), it yields a 3-tuple (dirpath, dirnames,
filenames).
So, you can try to do this to get only the directories:
for root, dirs, files in os.walk('/usr/bin'):
for name in dirs:
print os.path.join(root, name)
count += 1
Related
I need to show the total file count for all top-level directories, including the ones that have a file count of zero. Each top-level directory can contain subdirectories. I need the total count listed next to top-level directory only.
cnt = 0
for dirpath, dirnames, files in os.walk(FILES):
filecount = len(files)
cnt += filecount
print(dirnames,": ",filecount)
How can I get the above to print something like:
top-level-dir1: 234
top-level-dir2: 0
top-level-dir3: 5
....etc.
So, total files, including what's in the nested subfolders, but print the total next to the top-level folders only.
for directory in os.listdir(DOCUMENTS):
if os.path.isdir(directory):
filecount = 0
for dirpath, dirnames, files in os.walk(directory):
filecount += len(files)
print(directory,": ",filecount)
I'm close, but this just shows file count as 1 for each.
You are resetting the filecount variable for each directory. Instead, you want the count to persist over each directory in CONTRACTS.
Also, os.listdir(CONTRACTS) only shows the immediate directory names in CONTRACTS; for the script to work in directories other than the current directory, you need to use os.path.join() to specify the full path when you call os.walk().
Finally, as #TimRoberts says, you should use os.path.isdir() to check the output of os.listdir(), as it can also return files.
Something like this should do the trick:
import os
target = "target_directory"
for dir_name in os.listdir(target):
dir_path = os.path.join(target, dir_name)
if os.path.isdir(dir_path):
file_count = 0
for _, _, files in os.walk(dir_path):
file_count += len(files)
print(f"{dir_name}: {file_count}")
After doing my research for this specific task I found at that most of the solution given for this kind of problem either return the list of all the files or the TOTAL size of the folder/file.
What I am trying to achieve is get an output in the CSV file stating the folder structure i.e. folders - sub folders - files (optional) along with the size information for EACH.
There is no specific format for the CSV. I just need to know the tree structure with the size of the folder/sub-folder.
The reason behind this is that we are moving from physical servers to the cloud. In order to verify whether all the data was retained correctly during conversion I need to make a similar list of all SHARED DRIVES which can later be validated.
Looking forward for meaningful insights. Thanks!
Edit:
Sooo, that should be what you are asking for:
import os
import csv
def sizeof_fmt(num, suffix='B'):
for unit in ['','K','M','G','T','P','E','Z']:
if abs(num) < 1024.0:
return "%3.1f%s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f%s%s" % (num, 'Yi', suffix)
def get_size(start_path = '.'):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return sizeof_fmt(total_size)
with open("yourfilename.csv", mode="w") as dir_file:
csv_writer = csv.writer(dir_file, delimiter=",")
def files_and_sizes(start_path):
dir_list = [file for file in os.listdir(start_path)]
for file in dir_list:
path = start_path + "\\" + file
if os.path.isdir(path) is True:
csv_writer.writerow([file, get_size(path)])
files_and_sizes(start_path + "\\" + file)
files_and_sizes(r"C:\your\path\here")
Updated to better fit the question.
You can get all files with sizes like this:
import os
all_files_with_size = []
def files_and_sizes(start_path):
dir_list = [file for file in os.listdir(start_path)]
current_dir = []
for file in dir_list:
path = start_path + "\\" + file
if os.path.isdir(path) is True:
current_dir.append(files_and_sizes(path))
else:
current_dir.append((file, os.lstat(path).st_size))
return current_dir
It will return a list containing all files like (file, size) and a sublist for each directory.
I recommend appending the entries to a file, but the formatting is up to you.
Also, if you want the directory sizes as well:
if os.path.isdir(path) is True:
current_dir.append(file, os.lstat(path).st_size)
current_dir.append(files_and_sizes(path))
I believe you will have to use a combination of the solutions that you have already found. Such as 'os.listdir(path)' to get the contents of a directory, 'os.lstat(path).st_size' to get file size, and 'os.path.isdir(path)' and 'os.path.isfile(path)' to determine the type.
I currently have a code that searches for files by keyword. Is there a way to show the number of files found, as the code runs and or show the progress? I have a large directory to search and would like to see the progress if possible. The code I currently have doesn't show much info or processing time.
import os
import shutil
import time
import sys
def update_progress_bar():
print '\b.',
sys.stdout.flush()
print 'Starting ',
sys.stdout.flush()
path = '//server/users/'
keyword = 'monthly report'
for root, dirs, files in os.walk(path):
for name in files:
if keyword in name.lower():
time.sleep(0)
update_progress_bar()
print ' Done!'
This is pretty simple, but why not just keep a counter?
files_found = 0
for root, dirs, files in os.walk(path):
for name in files:
if keyword in name.lower():
files_found += 1
time.sleep(0)
update_progress_bar()
print "Found {}".format(files_found)
Edit: if you want to calculate progress you should first figure out how many files you'll be iterating over. If you use a nested list comprehension you can flatten each of the files from each triple emitted by os.walk.
filenames = [name for file in [files for _, _, files in os.walk(path)]]
num_files = float(len(filenames))
Now at each step you can describe the progress as being the current step number divided by the number of files. In other words, using enumerate to get the step number:
files_found = 0
for step, name in enumerate(filenames):
progress = step / num_files
print "{}% complete".format(progress * 100)
if keyword in name.lower():
files_found += 1
time.sleep(0)
update_progress_bar()
If you want to get more creative in how you print the progress that's a different question.
I am lookling to get count of folders and subfolders with a given name... Here I am searching for number of subfolders named "L-4"? Returns zero and I am sure thats not true? What did I miss?
import os
path = "R:\\"
i = 0
for (path, dirs, files) in os.walk(path):
if os.path.dirname == "L-4":
i += 1
print i
os.path.dirname is a reference to the standard library function, not a string. Perhaps you wanted to use os.path.dirname(path) instead here.
You could instead count how many times L-4 appears in the dirs list:
i = 0
for root, dirs, files in os.walk(path):
i += dirs.count('L-4')
print i
or, as a one-liner:
print sum(dirs.count('L-4') for _, dirs, _ in os.walk(path))
I'm looking for a way to do a non-recursive os.walk() walk, just like os.listdir() works. But I need to return in the same way the os.walk() returns. Any idea?
Thank you in advance.
Add a break after the filenames for loop:
for root, dirs, filenames in os.walk(workdir):
for fileName in filenames:
print (fileName)
break #prevent descending into subfolders
This works because (by default) os.walk first lists the files in the requested folder and then goes into subfolders.
next(os.walk(...))
My a bit more parametrised solution would be this:
for root, dirs, files in os.walk(path):
if not recursive:
while len(dirs) > 0:
dirs.pop()
//some fancy code here using generated list
Edit: fixes, if/while issue. Thanks, #Dirk van Oosterbosch :}
Well what Kamiccolo meant was more in line with this:
for str_dirname, lst_subdirs, lst_files in os.walk(str_path):
if not bol_recursive:
while len(lst_subdirs) > 0:
lst_subdirs.pop()
Empty the directories list
for r, dirs, f in os.walk('/tmp/d'):
del dirs[:]
print(f)
Flexible Function for counting files:
You can set recursive searching and what types you want to look for. The default argument: file_types=("", ) looks for any file. The argument file_types=(".csv",".txt") would search for csv and txt files.
from os import walk as os_walk
def count_files(path, recurse=True, file_types = ("",)):
file_count = 0
iterator = os_walk(path) if recurse else ((next(os_walk(path))), )
for _, _, file_names in iterator:
for file_name in file_names:
file_count += 1 if file_name.endswith(file_types) else 0
return file_count