Count and print number of files in subfolders using Python - python

My folder structure is as follows
Folder A
Folder B1
Folder B2
....
Folder Bn
How can I count the number of files in each of the folders (Folder B1 - Folder Bn), check if the number of files is larger than a given limit and print the folder name and number of files in it on the screen?
Like this:
Folders with too many files:
Folder B3 101
Folder B7 256
Here's what I've tried so far. It goes through every subfolder in each of my Folder B1 etc. I just need file count in one level.
import os, sys ,csv
path = '/Folder A/'
outwriter = csv.writer(open("numFiles.csv", 'w')
dir_count = []
for root, dirs, files in os.walk(path):
for d in dirs:
a = str(d)
count = 0
for fi in files:
count += 1
y = (a, count)
dir_count.append(y)
for i in dir_count:
outwriter.writerow(i)
And then I just printed numFiles.csv. Not quite how I'd like to do it.
Thanks in advance!

As the are all contained in that single folder, you only need to search that directory:
import os
path = '/Folder A/'
mn = 20
folders = ([name for name in os.listdir(path)
if os.path.isdir(os.path.join(path, name)) and name.startswith("B")]) # get all directories
for folder in folders:
contents = os.listdir(os.path.join(path,folder)) # get list of contents
if len(contents) > mn: # if greater than the limit, print folder and number of contents
print(folder,len(contents)

os.walk(path) gives you three tuple for a directory, ie (directory,subdirectory,files).
directory -> list of all directory in current dir, list of subdirectory in current dir, list of files in current dir.
so you can code likes this:
import os
for dir,subdir,files in os.walk(path):
if len(files) > your_limit:
print dir + " has crossed limit, " + "total files: " + len(files)
for x in files:
print x
if you want to walk only one level, you need to code like this:
for x in os.listdir(path):
if os.path.isdir(x):
count = len([ y for y in os.listdir(x) if os.path.isfile(os.path.join(x,y)) ])
if count > your_limit:
print x + " has crossed limit: ", +count

Related

How to find the size of all files in a directory and all its sub-directories?

I'm trying to print the name and the size of all files in a directory and all its sub-directories, but it only prints the name and size of the files in the first directory but not the sub-directories. Any help will be appreciated.
import os
path = os.getcwd()
walk_method = os.walk(path)
while True:
try:
p, sub_dir, files = next(walk_method)
break
except:
break
size_of_file = [
(f, os.stat(os.path.join(path, f)).st_size)
for f in files
]
for sub in sub_dir:
i = os.path.join(path, sub)
size = 0
for k in os.listdir(i):
size += os.stat(os.path.join(i, k)).st_size
size_of_file.append((sub, size))
for f, s in sorted(size_of_file, key = lambda x: x[1]):
print("{} : {}MB".format(os.path.join(path, f), round(s/(1024*1024), 3)))
I'm expecting to print the name and file size of all files in the current directory and all the sub-directories.
The documentation
has some helpful example code
that you might have chosen to follow.
A loop forever / next() / break approach could be
made to work, I'm sure, but it's not idiomatic
and that style does not improve the maintainability
of the code.
from pathlib import Path
import os
total = 0
for root, dirs, files in os.walk("."):
for file in files:
path = Path(root) / file
print(path)
total += path.stat().st_size
print(f"Total of {total} bytes.")
pathlib is amazing here I think, there are many ways of solving this but one simple example is something like this:
from pathlib import Path
dir = "."
paths = list(Path(dir).glob('**/*'))
for path in paths:
if path.is_file():
print(f"{path.name}, {path.lstat().st_size}")
You don't need the loop but for simplicity in this example I just used it.

Rename specific files in a folder

I am trying to do a program that renames the files that are not named in order for example :
input: in a folder we have these files : ['spam001.txt', 'spam002.txt', 'spam003.txt', 'spam005.txt', 'spam007.txt']
output: i want to end up with the same files named like this : ['spam001.txt', 'spam002.txt', 'spam003.txt', 'spam004.txt', 'spam005.txt']
Here is what i have done until now :
import os,shutil
count = 1
path = '.\\filling in the gaps' #this is the folder where i have the .txt files
files = os.listdir(path)
for i in files:
if(i[6] != str(count)):
os.rename(os.path.join(path,i),'spam00%s.txt' %(count))
count = count+1
My problem is that when I run the program, for the output mentioned above , I get ['spam001.txt', 'spam002.txt', 'spam003.txt'] in the 'filling in the gaps' folder and I get
['spam004.txt', 'spam005.txt'] in the folder that contains 'filling in the gaps' folder.
Basically, my program renames the files, but the files end up in another folder. Any idea why this is happening?
This is where the files go:
Here is a working version of your code, providing your desired results:
import os
count = 1
path = '.\\filling in the gaps' # this is the folder where i have the .txt files
files = os.listdir(path)
for i in files:
if (i[6] != str(count)):
os.rename(os.path.join(path, i),
os.path.join(path, 'spam00%s.txt' % (count))
)
count = count + 1

Rename files and there store it to csv using Python

I want to rename files in subfolders which I can do with this following code.
import os
#This is to rename files
def main():
path = "D:\Proyekan\Data yang udah di extract"
count = 1
for root, dirs, files in os.walk(path):
for i in files:
fullpath = os.path.join(root,i)
os.rename(os.path.join(fullpath), os.path.join(root, "Flight_" + str(1000000+count) + ".mat"))
count += 1
if __name__ == '__main__':
main()
The code works for subfolders so:
Folder 1 the file names are Flight_1-Flight_25 and
Folder 2 the file names are Flight_26-Flight_27.
The problem is, I need to list the old name into new files, so I can track it, the folders too. Ex :
123123 to Flight 1 in Folder 1,
12313 to Flight 2 in Folder 2,
and so on. Any ideas how? I got lost in trying to figure how
I want the results to be like this:

Rename multiple files inside multiple folders

So I have a lot of folders with a certain name. In each folder I have +200 items. The items inside the folders has names like:
CT.34562346.246.dcm
RD.34562346.dcm
RN.34562346.LAO.dcm
And some along that style.
I now wish to rename all files inside all folders so that the number (34562346) is replaced with the name of the folder. So for example in the folder named "1" the files inside should become:
CT.1.246.dcm
RD.1.dcm
RN.1.LAO.dcm
So only the large number is replaced. And yes, all files are similar like this. It would be the number after the first . that should be renamed.
So far I have:
import os
base_dir = "foo/bar/" #In this dir I have all my folders
dir_list = []
for dirname in os.walk(base_dir):
dir_list.append(dirname[0])
This one just lists the entire paths of all folders.
dir_list_split = []
for name in dir_list[1:]: #The 1 is because it lists the base_dir as well
x = name.split('/')[2]
dir_list_split.append(x)
This one extracts the name of each folder.
And then the next thing would be to enter the folders and rename them. And I'm kind of stuck here ?
The pathlib module, which was new in Python 3.4, is often overlooked. I find that it often makes code simpler than it would otherwise be with os.walk.
In this case, .glob('**/*.*') looks recursively through all of the folders and subfolders that I created in a sample folder called example. The *.* part means that it considers all files.
I put path.parts in the loop to show you that pathlib arranges to parse pathnames for you.
I check that the string constant '34562346' is in its correct position in each filename first. If it is then I simply replace it with the items from .parts that is the next level of folder 'up' the folders tree.
Then I can replace the rightmost element of .parts with the newly altered filename to create the new pathname and then do the rename. In each case I display the new pathname, if it was appropriate to create one.
>>> from pathlib import Path
>>> from os import rename
>>> for path in Path('example').glob('**/*.*'):
... path.parts
... if path.parts[-1][3:11]=='34562346':
... new_name = path.parts[-1].replace('34562346', path.parts[-2])
... new_path = '/'.join(list(path.parts[:-1])+[new_name])
... new_path
... ## rename(str(path), new_path)
... else:
... 'no change'
...
('example', 'folder_1', 'id.34562346.6.a.txt')
'example/folder_1/id.folder_1.6.a.txt'
('example', 'folder_1', 'id.34562346.wax.txt')
'example/folder_1/id.folder_1.wax.txt'
('example', 'folder_2', 'subfolder_1', 'ty.34562346.90.py')
'example/folder_2/subfolder_1/ty.subfolder_1.90.py'
('example', 'folder_2', 'subfolder_1', 'tz.34562346.98.py')
'example/folder_2/subfolder_1/tz.subfolder_1.98.py'
('example', 'folder_2', 'subfolder_2', 'doc.34.34562346.implication.rtf')
'no change'
This will rename files in subdirectories too:
import os
rootdir = "foo" + os.sep + "bar"
for subdir, dirs, files in os.walk(rootdir):
for file in files:
filepath = subdir + os.sep + file
foldername = subdir.split(os.sep)[-1]
number = ""
foundnumber = False
for c in filepath:
if c.isdigit():
foundnumber = True
number = number + c
elif foundnumber:
break
if foundnumber:
newfilepath = filepath.replace(number,foldername)
os.rename(filepath, newfilepath)
Split each file name on the . and replace the second item with the file name, then join on .'s again for the new file name. Here's some sample code that demonstrates the concept.
folder_name = ['1', '2']
file_names = ['CT.2345.234.dcm', 'BG.234234.222.dcm', "RA.3342.221.dcm"]
for folder in folder_name:
new_names = []
for x in file_names:
file_name = x.split('.')
file_name[1] = folder
back_together = '.'.join(file_name)
new_names.append(back_together)
print(new_names)
Output
['CT.1.234.dcm', 'BG.1.222.dcm', 'RA.1.221.dcm']
['CT.2.234.dcm', 'BG.2.222.dcm', 'RA.2.221.dcm']

Transverse through directories in python

I am trying to transverse through directories and count as I am going along so at the the end program output will be like:
output:
./file last accessed 1/1/2000 # just sample date
./dirA has 1 file and 1 sub-dir
./dirA/test/ has 5 files
Here is the code but Im out of ideas now:
directories = [startDir]
#!/usr/bin/python
import os,os.path, time
startDir = os.getcwd()
fileCount=0
directoryCount=0
while len(directories)>0:
directory = directories.pop()
for name in os.listdir(directory):
fullpath = os.path.join(directory,name)
lastAccess = os.stat(fullpath).st_atime
accessTime = time.asctime(time.gmtime(lastAccess))
if os.path.isfile(fullpath):
print fullpath +" is file"+" "+ accessTime
fileCount+=1
elif os.path.isdir(fullpath):
directories.append(fullpath)
directoryCount+=1
print fullpath + " "+ accessTime
print fileCount, directoryCount # only test printing for now
So that is where I am at right now. Just in case I wasnt clear, I want to list files in the current directory (and sub directories) along with the time they were last accessed. I also want to list the directories with how many files and sub dir in them.
A hint using os.walk:
for x,y,z in os.walk('your_path'):
... for file in z:
... print fullpath + ": "+ str(time.asctime(time.gmtime(os.stat(fullpath).st_atime)))
os.walk give three tuples dir,subdir,files

Categories

Resources