Make a list of root with os.walk - python

I am trying to setup a specific folder/file structure, which I will then copy into my test setup. I want a list of unique folders, that I can then create.
How to I get root into a list?
If I do the following:
for root, dirs, filenames in os.walk(path):
print root
I get:
/Users/Me/Folder
/Users/Me/Folder/SubFolder
But as I want use it in a for-loop it gets messed up.
for root, dirs, filenames in os.walk(path):
for x in root:
print x
and I get this result:
/
U
s
e
r
s
/
M
e
/
F
o
l
.
.
. and so on

To get an variable you are iterating over into a list simply append it to a list:
list = []
for root, dirs, filenames in os.walk(path):
list.append(root)
To create a list of folders you can simple use os.mkdir(path):
for path in list:
os.mkdir(path)
if you want an additional print statement to see which folders you created use:
for path in list:
os.mkdir(path)
print("created:{}".format(path))

Finally I found the answer:
for root, dirs, filenames in os.walk(src_path):
for x in root.splitlines():
print x

Related

How to get a list of list of subfolder files with full path?

I would like to get the same list structure that i am getting in this approach but i get a full list instead which i would have to break down manually and it kills the "automate the task".
For example, I have a folder called test with 4 subfolders called A,B,C,D and inside each folder we can find files file1, file2, file3.
import os
import openpyxl
#Create a regex that matches files with the american date format
path = r'C:\Users\Dell_G7_15\Desktop\test'
pathWalk = os.walk(path)
fileIndex = os.listdir(path)
wb = openpyxl.Workbook()
i=0
filenames = []
filesPathLink=[]
for foldernames in pathWalk:
filenames.append(foldernames[-1]) #creating list of filenames
i= i+1
filenames.pop(0) #delete first value of the list that causes error
print(filenames)
When i print filenames i get:
[['file1', 'file2', 'file3'],['file1', 'file2', 'file3'],['file1', 'file2', 'file3']]
I am looking for the same list structure but to get the full path of each one and it would look like this:
[['../A/file1', '../A/file2', '../A/file3'],[....],[....]]
Is this what you are looking for?
For the following folder and sub folders -
# root/
# -img0.jpg
# sub1/
# -img1.jpg
# -img1 copy.jpg
# sub2/
# -img2.jpg
# subsub1/
# -img3.jpg
path = '/Users/name/Desktop/root'
[[r+'/'+fname for fname in f] for r,d,f in os.walk(path)]
[['/Users/name/Desktop/root/img0.jpg'],
['/Users/name/Desktop/root/sub1/img1 copy.jpg',
'/Users/name/Desktop/root/sub1/img1.jpg'],
['/Users/name/Desktop/root/sub2/img2.jpg'],
['/Users/name/Desktop/root/sub2/subsub1/img3.jpg']]
For completion sake, if anyone is looking for a flat list of all files with paths inside a multi-level folder structure then try this -
[r+'/'+fname for r,d,f in os.walk(path) for fname in f]
['/Users/name/Desktop/root/img0.jpg',
'/Users/name/Desktop/root/sub1/img1 copy.jpg',
'/Users/name/Desktop/root/sub1/img1.jpg',
'/Users/name/Desktop/root/sub2/img2.jpg',
'/Users/name/Desktop/root/sub2/subsub1/img3.jpg']
EDIT: Simple loop without a list comprehension
filepaths = []
for r,d,f in os.walk(path):
l = []
for fname in f:
l.append(r+'/'+fname)
filepaths.append(l)
print(filepaths)
[['/Users/name/Desktop/root/img0.jpg'],
['/Users/name/Desktop/root/sub1/img1 copy.jpg',
'/Users/name/Desktop/root/sub1/img1.jpg'],
['/Users/name/Desktop/root/sub2/img2.jpg'],
['/Users/name/Desktop/root/sub2/subsub1/img3.jpg']]

Print statement not responding in my filemanagement system

I have 2 folders: Source and Destination. Each of those folders have 3 subfolders inside them named A, B and C. The 3 subfolders in Source all contain multiple files. The 3 subfolders in Destination are empty (yet).
I need the full path of all because my goal is to overwrite the files from Source A, B and C in Destination A, B and C.
How come my two print statements are not printing anything? I have zero errors.
import os
src = r'c:\data\AM\Desktop\Source'
dst = r'c:\data\AM\Desktop\Destination'
os.chdir(src)
for root, subdirs, files in os.walk(src):
for f in subdirs:
subdir_paths = os.path.join(src, f)
subdir_paths1 = os.path.join(dst, f)
for a in files:
file_paths = os.path.join(subdir_paths, a)
file_paths1 = os.path.join(subdir_paths1, a)
print(file_paths)
print(file_paths1)
Problem
As jasonharper said in a comment,
You are misunderstanding how os.walk() works. The files returned in files are in the root directory; you are acting as if though they existed in each of the subdirs directories, which are actually in root themselves.
The reason nothing is printed is that, on the first iteration, files is empty, so for a in files is not entered. Then on the following iterations (where root is A, B and C respectively), subdirs is empty, so for f in subdirs is not entered.
Solution
In fact you can ignore subdirs entirely. Instead walk the current dir, and join src/dst + root + a:
import os
src = r'c:\data\AM\Desktop\Source'
dst = r'c:\data\AM\Desktop\Destination'
os.chdir(src)
for root, subdirs, files in os.walk('.'):
src_dir = os.path.join(src, root)
dst_dir = os.path.join(dst, root)
for a in files:
src_file = os.path.join(src_dir, a)
dst_file = os.path.join(dst_dir, a)
print(src_file)
print(dst_file)
The output should have an extra dot directory between src/dst and root. If anyone could tell me how to get rid of it, I'm all ears.

Scanning for file paths with glob

I am searching for all .csv's located in a subfolder with glob like so:
def scan_for_files(path):
file_list = []
for path, dirs, files in os.walk(path):
for d in dirs:
for f in glob.iglob(os.path.join(path, d, '*.csv')):
file_list.append(f)
return file_list
If I call:
path = r'/data/realtimedata/trades/bitfinex/'
scan_for_files(path)
I get the correct recursive list of files:
['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']
However when using the actual sub-directory containing the files I want - it returns an empty list. Any idea why this is happening? Thanks.
path = r'/data/realtimedata/trades/bitfinex/btcusd/'
scan_for_files(path)
returns: []
Looks like btcusd is a bottom-level directory. That means that when you call os.walk with the r'/data/realtimedata/trades/bitfinex/btcusd/' path, the dirs variable will be an empty list [], so the inner loop for d in dirs: does not execute at all.
My advice would be to re-write your function to iterate over the files directly, and not the directories... don't worry, you'll get there eventually, that's the nature of a directory tree.
def scan_for_files(path):
file_list = []
for path, _, files in os.walk(path):
for f in files:
file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))
return file_list
However, on more recent versions of python (3.5+), you can use recursive glob:
def scan_for_files(path):
return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)
Source.

Print second folder name

I have a Python code that sorts the folders inside a folder.
However, I want to print the name of the second folder and not all of them.
Any suggestions?
for root,dirs,files in os.walk("C:\\Folder testing"):
for dirname in sorted(dirs, key=int, reverse=True):
print(dirs)
I wouldn't use os.walk for printing just one folder.
I'd rather make a list of all the folders, and then select the one I want:
some_path = "C:\\Folder testing"
dirs = [f for f in os.listdir(some_path) if os.path.isdir(os.path.join(some_path, f))]
dirs_sorted = sorted(dirs, key=int, reverse=True)
try:
print dirs_sorted[1]
except IndexError:
print "Folder doesn't exist"
Beware that your sorting method requires that the folders names are numbers only.

count number of folders with given name

I am lookling to get count of folders and subfolders with a given name... Here I am searching for number of subfolders named "L-4"? Returns zero and I am sure thats not true? What did I miss?
import os
path = "R:\\"
i = 0
for (path, dirs, files) in os.walk(path):
if os.path.dirname == "L-4":
i += 1
print i
os.path.dirname is a reference to the standard library function, not a string. Perhaps you wanted to use os.path.dirname(path) instead here.
You could instead count how many times L-4 appears in the dirs list:
i = 0
for root, dirs, files in os.walk(path):
i += dirs.count('L-4')
print i
or, as a one-liner:
print sum(dirs.count('L-4') for _, dirs, _ in os.walk(path))

Categories

Resources