I am trying to write a simple function that returns a list of files in a directory and its subdirectories. I shamelessly stole the majority of this function from another SO poster. I am using Python 2.6.4.
def getFiles(Asite):
# returns a list of config files
from os import listdir
from os.path import isfile, join
mypath = '/etc/config/' + Asite
print mypath
files = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
return files
The function simply returns an empty list, []. It appears that the mypath variable is not being interpolated by the listdir() and isfile() functions. Before anyone asks, yes, I have verified that there are in fact files located at mypath. Why is my files array empty?
Thanks all for your helpful comments. It turns out that the directory I was searching in only had sudirectories in it, but not files. So os.listdir() didn't work for searching subdirectories, as it only goes one level (thank you abhishekgarg). I ended up using the code below, which worked great (which I also found on SO, and modified a bit).
def getFiles(Asite):
# returns a list of config files
files = []
mypath = '/share/profile/base/sol-10-sparc-base/config/' + Asite
for dirname, dirnames, filenames in os.walk(mypath):
for filename in filenames:
pfile = os.path.join(dirname, filename)
files.append(pfile[len(mypath):])
return files
Related
I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?
A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.
I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))
I am learning python and just tried out the os.walk() function.I am using python 3.4.4 (64-bit) on windows platform.According to my understanding python should execute my statements line by line.
In this code i have iterated over a directory.The directory structure is
I need to print all the files first followed by the directory names.The code written is:
import os
dir_path = r"D:\\python_os_walk_check"
for root,dirs,files in os.walk(dir_path):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
The ouput printed is:
first_folder
second_folder
test1.txt
test2.txt
According to me the output should be:
test1.txt
test2.txt
first_folder
second_folder
Where am i going wrong?
You first get the contents of dir_path, which is only the two directories. The files are inside the directories, so you get to them later, in a second and third iteration of your loop. Add a print(root) as the first thing inside the loop and you'll see more clearly what's happening.
You could store the names of files/directories and print them once your for loop completes.
import os
dir_path = "D:\\python_os_walk_check"
file_list = []
dir_list = []
for root,dirs,files in os.walk(dir_path):
for file_name in files:
file_list.append(file_name)
for dir_name in dirs:
dir_list.append(dir_name)
for x in file_list:
print(x)
for i in dir_list:
print(i)
You might get a bit more like what you seem to want if you use the topdown parameter in os.walk like so:
import os
dir_path = r"C:\Python34\Tools\pynche"
for root,dirs,files in os.walk(dir_path, topdown=False):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
Unfortunately I think this is only good for two-level folders.
As a more up-to-date solution you could consider the abilities offered in the pathlib module, Python 3 doc 11.1.
I know how to use python to check to see if a file exists, but what I am after is trying to see if multiple files of the same name exist throughout my working directory. Take for instance:
gamedata/areas/
# i have 2 folders in this directory
# testarea and homeplace
1. gamedata/areas/testarea/
2. gamedata/areas/homeplace/
Each folder of homeplace and testarea for instance contains a file called 'example'
Is there a pythonic way to use 'os' or similiar to check to see if the file 'example' can be found in both testarea and homeplace?
Although is their a way to do this without manually and statically using
os.path.isfile()
because throughout the life of the program new directories will be made, and I don't want to constantly go back into the code to change it.
You can check in every directory bellow gamedata/areas/:
This only goes down one level, you could extend it to go down as many levels as you want.
from os import listdir
from os.path import isdir, isfile, join
base_path = "gamedata/areas/"
files = listdir(base_path)
only_directories = [path for path in files if isdir(join(base_path,path))]
for directory_path in only_directories:
dir_path = join(base_path, directory_path)
for file_path in listdir(dir_path):
full_file_path = join(base_path, dir_path, file_path)
is_file = isfile(full_file_path)
is_example = "example" in file_path
if is_file and is_example:
print "Found One!!"
Hope it helps!
Maybe something like
places = ["testarea", "homeplace"]
if all(os.path.isfile(os.path.join("gamedata/areas/", x, "example") for x in places)):
print("Missing example")
If the condition is false, this doesn't tell you which subdirectory does not contain the file example, though. You can update places as necessary.
As I mentioned in the comments, os.walk is your friend:
import os
ROOT="gamedata/areas"
in_dirs = [path for (path, dirs, filenames)
in os.walk(ROOT)
if 'example' in filenames]
in_dirs will be a list of subdirectories where example is found
I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.
I tried this:
import os
rootdir ='C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file,'r')
lines=f.readlines()
f.close()
f=open(file,'w')
for line in lines:
newline = "No you are not"
f.write(newline)
f.close()
but I am getting an error. What am I doing wrong?
The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(os.path.join(subdir, file))
If you still get errors when running the above, please provide the error message.
Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):
from pathlib import Path
rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]
# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]
Since Python 3.5, the glob module also supports recursive file finding:
import os
from glob import iglob
rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob(rootdir_glob, recursive=True) if os.path.isfile(f)]
The file_list from either of the above approaches can be iterated over without the need for a nested loop:
for f in file_list:
print(f) # Replace with desired operations
From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:
import glob, os
for filename in glob.iglob('/pardadox-music/**', recursive=True):
if os.path.isfile(filename): # filter dirs
print(filename)
Output:
/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...
Notes:
glob.iglob
glob.iglob(pathname, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
If recursive is True, the pattern '**' will match any files and
zero or more directories and subdirectories.
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif') ['card.gif']
>>> glob.glob('.c*')['.card.gif']
You can also use rglob(pattern),
which is the same as calling glob() with **/ added in front of the given relative pattern.
What is the simplest way to get the full recursive list of files inside a folder with python? I know about os.walk(), but it seems overkill for just getting the unfiltered list of all files. Is it really the only option?
There's nothing preventing you from creating your own function:
import os
def listfiles(folder):
for root, folders, files in os.walk(folder):
for filename in folders + files:
yield os.path.join(root, filename)
You can use it like so:
for filename in listfiles('/etc/'):
print filename
os.walk() is not overkill by any means. It can generate your list of files and directories in a jiffy:
files = [os.path.join(dirpath, filename)
for (dirpath, dirs, files) in os.walk('.')
for filename in (dirs + files)]
You can turn this into a generator, to only process one path at a time and safe on memory.
You could also use the find program itself from Python by using sh
import sh
text_files = sh.find(".", "-iname", "*.txt")
Either that or manually recursing with isdir() / isfile() and listdir() or you could use subprocess.check_output() and call find .. Bascially os.walk() is highest level, slightly lower level is semi-manual solution based on listdir() and if you want the same output find . would give you for some reason you can make a system call with subprocess.
pathlib.Path.rglob is pretty simple. It lists the entire directory tree
(The argument is a filepath search pattern. "*" means list everything)
import pathlib
for path in pathlib.Path("directory_to_list/").rglob("*"):
print(path)
os.walk() is hard to use, just kick it and use pathlib instead.
Here is a python function mimicking a similar function of list.files in R language.
def list_files(path,pattern,full_names=False,recursive=True):
if(recursive):
files=pathlib.Path(path).rglob(pattern)
else:
files=pathlib.Path(path).glob(pattern)
if full_names:
files=[str(f) for f in files]
else:
files=[f.name for f in files]
return(files)
import os
path = "path/to/your/dir"
for (path, dirs, files) in os.walk(path):
print files
Is this overkill, or am I missing something?