Python - Import all files in a directory tree

Python - Import all files in a directory tree - python

I want to import all python files in a directory tree, i.e. if we have the following directory structure:
tests/
tests/foo.py
tests/subtests/bar.py
(Imagine that the tree is of arbitrary depth).
I would like to do import_all('tests') and load foo.py and bar.py. Importing with the usual modules names (tests.foo and tests.subtests.bar) would be nice, but is not required.
My actual use case is that I have a whole bunch of code containing django forms; I want to identify which forms use a particular field class. My plan for the above code is to load all of my code, and then examine all loaded classes to find form classes.
What's a nice, simple way to go about this in python 2.7?

Here's a rough and ready version using os.walk:
import os
prefix = 'tests/unit'
for dirpath, dirnames, filenames in os.walk(prefix):
trimmedmods = [f[:f.find('.py')] for f in filenames if not f.startswith('__') and f.find('.py') > 0]
for m in trimmedmods:
mod = dirpath.replace('/','.')+'.'+m
print mod
__import__(mod)

import os
my_dir = '/whatever/directory/'
files = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(my_dir) for f in files if f.endswith('.py')]
modules = [__import__(os.path.splitext(f)[0],globals(),locals(),[],-1) for f in files]

Related

Python loop through directories

I am trying to use python library os to loop through all my subdirectories in the root directory, and target specific file name and rename them.
Just to make it clear this is my tree structure
My python file is located at the root level.
What I am trying to do, is to target the directory 942ba loop through all the sub directories and locate the file 000000 and rename it to 000000.csv
the current code I have is as follow:
import os
root = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
for dirs, subdirs, files in os.walk(root):
for f in files:
print(dirs)
if f == '000000':
dirs = dirs.strip(root)
f_new = f + '.csv'
os.rename(os.path.join(r'{}'.format(dirs), f), os.path.join(r'{}'.format(dirs), f_new))
But this is not working, because when I run my code, for some reasons the code strips the date from the subduers
can anyone help me to understand how to solve this issue?

A more efficient way to iterate through the folders and only select the files you are looking for is below:
source_folder = '<path-to-dir>/942ba956-8967-4bec-9540-fbd97441d17f/'
files = [os.path.normpath(os.path.join(root,f)) for root,dirs,files in os.walk(source_folder) for f in files if '000000' in f and not f.endswith('.gz')]
for file in files:
os.rename(f, f"{f}.csv")
The list comprehension stores the full path to the files you are looking for. You can change the condition inside the comprehension to anything you need. I use this code snippet a lot to find just images of certain type, or remove unwanted files from the selected files.
In the for loop, files are renamed adding the .csv extension.

I would use glob to find the files.
import os, glob
zdir = '942ba956-8967-4bec-9540-fbd97441d17f'
files = glob.glob('*{}/000000'.format(zdir))
for fly in files:
os.rename(fly, '{}.csv'.format(fly))

Ignoring folders when organizing files

I am fairly new to python, and trying to write a program that organizes files based on their extensions
import os
import shutil
newpath1 = r'C:\Users\User1\Documents\Downloads\Images'
if not os.path.exists(newpath1): # check to see if they already exist
os.makedirs(newpath1)
newpath2 = r'C:\Users\User1\Documents\Downloads\Documents'
if not os.path.exists(newpath2):
os.makedirs(newpath2)
newpath3 = r'C:\Users\User1\Documents\Downloads\Else'
if not os.path.exists(newpath3):
os.makedirs(newpath3)
source_folder = r"C:\Users\User1\Documents\Downloads" # the location of the files we want to move
files = os.listdir(source_folder)
for file in files:
if file.endswith(('.JPG', '.png', '.jpg')):
shutil.move(os.path.join(source_folder,file), os.path.join(newpath1,file))
elif file.endswith(('.pdf', '.pptx')):
shutil.move(os.path.join(source_folder,file), os.path.join(newpath2,file))
#elif file is folder:
#do nothing
else:
shutil.move(os.path.join(source_folder,file), os.path.join(newpath3,file))
I want it to move files based on their extensions. However, I am trying to figure out how to stop the folders from moving. Any help would be greatly appreciated.
Also, for some reason, not every file is being moved, even though they have the same extension.

As with most path operations, I recommend using the pathlib module. Pathlib is available since Python 3.4 and has portable (multi platform), high-level API for file system operations.
I recommend using the following methods on Path objects, to determine their type:
Path.is_file()
Path.is_dir()
import shutil
from pathlib import Path
# Using class for nicer grouping of target directories
# Note that pathlib.Path enables Unix-like path construction, even on Windows
class TargetPaths:
IMAGES = Path.home().joinpath("Documents/Downloads/Images")
DOCUMENTS = Path.home().joinpath("Documents/Downloads/Documents")
OTHER = Path.home().joinpath("Documents/Downloads/Else")
__ALL__ = (IMAGES, DOCUMENTS, OTHER)
for target_dir in TargetPaths.__ALL__:
if not target_dir.is_dir():
target_dir.mkdir(exist_ok=True)
source_folder = Path.home().joinpath("Documents/Downloads") # the location of the files we want to move
# Get absolute paths to the files in source_folder
# files is a generator (only usable once)
files = (path.absolute() for path in source_folder.iterdir() if path.is_file())
def move(source_path, target_dir):
shutil.move(str(source_path), str(target_dir.joinpath(file.name))
for path in files:
if path.suffix in ('.JPG', '.png', '.jpg'):
move(path, TargetPaths.IMAGES)
elif path.suffix in ('.pdf', '.pptx'):
move(path, TargetPaths.DOCUMENTS)
else:
move(path, TargetPaths.OTHER)

See here
In particular, the os.walk command. This command returns a 3-tuple with the dirpath, dirname, and filename.
In your case, you should use [x[0] for x in os.walk(dirname)]

How can I search sub-folders using glob.glob module? [duplicate]

This question already has answers here:
How to use glob() to find files recursively?
(28 answers)
Closed 1 year ago.
I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')
But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?

In Python 3.5 and newer use the new recursive **/ functionality:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursive is set, ** followed by a path separator matches 0 or more subdirectories.
In earlier Python versions, glob.glob() cannot list files in subdirectories recursively.
In that case I'd use os.walk() combined with fnmatch.filter() instead:
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txt files. In this specific case the fnmatch.filter() may be overkill, you could also use a .endswith() test:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]

There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):
glob.glob('*.txt') :matches all files ending in '.txt' in current directory
glob.glob('*/*.txt') :same as 1
glob.glob('**/*.txt') :matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directory
glob.glob('*.txt',recursive=True) :same as 1
glob.glob('*/*.txt',recursive=True) :same as 3
glob.glob('**/*.txt',recursive=True):matches all files ending in '.txt' in the current directory and in all subdirectories
So it's best to always specify recursive=True.

To find files in immediate subdirectories:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use ** and pass recursive=True since Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob() to return paths one by one. Or use pathlib:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).

The glob2 package supports wild cards and is reasonably fast
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.

You can use Formic with Python 2.6
import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")
Disclosure - I am the author of this package.

Here is a adapted version that enables glob.glob like functionality without using glob2.
def find_files(directory, pattern='*'):
if not os.path.exists(directory):
raise ValueError("Directory not found {}".format(directory))
matches = []
for root, dirnames, filenames in os.walk(directory):
for filename in filenames:
full_path = os.path.join(root, filename)
if fnmatch.filter([full_path], pattern):
matches.append(os.path.join(root, filename))
return matches
So if you have the following dir structure
tests/files
├── a0
│   ├── a0.txt
│   ├── a0.yaml
│   └── b0
│   ├── b0.yaml
│   └── b00.yaml
└── a1
You can do something like this
files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']
Pretty much fnmatch pattern match on the whole filename itself, rather than the filename only.

(The first options are of course mentioned in other answers, here the goal is to show that glob uses os.scandir internally, and provide a direct answer with this).
Using glob
As explained before, with Python 3.5+, it's easy:
import glob
for f in glob.glob('d:/temp/**/*', recursive=True):
print(f)
#d:\temp\New folder
#d:\temp\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document.txt
Using pathlib
from pathlib import Path
for f in Path('d:/temp').glob('**/*'):
print(f)
Using os.scandir
os.scandir is what glob does internally. So here is how to do it directly, with a use of yield:
def listpath(path):
for f in os.scandir(path):
f2 = os.path.join(path, f)
if os.path.isdir(f):
yield f2
yield from listpath(f2)
else:
yield f2
for f in listpath('d:\\temp'):
print(f)

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
Doesn't works for all cases, instead use glob2
configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

If you can install glob2 package...
import glob2
filenames = glob2.glob("C:\\top_directory\\**\\*.ext") # Where ext is a specific file extension
folders = glob2.glob("C:\\top_directory\\**\\")
All filenames and folders:
all_ff = glob2.glob("C:\\top_directory\\**\\**")

If you're running Python 3.4+, you can use the pathlib module. The Path.glob() method supports the ** pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Path objects for all matching files.
from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

You can use the function glob.glob() or glob.iglob() directly from glob module to retrieve paths recursively from inside the directories/files and subdirectories/subfiles.
Syntax:
glob.glob(pathname, *, recursive=False) # pathname = '/path/to/the/directory' or subdirectory
glob.iglob(pathname, *, recursive=False)
In your example, it is possible to write like this:
import glob
import os
configfiles = [f for f in glob.glob("C:/Users/sam/Desktop/*.txt")]
for f in configfiles:
print(f'Filename with path: {f}')
print(f'Only filename: {os.path.basename(f)}')
print(f'Filename without extensions: {os.path.splitext(os.path.basename(f))[0]}')
Output:
Filename with path: C:/Users/sam/Desktop/test_file.txt
Only filename: test_file.txt
Filename without extensions: test_file
Help:
Documentation for os.path.splitext and documentation for os.path.basename.

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly
import os, glob, itertools
configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))
Note that you can only iterate once over configfiles in this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).

The command rglob will do an infinite recursion down the deepest sub-level of your directory structure. If you only want one level deep, then do not use it, however.
I realize the OP was talking about using glob.glob. I believe this answers the intent, however, which is to search all subfolders recursively.
The rglob function recently produced a 100x increase in speed for a data processing algorithm which was using the folder structure as a fixed assumption for the order of data reading. However, with rglob we were able to do a single scan once through all files at or below a specified parent directory, save their names to a list (over a million files), then use that list to determine which files we needed to open at any point in the future based on the file naming conventions only vs. which folder they were in.

os.walk to crawl through folder structure

I have some code that looks at a single folder and pulls out files.
but now the folder structure has changed and i need to trawl throught the folders looking for files that match.
what the old code looks like
GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta'
def get_module_data():
module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')
filenames = [os.path.join(module_folder, f) for f in
os.listdir(module_folder)]
data = [parse_file(f) for f in filenames]
return data
But now the folder structure has changed to be like this
GSB_FOLDER\data\modules
\folder1\data\modules
\folder2\data\modules
\folder3\data\modules
where folder1,2 or 3, could be any text string
how do i rewrite the code above to do this...
I have been told about os.walk but I'm just learning Python... so any help appreciated

Nothing much changes you just call os.walk and it will recursively go thru the directory and return files e.g.
for root, dirs, files in os.walk('/tmp'):
if os.path.basename(root) != 'modules':
continue
data = [parse_file(os.path.join(root,f)) for f in files]
Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0

os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;
in your example, you could do something like this:
for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
#whatever you want to do with these folders
if "/data/modules/" in dirpath:
print dirpath, dirnames, filenames
try that out, should be fairly self explanatory how it works...

Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.
import os
import re
import sys
def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
file_paths = []
if not os.path.exists(input_dir):
raise FileNotFoundError("Could not find path: %s"%(input_dir))
for dirpath, dirnames, filenames in os.walk(input_dir):
if re.search(path_pattern, dirpath):
file_list = [item for item in filenames if re.search(file_pattern,item)]
file_path_list = [os.path.join(dirpath, item) for item in file_list]
file_paths += file_path_list
if len(file_paths) > maxResults:
break
return file_paths[0:maxResults]
Example usages:
directory_spider('/path/to/find') --> Finds the top 500 files in the path if it exists
directory_spider('/path/to/find',path_pattern="",file_pattern=".py$", maxResults=10)

You can use os.walk like #Anurag has detailed or you can try my small pathfinder library:
data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]

Make multiple directories based on a list

Hi
I would like to make multiple new dir's in a set root dir each one named based on a list of names
e.g.
List looks like this
Folder_1
Folder_x
Folder_y
is there an easy way to do this in python?

import os
root_path = '/whatever/your/root/path/is/'
folders = ['Folder_1','Folder_x','Folder_y']
for folder in folders:
os.mkdir(os.path.join(root_path,folder))

Here's one way to do it using a flexible custom function. Note that it uses os.makedirs() instead of os.mkdir() which means that it will also create the root folder if necessary, as well as allowing the subfolder paths to contain intermediate-level directories if desired.
The code also uses functools.partial() to create a temporary local function named concat_path() to use with the built-in map() function to concatenate the root directory's name with each subfolder's. It then uses os.makedirs() on each of those to create the subfolder path.
import os
from functools import partial
def makefolders(root_dir, subfolders):
concat_path = partial(os.path.join, root_dir)
for subfolder in map(concat_path, subfolders):
os.makedirs(subfolder, exist_ok=True) # Python 3.2+
if __name__=='__main__':
root_dir = '/path/to/root/folder'
subfolders = ('Numbers/Folder_1', 'Letters/Folder_x', 'Letters/Folder_y')
makefolders(root_dir, subfolders)

Make folder name as desired
import os
root_path = '/home/sagnik'
folders= [None] * 201
for x in range(0,201):
print(str(x))
folders[x] ="folder"+str(x)
Create folders
for folder in folders:
os.mkdir(os.path.join(root_path,folder))

os.mkdir(name_of_dir)
is your friend.

os.path.join to combine your root dir and name, and os.mkdir to create the directories. Looping over things is easily enough done with for.

import os
root_dir = 'root_path\\whateverYouWant\\'
list_ = ['Folder_1', 'Folder_x', 'Folder_y']
for folder in list_:
os.makedirs(root_dir + folder)

I was in the same situation too and finally got a small working output, try it.
I had two files, first the program file and second a .txt file containing the list of folder names.
import os
f = open('folder.txt', 'r')
for g in f:
os.mkdir(g)
f.close()

import os
dir_names = ["ABC1", "ABC2", "ABC3"]
#Create three folders on Desktop
#dir_path = os.path.expanduser("~/Desktop")
dir_path = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
for folder in dir_names:
try:
if not os.mkdir(os.path.join(dir_path,folder)):
print(folder)
except:
print("Folder already exists")
break

from os import makedirs
makedirs('1/2/3/4/5/6/7/8/4/4/5/5/5/5/5/5/5/55/5/5/5/5')
By this, you'll more than you want.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Import all files in a directory tree - python

import os my_dir = '/whatever/directory/' files = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(my_dir) for f in files if f.endswith('.py')] modules = [import(os.path.splitext(f)[0],globals(),locals(),[],-1) for f in files]

Related

Python loop through directories

Ignoring folders when organizing files

How can I search sub-folders using glob.glob module? [duplicate]

os.walk to crawl through folder structure

Make multiple directories based on a list

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Import all files in a directory tree - python

import os my_dir = '/whatever/directory/' files = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(my_dir) for f in files if f.endswith('.py')] modules = [__import__(os.path.splitext(f)[0],globals(),locals(),[],-1) for f in files]

Related

Python loop through directories

Ignoring folders when organizing files

How can I search sub-folders using glob.glob module? [duplicate]

os.walk to crawl through folder structure

Make multiple directories based on a list

Categories

Resources

import os my_dir = '/whatever/directory/' files = [os.path.join(dirpath, f) for dirpath, dirnames, files in os.walk(my_dir) for f in files if f.endswith('.py')] modules = [import(os.path.splitext(f)[0],globals(),locals(),[],-1) for f in files]