I have two files in two different directories, one is '/home/test/first/first.pdf', the other is '/home/text/second/second.pdf'. I use following code to compress them:
import zipfile, StringIO
buffer = StringIO.StringIO()
first_path = '/home/test/first/first.pdf'
second_path = '/home/text/second/second.pdf'
zip = zipfile.ZipFile(buffer, 'w')
zip.write(first_path)
zip.write(second_path)
zip.close()
After I open the zip file that I created, I have a home folder in it, then there are two sub-folders in it, first and second, then the pdf files. I don't know how to include only two pdf files instead of having full path zipped into the zip archive. I hope I make my question clear, please help.
The zipfile write() method supports an extra argument (arcname) which is the archive name to be stored in the zip file, so you would only need to change your code with:
from os.path import basename
...
zip.write(first_path, basename(first_path))
zip.write(second_path, basename(second_path))
zip.close()
When you have some spare time reading the documentation for zipfile will be helpful.
I use this function to zip a directory without include absolute path
import zipfile
import os
def zipDir(dirPath, zipPath):
zipf = zipfile.ZipFile(zipPath , mode='w')
lenDirPath = len(dirPath)
for root, _ , files in os.walk(dirPath):
for file in files:
filePath = os.path.join(root, file)
zipf.write(filePath , filePath[lenDirPath :] )
zipf.close()
#end zipDir
I suspect there might be a more elegant solution, but this one should work:
def add_zip_flat(zip, filename):
dir, base_filename = os.path.split(filename)
os.chdir(dir)
zip.write(base_filename)
zip = zipfile.ZipFile(buffer, 'w')
add_zip_flat(zip, first_path)
add_zip_flat(zip, second_path)
zip.close()
You can override the filename in the archive with the arcname parameter:
with zipfile.ZipFile(file="sample.zip", mode="w", compression=zipfile.ZIP_DEFLATED) as out_zip:
for f in Path.home().glob("**/*.txt"):
out_zip.write(f, arcname=f.name)
Documentation reference: https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.write
Can be done that way also (this allow for creating archives >2GB)
import os, zipfile
def zipdir(path, ziph):
"""zipper"""
for root, _, files in os.walk(path):
for file_found in files:
abs_path = root+'/'+file_found
ziph.write(abs_path, file_found)
zipf = zipfile.ZipFile(DEST_FILE.zip, 'w', zipfile.ZIP_DEFLATED, allowZip64=True)
zipdir(SOURCE_DIR, zipf)
zipf.close()
As João Pinto said, the arcname argument of ZipFile.write is what you need. Also, reading the documentation of pathlib is helpful. You can easily get the relative path to something also with pathlib.Path.relative_to, no need to switch to os.path.
import zipfile
from pathlib import Path
folder_to_compress = Path("/path/to/folder")
path_to_archive = Path("/path/to/archive.zip")
with zipfile.ZipFile(
path_to_archive,
mode="w",
compression=zipfile.ZIP_DEFLATED,
compresslevel=7,
) as zip:
for file in folder_to_compress.rglob("*"):
relative_path = file.relative_to(folder_to_compress)
print(f"Packing {file} as {relative_path}")
zip.write(file, arcname=relative_path)
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
How can I find all the files in a directory having the extension .txt in python?
You can use glob:
import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
print(file)
or simply os.listdir:
import os
for file in os.listdir("/mydir"):
if file.endswith(".txt"):
print(os.path.join("/mydir", file))
or if you want to traverse directory, use os.walk:
import os
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".txt"):
print(os.path.join(root, file))
Use glob.
>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']
Something like that should do the job
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith('.txt'):
print(file)
You can simply use pathlibs glob 1:
import pathlib
list(pathlib.Path('your_directory').glob('*.txt'))
or in a loop:
for txt_file in pathlib.Path('your_directory').glob('*.txt'):
# do something with "txt_file"
If you want it recursive you can use .glob('**/*.txt')
1The pathlib module was included in the standard library in python 3.4. But you can install back-ports of that module even on older Python versions (i.e. using conda or pip): pathlib and pathlib2.
Something like this will work:
>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']
import os
path = 'mypath/path'
files = os.listdir(path)
files_txt = [i for i in files if i.endswith('.txt')]
I like os.walk():
import os
for root, dirs, files in os.walk(dir):
for f in files:
if os.path.splitext(f)[1] == '.txt':
fullpath = os.path.join(root, f)
print(fullpath)
Or with generators:
import os
fileiter = (os.path.join(root, f)
for root, _, files in os.walk(dir)
for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
print(txt)
Here's more versions of the same that produce slightly different results:
glob.iglob()
import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories
print f
glob.glob1()
print glob.glob1("/mydir", "*.tx?") # literal_directory, basename_pattern
fnmatch.filter()
import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files
Try this this will find all your files recursively:
import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want
#double\\ no single \
for file in glob.glob("**/*.txt", recursive = True):
print(file)
Python v3.5+
Fast method using os.scandir in a recursive function. Searches for all files with a specified extension in folder and sub-folders. It is fast, even for finding 10,000s of files.
I have also included a function to convert the output to a Pandas Dataframe.
import os
import re
import pandas as pd
import numpy as np
def findFilesInFolderYield(path, extension, containsTxt='', subFolders = True, excludeText = ''):
""" Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)
path: Base directory to find files
extension: File extension to find. e.g. 'txt'. Regular expression. Or 'ls\d' to match ls1, ls2, ls3 etc
containsTxt: List of Strings, only finds file if it contains this text. Ignore if '' (or blank)
subFolders: Bool. If True, find files in all subfolders under path. If False, only searches files in the specified folder
excludeText: Text string. Ignore if ''. Will exclude if text string is in path.
"""
if type(containsTxt) == str: # if a string and not in a list
containsTxt = [containsTxt]
myregexobj = re.compile('\.' + extension + '$') # Makes sure the file extension is at the end and is preceded by a .
try: # Trapping a OSError or FileNotFoundError: File permissions problem I believe
for entry in os.scandir(path):
if entry.is_file() and myregexobj.search(entry.path): #
bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]
if len(bools)== len(containsTxt):
yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path
elif entry.is_dir() and subFolders: # if its a directory, then repeat process as a nested function
yield from findFilesInFolderYield(entry.path, extension, containsTxt, subFolders)
except OSError as ose:
print('Cannot access ' + path +'. Probably a permissions error ', ose)
except FileNotFoundError as fnf:
print(path +' not found ', fnf)
def findFilesInFolderYieldandGetDf(path, extension, containsTxt, subFolders = True, excludeText = ''):
""" Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)
path: Base directory to find files
extension: File extension to find. e.g. 'txt'. Regular expression. Or 'ls\d' to match ls1, ls2, ls3 etc
containsTxt: List of Strings, only finds file if it contains this text. Ignore if '' (or blank)
subFolders: Bool. If True, find files in all subfolders under path. If False, only searches files in the specified folder
excludeText: Text string. Ignore if ''. Will exclude if text string is in path.
"""
fileSizes, accessTimes, modificationTimes, creationTimes , paths = zip(*findFilesInFolderYield(path, extension, containsTxt, subFolders))
df = pd.DataFrame({
'FLS_File_Size':fileSizes,
'FLS_File_Access_Date':accessTimes,
'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
'FLS_File_Creation_Date':creationTimes,
'FLS_File_PathName':paths,
})
df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)
return df
ext = 'txt' # regular expression
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path, ext, containsTxt, subFolders = True)
path.py is another alternative: https://github.com/jaraco/path.py
from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
print f
To get all '.txt' file names inside 'dataPath' folder as a list in a Pythonic way:
from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and f.endswith(".txt")]
print onlyTxtFiles
Python has all tools to do this:
import os
the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))
I did a test (Python 3.6.4, W7x64) to see which solution is the fastest for one folder, no subdirectories, to get a list of complete file paths for files with a specific extension.
To make it short, for this task os.listdir() is the fastest and is 1.7x as fast as the next best: os.walk() (with a break!), 2.7x as fast as pathlib, 3.2x faster than os.scandir() and 3.3x faster than glob.
Please keep in mind, that those results will change when you need recursive results. If you copy/paste one method below, please add a .lower() otherwise .EXT would not be found when searching for .ext.
import os
import pathlib
import timeit
import glob
def a():
path = pathlib.Path().cwd()
list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]
def b():
path = os.getcwd()
list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]
def c():
path = os.getcwd()
list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]
def d():
path = os.getcwd()
os.chdir(path)
list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]
def e():
path = os.getcwd()
list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]
def f():
path = os.getcwd()
list_sqlite_files = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith(".sqlite"):
list_sqlite_files.append( os.path.join(root, file) )
break
print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))
Results:
# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274
import os
import sys
if len(sys.argv)==2:
print('no params')
sys.exit(1)
dir = sys.argv[1]
mask= sys.argv[2]
files = os.listdir(dir);
res = filter(lambda x: x.endswith(mask), files);
print res
To get an array of ".txt" file names from a folder called "data" in the same directory I usually use this simple line of code:
import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]
This code makes my life simpler.
import os
fnames = ([file for root, dirs, files in os.walk(dir)
for file in files
if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
])
for fname in fnames: print(fname)
Use fnmatch: https://docs.python.org/2/library/fnmatch.html
import fnmatch
import os
for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
print file
A copy-pastable solution similar to the one of ghostdog:
def get_all_filepaths(root_path, ext):
"""
Search all files which have a given extension within root_path.
This ignores the case of the extension and searches subdirectories, too.
Parameters
----------
root_path : str
ext : str
Returns
-------
list of str
Examples
--------
>>> get_all_filepaths('/run', '.lock')
['/run/unattended-upgrades.lock',
'/run/mlocate.daily.lock',
'/run/xtables.lock',
'/run/mysqld/mysqld.sock.lock',
'/run/postgresql/.s.PGSQL.5432.lock',
'/run/network/.ifstate.lock',
'/run/lock/asound.state.lock']
"""
import os
all_files = []
for root, dirs, files in os.walk(root_path):
for filename in files:
if filename.lower().endswith(ext):
all_files.append(os.path.join(root, filename))
return all_files
You can also use yield to create a generator and thus avoid assembling the complete list:
def get_all_filepaths(root_path, ext):
import os
for root, dirs, files in os.walk(root_path):
for filename in files:
if filename.lower().endswith(ext):
yield os.path.join(root, filename)
I suggest you to use fnmatch and the upper method. In this way you can find any of the following:
Name.txt;
Name.TXT;
Name.Txt
.
import fnmatch
import os
for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
if fnmatch.fnmatch(file.upper(), '*.TXT'):
print(file)
Here's one with extend()
types = ('*.jpg', '*.png')
images_list = []
for files in types:
images_list.extend(glob.glob(os.path.join(path, files)))
Functional solution with sub-directories:
from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk
print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))
In case the folder contains a lot of files or memory is an constraint, consider using generators:
def yield_files_with_extensions(folder_path, file_extension):
for _, _, files in os.walk(folder_path):
for file in files:
if file.endswith(file_extension):
yield file
Option A: Iterate
for f in yield_files_with_extensions('.', '.txt'):
print(f)
Option B: Get all
files = [f for f in yield_files_with_extensions('.', '.txt')]
use Python OS module to find files with specific extension.
the simple example is here :
import os
# This is the path where you want to search
path = r'd:'
# this is extension you want to detect
extension = '.txt' # this can be : .jpg .png .xls .log .....
for root, dirs_list, files_list in os.walk(path):
for file_name in files_list:
if os.path.splitext(file_name)[-1] == extension:
file_name_path = os.path.join(root, file_name)
print file_name
print file_name_path # This is the full path of the filter file
Many users have replied with os.walk answers, which includes all files but also all directories and subdirectories and their files.
import os
def files_in_dir(path, extension=''):
"""
Generator: yields all of the files in <path> ending with
<extension>
\param path Absolute or relative path to inspect,
\param extension [optional] Only yield files matching this,
\yield [filenames]
"""
for _, dirs, files in os.walk(path):
dirs[:] = [] # do not recurse directories.
yield from [f for f in files if f.endswith(extension)]
# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
print("-", filename)
Or for a one off where you don't need a generator:
path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
matches = (f for f in dirfiles if f.endswith(ext))
break
for filename in matches:
print("-", filename)
If you are going to use matches for something else, you may want to make it a list rather than a generator expression:
matches = [f for f in dirfiles if f.endswith(ext)]
This question already has answers here:
Find all files in a directory with extension .txt in Python
(25 answers)
Closed 2 months ago.
I am trying to find all the .c files in a directory using Python.
I wrote this, but it is just returning me all files - not just .c files:
import os
import re
results = []
for folder in gamefolders:
for f in os.listdir(folder):
if re.search('.c', f):
results += [f]
print results
How can I just get the .c files?
try changing the inner loop to something like this
results += [each for each in os.listdir(folder) if each.endswith('.c')]
Try "glob":
>>> import glob
>>> glob.glob('./[0-9].*')
['./1.gif', './2.txt']
>>> glob.glob('*.gif')
['1.gif', 'card.gif']
>>> glob.glob('?.gif')
['1.gif']
KISS
# KISS
import os
results = []
for folder in gamefolders:
for f in os.listdir(folder):
if f.endswith('.c'):
results.append(f)
print results
There is a better solution that directly using regular expressions, it is the standard library's module fnmatch for dealing with file name patterns. (See also glob module.)
Write a helper function:
import fnmatch
import os
def listdir(dirname, pattern="*"):
return fnmatch.filter(os.listdir(dirname), pattern)
and use it as follows:
result = listdir("./sources", "*.c")
for _,_,filenames in os.walk(folder):
for file in filenames:
fileExt=os.path.splitext(file)[-1]
if fileExt == '.c':
results.append(file)
For another alternative you could use fnmatch
import fnmatch
import os
results = []
for root, dirs, files in os.walk(path)
for _file in files:
if fnmatch.fnmatch(_file, '*.c'):
results.append(os.path.join(root, _file))
print results
or with a list comprehension:
for root, dirs, files in os.walk(path)
[results.append(os.path.join(root, _file))\
for _file in files if \
fnmatch.fnmatch(_file, '*.c')]
or using filter:
for root, dirs, files in os.walk(path):
[results.append(os.path.join(root, _file))\
for _file in fnmatch.filter(files, '*.c')]
Change the directory to the given path, so that you can search files within directory. If you don't change the directory then this code will search files in your present directory location:
import os #importing os library
import glob #importing glob library
path=raw_input() #input from the user
os.chdir(path)
filedata=glob.glob('*.c') #all files with .c extenstions stores in filedata.
print filedata
import os, re
cfile = re.compile("^.*?\.c$")
results = []
for name in os.listdir(directory):
if cfile.match(name):
results.append(name)
The implementation of shutil.copytree is in the docs. I mofdified it to take a list of extentions to INCLUDE.
def my_copytree(src, dst, symlinks=False, *extentions):
""" I modified the 2.7 implementation of shutils.copytree
to take a list of extentions to INCLUDE, instead of an ignore list.
"""
names = os.listdir(src)
os.makedirs(dst)
errors = []
for name in names:
srcname = os.path.join(src, name)
dstname = os.path.join(dst, name)
try:
if symlinks and os.path.islink(srcname):
linkto = os.readlink(srcname)
os.symlink(linkto, dstname)
elif os.path.isdir(srcname):
my_copytree(srcname, dstname, symlinks, *extentions)
else:
ext = os.path.splitext(srcname)[1]
if not ext in extentions:
# skip the file
continue
copy2(srcname, dstname)
# XXX What about devices, sockets etc.?
except (IOError, os.error), why:
errors.append((srcname, dstname, str(why)))
# catch the Error from the recursive copytree so that we can
# continue with other files
except Error, err:
errors.extend(err.args[0])
try:
copystat(src, dst)
# except WindowsError: # cant copy file access times on Windows
# pass
except OSError, why:
errors.extend((src, dst, str(why)))
if errors:
raise Error(errors)
Usage: For example, to copy only .config and .bat files....
my_copytree(source, targ, '.config', '.bat')
this is pretty clean.
the commands come from the os library.
this code will search through the current working directory and list only the specified file type. You can change this by replacing 'os.getcwd()' with your target directory and choose the file type by replacing '(ext)'. os.fsdecode is so you don't get a bytewise error from .endswith(). this also sorts alphabetically, you can remove sorted() for the raw list.
import os
filenames = sorted([os.fsdecode(file) for file in os.listdir(os.getcwd()) if os.fsdecode(file).endswith(".(ext)")])
Here's yet another solution, using pathlib (and Python 3):
from pathlib import Path
gamefolder = "path/to/dir"
result = sorted(Path(gamefolder).glob("**.c"))
Notice the double asterisk (**) in the glob() argument. This will search the gamefolder as well as its subdirectories. If you only want to search the gamefolder, use a single * in the pattern: "*.c". For more details, see the documentation.
If you replace '.c' with '[.]c$', you're searching for files that contain .c as the last two characters of the name, rather than all files that contain a c, with at least one character before it.
Edit: Alternatively, match f[-2:] with '.c', this MAY be computationally cheaper than pulling out a regexp match.
Just to be clear, if you wanted the dot character in your search term, you could've escaped it too:
'.*[backslash].c' would give you what you needed, plus you would need to use something like:
results.append(f), instead of what you had listed as results += [f]
This function returns a list of all file names with the specified extension that live in the specified directory:
import os
def listFiles(path, extension):
return [f for f in os.listdir(path) if f.endswith(extension)]
print listFiles('/Path/to/directory/with/files', '.txt')
If you want to list all files with the specified extension in a certain directory and its subdirectories you could do:
import os
def filterFiles(path, extension):
return [file for root, dirs, files in os.walk(path) for file in files if file.endswith(extension)]
print filterFiles('/Path/to/directory/with/files', '.txt')
You can actually do this with just os.listdir
import os
results = [f for f in os.listdir(gamefolders/folder) if f.endswith('.c')]