Getting Every File in a Windows Directory - python

I have a folder in Windows 7 which contains multiple .txt files. How would one get every file in said directory as a list?

You can use os.listdir(".") to list the contents of the current directory ("."):
for name in os.listdir("."):
if name.endswith(".txt"):
print(name)
If you want the whole list as a Python list, use a list comprehension:
a = [name for name in os.listdir(".") if name.endswith(".txt")]

import os
import glob
os.chdir('c:/mydir')
files = glob.glob('*.txt')

All of the answers here don't address the fact that if you pass glob.glob() a Windows path (for example, C:\okay\what\i_guess\), it does not run as expected. Instead, you need to use pathlib:
from pathlib import Path
glob_path = Path(r"C:\okay\what\i_guess")
file_list = [str(pp) for pp in glob_path.glob("**/*.txt")]

import fnmatch
import os
return [file for file in os.listdir('.') if fnmatch.fnmatch(file, '*.txt')]

If you just need the current directory, use os.listdir.
>>> os.listdir('.') # get the files/directories
>>> [os.path.abspath(x) for x in os.listdir('.')] # gets the absolute paths
>>> [x for x in os.listdir('.') if os.path.isfile(x)] # only files
>>> [x for x in os.listdir('.') if x.endswith('.txt')] # files ending in .txt only
You can also use os.walk if you need to recursively get the contents of a directory. Refer to the python documentation for os.walk.

Related

how to get name of a file in directory using python

There is an mkv file in a folder named "export". What I want to do is to make a python script which fetches the file name from that export folder.
Let's say the folder is at "C:\Users\UserName\Desktop\New_folder\export".
How do I fetch the name?
I tried using this os.path.basename and os.path.splitext .. well.. didn't work out like I expected.
os.path implements some useful functions on pathnames. But it doesn't have access to the contents of the path. For that purpose, you can use os.listdir.
The following command will give you a list of the contents of the given path:
os.listdir("C:\Users\UserName\Desktop\New_folder\export")
Now, if you just want .mkv files you can use fnmatch(This module provides support for Unix shell-style wildcards) module to get your expected file names:
import fnmatch
import os
print([f for f in os.listdir("C:\Users\UserName\Desktop\New_folder\export") if fnmatch.fnmatch(f, '*.mkv')])
Also as #Padraic Cunningham mentioned as a more pythonic way for dealing with file names you can use glob module :
map(path.basename,glob.iglob(pth+"*.mkv"))
You can use glob:
from glob import glob
pth ="C:/Users/UserName/Desktop/New_folder/export/"
print(glob(pth+"*.mkv"))
path+"*.mkv" will match all the files ending with .mkv.
To just get the basenames you can use map or a list comp with iglob:
from glob import iglob
print(list(map(path.basename,iglob(pth+"*.mkv"))))
print([path.basename(f) for f in iglob(pth+"*.mkv")])
iglob returns an iterator so you don't build a list for no reason.
I assume you're basically asking how to list files in a given directory. What you want is:
import os
print os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
If there's multiple files and you want the one(s) that have a .mkv end you could do:
import os
files = os.listdir("""C:\Users\UserName\Desktop\New_folder\export""")
mkv_files = [_ for _ in files if _[-4:] == ".mkv"]
print mkv_files
If you are searching for recursive folder search, this method will help you to get filename using os.walk, also you can get those file's path and directory using this below code.
import os, fnmatch
for path, dirs, files in os.walk(os.path.abspath(r"C:/Users/UserName/Desktop/New_folder/export/")):
for filename in fnmatch.filter(files, "*.mkv"):
print(filename)
You can use glob
import glob
for file in glob.glob('C:\Users\UserName\Desktop\New_folder\export\*.mkv'):
print(str(file).split('\')[-1])
This will list out all the files having extention .mkv as
file.mkv, file2.mkv and so on.
From os.walk you can read file paths as a list
files = [ file_path for _, _, file_path in os.walk(DIRECTORY_PATH)]
for file_name in files[0]: #note that it has list of lists
print(file_name)

Iterating through directories with Python

I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.
I tried this:
import os
rootdir ='C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file,'r')
lines=f.readlines()
f.close()
f=open(file,'w')
for line in lines:
newline = "No you are not"
f.write(newline)
f.close()
but I am getting an error. What am I doing wrong?
The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(os.path.join(subdir, file))
If you still get errors when running the above, please provide the error message.
Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):
from pathlib import Path
rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]
# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]
Since Python 3.5, the glob module also supports recursive file finding:
import os
from glob import iglob
rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob(rootdir_glob, recursive=True) if os.path.isfile(f)]
The file_list from either of the above approaches can be iterated over without the need for a nested loop:
for f in file_list:
print(f) # Replace with desired operations
From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:
import glob, os
for filename in glob.iglob('/pardadox-music/**', recursive=True):
if os.path.isfile(filename): # filter dirs
print(filename)
Output:
/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...
Notes:
glob.iglob
glob.iglob(pathname, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
If recursive is True, the pattern '**' will match any files and
zero or more directories and subdirectories.
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif') ['card.gif']
>>> glob.glob('.c*')['.card.gif']
You can also use rglob(pattern),
which is the same as calling glob() with **/ added in front of the given relative pattern.

read filenames and write directly into a list

Is it possible to get Python to look in a folder and put all of the filenames (with a certain extension) into a list?
e.g.:
[filename1.txt, filename2.txt,...]
You can do this easily with the glob module:
import glob
filenames = glob.glob('<some_path>/*.<extension>')
I always use os module for this and works perfectly for me.
import os
file_list = os.listdir(path)
print(file_list)
>>> ["file1.txt", "file2.txt", etc...]
Here's a quick answer I found.
import os
txt_files = filter(lambda x: x.endswith('.txt'), os.listdir('mydir'))

How to get the newest directory in Python

I'm looking for a method that can find the newest directory created inside another directory
The only method i have is os.listdir() but it shows all files and directories inside. How can I list only directories and how can I access to the attributes of the directory to find out the newest created?
Thanks
import os
dirs = [d for d in os.listdir('.') if os.path.isdir(d)]
sorted(dirs, key=lambda x: os.path.getctime(x), reverse=True)[:1]
Update:
Maybe some more explanation:
[d for d in os.listdir('.') if os.path.isdir(d)]
is a list comprehension. You can read more about them here
The code does the same as
dirs = []
for d in os.listdir('.'):
if os.path.isdir(d):
dirs.append(d)
would do, but the list comprehension is considered more readable.
sorted()is a built-in function. Some examples are here
The code I showed sorts all elemens within dirs by os.path.getctime(ELEMENT) in reverse.
The result is again a list. Which of course can be accessed using the [index] syntax and slicing
Here's a little function I wrote to return the name of the newest directory:
#!/usr/bin/env python
import os
import glob
import operator
def findNewestDir(directory):
os.chdir(directory)
dirs = {}
for dir in glob.glob('*'):
if os.path.isdir(dir):
dirs[dir] = os.path.getctime(dir)
lister = sorted(dirs.iteritems(), key=operator.itemgetter(1))
return lister[-1][0]
print "The newest directory is", findNewestDir('/Users/YOURUSERNAME/Sites')
The following Python code should solve your problem.
import os
import glob
for dir in glob.glob('*'):
if os.path.isdir(dir):
print dir,":",os.path.getctime(dir)
Check out os.walk and the examples in the docs for an easy way to get directories.
root, dirs, files = os.walk('/your/path').next()
Then check out os.path.getctime which depending on your os may be creation or modification time. If you are not already familiar with it, you will also want to read up on os.path.join.
os.path.getctime(path) Return the system’s ctime which, on some systems (like Unix) is the time of the last change, and, on others (like Windows), is the creation time for path.
max((os.path.getctime(os.path.join(root, f)), f) for f in dirs)

How do you get a directory listing sorted by creation date in python?

What is the best way to get a list of all files in a directory, sorted by date [created | modified], using python, on a windows machine?
I've done this in the past for a Python script to determine the last updated files in a directory:
import glob
import os
search_dir = "/mydir/"
# remove anything from the list that is not a file (directories, symlinks)
# thanks to J.F. Sebastion for pointing out that the requirement was a list
# of files (presumably not including directories)
files = list(filter(os.path.isfile, glob.glob(search_dir + "*")))
files.sort(key=lambda x: os.path.getmtime(x))
That should do what you're looking for based on file mtime.
EDIT: Note that you can also use os.listdir() in place of glob.glob() if desired - the reason I used glob in my original code was that I was wanting to use glob to only search for files with a particular set of file extensions, which glob() was better suited to. To use listdir here's what it would look like:
import os
search_dir = "/mydir/"
os.chdir(search_dir)
files = filter(os.path.isfile, os.listdir(search_dir))
files = [os.path.join(search_dir, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
Update: to sort dirpath's entries by modification date in Python 3:
import os
from pathlib import Path
paths = sorted(Path(dirpath).iterdir(), key=os.path.getmtime)
(put #Pygirl's answer here for greater visibility)
If you already have a list of filenames files, then to sort it inplace by creation time on Windows (make sure that list contains absolute path):
files.sort(key=os.path.getctime)
The list of files you could get, for example, using glob as shown in #Jay's answer.
old answer
Here's a more verbose version of #Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).
#!/usr/bin/env python
from stat import S_ISREG, ST_CTIME, ST_MODE
import os, sys, time
# path to the directory (relative or absolute)
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
# get all entries in the directory w/ stats
entries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))
entries = ((os.stat(path), path) for path in entries)
# leave only regular files, insert creation date
entries = ((stat[ST_CTIME], path)
for stat, path in entries if S_ISREG(stat[ST_MODE]))
#NOTE: on Windows `ST_CTIME` is a creation date
# but on Unix it could be something else
#NOTE: use `ST_MTIME` to sort by a modification date
for cdate, path in sorted(entries):
print time.ctime(cdate), os.path.basename(path)
Example:
$ python stat_creation_date.py
Thu Feb 11 13:31:07 2009 stat_creation_date.py
There is an os.path.getmtime function that gives the number of seconds since the epoch
and should be faster than os.stat.
import os
os.chdir(directory)
sorted(filter(os.path.isfile, os.listdir('.')), key=os.path.getmtime)
Here's my version:
def getfiles(dirpath):
a = [s for s in os.listdir(dirpath)
if os.path.isfile(os.path.join(dirpath, s))]
a.sort(key=lambda s: os.path.getmtime(os.path.join(dirpath, s)))
return a
First, we build a list of the file names. isfile() is used to skip directories; it can be omitted if directories should be included. Then, we sort the list in-place, using the modify date as the key.
Here's a one-liner:
import os
import time
from pprint import pprint
pprint([(x[0], time.ctime(x[1].st_ctime)) for x in sorted([(fn, os.stat(fn)) for fn in os.listdir(".")], key = lambda x: x[1].st_ctime)])
This calls os.listdir() to get a list of the filenames, then calls os.stat() for each one to get the creation time, then sorts against the creation time.
Note that this method only calls os.stat() once for each file, which will be more efficient than calling it for each comparison in a sort.
In python 3.5+
from pathlib import Path
sorted(Path('.').iterdir(), key=lambda f: f.stat().st_mtime)
Without changing directory:
import os
path = '/path/to/files/'
name_list = os.listdir(path)
full_list = [os.path.join(path,i) for i in name_list]
time_sorted_list = sorted(full_list, key=os.path.getmtime)
print time_sorted_list
# if you want just the filenames sorted, simply remove the dir from each
sorted_filename_list = [ os.path.basename(i) for i in time_sorted_list]
print sorted_filename_list
from pathlib import Path
import os
sorted(Path('./').iterdir(), key=lambda t: t.stat().st_mtime)
or
sorted(Path('./').iterdir(), key=os.path.getmtime)
or
sorted(os.scandir('./'), key=lambda t: t.stat().st_mtime)
where m time is modified time.
Here's my answer using glob without filter if you want to read files with a certain extension in date order (Python 3).
dataset_path='/mydir/'
files = glob.glob(dataset_path+"/morepath/*.extension")
files.sort(key=os.path.getmtime)
# *** the shortest and best way ***
# getmtime --> sort by modified time
# getctime --> sort by created time
import glob,os
lst_files = glob.glob("*.txt")
lst_files.sort(key=os.path.getmtime)
print("\n".join(lst_files))
sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.stat(p).st_mtime)
You could use os.walk('.').next()[-1] instead of filtering with os.path.isfile, but that leaves dead symlinks in the list, and os.stat will fail on them.
For completeness with os.scandir (2x faster over pathlib):
import os
sorted(os.scandir('/tmp/test'), key=lambda d: d.stat().st_mtime)
this is a basic step for learn:
import os, stat, sys
import time
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'
listdir = os.listdir(dirpath)
for i in listdir:
os.chdir(dirpath)
data_001 = os.path.realpath(i)
listdir_stat1 = os.stat(data_001)
listdir_stat2 = ((os.stat(data_001), data_001))
print time.ctime(listdir_stat1.st_ctime), data_001
Alex Coventry's answer will produce an exception if the file is a symlink to an unexistent file, the following code corrects that answer:
import time
import datetime
sorted(filter(os.path.isfile, os.listdir('.')),
key=lambda p: os.path.exists(p) and os.stat(p).st_mtime or time.mktime(datetime.now().timetuple())
When the file doesn't exist, now() is used, and the symlink will go at the very end of the list.
This was my version:
import os
folder_path = r'D:\Movies\extra\new\dramas' # your path
os.chdir(folder_path) # make the path active
x = sorted(os.listdir(), key=os.path.getctime) # sorted using creation time
folder = 0
for folder in range(len(x)):
print(x[folder]) # print all the foldername inside the folder_path
folder = +1
Here is a simple couple lines that looks for extention as well as provides a sort option
def get_sorted_files(src_dir, regex_ext='*', sort_reverse=False):
files_to_evaluate = [os.path.join(src_dir, f) for f in os.listdir(src_dir) if re.search(r'.*\.({})$'.format(regex_ext), f)]
files_to_evaluate.sort(key=os.path.getmtime, reverse=sort_reverse)
return files_to_evaluate
Add the file directory/folder in path, if you want to have specific file type add the file extension, and then get file name in chronological order.
This works for me.
import glob, os
from pathlib import Path
path = os.path.expanduser(file_location+"/"+date_file)
os.chdir(path)
saved_file=glob.glob('*.xlsx')
saved_file.sort(key=os.path.getmtime)
print(saved_file)
Turns out os.listdir sorts by last modified but in reverse so you can do:
import os
last_modified=os.listdir()[::-1]
Maybe you should use shell commands. In Unix/Linux, find piped with sort will probably be able to do what you want.

Categories

Resources