I want to read folders in python and probably make a list of it. Now my main concern is that most recent folder should be at location that is known to me. It can be the first element or last element of list. I am attaching image suggesting folders name. I want folder with name 20181005 either first in the list or last in the list.
I have tried this task and used os.listdir, but I am not very much confident on the way this function reads folders and store in list form. Would it store first folder as element or will it use creation date or modification date. If I could sort on the basis of name (20181005 etc), it would be really good.
Kindly suggest suitable method for the same.
Regards
os.listdir returns directory contents in arbitrary order, but you can sort that yourself:
l = sorted(listdir())
Since it seems that your folder names are ISO dates, they should sort correctly and the most recent one should be the last element after sorting.
If you need to access creation & modification times you can do that with os.path functions. If you want to sort by that, I would probably choose to put it in something like a pandas DataFrame to make it easier to manipulate.
import os
from datetime import datetime
import pandas as pd
path = "."
objects = os.listdir(path)
dirs = list()
for o in objects:
opath = os.path.join(path, o)
if os.path.isdir(opath):
dirs.append(dict(path=opath,
mtime=datetime.fromtimestamp(os.path.getmtime(opath)),
ctime=datetime.fromtimestamp(os.path.getctime(opath))))
data = pd.DataFrame(dirs)
data.sort_values(by='mtime')
Assumed, your directories has YYYYMMDD format naming. Then you can use listdir and sort to get the latest directory in last index.
import os
from os import listdir
mypath = 'D:\\anil'
list_dirs = []
for f in listdir(mypath):
if os.path.isdir(os.path.join(mypath, f)):
list_dirs.append(f)
list_dirs.sort()
for current_dir in list_dirs:
print(current_dir)
Related
I have wrote a code which creates a dictionary that stores all the absolute paths of folders from the current path as keys, and all of its filenames as values, respectively. This code would only be applied to paths that have folders which only contain file images. Here:
import os
import re
# Main method
the_dictionary_list = {}
for name in os.listdir("."):
if os.path.isdir(name):
path = os.path.abspath(name)
print(f'\u001b[45m{path}\033[0m')
match = re.match(r'/(?:[^\\])[^\\]*$', path)
print(match)
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[path] = list_of_file_contents
print('\n')
print('\u001b[43mthe_dictionary_list:\033[0m')
print(the_dictionary_list)
The thing is, that I want this dictionary to store only the last folder names as keys instead of its absolute paths, so I was planning to use this re /(?:[^\\])[^\\]*$, which would be responsible for obtaining the last name (of a file or folder from a given path), and then add those last names as keys in the dictionary in the for loop.
I wanted to test the code above first to see if it was doing what I wanted, but it didn't seem so, the value of the match variable became None in each iteration, which didn't make sense to me, everything else works fine.
So I would like to know what I'm doing wrong here.
I would highly recommend to use the builtin library pathlib. It would appear you are interested in the f.name part. Here is a cheat sheet.
I decided to rewrite the code above, in case of wanting to apply it only in the current directory (where this program would be found).
import os
# Main method
the_dictionary_list = {}
for subdir in os.listdir("."):
if os.path.isdir(subdir):
path = os.path.abspath(subdir)
print(f'\u001b[45m{path}\033[0m')
list_of_file_contents = os.listdir(path)
print(f'\033[46m{list_of_file_contents}')
the_dictionary_list[subdir] = list_of_file_contents
print('\n')
print('\033[1;37;40mThe dictionary list:\033[0m')
for subdir in the_dictionary_list:
print('\u001b[43m'+subdir+'\033[0m')
for archivo in the_dictionary_list[subdir]:
print(" ", archivo)
print('\n')
print(the_dictionary_list)
This would be useful in case the user wants to run the program with a double click on a specific location (my personal case)
I have a directory containing a large number of files. I want to find all files, where the file name contains specific strings (e.g. a certain ending like '.txt', a model ID 'model_xy', etc.) as well as one of the entries in an integer array (e.g. a number of years I would like to select).
I tried this the following way:
import numpy as np
import glob
startyear = 2000
endyear = 2005
timerange = str(np.arange(startyear,endyear+1))
data_files = []
for file in glob.glob('/home/..../*model_xy*'+timerange+'.txt'):
data_files.append(file);
print(data_files)
Unfortunately, like this, other files outside of my 'timerange' are still selected.
You can use regex in glob.glob. Moreover, glob.glob returns a list so you don't need to iterate through it and append to new list.
import glob
data_files = glob.glob("/home/..../*model_xy*200[0-5].txt")
# or if you want to do recursive search you can use **
# Below code will search for all those files under /home/ recursively
data_file = glob.glob("/home/**/*model_xy*200[0-5].txt")
basically what the title says, what is the best approach to do this?
I was looking at a few tools like the os.walk and scandir but then I am not sure how I would store them and decide which file to open if they are multiples. I was thinking I would need to store in a dictionary and then decide which numbered item I want.
you can use
list_of_files = os.listdit(some_directory)
which returns a list of names of the files that exist in that directory, you can easily add some of these names to a dictionary based on their index in this list.
Here is a function that implements the specifications you have outlined. It may require some tinkering as your specs evolve, but it's an ok start. See the docs for the os builtin package for more info :)
import os
def my_files_dict(directory, filename):
myfilesdict = []
with os.scandir(directory) as myfiles:
for f in myfiles:
if f.name == filename and f.is_file:
myfilesdict.append(f.name)
return dict(enumerate(myfilesdict))
I have a lot of files in a directory with name like:
'data_2000151_avg.txt', 'data_2000251_avg.txt', 'data_2003051_avg.txt'...
Assume that one of them is called fname. I would like to extract a subset from each like so:
fname.split('_')[1][:4]
This will give as a result, 2000. I want to collect these from all the files in the directory and create a unique list. How do I do that?
You should use os.
import os
dirname = 'PathToFile'
myuniquelist = []
for d in os.listdir(dirname):
if d.startswith('fname'):
myuniquelist.append(d.split('_')[1][:4])
EDIT: Just saw your comment on wanting a set. After the for loop add this line.
myuniquelist = list(set(myuniquelist))
If unique list means a list of unique values, then a combination of glob (in case the folder contains files that do not match the desired name format) and set should do the trick:
from glob import glob
uniques = {fname.split('_')[1][:4] for fname in glob('data_*_avg.txt')}
# In case you really do want a list
unique_list = list(uniques)
This assumes the files reside in the current working directory. Append path as necessary to glob('path/to/data_*_avg.txt').
For listing files in directory you can use os.listdir(). For generating the list of unique values best suitable is set comprehension.
import os
data = {f.split('_')[1][:4] for f in os.listdir(dir_path)}
list(data) #if you really need a list
I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.
for searchedfile in glob.glob("*cycle*.log"):
This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.
Is there a way to force the code to search by date order?
This question has been asked for php but I am not sure of the differences.
Thanks
To sort files by date:
import glob
import os
files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))
See also Sorting HOW TO.
Essentially the same as #jfs but in one line using sorted
import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)
Well. The answer is nope. glob uses os.listdir which is described by:
"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."
So you are actually lucky that you got it sorted. You need to sort it yourself.
This works for me:
import glob
import os
import time
searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))
for file in files:
print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )
Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.
If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).
However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.
Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).
From the module's test cases:
import pathlib
import tempfile
import datetime_glob
def test_sort_listdir(self):
with tempfile.TemporaryDirectory() as tempdir:
pth = pathlib.Path(tempdir)
(pth / 'some-description-20.3.2016.txt').write_text('tested')
(pth / 'other-description-7.4.2016.txt').write_text('tested')
(pth / 'yet-another-description-1.1.2016.txt').write_text('tested')
matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]
subpths = [subpth for _, subpth in sorted(dtimes_subpths)]
# yapf: disable
expected = [
pth / 'yet-another-description-1.1.2016.txt',
pth / 'some-description-20.3.2016.txt',
pth / 'other-description-7.4.2016.txt'
]
# yapf: enable
self.assertListEqual(subpths, expected)
Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.
You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.