Glob search files in date order? - python

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.
for searchedfile in glob.glob("*cycle*.log"):
This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.
Is there a way to force the code to search by date order?
This question has been asked for php but I am not sure of the differences.
Thanks

To sort files by date:
import glob
import os
files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))
See also Sorting HOW TO.

Essentially the same as #jfs but in one line using sorted
import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)

Well. The answer is nope. glob uses os.listdir which is described by:
"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."
So you are actually lucky that you got it sorted. You need to sort it yourself.
This works for me:
import glob
import os
import time
searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))
for file in files:
print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )
Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.

If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).
However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.
Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).
From the module's test cases:
import pathlib
import tempfile
import datetime_glob
def test_sort_listdir(self):
with tempfile.TemporaryDirectory() as tempdir:
pth = pathlib.Path(tempdir)
(pth / 'some-description-20.3.2016.txt').write_text('tested')
(pth / 'other-description-7.4.2016.txt').write_text('tested')
(pth / 'yet-another-description-1.1.2016.txt').write_text('tested')
matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]
subpths = [subpth for _, subpth in sorted(dtimes_subpths)]
# yapf: disable
expected = [
pth / 'yet-another-description-1.1.2016.txt',
pth / 'some-description-20.3.2016.txt',
pth / 'other-description-7.4.2016.txt'
]
# yapf: enable
self.assertListEqual(subpths, expected)

Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.

You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Related

Select a random file fairly from nested directories

I have a file structure that looks something like this
main_directory
+-subdirectory1
+-file1
+-file2
+-file3
+-subdirectory2
+-file4
+-file5
+-subdirectory3
+-file6
I want to write a function, that gets the path to main_directory and returns one of the files at random, but with each file being equally likely. The nesting level of all files is the same and I can know in advance how deeply they are nested, though ideally I'd prefer to have a solution that works for all cases.
I know that I can use random.choice(os.listdir("/path/to/main_directory")) to get a random subdirectory, and I could repeat that recursively until I have a file or something, but that would, for example, make file6 a lot more likely than all the other files.
glob does the recursion for you when you include /**/ as part of the pattern.
from glob import glob
from random import choice
random_file = choice(glob(f'{main_directory}/**/*'))
If you want to be more specific, use a more specific glob pattern, such as **/*.log, or filter the list of files glob() returns in another way.
nesting level of all files is the same and I can know in advance how deeply they are nested
In such case you might harness glob.glob for example if files are just in subdirectories of current working directory you might do:
import glob
import os
import random
files = [i for i in glob.glob("*/*") if os.path.isfile(i)]
randomfile = random.choice(files)

prevent getfiles from seeing .DS and other hidden files

I am currently working on a python project on my macintosh, and from time to time I get unexpected errors, because .DS or other files, which are not visible to the "not-root-user" are found in folders. I am using the following command
filenames = getfiles.getfiles(file_directory)
to retreive information about the amount and name of the files in a folder. So I was wondering, if there is a possibility to prevent the getfiles command to see these types of files, by for example limiting its right or the extensions which it can see (all files are of .txt format)
Many thanks in advance!
In your case, I would recommend you switch to the Python standard library glob.
In your case, if all files are of .txt format, and they are located in the directory of /sample/directory/, you can use the following script to get the list of files you wanted.
from glob import glob
filenames = glob.glob("/sample/directory/*.txt")
You can easily use regular expressions to match files and filter out files you do not need. More details can be found from Here.
Keep in mind that with regular expression, you could do much more complicated pattern matching than the above example to handle your future needs.
Another good example of using glob to glob multiple extensions can be found from Here.
If you only want to get the basenames of those files, you can always use standard library os to extract basenames from the full paths.
import os
file_basenames = [os.path.basename(full_path) for full_path in filenames]
There isn't an option to filter within getfiles, but you could filter the list after.
Most-likely you will want to skip all "dot files" ("system files", those with a leading .), which you can accomplish with code like the following.
filenames = [f for f in ['./.a', './b'] if not os.path.basename(f).startswith('.')]
Welcome to Stackoverflow.
You might find the glob module useful. The glob.glob function takes a path including wildcards and returns a list of the filenames that match.
This would allow you to either select the files you want, like
filenames = glob.glob(os.path.join(file_directory, "*.txt")
Alternatively, select the files you don't want, and ignore them:
exclude_files = glob.glob(os.path.join(file_directory, ".*"))
for filename in getfiles.getfiles(file_directory):
if filename in exclude_files:
continue
# process the file

Is there a way to be able to use a variable path using os

The goal is to run through a half stable and half variable path.
I am trying to run through a path (go to lowest folder which is called Archive) and fill a list with files that have a certain ending. This works quite well for a stable path such as this.
fileInPath='\\server123456789\provider\COUNTRY\CATEGORY\Archive
My code runs through the path (recursive) and lists all files that have a certain ending. This works well. For simplicity I will just print the file name in the following code.
import csv
import os
fileInPath='\\\\server123456789\\provider\\COUNTRY\\CATEGORY\\Archive
fileOutPath=some path
csvSeparator=';'
fileList = []
for subdir, dirs, files in os.walk(fileInPath):
for file in files:
if file[-3:].upper()=='PAR':
print (file)
The problem is that I can manage to have country and category to be variable e.g. by using *
The standard library module pathlib provides a simple way to do this.
Your file list can be obtained with
from pathlib import Path
list(Path("//server123456789/provider/".glob("*/*/Archive/*.PAR"))
Note I'm using / instead of \\ pathlib handles the conversion for you on windows.

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Query Related to Python - Folders Read

I want to read folders in python and probably make a list of it. Now my main concern is that most recent folder should be at location that is known to me. It can be the first element or last element of list. I am attaching image suggesting folders name. I want folder with name 20181005 either first in the list or last in the list.
I have tried this task and used os.listdir, but I am not very much confident on the way this function reads folders and store in list form. Would it store first folder as element or will it use creation date or modification date. If I could sort on the basis of name (20181005 etc), it would be really good.
Kindly suggest suitable method for the same.
Regards
os.listdir returns directory contents in arbitrary order, but you can sort that yourself:
l = sorted(listdir())
Since it seems that your folder names are ISO dates, they should sort correctly and the most recent one should be the last element after sorting.
If you need to access creation & modification times you can do that with os.path functions. If you want to sort by that, I would probably choose to put it in something like a pandas DataFrame to make it easier to manipulate.
import os
from datetime import datetime
import pandas as pd
path = "."
objects = os.listdir(path)
dirs = list()
for o in objects:
opath = os.path.join(path, o)
if os.path.isdir(opath):
dirs.append(dict(path=opath,
mtime=datetime.fromtimestamp(os.path.getmtime(opath)),
ctime=datetime.fromtimestamp(os.path.getctime(opath))))
data = pd.DataFrame(dirs)
data.sort_values(by='mtime')
Assumed, your directories has YYYYMMDD format naming. Then you can use listdir and sort to get the latest directory in last index.
import os
from os import listdir
mypath = 'D:\\anil'
list_dirs = []
for f in listdir(mypath):
if os.path.isdir(os.path.join(mypath, f)):
list_dirs.append(f)
list_dirs.sort()
for current_dir in list_dirs:
print(current_dir)

Categories

Resources