python - get highest number in filenames in a directory [duplicate] - python

This question already has answers here:
Find file in directory with the highest number in the filename
(2 answers)
Closed 3 years ago.
I'm developing a timelapse camera on a read-only filesystem which writes images on a USB stick, without real-time clock and internet connection, then I can't use datetime to maintain the temporal order of files and prevent overwriting.
So I could store images as 1.jpg, 2.jpg, 3.jpg and so on and update the counter in a file last.txt on the USB stick, but I'd rather avoid to do that and I'm trying to calculate the last filename at boot, but having 9.jpg and 10.jpg print(max(glob.glob('/home/pi/Desktop/timelapse/*'))) returns me 9.jpg, also I think that glob would be slow with thousands of files, how can I solve this?
EDIT
I found this solution:
import glob
import os
import ntpath
max=0
for name in glob.glob('/home/pi/Desktop/timelapse/*.jpg'):
n=int(os.path.splitext(ntpath.basename(name))[0])
if n>max:
max=n
print(max)
but it takes about 3s every 10.000 files, is there a faster solution apart divide files into sub-folders?

Here:
latest_file_index = max([int(f[:f.index('.')]) for f in os.listdir('path_to_folder_goes_here')])
Another idea is just to use the length of the file list (assuming all fiels in the folder are the jpg files)
latest_file_index = len(os.listdir(dir))

You need to extract the numbers from the filenames and convert them to integer to get proper numeric ordering.
For example like so:
from pathlib import Path
folder = Path('/home/pi/Desktop/timelapse')
highest = max(int(file.stem) for file in folder.glob('*.jpg'))
For more complicated file-name patterns this approach could be extended with regular expressions.

Using re:
import re
filenames = [
'file1.jpg',
'file2.jpg',
'file3.jpg',
'file4.jpg',
'fileA.jpg',
]
### We'll match on a general pattern of any character before a number + '.jpg'
### Then, we'll look for a file with that number in its name and return the result
### Note: We're grouping the number with parenthesis, so we have to extract that with each iteration.
### We also skip over non-matching results with teh conditional 'if'
### Since a list is returned, we can unpack that by calling index zero.
max_file = [file for file in filenames if max([re.match(r'.*(\d+)\.jpg', i).group(1) for i in filenames if re.match(r'.*(\d+)\.jpg', i)]) in file][0]
print(f'The file with the maximum number is: {max_file}')
Output:
The file with the maximum number is: file4.jpg
Note: This will work whether there are letters before the number in the filename or not, so you can name the files (pretty much) whatever you want.
*Second solution: Use the creation date. *
This is similar to the first, but we'll use the os module and iterate the directory, returning a file with the latest creation date:
import os
_dir = r'C:\...\...'
max_file = [x for x in os.listdir(_dir) if os.path.getctime(os.path.join(_dir, x)) == max([os.path.getctime(os.path.join(_dir, i)) for i in os.listdir(_dir)])]

You can use os.walk(), because it gives you the list of filenames it founds, and then append in another list every value you found after removing '.jpg' extension and casting the string to int, and then a simple call of max will do the work.
import os
# taken from https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
_, _, filenames = next(os.walk(os.getcwd()), (None, None, []))
values = []
for filename in filenames:
try:
values.append(int(filename.lower().replace('.jpg','')))
except ValueError:
pass # not a file with format x.jpg
max_value = max(values)

Related

How can i read specific files in a folder (files within a range)in Python

For example, I have some 43000 txt files in my folder, however, I want to read not all the files but just some of them by giving in a range, like from 1.txt till 14400.txt`. How can I achieve this in Python? For now, I'm reading all the files in a directory like
for each in glob.glob("data/*.txt"):
with open(each , 'r') as file:
content = file.readlines()
with open('{}.csv'.format(each[0:-4]) , 'w') as file:
file.writelines(content)
Any way I can achieve the desired results?
Since glob.glob() returns an iterable, you can simply iterate over a certain section of the list using something like:
import glob
for each in glob.glob("*")[:5]:
print(each)
Just use variable list boundaries and I think this achieves the results you are looking for.
Edit: Also, be sure that you are not trying to iterate over a list slice that is out of bounds, so perhaps a check for that prior might be in order.
If the files have numerically consecutive names starting with 1.txt, you can use range() to help construct the filenames:
for num in range(1, 14400):
filename = "data/%d.txt" % num
I found a solution here: How to extract numbers from a string in Python?
import os
import re
filepath = './'
for filename in os.listdir():
numbers_in_name = re.findall('\d',filename)
if (numbers_in_name != [] and int(numbers_in_name[0]) < 5 ) :
print(os.path.join(filepath,filename))
#do other stuff with the filenames
You can use re to get the numbers in the filename. This prints all filenames where the first number is smaller than 5 for example.

Searching for filenames containing multiple keywords in python

I have a directory containing a large number of files. I want to find all files, where the file name contains specific strings (e.g. a certain ending like '.txt', a model ID 'model_xy', etc.) as well as one of the entries in an integer array (e.g. a number of years I would like to select).
I tried this the following way:
import numpy as np
import glob
startyear = 2000
endyear = 2005
timerange = str(np.arange(startyear,endyear+1))
data_files = []
for file in glob.glob('/home/..../*model_xy*'+timerange+'.txt'):
data_files.append(file);
print(data_files)
Unfortunately, like this, other files outside of my 'timerange' are still selected.
You can use regex in glob.glob. Moreover, glob.glob returns a list so you don't need to iterate through it and append to new list.
import glob
data_files = glob.glob("/home/..../*model_xy*200[0-5].txt")
# or if you want to do recursive search you can use **
# Below code will search for all those files under /home/ recursively
data_file = glob.glob("/home/**/*model_xy*200[0-5].txt")

How can I move files with random names from one folder to another in Python? [duplicate]

This question already has answers here:
python copy files by wildcards
(3 answers)
Closed 4 years ago.
I have a large number of .txt files named in the combination of "cb" + number (like cb10, cb13), and I need to filter them out from a source folder that contains all the files named in "cb + number", including the target files.
The numbers in the target file names are all random, so I have to list all the file names.
import fnmatch
import os
import shutil
os.chdir('/Users/college_board_selection')
os.getcwd()
source = '/Users/college_board_selection'
dest = '/Users/seperated_files'
files = os.listdir(source)
for f in os.listdir('.'):
names = ['cb10.txt','cb11.txt']
if names in f:
shutil.move(f,dest)
if names in f: isn't going to work as f is a filename, not a list. Maybe you want if f in names:
But you don't need to scan a whole directory for this, just loop on the files you're targetting, it they exist:
for f in ['cb10.txt','cb11.txt']:
if os.path.exists(f):
shutil.move(f,dest)
If you have a lot of cbxxx.txt files, maybe an alternative would be to compute the intersection of this list with the result of os.listdir using a set (for faster lookup than a list, worth if there are a lot of elements):
for f in {'cb10.txt','cb11.txt'}.intersection(os.listdir(".")):
shutil.move(f,dest)
On Linux, with a lot of "cb" files, this would be faster because listdir doesn't perform a fstat, whereas os.path.exists does.
EDIT: if the files have the same prefix/suffix, you can build the lookup set with a set comprehension to avoid tedious copy/paste:
s = {'cb{}.txt'.format(i) for i in ('10','11')}
for f in s.intersection(os.listdir(".")):
or for the first alternative:
for p in ['10','11']:
f = "cb{}.txt".format(p)
if os.path.exists(f):
shutil.move(f,dest)
EDIT: if all cb*.txt files must be moved, then you can use glob.glob("cb*.txt"). I won't elaborate, the linked "duplicate target" answer explains it better.

Glob search files in date order?

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.
for searchedfile in glob.glob("*cycle*.log"):
This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.
Is there a way to force the code to search by date order?
This question has been asked for php but I am not sure of the differences.
Thanks
To sort files by date:
import glob
import os
files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))
See also Sorting HOW TO.
Essentially the same as #jfs but in one line using sorted
import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)
Well. The answer is nope. glob uses os.listdir which is described by:
"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."
So you are actually lucky that you got it sorted. You need to sort it yourself.
This works for me:
import glob
import os
import time
searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))
for file in files:
print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )
Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.
If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).
However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.
Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).
From the module's test cases:
import pathlib
import tempfile
import datetime_glob
def test_sort_listdir(self):
with tempfile.TemporaryDirectory() as tempdir:
pth = pathlib.Path(tempdir)
(pth / 'some-description-20.3.2016.txt').write_text('tested')
(pth / 'other-description-7.4.2016.txt').write_text('tested')
(pth / 'yet-another-description-1.1.2016.txt').write_text('tested')
matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]
subpths = [subpth for _, subpth in sorted(dtimes_subpths)]
# yapf: disable
expected = [
pth / 'yet-another-description-1.1.2016.txt',
pth / 'some-description-20.3.2016.txt',
pth / 'other-description-7.4.2016.txt'
]
# yapf: enable
self.assertListEqual(subpths, expected)
Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.
You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Finding and printing file name of zero length files in python

I'm learning about the os module and I need to work out to print the file names of only zero length files and the count.
So far I've figured the easiest way to do it is to generate a list or a tuple of files and their sizes in this format:
((zerotextfile1.txt, 0), (notazerotextfile.txt, 15))
Then use an if statement to only print out only files with zero length.
Then use a sum function to add the number of list items to get the count of zero length files.
So far, I've got bits and pieces - it's how to put them together I'm having trouble with.
Some of my bits (viable code I've managed to write, not much, I know):
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
print (files)
Then there is stuff like os.path.getsize() which requires I put in a filename, so I figure I've got to use a for loop to print a list of the file names in this function in order to get it to work, right?
Any tips or pointing in the right direction would be vastly appreciated!
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for root, dirs, files in os.walk(place):
for f in files:
file_path = os.path.join(root, f) #Full path to the file
size = os.path.getsize(file_path) #pass the full path to getsize()
if size == 0:
print f, file_path
are you looking for the following ?
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
if os.path.getsize(b[0] + '\\' +b[2][0]) == 0:
print (files)

Categories

Resources