Finding and printing file name of zero length files in python - python

I'm learning about the os module and I need to work out to print the file names of only zero length files and the count.
So far I've figured the easiest way to do it is to generate a list or a tuple of files and their sizes in this format:
((zerotextfile1.txt, 0), (notazerotextfile.txt, 15))
Then use an if statement to only print out only files with zero length.
Then use a sum function to add the number of list items to get the count of zero length files.
So far, I've got bits and pieces - it's how to put them together I'm having trouble with.
Some of my bits (viable code I've managed to write, not much, I know):
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
print (files)
Then there is stuff like os.path.getsize() which requires I put in a filename, so I figure I've got to use a for loop to print a list of the file names in this function in order to get it to work, right?
Any tips or pointing in the right direction would be vastly appreciated!

import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for root, dirs, files in os.walk(place):
for f in files:
file_path = os.path.join(root, f) #Full path to the file
size = os.path.getsize(file_path) #pass the full path to getsize()
if size == 0:
print f, file_path

are you looking for the following ?
import os
place = 'C:\\Users\\Me\\Documents\\Python Programs\\'
for files in os.walk(place):
if os.path.getsize(b[0] + '\\' +b[2][0]) == 0:
print (files)

Related

python - get highest number in filenames in a directory [duplicate]

This question already has answers here:
Find file in directory with the highest number in the filename
(2 answers)
Closed 3 years ago.
I'm developing a timelapse camera on a read-only filesystem which writes images on a USB stick, without real-time clock and internet connection, then I can't use datetime to maintain the temporal order of files and prevent overwriting.
So I could store images as 1.jpg, 2.jpg, 3.jpg and so on and update the counter in a file last.txt on the USB stick, but I'd rather avoid to do that and I'm trying to calculate the last filename at boot, but having 9.jpg and 10.jpg print(max(glob.glob('/home/pi/Desktop/timelapse/*'))) returns me 9.jpg, also I think that glob would be slow with thousands of files, how can I solve this?
EDIT
I found this solution:
import glob
import os
import ntpath
max=0
for name in glob.glob('/home/pi/Desktop/timelapse/*.jpg'):
n=int(os.path.splitext(ntpath.basename(name))[0])
if n>max:
max=n
print(max)
but it takes about 3s every 10.000 files, is there a faster solution apart divide files into sub-folders?
Here:
latest_file_index = max([int(f[:f.index('.')]) for f in os.listdir('path_to_folder_goes_here')])
Another idea is just to use the length of the file list (assuming all fiels in the folder are the jpg files)
latest_file_index = len(os.listdir(dir))
You need to extract the numbers from the filenames and convert them to integer to get proper numeric ordering.
For example like so:
from pathlib import Path
folder = Path('/home/pi/Desktop/timelapse')
highest = max(int(file.stem) for file in folder.glob('*.jpg'))
For more complicated file-name patterns this approach could be extended with regular expressions.
Using re:
import re
filenames = [
'file1.jpg',
'file2.jpg',
'file3.jpg',
'file4.jpg',
'fileA.jpg',
]
### We'll match on a general pattern of any character before a number + '.jpg'
### Then, we'll look for a file with that number in its name and return the result
### Note: We're grouping the number with parenthesis, so we have to extract that with each iteration.
### We also skip over non-matching results with teh conditional 'if'
### Since a list is returned, we can unpack that by calling index zero.
max_file = [file for file in filenames if max([re.match(r'.*(\d+)\.jpg', i).group(1) for i in filenames if re.match(r'.*(\d+)\.jpg', i)]) in file][0]
print(f'The file with the maximum number is: {max_file}')
Output:
The file with the maximum number is: file4.jpg
Note: This will work whether there are letters before the number in the filename or not, so you can name the files (pretty much) whatever you want.
*Second solution: Use the creation date. *
This is similar to the first, but we'll use the os module and iterate the directory, returning a file with the latest creation date:
import os
_dir = r'C:\...\...'
max_file = [x for x in os.listdir(_dir) if os.path.getctime(os.path.join(_dir, x)) == max([os.path.getctime(os.path.join(_dir, i)) for i in os.listdir(_dir)])]
You can use os.walk(), because it gives you the list of filenames it founds, and then append in another list every value you found after removing '.jpg' extension and casting the string to int, and then a simple call of max will do the work.
import os
# taken from https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
_, _, filenames = next(os.walk(os.getcwd()), (None, None, []))
values = []
for filename in filenames:
try:
values.append(int(filename.lower().replace('.jpg','')))
except ValueError:
pass # not a file with format x.jpg
max_value = max(values)

Segregate files based on filename

I've got a directory containing multiple images, and I need to separate them into two folders based on a portion of the file name. Here's a sample of the file names:
22DEC167603520981127600_03.jpg
13NOV162302999230157801_07.jpg
08JAN147603811108236510_02.jpg
21OCT152302197661710099_07.jpg
07MAR172302551529900521_01.jpg
19FEB173211074174309177_09.jpg
19FEB173211881209232440_02.jpg
19FEB172302491000265198_04.jpg
I need to move the files into two folders according to the numbers in bold after the date - so files containing 2302 and 3211 would go into an existing folder named "panchromatic" and files with 7603 would go into another folder named "sepia".
I've tried multiple examples from other questions, and none seem to fit this problem. I'm very new to Python, so I'm not sure what example to post. Any help would be greatly appreciated.
You can do this the easy way or the hard way.
Easy way
Test if your filename contains the substring you're looking for.
import os
import shutil
files = os.listdir('.')
for f in files:
# skip non-jpeg files
if not f.endswith('.jpg'):
continue
# move if panchromatic
if '2302' in f or '3211' in f:
shutil.move(f, os.path.join('panchromatic', f))
# move if sepia
elif '7603' in f:
shutil.move(f, os.path.join('sepia', f))
# notify if something else
else:
print('Could not categorize file with name %s' % f)
This solution in its current form is susceptible to mis-classification, as the number we're looking for can appear by chance later in the string. I'll leave you to find ways to mitigate this.
Hard way
Regular expressions. Match the four letter digits after the date with a regular expression. Left for you to explore!
Self explanative, with Python 3, or Python 2 + backport pathlib:
import pathlib
import shutil
# Directory paths. Tailor this to your files layout
# see https://docs.python.org/3/library/pathlib.html#module-pathlib
source_dir = pathlib.Path('.')
sepia_dir = source_dir / 'sepia'
panchro_dir = source_dir / 'panchromatic'
assert sepia_dir.is_dir()
assert panchro_dir.is_dir()
destinations = {
('2302', '3211'): panchro_dir,
('7603',): sepia_dir
}
for filename in source_dir.glob('*.jpg'):
marker = str(filename)[7:11]
for key, value in destinations.items():
if marker in key:
filepath = source_dir / filename
shutil.move(str(filepath), str(value))

Given a filename, go to the next file in a directory

I am writing a method that takes a filename and a path to a directory and returns the next available filename in the directory or None if there are no files with names that would sort after the file.
There are plenty of questions about how to list all the files in a directory or iterate over them, but I am not sure if the best solution to finding a single next filename is to use the list that one of the previous answers generated and then find the location of the current file in the list and choose the next element (or None if we're already on the last one).
EDIT: here's my current file-picking code. It's reused from a different part of the project, where it is used to pick a random image from a potentially nested series of directories.
# picks a file from a directory
# if the file is also a directory, pick a file from the new directory
# this might choke up if it encounters a directory only containing invalid files
def pickNestedFile(directory, bad_files):
file=None
while file is None or file in bad_files:
file=random.choice(os.listdir(directory))
#file=directory+file # use the full path name
print "Trying "+file
if os.path.isdir(os.path.join(directory, file))==True:
print "It's a directory!"
return pickNestedFile(directory+"/"+file, bad_files)
else:
return directory+"/"+file
The program I am using this in now is to take a folder of chatlogs, pick a random log, starting position, and length. These will then be processed into a MOTD-like series of (typically) short log snippets. What I need the next-file picking ability for is when the length is unusually long or the starting line is at the end of the file, so that it continues at the top of the next file (a.k.a. wrap around midnight).
I am open to the idea of using a different method to choose the file, since the above method does not discreetly give a separate filename and directory and I'd have to go use a listdir and match to get an index anyway.
You should probably consider rewriting your program to not have to use this. But this would be how you could do it:
import os
def nextFile(filename,directory):
fileList = os.listdir(directory)
nextIndex = fileList.index(filename) + 1
if nextIndex == 0 or nextIndex == len(fileList):
return None
return fileList[nextIndex]
print(nextFile("mail","test"))
I tweaked the accepted answer to allow new files to be added to the directory on the fly and for it to work if a file is deleted or changed or doesn't exist. There are better ways to work with filenames/paths, but the example below keeps it simple. Maybe it's helpful:
import os
def next_file_in_dir(directory, current_file=None):
file_list = os.listdir(directory)
next_index = 0
if current_file in file_list:
next_index = file_list.index(current_file) + 1
if next_index >= len(file_list):
next_index = 0
return file_list[next_index]
file_name = None
directory = "videos"
user_advanced_to_next = True
while user_advanced_to_next:
file_name = next_file_in_dir(directory=directory, current_file=file_name )
user_advanced_to_next = play_video("{}/{}".format(directory, file_name))
finish_and_clean_up()

Copy multiple files with control

I want to copy multiple files in one directory and copy and rename the file in increments of 500. For example the first 500 files in C:\Pics (with random original names) will be renamed 500-1000 and the new directory they are placed in is called 500…….files 1000-1500 would go into directory 1000 and so on.
The current code does not rename the files put instead puts it in a new directory with the correct number. This was just a start. I believe the code below Is a good start can anyone help me modify to get the results desired?
import os, glob
target = 'C:\Pics'
prefix = 'p0'
os.chdir(target)
allfiles = os.listdir(target)
count = 500
for filename in allfiles:
if not glob.glob('*.jpg'): continue
dirname = prefix + str(count)
target = os.path.join(dirname, filename)
os.renames(filename, target)
count +=1
os.listdir and glob.glob are similar functions. They both return lists of files/dirs, so they don't belong in the same loop (at least not the way you're trying to use them). The main difference is that os.listdir just takes a directory and returns basically *.* from it (minus . and ..), where as glob.glob expects a "globbing pattern" which can contain * ? [] in a restricted regex format. The function you might be thinking of here (instead of glob.glob) is fnmatch.fnmatch, which applies a globbing pattern to a single file name.
os.listdir(path)
Return a list containing the names of the entries in the directory
given by path. The list is in arbitrary order. It does not include the
special entries '.' and '..' even if they are present in the
directory.
Availability: Unix, Windows.
Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a Unicode object, the result > will be a list of Unicode objects. Undecodable filenames will still be returned as string
objects.
glob.glob(pathname)
Return a possibly-empty list of path names that
match pathname, which must be a string containing a path
specification. pathname can be either absolute (like
/usr/src/Python-1.5/Makefile) or relative (like ../../Tools//.gif),
and can contain shell-style wildcards. Broken symlinks are included in
the results (as in the shell).
Sorry, too lazy to actually mock up files and test this, but then I'd be doing all the work for you. But this should work (or be a darn close to what I think you're aiming at). ;)
import os
import fnmatch
import os.path
target = 'C:\Pics'
os.chdir(target)
allfiles = os.listdir(target)
count = 500
for filename in allfiles:
if not fnmatch.fnmatch(filename, '*.jpg'):
continue
if count % 500 == 0:
dirname = 'p%04d' % count
if not os.path.exists(dirname):
os.mkdir(dirname)
target = os.path.join(dirname, '%d.jpg' % count)
os.rename(filename, target)
count += 1

Moving specific files in subdirectories into a directory - python

Im rather new to python but I have been attemping to learn the basics.
Anyways I have several files that once i have extracted from their zip files (painfully slow process btw) produce several hundred subdirectories with 2-3 files in each. Now what I want to do is extract all those files ending with 'dem.tif' and place them in a seperate file (move not copy).
I may have attempted to jump into the deep end here but the code i've written runs without error so it must not be finding the files (that do exist!) as it gives me the else statement. Here is the code i've created
import os
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
def move():
for (dirpath, dirs, files) in os.walk(src):
if files.endswith('dem.tif'):
shutil.move(os.path.join(src,files),dst)
print ('Moving ', + files, + ' to ', + dst)
else:
print 'No Such File Exists'
First, welcome to the community, and python! You might want to change your user name, especially if you frequent here. :)
I suggest the following (stolen from Mr. Beazley):
# genfind.py
#
# A function that generates files that match a given filename pattern
import os
import shutil
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
src = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Extracted' # input
dst = 'O:\DATA\ASTER GDEM\Original\North America\UTM Zone 14\USA\Analyses' # desired location
filesToMove = gen_find("*dem.tif",src)
for name in filesToMove:
shutil.move(name, dst)
I think you've mixed up the way you should be using os.walk().
for dirpath, dirs, files in os.walk(src):
print dirpath
print dirs
print files
for filename in files:
if filename.endswith('dem.tif'):
shutil.move(...)
else:
...
Update: the questioner has clarified below that he / she is actually calling the move function, which was the first point in my answer.
There are a few other things to consider:
You've got the order of elements returned in each tuple from os.walk wrong, I'm afraid - check the documentation for that function.
Assuming you've fixed that, also bear in mind that you need to iterate over files, and you need to os.join each of those to root, rather than src
The above would be obvious, hopefully, if you print out the values returned by os.walk and comment out the rest of the code in that loop.
With code that does potentially destructive operations like moving files, I would always first try some code that just prints out the parameters to shutil.move until you're sure that it's right.
Any particular reason you need to do it in Python? Would a simple shell command not be simpler? If you're on a Unix-like system, or have access to Cygwin on Windows:
find src_dir -name "*dem.tif" -exec mv {} dst_dir

Categories

Resources