This is part of a program I'm writing. The goal is to extract all the GPX files, say at G:\ (specified with -e G:\ at the command line). It would create an 'Exports' folder and dump all files with matching extensions there, recursively that is. Works great, a friend helped me write it!! Problem: empty directories and subdirectories for dirs that did not contain GPX files.
import argparse, shutil, os
def ignore_list(path, files): # This ignore list is specified in the function below.
ret = []
for fname in files:
fullFileName = os.path.normpath(path) + os.sep + fname
if not os.path.isdir(fullFileName) \
and not fname.endswith('gpx'):
ret.append(fname)
elif os.path.isdir(fullFileName) \ # This isn't doing what it's supposed to.
and len(os.listdir(fullFileName)) == 0:
ret.append(fname)
return ret
def gpxextract(src,dest):
shutil.copytree(src,dest,ignore=ignore_list)
Later in the program we have the call for extractpath():
if args.extractpath:
path = args.extractpath
gpxextract(extractpath, 'Exports')
So the above extraction does work. But the len function call above is designed to prevent the creation of empty dirs and does not. I know the best way is to os.rmdir somehow after the export, and while there's no error, the folders remain.
So how can I successfully prune this Exports folder so that only dirs with GPXs will be in there? :)
If I understand you correctly, you want to delete empty folders? If that is the case, you can do a bottom up delete folder operation -- which will fail for any any folders that are not empty. Something like:
for root, dirs, files in os.walk('G:/', topdown=true):
for dn in dirs:
pth = os.path.join(root, dn)
try:
os.rmdir(pth)
except OSError:
pass
Related
Folder structure:
Folder
/ \
/ \
subfolder1 files
/\
/ \
inner_subfolder1 files
/\
/ \
sub_inner_folder files
/
files
Problem here is files in sub_inner_folder are not encrypted.
def encypt_Files():
for folder, subfolders, files in os.walk('/home/username/Desktop/folder'):
for subfolder in subfolders:
os.chdir(folder)
for files in os.listdir():
if files.endswith('.pdf'):
PdfReaderobj = PyPDF2.PdfFileReader(open(files, 'rb'))
PdfWriterobj = PyPDF2.PdfFileWriter()
if PdfReaderobj.isEncrypted:
break
else:
PdfWriterobj.addPage(PdfReaderobj.getPage(0))
PdfWriterobj.encrypt(sys.argv[1])
resultPdf = open(files.strip('.pdf')+'_encrypted.pdf', 'wb')
PdfWriterobj.write(resultPdf)
resultPdf.close()
probem here is files in sub_inner_folder are not encrypted.
You do os.chdir(folder) where is should be os.chdir(subfolder). Also, you need to change the directory back using os.chdir("..") when you're done with that directory.
If you start on the wrong working directory, you won't be able to chdir() anywhere. So you need a os.chdir("/home/username/Desktop/folder") first.
Also, permission may break out of the loop. Add
except FileNotFoundError:
pass # or whatever
except PermissionError:
pass # or whatever
But: os.walk() already gives you a list of files. You should just need to loop over these. That way you also get rid of os.listdir()
Yet another option which sounds totally reasonable to me:
import glob
for result in glob.iglob('/home/username/Desktop/folder/**/*.pdf'):
print(result)
One problem I see is that you're breaking out of the inner for loop on finding an encrypted file. That should probably be a continue, but your making a new iterator using files suggests you may need to rethink the whole strategy.
And another problem is that you're chdiring to a relative path that may no longer be relative to where you are. I suggest using os.path.join instead.
Oh, and you're chdiring to folder instead of to the presumably intended subfolder.
I suggest you start over. Use the files iterator provided by os.walk, and use os.path.join to list out the full path to each file in the directory structure. Then add your pdf encryption code using the full path to each file, and ditch chdir.
foldername=[] # used to store folder paths
for folders,subfolders,files in os.walk(path):
foldername.append(folders) # storing folder paths in list
for file_path in foldername:
os.chdir(file_path) # changing directory path from stored folder paths
for files in os.listdir():
if files.endswith('.pdf'):
pdfReaderobj=PyPDF2.PdfFileReader(open(files,'rb'))
pdfWriterobj=PyPDF2.PdfFileWriter()
if pdfReaderobj.isEncrypted:
continue
else:
pdfWriterobj.addPage(pdfReaderobj.getPage(0))
pdfWriterobj.encrypt(sys.argv[1])
resultPdf=open(files.strip('.pdf')+'_encrypted.pdf','wb')
pdfWriterobj.write(resultPdf)
resultPdf.close()
I am trying to create a program that copies files with certain file extension to the given folder. When files are located in subfolders instead of the root folder the program fails to get correct path. In its current state the program works perfectly for the files in the root folder, but it crashes when it finds matching items in subfolders. The program tries to use rootfolder as directory instead of the correct subfolder.
My code is as follows
# Selective_copy.py walks through file tree and copies files with
# certain extension to give folder
import shutil
import os
import re
# Deciding the folders and extensions to be targeted
# TODO: user input instead of static values
extension = "zip"
source_folder = "/Users/viliheikkila/documents/kooditreeni/"
destination_folder = "/Users/viliheikkila/documents/test"
def Selective_copy(source_folder):
# create regex to identify file extensions
mo = re.compile(r"(\w+).(\w+)") # Group(2) represents the file extension
for dirpath, dirnames, filenames in os.walk(source_folder):
for i in filenames:
if mo.search(i).group(2) == extension:
file_path = os.path.abspath(i)
print("Copying from " + file_path + " to " + destination_folder)
shutil.copy(file_path, destination_folder)
Selective_copy(source_folder)
dirpath is one of the things provided by walk for a reason: it gives the path to the directory that the items in files is located in. You can use that to determine the subfolder you should be using.
file_path = os.path.abspath(i)
This line is blatantly wrong.
Keep in mind that filenames keeps list of base file names. At this point it's just a list of strings and (technically) they are not associated at all with files in filesystem.
os.path.abspath does string-only operations and attempts to merge file name with current working dir. As a result, merged filename points to file that does not exist.
What should be done is merge between root and base file name (both values yield from os.walk):
file_path = os.path.abspath(dirpath, i)
I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' don´t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...
I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).
os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.
my program does not believe that folders are directory, assuming theyre files, and because of this, the recursion prints the folders as files, then since there are no folders waiting to be traversed through, the program finishes.
import os
import sys
class DRT:
def dirTrav(self, dir, buff):
newdir = []
for file in os.listdir(dir):
print(file)
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
for f in newdir:
print("dir: " + f)
self.dirTrav(f, "")
dr = DRT()
dr.dirTrav(".", "")
See os.walk from there:
This example displays the number of bytes taken by non-directory files in each directory under the starting directory, except that it doesn’t look under any CVS subdirectory:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
The problem is that you're not checking the right thing. file is just the filename, not the pathname. That's why you need os.path.join(dir, file), on the next line, right? So you need it in the isdir call, too. But you're just passing file.
So, instead of asking "is .foo/bar/baz a directory?" you're just asking "is baz a directory?" It interprets just baz as ./baz, as you'd expect. And, since there (probably) is no "./baz", you get back False.
So, change this:
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
to:
path = os.path.join(dir, file)
if os.path.isdir(path):
newdir.append(path)
All that being said, using os.walk as sotapme suggested is simpler than trying to build it yourself.
I am running a script that walks a directory structure and generates new files in each folder in the directory. I want to delete some of the files right after creation. This is my idea, but it is quite wrong I imagine:
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
os.remove(os.path.join(directory,"*.mesh.xml"))
How to you put wildcards in a path? I guess not like /home/me/*.txt, but that is what I am trying.
Thanks,
Gareth
You can use the glob module:
import glob
glob.glob("*.mesh.xml")
to get a list of matching files. Then you delete them, one by one.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
# you can use absolute pathes in the glob
# to ensure, that you're purging the files in
# the right directory, e.g. "/tmp/*.mesh.xml"
for f in glob.glob("*.mesh.xml"):
os.remove(f)
do a for loop with the list of files as the thing you are looping over.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
for filename in os.listdir(dir):
if not(re.match(".*\.mesh\".xml ,filename) is None):
os.remove(directory + "/" + file)