Folder structure:
Folder
/ \
/ \
subfolder1 files
/\
/ \
inner_subfolder1 files
/\
/ \
sub_inner_folder files
/
files
Problem here is files in sub_inner_folder are not encrypted.
def encypt_Files():
for folder, subfolders, files in os.walk('/home/username/Desktop/folder'):
for subfolder in subfolders:
os.chdir(folder)
for files in os.listdir():
if files.endswith('.pdf'):
PdfReaderobj = PyPDF2.PdfFileReader(open(files, 'rb'))
PdfWriterobj = PyPDF2.PdfFileWriter()
if PdfReaderobj.isEncrypted:
break
else:
PdfWriterobj.addPage(PdfReaderobj.getPage(0))
PdfWriterobj.encrypt(sys.argv[1])
resultPdf = open(files.strip('.pdf')+'_encrypted.pdf', 'wb')
PdfWriterobj.write(resultPdf)
resultPdf.close()
probem here is files in sub_inner_folder are not encrypted.
You do os.chdir(folder) where is should be os.chdir(subfolder). Also, you need to change the directory back using os.chdir("..") when you're done with that directory.
If you start on the wrong working directory, you won't be able to chdir() anywhere. So you need a os.chdir("/home/username/Desktop/folder") first.
Also, permission may break out of the loop. Add
except FileNotFoundError:
pass # or whatever
except PermissionError:
pass # or whatever
But: os.walk() already gives you a list of files. You should just need to loop over these. That way you also get rid of os.listdir()
Yet another option which sounds totally reasonable to me:
import glob
for result in glob.iglob('/home/username/Desktop/folder/**/*.pdf'):
print(result)
One problem I see is that you're breaking out of the inner for loop on finding an encrypted file. That should probably be a continue, but your making a new iterator using files suggests you may need to rethink the whole strategy.
And another problem is that you're chdiring to a relative path that may no longer be relative to where you are. I suggest using os.path.join instead.
Oh, and you're chdiring to folder instead of to the presumably intended subfolder.
I suggest you start over. Use the files iterator provided by os.walk, and use os.path.join to list out the full path to each file in the directory structure. Then add your pdf encryption code using the full path to each file, and ditch chdir.
foldername=[] # used to store folder paths
for folders,subfolders,files in os.walk(path):
foldername.append(folders) # storing folder paths in list
for file_path in foldername:
os.chdir(file_path) # changing directory path from stored folder paths
for files in os.listdir():
if files.endswith('.pdf'):
pdfReaderobj=PyPDF2.PdfFileReader(open(files,'rb'))
pdfWriterobj=PyPDF2.PdfFileWriter()
if pdfReaderobj.isEncrypted:
continue
else:
pdfWriterobj.addPage(pdfReaderobj.getPage(0))
pdfWriterobj.encrypt(sys.argv[1])
resultPdf=open(files.strip('.pdf')+'_encrypted.pdf','wb')
pdfWriterobj.write(resultPdf)
resultPdf.close()
Related
I have my jupyter notebook (python script) in current directory. In current directory, I have two subfolders, namely a and b. In both directories a and b I have equal number of .dat files with same names. For example, directory a contains files, namely x1-x1-val_1, x1-x1-val_5, x1-x1-val_11...x1-x1-val_86 and x1-x2-val_1, x1-x2-val_5, x1-x2-val_11...x1-x2-val_86, i.e. values are in range(1,90,5). Likewise I have files in directory b.
I want to use my python script to access files in a and b to perform iterative operations on .dat files. My present code works only if I keep files of directory a or b in current directory. For example, my script uses following function.
def get_info(test):
my_dict = {'test':test}
c = []
for i in range(1,90,5):
x_val = 'x_val_'+test+'-val_'+str(i)
y_val = 'y_val_'+test+'-val_'+str(i)
my_dict[x_val],my_dict[y_val]= np.loadtxt(test+'-val_'+str(i)+'.dat'
,usecols= (1,2),unpack=True)
dw = compute_yy(my_dict[x_val],my_dict[y_val],test)
c.append(dw)
my_dict.update({test+'_c'+:np.array(c)})
return my_dict
I call get_info() by using following:
tests = ['x1-x1', 'x1-x2']
new_dict = {}
for i in tests:
new_dict.update({i:get_info(i)})
How can I use my code to access files in either directory a and/or b? I know its about providing correct path, but I am unsure how can I do so. One way I thought is following;
ext = '.dat'
for files in os.listdir(path_to_dir):
if files.endswith(ext):
print(files) # do operations
Alternative could be to make use of os.path.join(). However, I am unable to solve it such that I can use same python script (with minimum changes perhaps) that can use files and iterate on them which are in subfolders a and b. Thanks for your feedback in advance!
If you want to run get_info() on every folder separatelly then you have two methods:
First: described by #medium-dimensional in comment
You can use os.chdir(folder) to change Current Working Directory. And then code will run with files in this folder
You can see current working directory with print( os.getcwd() )
os.chdir("a")
get_info(i)
os.chdir("..") # move back to parent folder
os.chdir("b")
get_info(i)
os.chdir("..") # move back to parent folder
chdir() (similar to command cd in console) can use relative path (r"a") full path (r"C:\full\path\to\a") and .. to move to parent folder (r"a\..\b")
If files can be in nested folders then .. may not go back you can use getcwd()
cwd = os.getcwd()
os.chdir("folder1/folder2/a")
get_info(i)
os.chdir(cwd) # move back to previous folder
os.chdir("folder1/folder2/b")
get_info(i)
os.chdir(cwd) # move back to previous folder
(BTW: in console on Linux you can use cd - to move back to previous folder)
Second: use folder when you open file
Every command which gets filename can also get path with folder\filename (it can be relative path, full path, and path with ..) like
r"a\filename.dat"
r"C:\full\path\to\b\filename.dat"
r"a\..\b\filename.dat"
So you could define function with extra option folder
def get_info(text, folder):
and use this folder when you read file
loadtxt(folder + r'\' + test+'-val_'+str(i)+'.dat', ...)
or more readable with f-string
loadtxt(rf'{folder}\{test}-val_{i}.dat', ...)
And later you run it as
get_info(i, "a")
get_info(i, "b")
I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.
The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)
This is part of a program I'm writing. The goal is to extract all the GPX files, say at G:\ (specified with -e G:\ at the command line). It would create an 'Exports' folder and dump all files with matching extensions there, recursively that is. Works great, a friend helped me write it!! Problem: empty directories and subdirectories for dirs that did not contain GPX files.
import argparse, shutil, os
def ignore_list(path, files): # This ignore list is specified in the function below.
ret = []
for fname in files:
fullFileName = os.path.normpath(path) + os.sep + fname
if not os.path.isdir(fullFileName) \
and not fname.endswith('gpx'):
ret.append(fname)
elif os.path.isdir(fullFileName) \ # This isn't doing what it's supposed to.
and len(os.listdir(fullFileName)) == 0:
ret.append(fname)
return ret
def gpxextract(src,dest):
shutil.copytree(src,dest,ignore=ignore_list)
Later in the program we have the call for extractpath():
if args.extractpath:
path = args.extractpath
gpxextract(extractpath, 'Exports')
So the above extraction does work. But the len function call above is designed to prevent the creation of empty dirs and does not. I know the best way is to os.rmdir somehow after the export, and while there's no error, the folders remain.
So how can I successfully prune this Exports folder so that only dirs with GPXs will be in there? :)
If I understand you correctly, you want to delete empty folders? If that is the case, you can do a bottom up delete folder operation -- which will fail for any any folders that are not empty. Something like:
for root, dirs, files in os.walk('G:/', topdown=true):
for dn in dirs:
pth = os.path.join(root, dn)
try:
os.rmdir(pth)
except OSError:
pass
I need to os.walk from my parent path (tutu), by all subfolders. For each one, each of the deepest subfolders have the files that i need to process with my code. For all the deepest folders that have files, the file 'layout' is the same: one file *.adf.txt, one file *.idf.txt, one file *.sdrf.txt and one or more files *.dat., as pictures shown.
My problem is that i don't know how to use the os module to iterate, from my parent folder, to all subfolders sequentially. I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists. If exists, then verify if that file layout is present (this is no problem...), and if it is, then apply the code (no problem too). If not, and if that folder don't have more sub-folders, return to the parent folder and os.walk to the next subfolder, and this for all subfolders into my parent folder (tutu). To resume, i need some function like that below (written in python/imaginary code hybrid):
for all folders in tutu:
if os.havefiles in os.walk(current_path):#the 'havefiles' donĀ“t exist, i think...
for filename in os.walk(current_path):
if 'adf' in filename:
etc...
#my code
elif:
while true:
go deep
else:
os.chdir(parent_folder)
Do you think that is best a definition to call in my code to do the job?
this is the code that i've tried to use, without sucess, of course:
import csv
import os
import fnmatch
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
# print path to all subdirectories first.
for subdirname in subdirs:
print os.path.join(dirname, subdirname), 'os.path.join(dirname, subdirname)'
current_path= os.path.join(dirname, subdirname)
os.chdir(current_path)
for filename in os.walk(current_path):
print filename, 'f in os.walk'
if os.path.isdir(filename)==True:
break
elif os.path.isfile(filename)==True:
print filename, 'file'
#code here
Thanks in advance...
I need a function that, for the current subfolder in os.walk, if that subfolder is empty, continue to the sub-subfolder inside that subfolder, if it exists.
This doesn't make any sense. If a folder is empty, it doesn't have any subfolders.
Maybe you mean that if it has no regular files, then recurse into its subfolders, but if it has any, don't recurse, and instead check the layout?
To do that, all you need is something like this:
for dirname, subdirs, filenames in os.walk('.'):
if filenames:
# can't use os.path.splitext, because that will give us .txt instead of .adf.txt
extensions = collections.Counter(filename.partition('.')[-1]
for filename in filenames)
if (extensions['.adf.txt'] == 1 and extensions['.idf.txt'] == 1 and
extensions['.sdrf.txt'] == 1 and extensions['.dat'] >= 1 and
len(extensions) == 4):
# got a match, do what you want
# Whether this is a match or not, prune the walk.
del subdirs[:]
I'm assuming here that you only want to find directories that have exactly the specified files, and no others. To remove that last restriction, just remove the len(extensions) == 4 part.
There's no need to explicitly iterate over subdirs or anything, or recursively call os.walk from inside os.walk. The whole point of walk is that it's already recursively visiting every subdirectory it finds, except when you explicitly tell it not to (by pruning the list it gives you).
os.walk will automatically "dig down" recursively, so you don't need to recurse the tree yourself.
I think this should be the basic form of your code:
import csv
import os
import fnmatch
directoriesToMatch = [list here...]
filenamesToMatch = [list here...]
abs_path=os.path.abspath('.')
for dirname, subdirs, filenames in os.walk('.'):
if len(set(directoriesToMatch).difference(subdirs))==0: # all dirs are there
if len(set(filenamesToMatch).difference(filenames))==0: # all files are there
if <any other filename/directory checking code>:
# processing code here ...
And according to the python documentation, if you for whatever reason don't want to continue recursing, just delete entries from subdirs:
http://docs.python.org/2/library/os.html
If you instead want to check that there are NO sub-directories where you find your files to process, you could also change the dirs check to:
if len(subdirs)==0: # check that this is an empty directory
I'm not sure I quite understand the question, so I hope this helps!
Edit:
Ok, so if you need to check there are no files instead, just use:
if len(filenames)==0:
But as I stated above, it would probably be better to just look FOR specific files instead of checking for empty directories.
I am running a script that walks a directory structure and generates new files in each folder in the directory. I want to delete some of the files right after creation. This is my idea, but it is quite wrong I imagine:
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
os.remove(os.path.join(directory,"*.mesh.xml"))
How to you put wildcards in a path? I guess not like /home/me/*.txt, but that is what I am trying.
Thanks,
Gareth
You can use the glob module:
import glob
glob.glob("*.mesh.xml")
to get a list of matching files. Then you delete them, one by one.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
# you can use absolute pathes in the glob
# to ensure, that you're purging the files in
# the right directory, e.g. "/tmp/*.mesh.xml"
for f in glob.glob("*.mesh.xml"):
os.remove(f)
do a for loop with the list of files as the thing you are looping over.
directory = os.path.dirname(obj)
m = MeshExporterApplication(directory)
for filename in os.listdir(dir):
if not(re.match(".*\.mesh\".xml ,filename) is None):
os.remove(directory + "/" + file)