I have code that converts all .jpgs in a folder to one PDF, but it is not working. I believe it to be because of something with my directory being passed. The below is the code and my output. Now it states that my PDF was written, but it doesn't display the directory.
root = "C:\\Users\\Matthew\\Desktop\\Comics\\"
try:
n = 0
for dirpath, dirnames, filenames in os.walk(root):
PdfOutputFileName = os.path.basename(dirpath) + ".pdf"
c = canvas.Canvas(PdfOutputFileName)
if n > 0 :
for filename in filenames:
LowerCaseFileName = filename.lower()
if LowerCaseFileName.endswith(".jpg"):
print(filename)
filepath = os.path.join(dirpath, filename)
print(filepath)
im = ImageReader(filepath)
imagesize = im.getSize()
c.setPageSize(imagesize)
c.drawImage(filepath,0,0)
c.showPage()
c.save()
n = n + 1
print "PDF of Image directory created" + PdfOutputFileName
except:
print "Failed creating PDF"
The below is my output:
PDF of Image directory created.pdf
At start n is 0.
The os.walk loop only runs once in this case (since there's probably only one directory to scan, and you get only one print statement so that's the proof), providing filenames and dirnames as iterables, but you skip the iteration by testing n > 0, so it does nothing.
for dirpath, dirnames, filenames in os.walk(root):
PdfOutputFileName = os.path.basename(dirpath) + ".pdf"
c = canvas.Canvas(PdfOutputFileName)
if n > 0 :
My advice: get rid of the if n > 0 test.
Have a look at img2pdf for lossless conversion in Python:
https://gitlab.mister-muffin.de/josch/img2pdf
Example CLI usage:
img2pdf img1.png img2.jpg -o out.pdf
Related
I have a folder that is structured like this
post
-----1
------10am
-----------images
-----2
-------10am
-----------images
-----3
------10am
-----------images
this folder is up to 31 with the same subfolder '10am' and inside that is another folder 'images'
in another folder, I have all the image and .txt files that that I need to copy based on folder name in python
So what I need to do now is copy
'2.jpg' inside 'post\2\10am\images' and
'2.txt' inside ''post\2\10am' and so on
here is my code so far:
import os,shutil
sampleFiles = r"\practice image and text"
destination = r"\posts"
time = '10am'
sample = os.listdir(sampleFiles)
# sample = ['10.jpg', '10.txt', '11.jpg', '11.txt', '13.png', '13.txt', '16.jpg', '16.txt', '17.jpg', '17.txt', '18.jpg', '18.txt', '2.jpg', '2.txt', '20.jpg', '20.txt', '23.jpg', '23.txt', '24.jpg', '24.txt', '25.jpg', '25.txt', '27.jpg', '27.txt', '3.jpg', '3.txt','4.jpg', '4.txt', '5.jpg', '5.txt', '6.jpg', '6.txt', '9.jpg', '9.txt']
for root, dirs, files in os.walk(destination):
for folderName in dirs:
#get root + foldername
rootWithFolder = os.path.join(root, folderName)
#get path to date
pathToDate = rootWithFolder.endswith(int(folderName)) # how to get the number?
# get path to image folders
if rootWithFolder.endswith('images'):
pathToImage = rootWithFolder
#copy .jpg files to pathToImage
shutil.copy(sampleFiles + '\\' + str(pathToDate) + '.jpg' , pathToImage) #not the most elegant way
#copy .txt files to pathToDate
shutil.copy(sampleFiles + '\\' + str(pathToDate) + '.txt' , pathToDate + '\\' + 'time') #not the most elegant way
in my code, I am stuck on how to get pathToDate so I can just copy it based on the name of the folder,
I tried using def like this:
def allfiles(list):
for i in range(len(list)):
return list[i] # returns only the first value of the list
# print(list[i]) #but this one returns all the value of the list
allfiles(sample)
but it only returns 1 instance of the list.
My question is, how can I get the folder that is named as a number and ignore the strings like 10am folder and images folder
or is there a better way to do this? Thank you
Here's a suggestion if you're still looking for a solution... I'd go the other way round:
from pathlib import Path
from shutil import copy
sample_folder = Path("practice image and text")
dest_folder = Path("posts")
for file in sample_folder.glob("*.*"):
number, suffix = file.name.split(".")
if suffix == "txt":
copy(file, dest_folder / number / "10am")
else:
copy(file, dest_folder / number / "10am" / "images")
A script was supplied to me in order to upload files to a cloud bucket. You input the dir where the files you want to upload are and bingo bango, done.
What needs to happen is that there are additional sub dirs with their own files in them that I would like to transfer as well based on the input of the root dir. They would need to retain their tree structure relative to the root dir input.
Using the current code I get a write error/access denied fail. I know this is because the for loop is using os.listdir which can't parse the extra sub dirs and files but I'm not sure how to modify.
I attempted to get all the information I needed using os.walk and parsing that out. I verified with some print tests that it was looking in the right place for everything. However I hit a wall when I got this error when running the script:
folder\folder\lib\ntpath.py", line 76, in join
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not list
I understand that something is being generated as a list when it shouldn't be but I'm not sure how to go about this...
This is the original script provided to me below. I have added the variable at the top just to be a little less abstract.
local_directory_path = 'C:\folder\folder\sync\FROM_LOCAL_UPLOAD'
def upload_folder_to_cloud(self, mount_id, local_directory_path):
''' This method will list every file at the local_directory_path and then for each,
it will call the api method athera.sync.upload_file for every file in your local directory
'''
_, destination_folder = os.path.split(local_directory_path)
if not destination_folder:
self.logger.error("Make sure the provided 'local_directory_path' does not end with a '/' or a '\\'")
sys.exit(2)
destination_folder = destination_folder + "/"
self.logger.info("Folder = {}".format(destination_folder))
for filename in os.listdir(local_directory_path):
destination_path = destination_folder + filename
filepath = os.path.join(local_directory_path, filename)
with open(filepath, "rb") as f:
_, err = self.client.upload_file(self.group_id, mount_id, f, destination_path=destination_path,)
if err != None:
self.logger.error(err)
sys.exit(4)
return destination_folder
This is what I modified it to as a test:
for root, dirs, files in os.walk(local_directory_path):
srcFile = (os.path.join(files))
srcRoot = (os.path.join(root))
rootSplit = os.path.normpath(srcRoot).split(os.path.sep)
srcDirs = '/'.join(rootSplit[4:])
src = str('fixLocalFolder') + '/' + str(srcDirs) +'/'+ (files)
dst = str(srcDirs) + '/' + (files)
destination_folder = str(srcRoot) + "/"
destination_path = str(destination_folder) + str(srcFile)
filepath = os.path.join((str(srcDirs), str(srcFile)))
with open(filepath, "rb") as f:
_, err = self.client.upload_file(
self.group_id,
mount_id,
f,
destination_path=destination_path,
)
if err != None:
self.logger.error(err)
sys.exit(4)
return destination_folder
I do not code for a living so I am sure I am not going about this the right way. I apologize for any code atrocities in advance. Thank you!
I do see some issues in that code, even without testing it. Something like the following might work for that loop. (Note! Untested!).
for root, dirs, files in os.walk(local_directory_path):
# Iterate through files in the currently processed directory
for current_file in files:
# Full path to file
src_file = os.path.join(root, current_file)
# Get the sub-path relative the original root.
sub_path = os.path.relpath(root, start=destination_folder)
# Get the destination path
destination_path = os.path.join(sub_path, current_file)
with open(src_file, "rb") as f:
_, err = self.client.upload_file(
self.group_id,
mount_id,
f,
destination_path=destination_path,
)
if err != None:
self.logger.error(err)
sys.exit(4)
I believe your central problem was misunderstanding what os.walk gives you. It gives you listing of each directory (and subdirectory), one after another.
Thus the values of one iterations might look like (when listing /mydir):
# First iteration:
root = "/mydir"
dirs = ["subdir", ...]
files = ["something.doc", "something else.txt"]
# Second iteration:
root = "/mydir/subdir"
dirs = ["sub-sub-dir1", ...]
files = ["file1.txt", "file2.txt", ...]
I wrote a script which will scan the shared folders and get the file age of all the files in those folders and some other file meta info.
my_paths = []
for p in ['\\\\data\\hold\\app1','\\\\data\\hold\\app2']:
for dirpath, dirnames, filenames in os.walk(p):
my_paths.append(filenames)
for filename in filenames:
full_path = os.path.join(dirpath, filename)
if last_modified(full_path) < threshold:
string_output='Age :' +age_file(full_path)+'\n'+'File Size : '+file_size(full_path)+'\n'
print(string_output)
#create(string_output);
I have to hard code all the shares in the code, instead I thought of making the script to read from a text file where I can update all the paths.
text.txt
\\data\hold\app1
\\data\hold\app2
revamp code,
my_paths = []
for line in open(r'd:\mydata\text.txt').readlines():
my_paths.append(line.strip())
print my_paths
for dirpath, dirnames, filenames in os.walk(line):
my_paths.append(filenames)
for filename in filenames:
full_path = os.path.join(dirpath, filename)
if last_modified(full_path) < threshold:
string_output='Age :' +age_file(full_path)+'\n'+'File Size : '+file_size(full_path)+'\n'
print(string_output)
#create(string_output);
my_paths returning the data like ['\\data\hold\app1','\\data\hold\app2'] but somehow the rest is not working.
You simply forgot to strip the line you read in the loop:
...
for dirpath, dirnames, filenames in os.walk(line.strip()):
...
I wrote a loop which ignores all sub-directories which contain .txt files within them.
src = raw_input("Enter source disk location: ")
src = os.path.abspath(src)
dst = raw_input("Enter first destination to copy: ")
dst = os.path.abspath(dst)
dest = raw_input("Enter second destination to move : ")
dest = os.path.abspath(dest)
path_patter = '(\S+)_(\d+)_(\d+)_(\d+)__(\d+)_(\d+)_(\d+)'
for dir, dirs, files in os.walk(src):
if any(f.endswith('.txt') for f in files):
dirs[:] = [] # do not recurse into subdirectories
continue
files = [os.path.join(dir, f) for f in files ]
for f in files:
part1 = os.path.dirname(f)
part2 = os.path.dirname(os.path.dirname(part1))
part3 = os.path.split(part1)[1]
path_miss1 = os.path.join(dst, "missing_txt")
path_miss = os.path.join(path_miss1, part3)
path_missing = os.path.join(dest, "missing_txt")
searchFileName = re.search(path_patter, part3)#### update
if searchFileName:#####update
try:
if not os.path.exists(path_miss):
os.makedirs(path_miss)
else:
pass
if os.path.exists(path_miss):
distutils.dir_util.copy_tree(part1, path_miss)
else:
debug_status += "missing_file\n"
pass
if (get_size(path_miss)) == 0:
os.rmdir(path_miss)
else:
pass
if not os.path.exists(path_missing):
os.makedirs(path_missing)
else:
pass
if os.path.exists(path_missing):
shutil.move(part1, path_missing)
else:
pass
if (get_size(path_missing)) == 0:
os.rmdir(path_missing)
else:
pass
except Exception:
pass
else:
continue
How to modify this code to compare directory name with regular expression in this case. (it has to ignore directories with .txt files)
import os
import re
def createEscapedPattern(path,pattern):
newPath = os.path.normpath(path)
newPath = newPath.replace("\\","\\\\\\\\")
return newPath + "\\\\\\\\" + pattern
def createEscapedPath(path):
newPath = os.path.normpath(path)
return newPath.replace("\\","\\\\")
src = 'C:\\Home\\test'
path_patter = '(\S+)_(\d+)_(\d+)_(\d+)__(\d+)_(\d+)_(\d+)$'
p = re.compile(createEscapedPattern(src,path_patter))
for dir, dirs, files in os.walk(src):
if any(f.endswith('.txt') for f in files):
dirs[:] = []
continue
if any(p.match(createEscapedPath(dir)) for f in files):
for f in files:
print createEscapedPath(dir + "/" + f)
p = re.compile(createEscapedPattern(dir,path_patter))
There are a couple of things i did here and hope this example helps
I wrote this for windows fs so used two path convert functions.
This script ignores dirs with .txt files like you implemented it
This script will start at the directory you start the script and will only print file names if the pattern matches. This is done for all subdirectory's that are not ignored by the previous rule.
Used regex in python and made it compile again for each directory so you get: 'directory/(\S+)(\d+)(\d+)_(\d+)__(\d+)(\d+)(\d+)$'
Im trying to put into an array files[] the paths of each file from the Data folder but when I try to go into subfolders I want it to be able to go down to the end of the Data file, for example I can read files in a subfolder of the main folder Data which im trying to get a list of all the paths of each file into an array but it doesn't go deeper it does not access the subfolder of the subfolder of Data without writing a loop. Want I want is a loop which has infinit depth of view of files in the Data folder so I can get all the file paths.
For example this is what I get:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt']
This is what I want but it can still go into deeper folders:
['Data/DataReader.py', 'Data/DataReader - Copy.py', 'Data/Dat/DataReader.py', 'Data/fge/er.txt', 'Data/fge/Folder/dummy.png', 'Data/fge/Folder/AnotherFolder/data.dat']
This is my current path, what would i need to add or change?
import os
from os import walk
files = []
folders = []
for (dirname, dirpath, filename) in walk('Data'):
folders.extend(dirpath)
files.extend(filename)
break
filecount = 0
for i in files:
i = 'Data/' + i
files[filecount] = i
filecount += 1
foldercount = 0
for i in folders:
i = 'Data/' + i
folders[foldercount] = i
foldercount += 1
subfolders = []
subf_files = []
for i in folders:
for (dirname, dirpath, filename) in walk(i):
subfolders.extend(dirpath)
subf_files.extend(filename)
break
subf_files_count = 0
for a in subf_files:
a = i + '/'+a
files = files
files.append(a)
print files
subf_files = []
print files
print folders
Thanks a lot!
Don't understand what are your trying to do, especially why you break your walk after the first element:
import os
files = []
folders = []
for (path, dirnames, filenames) in os.walk('Data'):
folders.extend(os.path.join(path, name) for name in dirnames)
files.extend(os.path.join(path, name) for name in filenames)
print files
print folders