Merge PDF Files using python PyPDF2 - python

I have watched a video to learn how to merge PDF files into one PDF file. I tried to modify a little in the code so as to deal with a folder which has the PDF files
The main folder (Spyder) has the Demo.py and this is the code
import os
from PyPDF2 import PdfFileMerger
source_dir = os.getcwd() + './PDF Files'
merger = PdfFileMerger()
for item in os.listdir(source_dir):
if item.endswith('pdf'):
merger.append(item)
merger.write('.PDF Files/Output/Complete.pdf')
merger.close()
I have a subfolder named PDF Files into the main folder Spyder and in this subfolder I put the PDF files and inside the subfolder PDF Files I created a folder named Output.
I got error file not found as for the 1.pdf although when printing the item inside the loop, I got the PDF names.
The Traceback of error
Traceback (most recent call last):
File "demo.py", line 9, in <module>
merger.append(item)
File "C:\Users\Future\AppData\Local\Programs\Python\Python36\lib\site-packages\PyPDF2\merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\Users\Future\AppData\Local\Programs\Python\Python36\lib\site-packages\PyPDF2\merger.py", line 114, in merge
fileobj = file(fileobj, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '1.pdf'

I could solve it like that
import os
from PyPDF2 import PdfFileMerger
source_dir = './PDF Files/'
merger = PdfFileMerger()
for item in os.listdir(source_dir):
if item.endswith('pdf'):
#print(item)
merger.append(source_dir + item)
merger.write(source_dir + 'Output/Complete.pdf')
merger.close()

Related

Unable to open or read log files: FileNotFoundError [duplicate]

I'm trying to run the following script which simply reads and image and saves it again:
from PIL import Image
import os
rootdir = '/home/user/Desktop/sample'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
im = Image.open(file)
im.save(file)
I however get the following error:
Traceback (most recent call last):
File "test.py", line 10, in <module>
im = Image.open(file)
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2258, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '1.jpg'
So, what I'm trying to do is simply read the file 1.jpg and save it again, provided that 1.jpg is located in the directory.
How can I fix this issue?
Thanks.
You're going to need to provide a fully qualified path, because file holds only the tail, not the entire path.
You can use os.path.join to join the root to the tail:
for root, dirs, files in os.walk(rootdir):
for file in files:
path = os.path.join(root, file)
im = Image.open(path)
im.save(path)

Merge PDF files with same prefix using PyPDF2 Python

I have multiple PDF files that have different prefixes. I want to merge these pdf files based on the third prefix (third value in the underscore). I want to do this using python library PyPDF2.
This is the error message
Traceback (most recent call last):
File "C:/test2.py", line 12, in <module>
merger.append(filename)
File "C:\py\lib\site-packages\PyPDF2\merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\py\lib\site-packages\PyPDF2\merger.py", line 114, in merge
fileobj = file(fileobj, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '0_2021_564495_12345.pdf'
Process finished with exit code 1
For example:
0_2021_1_123.pdf
0_2021_1_1234.pdf
0_2021_1_12345.pdf
0_2021_2_123.pdf
0_2021_2_1234.pdf
0_2021_2_12345.pdf
Expected outcome
1_merged.pdf
2_merged.pdf
Here is what i tried but i am getting an error and it is not working. Any help is much appreciated.
from PyPDF2 import PdfFileMerger
import io
import os
files = os.listdir("C:\\test\\raw")
x=0
merger = PdfFileMerger()
for filename in files:
print(filename.split('_')[2])
prefix = filename.split('_')[2]
if filename.split('_')[2] == prefix:
merger.append(filename)
merger.write("C:\\test\\result" + prefix + "_merged.pdf")
merger.close()

Merging PDF files with Python

I have been trying to debug this code for merging a folder of pdf's into one pdf file:
import os
from PyPDF2 import PdfFileMerger
loc = "C:\\Users\\anzal\\desktop\\pdf"
x = [a for a in os.listdir(loc) if a.endswith(".pdf")]
print(x)
merger = PdfFileMerger()
for pdf in x:
merger.append(open(pdf,'rb'))
with open("result.pdf", "wb") as fout:
merger.write(fout)
But it doesn't recognize the pdf files - I get the following error:
['A1098e.pdf', 'J1098e.pdf']
Traceback (most recent call last):
File "combopdf.py", line 14, in <module>
merger.append(open(pdf,'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'A1098e.pdf'
Any ideas on how to fix this? Thanks.
Use absolute paths:
loc = "C:\\Users\\anzal\\desktop\\pdf"
x = [loc+"\\"+a for a in os.listdir(loc) if a.endswith(".pdf")]
^^^^^^^^
add this
Right now it's looking for the .pdf files in the directory from which the script is being ran, and I'm pretty sure that's not C:/Users/anzal/desktop/pdf.

tarfile doesn't work for .gz files

I have a nested tarfile in the form of
tarfile.tar.gz
--tar1.gz
--tar1.txt
--tar2.gz
--tar3.gz
I wanted to write a little script in python to extract all tars breadth first in to the same order of folders i.e. tar1.txt should lie in tarfile/tar1/
Here's the script,
#!/usr/bin/python
import os
import re
import tarfile
data = os.path.join(os.getcwd(), 'data')
dirs = [data]
while len(dirs):
dirpath = dirs.pop(0)
for subpath in os.listdir(dirpath):
if not re.search('(.tar)?.gz$', subpath):
continue
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
tarf.extractall(path=dirpath)
for subpath in os.listdir(dirpath):
newpath = os.path.join(dirpath, subpath)
if os.path.isdir(newpath):
dirs.append(newpath)
elif dirpath != data or os.path.islink(newpath):
os.remove(newpath)
But when i run the script I get the following error:
Traceback (most recent call last):
File "./extract.py", line 16, in <module>
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
File "/usr/lib/python2.7/tarfile.py", line 1678, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
The '.tar.gz' file is extracted fine but not the nested '.gz' files. What's up here? Does tarfile module not handle .gz files?
.gz denotes that the file is gzipped; .tar.gz means a tar file that has been gzipped. tarfile handles gzipped tars perfectly well, but it doesn't handle files that aren't tar archives (like your tar1.gz).

Python zipfile crashes gives an error for some files

I have a simple code to zip files using zipfile module. I am able to zip some files but I get FileNotFound error for the others. I have checked if this is file size error but its not.
I can pack files with name like example file.py but when I have a file inside a directory like 'Analyze Files en-US_es-ES.xlsx' if fails.
It works when I change os.path.basename to os.path.join but I don't want to zip whole folder structure, I want to have flat structure in my zip.
Here is my code:
import os
import zipfile
path = input()
x=zipfile.ZipFile('new.zip', 'w')
for root, dir, files in os.walk(path):
for eachFile in files:
x.write(os.path.basename(eachFile))
x.close()
Error looks like this:
Traceback (most recent call last):
File "C:/Users/mypc/Desktop/Zip test.py", line 15, in <module>
x.write(os.path.basename(eachFile))
File "C:\Python34\lib\zipfile.py", line 1326, in write
st = os.stat(filename)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'Analyze Files en-US_ar-SA.xlsx'*
Simply change working directory to add file without original directory structure.
import os
import zipfile
path = input()
baseDir = os.getcwd()
with zipfile.ZipFile('new.zip', 'w') as z:
for root, dir, files in os.walk(path):
os.chdir(root)
for eachFile in files:
z.write(eachFile)
os.chdir(baseDir)

Categories

Resources