Merge PDF files with same prefix using PyPDF2 Python - python

I have multiple PDF files that have different prefixes. I want to merge these pdf files based on the third prefix (third value in the underscore). I want to do this using python library PyPDF2.
This is the error message
Traceback (most recent call last):
File "C:/test2.py", line 12, in <module>
merger.append(filename)
File "C:\py\lib\site-packages\PyPDF2\merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\py\lib\site-packages\PyPDF2\merger.py", line 114, in merge
fileobj = file(fileobj, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '0_2021_564495_12345.pdf'
Process finished with exit code 1
For example:
0_2021_1_123.pdf
0_2021_1_1234.pdf
0_2021_1_12345.pdf
0_2021_2_123.pdf
0_2021_2_1234.pdf
0_2021_2_12345.pdf
Expected outcome
1_merged.pdf
2_merged.pdf
Here is what i tried but i am getting an error and it is not working. Any help is much appreciated.
from PyPDF2 import PdfFileMerger
import io
import os
files = os.listdir("C:\\test\\raw")
x=0
merger = PdfFileMerger()
for filename in files:
print(filename.split('_')[2])
prefix = filename.split('_')[2]
if filename.split('_')[2] == prefix:
merger.append(filename)
merger.write("C:\\test\\result" + prefix + "_merged.pdf")
merger.close()

Related

Merge PDF Files using python PyPDF2

I have watched a video to learn how to merge PDF files into one PDF file. I tried to modify a little in the code so as to deal with a folder which has the PDF files
The main folder (Spyder) has the Demo.py and this is the code
import os
from PyPDF2 import PdfFileMerger
source_dir = os.getcwd() + './PDF Files'
merger = PdfFileMerger()
for item in os.listdir(source_dir):
if item.endswith('pdf'):
merger.append(item)
merger.write('.PDF Files/Output/Complete.pdf')
merger.close()
I have a subfolder named PDF Files into the main folder Spyder and in this subfolder I put the PDF files and inside the subfolder PDF Files I created a folder named Output.
I got error file not found as for the 1.pdf although when printing the item inside the loop, I got the PDF names.
The Traceback of error
Traceback (most recent call last):
File "demo.py", line 9, in <module>
merger.append(item)
File "C:\Users\Future\AppData\Local\Programs\Python\Python36\lib\site-packages\PyPDF2\merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\Users\Future\AppData\Local\Programs\Python\Python36\lib\site-packages\PyPDF2\merger.py", line 114, in merge
fileobj = file(fileobj, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '1.pdf'
I could solve it like that
import os
from PyPDF2 import PdfFileMerger
source_dir = './PDF Files/'
merger = PdfFileMerger()
for item in os.listdir(source_dir):
if item.endswith('pdf'):
#print(item)
merger.append(source_dir + item)
merger.write(source_dir + 'Output/Complete.pdf')
merger.close()

Merging PDF files with Python

I have been trying to debug this code for merging a folder of pdf's into one pdf file:
import os
from PyPDF2 import PdfFileMerger
loc = "C:\\Users\\anzal\\desktop\\pdf"
x = [a for a in os.listdir(loc) if a.endswith(".pdf")]
print(x)
merger = PdfFileMerger()
for pdf in x:
merger.append(open(pdf,'rb'))
with open("result.pdf", "wb") as fout:
merger.write(fout)
But it doesn't recognize the pdf files - I get the following error:
['A1098e.pdf', 'J1098e.pdf']
Traceback (most recent call last):
File "combopdf.py", line 14, in <module>
merger.append(open(pdf,'rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'A1098e.pdf'
Any ideas on how to fix this? Thanks.
Use absolute paths:
loc = "C:\\Users\\anzal\\desktop\\pdf"
x = [loc+"\\"+a for a in os.listdir(loc) if a.endswith(".pdf")]
^^^^^^^^
add this
Right now it's looking for the .pdf files in the directory from which the script is being ran, and I'm pretty sure that's not C:/Users/anzal/desktop/pdf.

Why does do I have an IO error saying my file doesn't exist even though it does exist in the directory?

I am trying to loop over a Python directory, and I have a specific file that happens to be the last file in the directory such that I get an IOerror for that specific file.
The error I get is:
IOError: [Errno 2] No such file or directory: 'nod_gyro_instance_11_P_4.csv'
My script:
for filename in os.listdir("/Users/my_name/PycharmProjects/My_Project/Data/Nod/Gyro"):
data = []
if filename.endswith(".csv"):
data.append(k_fold(filename))
continue
else:
continue
k_fold does this:
def k_fold(myfile, myseed=11109, k=20):
# Load data
data = open(myfile).readlines()
The entire traceback:
Traceback (most recent call last):
File "/Users/my_name/PycharmProjects/MY_Project/Cross_validation.py", line 30, in <module>
data.append(k_fold(filename))
File "/Users/my_name/PycharmProjects/My_Project/Cross_validation.py", line 8, in k_fold
data = open(myfile).readlines()
IOError: [Errno 2] No such file or directory: 'nod_gyro_instance_11_P_4.csv'
My CSV files are such:
nod_gyro_instance_0_P_4.csv
nod_gyro_instance_0_P_3.csv
nod_gyro_instance_0_P_2.csv
nod_gyro_instance_0_P_5.csv
...
nod_gyro_instance_11_P_4.csv
nod_gyro_instance_10_P_6.csv
nod_gyro_instance_10_P_5.csv
nod_gyro_instance_10_P_4.csv
Why doesn't it recognize my nod_gyro_instance_10_P_4.csv file?
os.listdir returns just filenames, not absolute paths. If you're not currently in that same directory, trying to read the file will fail.
You need to join the dirname onto the filename returned:
data_dir = "/Users/my_name/PycharmProjects/My_Project/Data/Nod/Gyro"
for filename in os.listdir(data_dir):
k_fold(os.path.join(data_dir, filename))
Alternatively, you could use glob to do both the listing (with full paths) and extension filtering:
import glob
for filename in glob.glob("/Users/my_name/PycharmProjects/My_Project/Data/Nod/Gyro/*.csv"):
k_fold(filename)

creating and saving a pdf in reportlab returns an error

I'm trying to create a pdf using reportlab but I keep getting an error, ideally I want to save the created pdf to a specific directory but this is just for testing and the save function only saves it to the current working directory.
import os
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
folder_path = "/home/ro/A Python Scripts/dest_test/"
folder_name = os.path.basename(folder_path)
pdf_name = folder_name + '.py'
def generate_pdf(folder_paths, folder_names, speedy_share_links):
c = canvas.Canvas(folder_names)
c.drawString(100,780,folder_names)
c.drawString(100,750,speedy_share_links)
c.save()
generate_pdf(folder_path, folder_name, "hiya")
I get the following error
Traceback (most recent call last):
File "pdf.py", line 16, in <module>
generate_pdf(folder_path, folder_name, "hiya")
File "pdf.py", line 14, in generate_pdf
c.save()
File "/usr/lib/python2.7/dist-packages/reportlab/pdfgen/canvas.py", line 1209, in save
self._doc.SaveToFile(self._filename, self)
File "/usr/lib/python2.7/dist-packages/reportlab/pdfbase/pdfdoc.py", line 216, in SaveToFile
f = open(filename, "wb")
IOError: [Errno 2] No such file or directory: u''
Your path has an empty basename, which is why the error shows the empty string.
Replace the line:
folder_path = "/home/ro/A Python Scripts/dest_test/"
With the line:
folder_path = "/home/ro/A Python Scripts/dest_test/foobar.pdf"
and your program will generate foobar.pdf in the current directory.

Parse XML Tag value for all files in directory using Python

I can't quite make the leap despite pre-existing similar questions. Help would be valued!
I am trying to recursively parse all xml files in the directory/sub directory
I am looking for the value that appears for the tag "Operator id"
Example source XML:
<Operators>
<Operator id="OId_LD">
<OperatorCode>LD</OperatorCode>
<OperatorShortName>ARRIVA THE SHIRES LIMIT</OperatorShortName>
This is the code I have thus far:
from xml.dom.minidom import parse
import os
def jarv(target_folder):
for root,dirs,files in os.walk(target_folder):
for targetfile in files:
if targetfile.endswith(".xml"):
print targetfile
dom=parse(targetfile)
name = dom.getElementsByTagName('Operator_id')
print name[0].firstChild.nodeValue
This is the terminal command I am running:
python -c "execfile('xml_tag.py'); jarv('/Users/admin/Projects/AtoB_GTFS')"
And this is the error I receive:
tfl_64-31_-37434-y05.xml
encodings.xml
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "xml_tag.py", line 8, in jarv
dom=parse(targetfile)
File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/local/Cellar/python/2.7.8_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/expatbuilder.py", line 922, in parse
fp = open(file, 'rb')
IOError: [Errno 2] No such file or directory: 'encodings.xml'
(frigo)andytmac:AtoB_GTFS admin$ python -c "execfile('xml_tag.py'); jarv('/Users/admin/Projects/AtoB_GTFS')"
tfl_64-31_-37434-y05.xml
If I comment out the code after the 'print target file' line it does list all the xml files I have.
Thanks for your assistance,
Andy
You're not looking at the right place (relative path) : when you use for root, dirs, files in os.walk(target_folder):, files is a list of the file names in the directory root, and not their absolute path.
Try remplacing dom=parse(targetfile) by dom = parse(os.sep.join(root, targetfile))

Categories

Resources