Joining 2 PDFs using Python join.py results in blank PDF - python

When I run join.py on computer A, I get a properly joined 2 page PDF. When running it on computers B and C, I get a one page blank PDF. All three computers are MacBook Pros running 10.10.1
Running the following command from Script Editor, no errors occur:
do shell script "python '/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py' -o '/Volumes/SSD/Users/username/Desktop/CD123AD9-77DB-4678-B301-9498CFD4E344/Welcome Packet.pdf' /Volumes/SSD/Users/username/Desktop/CD123AD9-77DB-4678-B301-9498CFD4E344/*.pdf"
Any ideas on what is causing the joined PDF to come out as 1 page and blank?

This is an old post, but i had the exact same problem. After a lot of different attempts, eventually i managed to solve it (with the help of SO users ofc - thanks Marcomdm) in the following way:
I created a string with the necessary command parameters laid out:
stringToCall = "%s %s %s%s" % ("./join.py -o", endPdf, pdfInputDir,"/*.pdf")
os.chdir(/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/)
os.system(stringToCall)
where endPdf is the final pdf name coming from
endPdf = os.path.join (pdfOutputDir, pdfName+".pdf")
Worked like a charm. Hope this helps.

Related

Error when trying to merge two slide slides

How do you merge two slide slides saved as a PDF? Specifically, I want to know how to add a second slide to a first slide. I tried copying the code in a Python Udemy course while changing the variables' names, I found one solution online that looked very, very not concise, and I glanced at a stack overflow thread that ended in there maybe not being a solution. I got an error four times when running Udemy code that one of my PDFs was corrupted and never got a slide added to another slide. The expected output is a pdf file with two slides, one from each pdf file. Here's a very rough excerpt of my code (attached). Note, the practice presentations I tried to merge I called ~sept-1.pdf and second-page.pdf + I changed the username in the file path to eliminate my nickname.
Goal: Add the one and only slide from pdf 2 and add it to the end of pdf 1.
Import modules
!pip install PyPDF4 then import PyPDF4
Change directory to desktop
import os
os.getcwd()
os.chdir('/Users/mac_username_name/Desktop')
os.getcwd()
Stuff after chaging the directory to the desktop
Adding to my desired PDF
f = open('sept-1.pdf','rb')
pdf_reader = PyPDF4.PdfFileReader(f)
first_page = pdf_reader.getPage(0)
pdf_writer = PyPDF4.PdfFileWriter()
pdf_writer.addPage(first_page)
pdf_output = open("newer.pdf","wb")
pdf_writer.write(pdf_output)
f.close()

Python lowriter call getting different result than from terminal

So, I'm trying to use lowriter to convert some files from doc/docx to pdf inside my program. When I run it like a sub process I get 'Application Error' like below
>>rest = subprocess.getoutput('lowriter --convert-to pdf --outdir ./data/temp_convert "<FILE_PATH>/file.docx"')
>>rest
'Application Error'
But when I run it directly form the terminal everything works fine:
$ lowriter --convert-to pdf --outdir ./data/temp_convert "<FILE_PATH>/file.docx"
convert <FILE_PATH>/file.docx -> ./data/temp_convert/file.pdf using filter : writer_pdf_Export
Any idea what might be causing this?
Ps. This is an old script that worked before last week and I changed nothing since the last working build

What happened when I used pandas to read csv files for multiple time in kaggle's notebook?

I am participating the kaggle's NCAA March Madness Anlytics Competion. I used pandas to read the information from csv files but encountered such a problem:
seeds = pd.read_csv('/kaggle/input/march-madness-analytics-2020/2020DataFiles/2020DataFiles/2020-Womens-Data/WDataFiles_Stage1/WNCAATourneySeeds.csv')
seeds
Here the output is empty. And I tried again like this:
rank = seeds.merge(teams)
Then there came an error:
NameError: name 'seeds' is not defined.
I can't figure out what happened and I tried it offline which turned out that nothing happened. Do I miss anything? And how can I fix it? Note that this was not the first time I used the read_csv() to read data from csv file in this notebook, though I couldn't figure out whether there is relation between this trouble and my situation.
You must put the CSV file in the folder where python saves projects.
Run this to find out the destination:
%pwd
Put the file in the destination and run this:
seeds = pd.read_csv('WNCAATourneySeeds.csv')
You can also run this:
seeds = pd.read_csv(r'C:\Users....\WNCAATourneySeeds.csv')
Where "C" is the disk where your file is saved and replace "..." by the computer path where the file is saved. Use also "\" not "/".
I finally found the problem. I didn't notice I was writing my codes in the markdown cell. Stupid me!

Error in my python script produces 2 - 3 times too many jpgs (pdf2image) sometimes, but not always

I am using pdf2image to change pdfs to jpgs in about 1600 folders. I have looked around and adapted code from many SO answers, but this one section seems to be overproducing jpgs in certain folders (hard to tell which).
In one particular case, using an Adobe Acrobat tool to make pdfs creates 447 jpgs (correct amount) but my script makes 1059. I looked through and found some pdf pages are saved as jpgs multiple times and inserted into the page sequences of other pdf files.
For example:
PDF A has 1 page and creates PDFA_page_1.jpg.
PDF B has 44 pages and creates PDFB_page_1.jpg through ....page_45.jpg because PDF A shows up again as page_10.jpg. If this is confusing, let me know.
I have tried messing with the index portion of the loop (specifically, taking the +1 away, using pages instead of page, placing the naming convention as a variable rather than directly into the .save and .move functions.
I also tried using the fmt='jpg' parameter in pdf2image.py but was unable to produce the correct naming scheme because I am unsure how to iterate the page numbers without the for page in pages loop.
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith(".pdf") and pdf_file.startswith("602024"):
#Convert function from pdf2image
pages = convert_from_path(pdf_file, 72, output_folder=final_directory)
print(pages)
pdf_file = pdf_file[:-4]
for page in pages:
#save with designated naming scheme <pdf file name> + page index
jpg_name = "%s-page_%d.jpg" % (pdf_file,pages.index(page)+1)
page.save(jpg_name, "JPEG")
#Moves jpg to the mini_jpg folder
shutil.move(jpg_name, 'mini_jpg')
#no_Converted += 1
# Delete ppm files
dir_name = final_directory
ppm_remove_list = os.listdir(dir_name)
for ppm_file in ppm_remove_list:
if ppm_file.endswith(".ppm"):
os.remove(os.path.join(dir_name, ppm_file))
There are no error messages, just 2 - 3 times as many jpgs as I expected in just SOME cases. Folders with many single-page pdfs do not experience this problem, nor do folders with a single multi-page pdf. Some folders with multiple multi-page pdfs also function correctly.
If you can create a reproducible example, feel free to open an issue on the official repository, I am not sure that I understand how that could happen: https://github.com/Belval/pdf2image
Do provide PDF examples otherwise, I can't test.
As an aside, instead of pages.index use for i, page in enumerate(pages) and page number will be i + 1.

How do I run a script in the command line that is saved in Jupyter Notebook?

I don't know how to run any code I write. If its a simple practice problem that just spits out output values then of course it works, but when I practice something I would actually use like a web scraper that requires a google search input I don't know how run it so it can accept an input. This specific example is from 'automate the boring stuff' chapter 11 'I'm feeling lucky' google search problem. I work out of jupyter notebook and research suggests that I need to run my code in the command line. If my file name is 'Google', what do I type into the command line to make this work if I am for example want to search for the word 'dog'? I've look all over the internet for answers and can't seem to find them, most likely because the explanation is going over my head. Any help is appreciated.
#! python3
#Opens several Google search results.
import os
import requests, sys, webbrowser, bs4
import os
print('Googling...')
res =requests.get('http://google.com/search?q=' + 'Soup'.join(sys.argv[1:]))
res.raise_for_status()
soup =bs4.BeautifulSoup(res.text)
linkElems = soup.select('.r a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
webbrowser.open('http://google.com' + linkElems[i].get('href'))
There is no error message when I run the script in my jupyter notebook. It just prints 'Googling...' as expected, but does not allow for a google search input.

Categories

Resources