How to save web page as pdf automatically in Selenium python

How to save web page as pdf automatically in Selenium python - python

I'm trying to save a web page as a PDF but all I get is a file name selection window. How to automatically enter a file name and save it?
settings = {
"appState": {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
"margin": 0,
'size': 'auto'
}],
"selectedDestinationId": "Save as PDF",
"version": 2,
"margin": 0,
'size': 'auto'
}
}
#There is probably a lot of excess here, I tried to use everything that can help
prefs = {'printing.print_preview_sticky_settings': json.dumps(settings),
'profile.default_content_settings.popups': 0,
'download.name': 'test.pdf', #It doesn't work(
'download.default_directory': download_path,
'savefile.default_directory': download_path,
'download.prompt_for_download': False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": True,
"download.extensions_to_open": "",
"plugins.always_open_pdf_externally": True,
}
options.add_experimental_option('prefs', prefs)
options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(service=ser, options=options)
driver.maximize_window()
driver.get('url')
driver.execute_script('window.print();')
time.sleep(20)
I couldn't find a solution on the internet, I tried every possible option but it doesn't work for me.

There is no built-in function in Selenium that allows you to save a web page as a PDF. However, you can use a third-party tool, such as wkhtmltopdf, to accomplish this.
Install wkhtmltopdf
Download the wkhtmltopdf binaries from the official website and install them on your system.
Add wkhtmltopdf to your PATH
Add the wkhtmltopdf binary to your system PATH so that Selenium can find it.
Use the save_as_pdf function
The save_as_pdf function takes a Selenium webdriver instance and a filename as arguments and saves the current page as a PDF.
def save_as_pdf(driver, filename): driver.execute_script('window.print();') sleep(5) with open(filename, 'wb') as file: file.write(driver.page_source.encode('utf-8'))

I was able to solve this problem using the pyautogui library. Although I think that this is not the best solution
import pyautogui as pag
driver.execute_script('window.print();')
time.sleep(20)
pag.typewrite('test.pdf')
time.sleep(1)
pag.press("enter")
time.sleep(20)

Related

Custom Download folder for chromedriver in python

I want to save the files I get with a scraper into custom folders. I looked around and none of the solutions I found worked for me. Here is my configuration:
options = webdriver.ChromeOptions()
prefs = {
'profile.default_content_settings.popups': 0,
'download.default_directory': my_data_folder,
"download.directory_upgrade": True,
"download.prompt_for_download": False,
"safebrowsing.enabled":False,
}
options.add_argument('--remote-debugging-port=9222')
options.add_experimental_option("useAutomationExtension", False)
desired_caps = {
'prefs': {
'savefile': {
'default_directory': my_data_folder,
"directory_upgrade": True,
"extensions_to_open": ""
}
}
}
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(chrome_options=options, desired_capabilities=desired_caps)
But when I try downloading, it goes to ~/Downloads/ instead of my_data_folder.
I have tried prefs and desired_caps independently to no avail.
I am using Chromium 108.0.5359.22 snap
Help is appreciated !
I have tried:
How to download to a specific folder with Chromedriver?
Define download directory for chromedriver selenium with python
and many other posts and blogs. All these solutions are summarised in the script above.
Thanks !
UPDATE
It works if I add
options.add_argument("--headless")
The folder option works, but this is not desirable for other reasons. Is there a better way to fix this problem?

Have you thought about using shutil to move the file after the download ?
Here's how I had that implemented in another project I was working on
filename = max([
download_folder + "\\" + f for f in os.listdir(download_folder)],
key=os.path.getctime)
shutil.move(
filename,
os.path.join(download_folder,f"filename.format")
)

Python selenium print page problem with page layout

I have encountered a problem with python selenium while I was trying to print (save as pdf) a bunch of pages from an essay with a python program that I wrote and the selenium plugin for it. The problem is that the content of some pages does not fit on one A4 page, and the automatic "page-breaking" cuts the pages in a very silly way. As you can see in the attached image of the pdf, the bottom row of the first page is cut in half (therefore cannot be read) and the top 3-4 row of the next page's content is also missing (due to the margin on the top of the page, i guess?). I would like to implement a solution in my code, so the "page-breaking" is done correctly (every row shows).
The pdf is not in English, but I think it is irrelevant regarding the problem.
The parts regarding the settings of the print in my code:
chrome_options = webdriver.ChromeOptions()
settings = {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
}],
"selectedDestinationId": "Save as PDF",
"version": 2,
"isHeaderFooterEnabled": False
}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
browser = webdriver.Chrome(chrome_options=chrome_options)
And i call the print like this:
browser.execute_script('window.print();')
A possible solution would be a larger page size, but I need A4 size because I would like to print it at home.
an occurrence of the bug here

How to Save as PDF with legal size document using Python and Selenium

I have a working script using Python, Selenium, and the Chrome webdriver to save webpages as PDFs. However, I need to save them on legal sized documents (216 x 356 mm), while my current script only saves files in letter size (216 x 279 mm).
Here's the code that I currently have:
# Attach printing options to webdriver
app_state = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"isCssBackgroundEnabled": True,
"isHeaderFooterEnabled": False,
"isLandscapeEnabled": True,
"version": 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(app_state)
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(options=chrome_options)
Is there a way to save documents using legal size (or change the paper size in any way)?
I've been searching for other prefs and options to change the paper setting and/or dimensions, but haven't had any luck at all.
Thanks!

if you add
"mediaSize": {"height_microns": 355600, "width_microns": 215900}
in the appState dictionary, you should get the paper size set to legal.
If you want to change to any other size (that is in the list) you can google the dimensions and convert to microns, or inspect the dropdown that allows you to choose the paper-size, and then search for your desired size in the options, and copy/paste the values.
For this solution to work, the values have to match exactly the ones in the dropdown, otherwise it won't select and default to Letter.

How do i disable headers and footers Selenium Printing

Does anyone know how to disable the "headers and footers" option while printing in selenium? It's by default set to true, Anyone know how to fix this? Thank you!!!
import json
import os
from selenium import webdriver
# setting html path
htmlPath = os.getcwd() + "\\sample.html"
addr = "file:///" + htmlPath
# setting Chrome Driver
chromeOpt = webdriver.ChromeOptions()
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chromeOpt.add_experimental_option('prefs', prefs)
chromeOpt.add_argument('--kiosk-printing')
driver = webdriver.Chrome('.\\bin\\chromedriver', options=chromeOpt)
# HTML open and print
driver.get(addr)
driver.execute_script('return window.print()')```

Just add "isHeaderFooterEnabled": False to your appstate this below change.
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2,
"isHeaderFooterEnabled": False
}
Screenshot : commented chromeOpt.add_argument('--kiosk-printing') line just to show the de-selected option in screenshot.
You can find the details about the chromium options on the below page
https://github.com/chromium/chromium/blob/eadef3f685cd9e96e94fcb9645b6838b6d0907a8/chrome/browser/resources/print_preview/data/model.js

This can be done using Selenium by following pyppeteer examples.
You will need to be able to send commands to chromium and call Page.printToPDF Devtools API as shown in the following snip:
result = send_cmd(driver, "Page.printToPDF", params={
'landscape': False
,'margin':{'top':'1cm', 'right':'1cm', 'bottom':'1cm', 'left':'1cm'}
,'format': 'A4'
,'displayHeaderFooter': False
,'scale': 1
})
with open(out_path_full, 'wb') as file:
file.write(base64.b64decode(result['data']))
I've included a full example in my GitHub Repo with more settings that are available.

PDF printing from Selenium with chromedriver

I am trying to implement printing html/css contents as PDF with Selenium, chromedriver and python.
I could printing with a below code, but I cannot change printing setting. I would like to print in Letter size and no header/footer. Official information chromedriver or Selenium doesn't tell me a lot, so I'm in trouble. Does anyone know that how printing setting can be changed or it can never be done.
import json
import os
from selenium import webdriver
# setting html path
htmlPath = os.getcwd() + "\\sample.html"
addr = "file:///" + htmlPath
# setting Chrome Driver
chromeOpt = webdriver.ChromeOptions()
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chromeOpt.add_experimental_option('prefs', prefs)
chromeOpt.add_argument('--kiosk-printing')
driver = webdriver.Chrome('.\\bin\\chromedriver', options=chromeOpt)
# HTML open and print
driver.get(addr)
driver.execute_script('return window.print()')

Add --headless and try it like this:
pdf = driver.execute_cdp_cmd("Page.printToPDF", {
"printBackground": True
})
import base64
with open("file.pdf", "wb") as f:
f.write(base64.b64decode(pdf['data']))
Here are some options you can fiddle with

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to save web page as pdf automatically in Selenium python - python

I was able to solve this problem using the pyautogui library. Although I think that this is not the best solution import pyautogui as pag driver.execute_script('window.print();') time.sleep(20) pag.typewrite('test.pdf') time.sleep(1) pag.press("enter") time.sleep(20)

Related

Custom Download folder for chromedriver in python

Python selenium print page problem with page layout

How to Save as PDF with legal size document using Python and Selenium

How do i disable headers and footers Selenium Printing

PDF printing from Selenium with chromedriver

Categories

Resources