Python selenium print page problem with page layout - python

I have encountered a problem with python selenium while I was trying to print (save as pdf) a bunch of pages from an essay with a python program that I wrote and the selenium plugin for it. The problem is that the content of some pages does not fit on one A4 page, and the automatic "page-breaking" cuts the pages in a very silly way. As you can see in the attached image of the pdf, the bottom row of the first page is cut in half (therefore cannot be read) and the top 3-4 row of the next page's content is also missing (due to the margin on the top of the page, i guess?). I would like to implement a solution in my code, so the "page-breaking" is done correctly (every row shows).
The pdf is not in English, but I think it is irrelevant regarding the problem.
The parts regarding the settings of the print in my code:
chrome_options = webdriver.ChromeOptions()
settings = {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
}],
"selectedDestinationId": "Save as PDF",
"version": 2,
"isHeaderFooterEnabled": False
}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings)}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
browser = webdriver.Chrome(chrome_options=chrome_options)
And i call the print like this:
browser.execute_script('window.print();')
A possible solution would be a larger page size, but I need A4 size because I would like to print it at home.
an occurrence of the bug here

Related

How to save web page as pdf automatically in Selenium python

I'm trying to save a web page as a PDF but all I get is a file name selection window. How to automatically enter a file name and save it?
settings = {
"appState": {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
"margin": 0,
'size': 'auto'
}],
"selectedDestinationId": "Save as PDF",
"version": 2,
"margin": 0,
'size': 'auto'
}
}
#There is probably a lot of excess here, I tried to use everything that can help
prefs = {'printing.print_preview_sticky_settings': json.dumps(settings),
'profile.default_content_settings.popups': 0,
'download.name': 'test.pdf', #It doesn't work(
'download.default_directory': download_path,
'savefile.default_directory': download_path,
'download.prompt_for_download': False,
"download.directory_upgrade": True,
"safebrowsing_for_trusted_sources_enabled": False,
"safebrowsing.enabled": True,
"download.extensions_to_open": "",
"plugins.always_open_pdf_externally": True,
}
options.add_experimental_option('prefs', prefs)
options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(service=ser, options=options)
driver.maximize_window()
driver.get('url')
driver.execute_script('window.print();')
time.sleep(20)
I couldn't find a solution on the internet, I tried every possible option but it doesn't work for me.
There is no built-in function in Selenium that allows you to save a web page as a PDF. However, you can use a third-party tool, such as wkhtmltopdf, to accomplish this.
Install wkhtmltopdf
Download the wkhtmltopdf binaries from the official website and install them on your system.
Add wkhtmltopdf to your PATH
Add the wkhtmltopdf binary to your system PATH so that Selenium can find it.
Use the save_as_pdf function
The save_as_pdf function takes a Selenium webdriver instance and a filename as arguments and saves the current page as a PDF.
def save_as_pdf(driver, filename): driver.execute_script('window.print();') sleep(5) with open(filename, 'wb') as file: file.write(driver.page_source.encode('utf-8'))
I was able to solve this problem using the pyautogui library. Although I think that this is not the best solution
import pyautogui as pag
driver.execute_script('window.print();')
time.sleep(20)
pag.typewrite('test.pdf')
time.sleep(1)
pag.press("enter")
time.sleep(20)

How to use Python to catch a web page response from another browser tab?

I'm currently making an script that allows to download Instagram stories from any private account that you follow. The best tool I've found so far is this webpage which uses your account session from the web browser to get the content.
The thing is that when this site gets the query response with all the stories information it automatically opens a new tab in the browser with all the content in JSON format.
Example:
If you enter the url with the requested stories (such as https://www.instagram.com/stories/highlights/1234567890/) a new tab will be opened with plain text like this:
{
"data": {
"reels_media": [
{
"__typename": "GraphHighlightReel",
"id": "some_id",
"latest_reel_media": null,
"owner": {
"__typename": "GraphUser",
"id": "some_id",
"profile_pic_url": "some_url",
"username": "some_username"
},
"items": [
{
"__typename": "GraphStoryVideo",
"id": "some_id",
"dimensions": { "height": 1136, "width": 640 },
"display_resources": [
{
"src": "some_url",
"config_width": 640,
"config_height": 1136
},
{
"src": "some_url",
"config_width": 750,
"config_height": 1331
},
...
And when you copy-paste all the JSON content in the "Paste alien text here..." box you get all the media displayed to download directly.
What I'm doing right now is download the result HTML file with the media and then pass it to my script to download the stuff. But what I want to do is to catch the response directly inside the script using some kind of "fake browser" module. The problem is that I don't know how to get the response if the content is opened in a different tab.
If needed I can post part of my script to show how it works right now.
Thanks in advance for your time.
Use selenium
Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that.
Boring web-based administration tasks can (and should) also be automated as well.
Example:
open a new Firefox browser
load the Yahoo homepage
search for “seleniumhq”
close the browser
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get('http://www.yahoo.com')
assert 'Yahoo' in browser.title
elem = browser.find_element(By.NAME, 'p') # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)
browser.quit()

Downlad PDF from PDF Viewer using Selenium/Python/Chrome

I try to navigate through a webpage and whenever a pdf viewer appears, I want to download the pdf file. So to keep it easy in the beginning, I only try to login to the page, navigate to the first page that holds a pdf and try to download it.
The code I used so far:
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "/Users/XXX/Documents",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True
})
browser = webdriver.Chrome("/Users/XXX/Documents/chromedriver", options=options)
browser.get('the login webpage')
From here I login and navigate to the desired webpage.
And from then on, I don't really know how to get the PDF...
Hope someone can help me out here.
Thank you

How to Save as PDF with legal size document using Python and Selenium

I have a working script using Python, Selenium, and the Chrome webdriver to save webpages as PDFs. However, I need to save them on legal sized documents (216 x 356 mm), while my current script only saves files in letter size (216 x 279 mm).
Here's the code that I currently have:
# Attach printing options to webdriver
app_state = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"isCssBackgroundEnabled": True,
"isHeaderFooterEnabled": False,
"isLandscapeEnabled": True,
"version": 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(app_state)
}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
driver = webdriver.Chrome(options=chrome_options)
Is there a way to save documents using legal size (or change the paper size in any way)?
I've been searching for other prefs and options to change the paper setting and/or dimensions, but haven't had any luck at all.
Thanks!
if you add
"mediaSize": {"height_microns": 355600, "width_microns": 215900}
in the appState dictionary, you should get the paper size set to legal.
If you want to change to any other size (that is in the list) you can google the dimensions and convert to microns, or inspect the dropdown that allows you to choose the paper-size, and then search for your desired size in the options, and copy/paste the values.
For this solution to work, the values have to match exactly the ones in the dropdown, otherwise it won't select and default to Letter.

PDF printing from Selenium with chromedriver

I am trying to implement printing html/css contents as PDF with Selenium, chromedriver and python.
I could printing with a below code, but I cannot change printing setting. I would like to print in Letter size and no header/footer. Official information chromedriver or Selenium doesn't tell me a lot, so I'm in trouble. Does anyone know that how printing setting can be changed or it can never be done.
import json
import os
from selenium import webdriver
# setting html path
htmlPath = os.getcwd() + "\\sample.html"
addr = "file:///" + htmlPath
# setting Chrome Driver
chromeOpt = webdriver.ChromeOptions()
appState = {
"recentDestinations": [
{
"id": "Save as PDF",
"origin": "local",
"account": ""
}
],
"selectedDestinationId": "Save as PDF",
"version": 2
}
prefs = {
'printing.print_preview_sticky_settings.appState': json.dumps(appState)}
chromeOpt.add_experimental_option('prefs', prefs)
chromeOpt.add_argument('--kiosk-printing')
driver = webdriver.Chrome('.\\bin\\chromedriver', options=chromeOpt)
# HTML open and print
driver.get(addr)
driver.execute_script('return window.print()')
Add --headless and try it like this:
pdf = driver.execute_cdp_cmd("Page.printToPDF", {
"printBackground": True
})
import base64
with open("file.pdf", "wb") as f:
f.write(base64.b64decode(pdf['data']))
Here are some options you can fiddle with

Categories

Resources