I am trying to print my html code to pdf by opening up the html document with selenium (chrome-driver) in Python. Having opened the html file, I try to print the document to pdf but whenever I open the printer window (manually or through selenium commands) it immediately shuts down.
I have tried with the following code which cannot produce a .pdf file:
chrome_options = webdriver.ChromeOptions()
settings = {"recentDestinations": [{"id": "Save as PDF", "origin": "local", "account": ""}], "selectedDestinationId": "Save as PDF", "version": 2}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings),
'savefile.default_directory': path_html}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
browser = webdriver.Chrome(CHROMEDRIVER_PATH, options=chrome_options)
browser.get(path_html)
browser.execute_script('window.print();')
browser.close()
I am using the latest chromedriver version from https://chromedriver.chromium.org/downloads for windows.
If I simply open my own browser (not through selenium) I can easily print the html document to a .pdf file.
UPDATE:
If someone bumps into the sample problem, I solved the problem by utilizing the a wrapper that converts the html file to pdf - it essentially goes through the above code.
Check out: the converter function from https://pypi.org/project/pyhtml2pdf/
Related
I am trying to scrape a website and download all the webpages as .html files (including all the HTML assets) so that the locally downloaded page opens just like the same in the server.
Currently using Selenium, Chrome Webdriver, and Python.
Approach:
I tried updating the prefs of the chrome browser. And then login into the website. After logging in I want to download the webpage similarly we do download by clicking ctrl + s from the keyboard.
Below code opens the desired page I want to download but does not disable Windows's save as a pop-up and neither downloads the page to the specified path.
from selenium import webdriver
import pyautogui
chrome_options = webdriver.ChromeOptions()
preferences = {
"download.default_directory":"C:\\Users\\pathtodir",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
}
chrome_options.add_experimental_option("prefs", preferences)
driver = webdriver.Chrome(options=chrome_options)
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
pyautogui.hotkey('ctrl', 's')
pyautogui.typewrite('hello1' + '.html')
pyautogui.hotkey('enter')
Can somebody please help me to understand what I am doing wrong? Please suggest if there is any other alternative library that can be used in python.
To save a page first obtain the page source behind the webpage with the help of the page_source method.
Then open a file with a particular encoding with the codecs.open method. The file has to be opened in the write mode represented by w and encoding type as utf−8. Then use the write method to write the content obtained from the page_source method.
from selenium import webdriver
import codecs
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.implicitly_wait(0.5)
driver.get(***URL to the website***)
h = driver.page_source
n=os.path.join("C:\ANYPATH","Page.html")
f = codecs.open(n, "w", "utf−8")
f.write(h)
driver.quit()
I was able to fix the issue, the problem was that my code quit before the browser was able to download the file. Adding time.sleep() fixed it.
Updated code:
from selenium import webdriver
import pyautogui
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
FILE_NAME = r'C:\ANYPATH\Page.html'
pyautogui.typewrite(FILE_NAME)
pyautogui.press('enter')
time.sleep(10)
driver.quit()
I'm trying to "Print as PDF" a set of webpage reports repeatedly with a Selenium ChromeDriver script in Python. As you can see below, if we have the "Background graphics" option selected, the header of the report renders perfectly.
However, when I run Selenium, I cannot figure out how to set the parameter for background graphics. The saved PDF will instead look like this:
Here is a minimally reproducible script that I've been attempting to run:
import selenium
from selenium import webdriver
import json
from time import sleep
import os
chrome_options = webdriver.ChromeOptions()
settings = {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
}],
"selectedDestinationId": "Save as PDF",
"version": 2,
#"cssBackground": 2,
}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings), }
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
CHROMEDRIVER_PATH = '/usr/local/bin/chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=CHROMEDRIVER_PATH)
driver.get("https://dash.gallery/dash-multipage-report/")
sleep(3)
driver.execute_script('window.print();')
sleep(5)
driver.quit()
I've done some digging on Chromium's documentation and I believe that I've found the cssBackground setting here. Additionally, there's a BackgroundGraphicsModeRestriction in this file that might be affecting things? I found what I thought would be a useful lead to setting the value in this StackOverflow post but I can't seem to put the pieces together and check that "Background graphics" box from the start. Any help that you can provide would be incredibly appreciated as I'm pretty stuck here. I've also tried rendering my page's CSS with -webkit-print-color-adjust:exact; but it doesn't seem to capture the entire background.
I am trying to create a onefile .exe using pyinstaller that will utilize chromedriver_autoinstaller so that the chromedriver is always up to date. The code works fine when run within the IDE but once I use pyinstaller to create the .exe it throws the 'Specific issue with chrome_autoinstaller' message. Because the chromedriver_autoinstaller is not working, the program then cannot find chromedriver. The program does work in an .exe without the autoinstaller by directly referencing the chromedriver file path but I would prefer to utilize this package if possible.
class LoginPCC:
def __init__(self): # create an instance of this class. Begins by logging in
try:
chrome_options = webdriver.ChromeOptions()
settings = {
"recentDestinations": [{
"id": "Save as PDF",
"origin": "local",
"account": "",
}],
"selectedDestinationId": "Save as PDF",
"version": 2
}
prefs = {'printing.print_preview_sticky_settings.appState': json.dumps(settings),
"plugins.always_open_pdf_externally": True}
chrome_options.add_experimental_option('prefs', prefs)
chrome_options.add_argument('--kiosk-printing')
try:
chromedriver_autoinstaller.install()
except:
print('Specific issue with chrome_autoinstaller')
try:
self.driver = webdriver.Chrome(options=chrome_options)
except:
print('Cannot find chromedriver')
python 3.7
pyinstaller 4.0.dev
selenium 4.141.0
chromedriver-autoinstaller 0.2
The below worked for me:
chromedriver_autoinstaller.install(cwd=True)
When you launch your EXE first time, a new folder will be created in a folder where you place your EXE. This is where the chromedriver will be stored. This is not the prettiest solution, but it works.
Having trouble figuring out the next step, trying to download a pdf file from a website and getting stuck.
"https://www.southtechhosting.com/SanJoseCity/CampaignDocsWebRetrieval/Search/SearchByElection.aspx"
Page with Links to PDF Files
PDF file to download
I was able to click on the pdf link from the "Page with Links" using Selenium & ChromeDriver but then I get a popup form instead of a download.
I tried disabling the Chrome PDF Viewer ("plugins.plugins_list":[{"enabled":False,"name":"Chrome PDF Viewer"}]), but that doesn't work.
The popup form (viewed in "PDF file to download") has a hover link to download the pdf file. I've tried ActionChains(), but I get this exception after running this line:
from selenium.webdriver.common.action_chains import ActionChains
element_to_hover = driver.find_element_by_xpath("//paper-icon-button[#id='download']")
hover = ActionChains(driver).move_to_element(element_to_hover)
hover.perform()
Looking for the most efficient way to download pdf files in this type of situation. Thanks!
Please try this:
chromeOptions = webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True}
chromeOptions.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
driver.get('https://www.southtechhosting.com/SanJoseCity/CampaignDocsWebRetrieval/Search/SearchByElection.aspx')
#Code to open the pop-up
driver.find_element_by_xpath('//*[#id="ctl00_DefaultContent_ASPxRoundPanel1_btnFindFilers_CD"]').click()
driver.find_element_by_xpath('//*[#id="ctl00_GridContent_gridFilers_DXCBtn0"]').click()
driver.find_element_by_xpath('//*[#id="ctl00_DefaultContent_gridFilingForms_DXCBtn0"]').click()
driver.switch_to.frame(driver.find_element_by_tag_name('iframe'))
a = driver.find_element_by_link_text("Click here")
ActionChains(driver).key_down(Keys.CONTROL).click(a).key_up(Keys.CONTROL).perform()
UPDATE:
To exit the popup, you can try this:
driver.switch_to.default_content()
driver.find_element_by_xpath('//*[#id="ctl00_GenericPopupSizeable_InnerPopupControl_HCB-1"]/img').click()
I'm using Selenium to download an embedded pdf accessed through many complex layers of logins and other browser actions. I've set up my chromedriver with the following options per instruction from various other posts:
chromedriver = r'C:\Users\cj9250\AppData\Local\Continuum\anaconda3\chromedriver.exe'
download_dir = "C:\\Users\\CJ9250\\Downloads\\" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()
profile = {
"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}],
"download.default_directory": download_dir ,
"download.extensions_to_open": "applications/pdf",
"plugins.always_open_pdf_externally": True,
"download.prompt_for_download": False,
"safebrowsing.enabled": True
}
options.add_experimental_option("prefs", profile)
browser = webdriver.Chrome(chromedriver, chrome_options=options)
However, I get this box that I have to click before it downloads to my specified directory:
The 'Open' element doesn't have an xpath that I can find through the inspector. I'm guessing that this is some kind of internal security setting for the ChromeDriver but I can't find a way past it.
My end goal is just to download a embedded PDF in an open Selenium Test page, this seemed the only suggested course of action.
reportSho.do OPEN
I'm not sure why it worked but, I modified my profile variable to:
profile = {
"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
"download.default_directory": download_dir ,
"download.extensions_to_open": "applications/pdf",
"safebrowsing.enabled": False
}
From there I was able to get a frame element on the page with the open button. One of the element attributes had a URL. When I instructed the browser to go to the URL, it downloaded the file to my specified directory!