Downloading PDF from popup/form with Selenium Python ChromeDriver - python

Having trouble figuring out the next step, trying to download a pdf file from a website and getting stuck.
"https://www.southtechhosting.com/SanJoseCity/CampaignDocsWebRetrieval/Search/SearchByElection.aspx"
Page with Links to PDF Files
PDF file to download
I was able to click on the pdf link from the "Page with Links" using Selenium & ChromeDriver but then I get a popup form instead of a download.
I tried disabling the Chrome PDF Viewer ("plugins.plugins_list":[{"enabled":False,"name":"Chrome PDF Viewer"}]), but that doesn't work.
The popup form (viewed in "PDF file to download") has a hover link to download the pdf file. I've tried ActionChains(), but I get this exception after running this line:
from selenium.webdriver.common.action_chains import ActionChains
element_to_hover = driver.find_element_by_xpath("//paper-icon-button[#id='download']")
hover = ActionChains(driver).move_to_element(element_to_hover)
hover.perform()
Looking for the most efficient way to download pdf files in this type of situation. Thanks!

Please try this:
chromeOptions = webdriver.ChromeOptions()
prefs = {"plugins.always_open_pdf_externally": True}
chromeOptions.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chromeOptions)
driver.get('https://www.southtechhosting.com/SanJoseCity/CampaignDocsWebRetrieval/Search/SearchByElection.aspx')
#Code to open the pop-up
driver.find_element_by_xpath('//*[#id="ctl00_DefaultContent_ASPxRoundPanel1_btnFindFilers_CD"]').click()
driver.find_element_by_xpath('//*[#id="ctl00_GridContent_gridFilers_DXCBtn0"]').click()
driver.find_element_by_xpath('//*[#id="ctl00_DefaultContent_gridFilingForms_DXCBtn0"]').click()
driver.switch_to.frame(driver.find_element_by_tag_name('iframe'))
a = driver.find_element_by_link_text("Click here")
ActionChains(driver).key_down(Keys.CONTROL).click(a).key_up(Keys.CONTROL).perform()
UPDATE:
To exit the popup, you can try this:
driver.switch_to.default_content()
driver.find_element_by_xpath('//*[#id="ctl00_GenericPopupSizeable_InnerPopupControl_HCB-1"]/img').click()

Related

Download entire webpage as HTML (including the HTML assets) without save as pop up using Selenium and Python

I am trying to scrape a website and download all the webpages as .html files (including all the HTML assets) so that the locally downloaded page opens just like the same in the server.
Currently using Selenium, Chrome Webdriver, and Python.
Approach:
I tried updating the prefs of the chrome browser. And then login into the website. After logging in I want to download the webpage similarly we do download by clicking ctrl + s from the keyboard.
Below code opens the desired page I want to download but does not disable Windows's save as a pop-up and neither downloads the page to the specified path.
from selenium import webdriver
import pyautogui
chrome_options = webdriver.ChromeOptions()
preferences = {
"download.default_directory":"C:\\Users\\pathtodir",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
}
chrome_options.add_experimental_option("prefs", preferences)
driver = webdriver.Chrome(options=chrome_options)
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
pyautogui.hotkey('ctrl', 's')
pyautogui.typewrite('hello1' + '.html')
pyautogui.hotkey('enter')
Can somebody please help me to understand what I am doing wrong? Please suggest if there is any other alternative library that can be used in python.
To save a page first obtain the page source behind the webpage with the help of the page_source method.
Then open a file with a particular encoding with the codecs.open method. The file has to be opened in the write mode represented by w and encoding type as utf−8. Then use the write method to write the content obtained from the page_source method.
from selenium import webdriver
import codecs
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.implicitly_wait(0.5)
driver.get(***URL to the website***)
h = driver.page_source
n=os.path.join("C:\ANYPATH","Page.html")
f = codecs.open(n, "w", "utf−8")
f.write(h)
driver.quit()
I was able to fix the issue, the problem was that my code quit before the browser was able to download the file. Adding time.sleep() fixed it.
Updated code:
from selenium import webdriver
import pyautogui
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
FILE_NAME = r'C:\ANYPATH\Page.html'
pyautogui.typewrite(FILE_NAME)
pyautogui.press('enter')
time.sleep(10)
driver.quit()

Selenium Python Authenticating issue

To open in Selenium a page where authentication is needed the code below is appropriate.
driver = webdriver.Firefox()
driver.get("https://username:password#testwebsite.com/testpage.html")
driver.implicitly_wait(30)
But the above works when a page should be open directly from Selenium as the initial page.
In my project, I click a link that opens a page where an authentication as above is required.
How to resolve it? I can not use directly driver.get(...) because I can not open directly this page...

Is there a function in python that can download Microsoft Forms excel sheet?

I created a Microsoft Form survey using google chrome, where I sent the link to different individuals to complete the survey, now I would like to see all the responses from individuals that have completed the survey. How can I download the Microsoft Form excel sheet directly from chrome using python?
Code:
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("Microsoft Form Link")
button = driver.find_element_by_id('Open in Excel')
button.click()
But i got this error:
WebDriverException: Message: 'chrome driver' executable needs to be in PATH. Please see
https://sites.google.com/a/chromium.org/chromedriver/home
You have to download the chromedriver that matches with your Chrome browser from here: https://chromedriver.chromium.org/downloads
Then you have to add this:
driver = webdriver.Chrome(executable_path=r'PATH TO THE FILE\chromedriver.exe')

Why does the selenium download not work?

With selenium I try to download something (in order to verify its content), using the following code as a proof-of-concept:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
#Set Location to store files after downloading.
profile.set_preference("browser.download.dir", "/tmp")
profile.set_preference("browser.download.folderList", 2)
#Set Preference to not show file download confirmation dialogue using MIME types Of different file extension types.
#profile.set_preference("browser.helperApps.neverAsk.saveToDisk",
# "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;")
profile.set_preference("browser.download.manager.showWhenStarting", False )
profile.set_preference("pdfjs.disabled", True )
profile.set_preference("browser.helperApps.neverAsk.saveToDisk","application/zip")
profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/zip")
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)
browser.get('https://www.thinkbroadband.com/download')
time.sleep(15)
elem = browser.find_element_by_xpath('//a[#href="http://ipv4.download.thinkbroadband.com/5MB.zip"]')
elem.click()
time.sleep(15)
However, nothing 'happens' (i.e. the download is not performed), and also no error message is shown. When I click on that download link manually, the test-file is being downloaded into /tmp.
Is there anything I am missing?
The issue could be because the click in this case needs to go through a child element
elem = browser.find_element_by_xpath('//a[#href="http://ipv4.download.thinkbroadband.com/5MB.zip"]/img')
elem.click()
But otherwise when you click on a link browser checks stuff in background for that link, and the site seems to have a problem, when I open the target link in browser I get an empty response

Download the whole html page content using selenium

I need to download the whole content of html pages images , css , js.
First Option:
Download the page by urllib or requests,
Extract the page info. by beutiful soup or lxml,
Download all links and
Edit the links in original page torelative.
Disadvantages
multiple steps.
The downloaded page will never be identical to remote page. may be due to js or ajax content
Second option
Some authors recommend automating the webbrowser to download the page; so the java scrip and ajax will executed before download.
scraping ajax sites and java script
I want to use this option.
First attempt
So I have copied this piece of selenium code to do 2 steps:
Open the URL in firefox browser
Download the page.
The code
import os
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False )
profile.set_preference('browser.download.dir', os.environ["HOME"])
profile.set_preference("browser.helperApps.alwaysAsk.force", False )
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/html,text/webviewhtml,text/x-server-parsed-html,text/plaintext,application/octet-stream');
browser = webdriver.Firefox(profile)
def open_new_tab(url):
ActionChains(browser).send_keys(Keys.CONTROL, "t").perform()
browser.get(url)
return browser.current_window_handle
# call the function
open_new_tab("https://www.google.com")
# Result: the browser is opened t the given url, no download occur
Result
unfortunately no download occurs, it just opens the browser at the url provided (first step).
Second attempt
I think in downloading the page by separate function; so I have added this function.
The function added
def save_current_page():
ActionChains(browser).send_keys(Keys.CONTROL, "s").perform()
# call the function
open_new_tab("https://www.google.com")
save_current_page()
Result
# No more; the browser is opened at the given url, no download occurs.
Question
How to automate downloading webpages by selenium ??

Categories

Resources