Open website and save as image - python

I am currently using the following code to open a website:
import webbrowser
webbrowser.open('http://test.com')
I am now trying to save the open webpage as a .gif, any advise on how to do this please?

How about selenium and its save_screenshot()?
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://test.com")
driver.save_screenshot("screenshot.png")
driver.close()

Related

Python - Element item Xpath with Selenium

I'm creating a bot to download a pdf from a website. I used selenium to open google chrome and I can open the website window but I select the Xpath of the first item in the grid, but the click to download the pdf does not occur. I believe I'm getting the wrong Xpath.
I leave the site I'm accessing and my code below. Could you tell me what am I doing wrong? Am I getting the correct Xpath? Thank you very much in advance.
This site is an open government data site from my country, Brazil, and for those trying to access from outside, maybe the IP is blocked, but the page would be this:
Image site
Source site
Edit
Page source code
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
service = Service(ChromeDriverManager().install())
navegador = webdriver.Chrome(service=service)
try:
navegador.get("https://www.tce.ce.gov.br/cidadao/diario-oficial-eletronico")
time.sleep(2)
elem = navegador.find_element(By.XPATH, '//*[#id="formUltimasEdicoes:consultaAvancadaDataTable:0:j_idt101"]/input[1]')
elem.click()
time.sleep(2)
navegador.close()
navegador.quit()
except:
navegador.close()
navegador.quit()
I think you'll need this PDF, right?:
<a class="maximenuck " href="https://www.tce.ce.gov.br/downloads/Jurisdicionado/CALENDARIO_DAS_OBRIGACOES_ESTADUAIS_2020_N.pdf" target="_blank"><span class="titreck">Estaduais</span></a>
You'll need to locate that element by xpath, and then download the pdf's using the "href" value requests.get("Your_href_url")
The XPATH in your source-code is //*[#id="menu-principal"]/div[2]/ul/li[5]/div/div[2]/div/div[1]/ul/li[14]/div/div[2]/div/div[1]/ul/li[3]/a but that might not always be the same.

Download entire webpage as HTML (including the HTML assets) without save as pop up using Selenium and Python

I am trying to scrape a website and download all the webpages as .html files (including all the HTML assets) so that the locally downloaded page opens just like the same in the server.
Currently using Selenium, Chrome Webdriver, and Python.
Approach:
I tried updating the prefs of the chrome browser. And then login into the website. After logging in I want to download the webpage similarly we do download by clicking ctrl + s from the keyboard.
Below code opens the desired page I want to download but does not disable Windows's save as a pop-up and neither downloads the page to the specified path.
from selenium import webdriver
import pyautogui
chrome_options = webdriver.ChromeOptions()
preferences = {
"download.default_directory":"C:\\Users\\pathtodir",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
}
chrome_options.add_experimental_option("prefs", preferences)
driver = webdriver.Chrome(options=chrome_options)
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
pyautogui.hotkey('ctrl', 's')
pyautogui.typewrite('hello1' + '.html')
pyautogui.hotkey('enter')
Can somebody please help me to understand what I am doing wrong? Please suggest if there is any other alternative library that can be used in python.
To save a page first obtain the page source behind the webpage with the help of the page_source method.
Then open a file with a particular encoding with the codecs.open method. The file has to be opened in the write mode represented by w and encoding type as utf−8. Then use the write method to write the content obtained from the page_source method.
from selenium import webdriver
import codecs
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.implicitly_wait(0.5)
driver.get(***URL to the website***)
h = driver.page_source
n=os.path.join("C:\ANYPATH","Page.html")
f = codecs.open(n, "w", "utf−8")
f.write(h)
driver.quit()
I was able to fix the issue, the problem was that my code quit before the browser was able to download the file. Adding time.sleep() fixed it.
Updated code:
from selenium import webdriver
import pyautogui
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[#id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[#id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[#id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
FILE_NAME = r'C:\ANYPATH\Page.html'
pyautogui.typewrite(FILE_NAME)
pyautogui.press('enter')
time.sleep(10)
driver.quit()

python selenium having difficulties with click

Currently using python and trying to have selenium click the "About" on google without using id. When I try to use .click() it does not execute, what is wrong with my code? I have looked at many videos and tutorials and it looks correct.
from selenium import webdriver
from time import sleep
browser = webdriver.Safari()
browser.get('http://google.com')
browser.maximize_window()
elm = browser.find_element_by_link_text('About')
browser.implicitly_wait(5)
elm.click()
I think, you can try using find_element_by_xpath.
First you will copy xpath of about link then you can try like below:
from selenium import webdriver
from time import sleep
browser = webdriver.Safari()
browser.get('http://google.com')
browser.maximize_window()
elm = browser.find_element_by_xpath('//*[#id="fsl"]/a[3]')
browser.implicitly_wait(5)
elm.click()
So the issue ended up being safari. For some reason the web driver safari was not allowing me to use .click. I switched to chrome and it worked.

Selenium + Chrome + Python

I want to save a webpage. This looks simples. I used the code below. This opens the browser but the page is not saved.
Why?
When this works, where the file will be saved?
Thanks
Detais:
Chrome 68.0.3440.106 - 64 bits
ChromeDriver 2.41
Code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
browser = webdriver.Chrome(executable_path=r"C:\Program Files (x86)\Selenium\chromedriver.exe")
browser.get('https://automatetheboringstuff.com')
ActionChains(browser).key_down(Keys.CONTROL).send_keys('s').key_up(Keys.CONTROL).perform()
If you are looking to save the html of a page, you can get that from the page source.
html = browser.page_source
and if you want to write this to a file, you can do this:
html_file = open('some_file_name.html', 'w')
html_file.write(html)

How to download complete webpage using Python Selenium

I have to write a Python code that will get URL, open a Chrome/Firefox browser using Selenium and will download it as a "Complete Webpage", mean with the CSS assets for example.
I know the basis of using Selenium, like:
from selenium import webdriver
ff = webdriver.firefox()
ff.get(URL)
ff.close
How can I perform the downloading action (Like clicking automatically in the browser CTRL+S)?
You can try following code to get HTML page as file:
from selenium import webdriver
ff = webdriver.Firefox()
ff.get(URL)
with open('/path/to/file.html', 'w') as f:
f.write(ff.page_source)
ff.close

Categories

Resources