Python - Element item Xpath with Selenium

Python - Element item Xpath with Selenium - python

I'm creating a bot to download a pdf from a website. I used selenium to open google chrome and I can open the website window but I select the Xpath of the first item in the grid, but the click to download the pdf does not occur. I believe I'm getting the wrong Xpath.
I leave the site I'm accessing and my code below. Could you tell me what am I doing wrong? Am I getting the correct Xpath? Thank you very much in advance.
This site is an open government data site from my country, Brazil, and for those trying to access from outside, maybe the IP is blocked, but the page would be this:
Image site
Source site
Edit
Page source code
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
service = Service(ChromeDriverManager().install())
navegador = webdriver.Chrome(service=service)
try:
navegador.get("https://www.tce.ce.gov.br/cidadao/diario-oficial-eletronico")
time.sleep(2)
elem = navegador.find_element(By.XPATH, '//*[#id="formUltimasEdicoes:consultaAvancadaDataTable:0:j_idt101"]/input[1]')
elem.click()
time.sleep(2)
navegador.close()
navegador.quit()
except:
navegador.close()
navegador.quit()

I think you'll need this PDF, right?:
<a class="maximenuck " href="https://www.tce.ce.gov.br/downloads/Jurisdicionado/CALENDARIO_DAS_OBRIGACOES_ESTADUAIS_2020_N.pdf" target="_blank"><span class="titreck">Estaduais</span></a>
You'll need to locate that element by xpath, and then download the pdf's using the "href" value requests.get("Your_href_url")
The XPATH in your source-code is //*[#id="menu-principal"]/div[2]/ul/li[5]/div/div[2]/div/div[1]/ul/li[14]/div/div[2]/div/div[1]/ul/li[3]/a but that might not always be the same.

Related

How to use selenium for webscraping google flights?

I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the code below, all I get back is 14 empty lists.
from selenium import webdriver
from lxml import html
from time import sleep
driver = webdriver.Chrome(r"C:\Users\14074\Python\chromedriver")
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
sleep(1)
tree = html.fromstring(driver.page_source)
for flight_tree in tree.xpath('//div[#class="TQqf0e sSHqwe tPgKwe ogfYpf"]'):
title = flight_tree.xpath('.//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[2]/div/div[2]/div[6]/div/div[2]/div/div[1]/div/div[1]/div/div[2]/div[2]/div[2]/span/text()')
price = flight_tree.xpath('.//span[contains(#data-gs, "CjR")]')
print(title, price)
#driver.close()
This is just the first part of my code but I can't really continue without getting this to work. If anyone has some ideas on what I'm doing wrong that would be amazing! It's been driving me crazy. Thank you!

I noticed a few issues with your code. First of all, I believe that when entering this page, first google will show you the "I agree to terms and conditions" popup before showing you the content of the page, therefore you need to first click on that button.
Also, you should use the find_elements_by_xpath function directly on driver instead of using the page content, as this also allows you to render the javascript content. You can find more info here: python tree.xpath return empty list
To get more info on how to scrape using selenium and python you could check out this guide: https://www.webscrapingapi.com/python-selenium-web-scraper/
I used the following code to scrape the titles. (I also changed the xpaths to do so, by extracting them directly from google chrome. You can do that by right clicking on an element -> inspect and in the elements tab where the element is, you can right click -> copy -> Copy xpath)
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# I used these for the code to work on my windows subsystem linux
option = webdriver.ChromeOptions()
option.add_argument('--no-sandbox')
option.add_argument('--disable-dev-sh-usage')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=option)
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button/span').click() # this is necessary to pres the I agree button
elements = driver.find_elements_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[3]/div[3]/c-wiz/div/div[2]/div[1]/div/div/ol/li')
for flight_tree in elements:
title = flight_tree.find_element_by_xpath('.//*[#class="W6bZuc YMlIz"]').text
print(title)

I tried the below code, with screen maximized and having explicit waits and could successfully extract the information, please see below :
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.get("https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA")
wait = WebDriverWait(driver, 10)
titles = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div/descendant::h3")))
for name in titles:
print(name.text)
price = name.find_element(By.XPATH, "./../following-sibling::div/descendant::span[2]").text
print(price)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
Tokyo
₹38,473
Mumbai
₹3,515
Dubai
₹15,846

How can I use Selenium (Python) to do a Google Search and then open the results of the first page in new tabs?

As the title said, I'd like to performa a Google Search using Selenium and then open all results of the first page on separate tabs.
Please have a look at the code, I can't get any further (it's just my 3rd day learning Python)
Thank you for your help !!
Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pyautogui
query = 'New Search Query'
browser = webdriver.Chrome('/Users/MYUSERNAME/Desktop/Desktop-Files/Chromedriver/chromedriver')
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
element = browser.find_element_by_class_name('LC20lb')
element.click()
The reason why I imported pyautogui is because I tried simulating a right click and then open in new tab for each result but it was a little confusing :)

Forget about pyautogui as what you want to do can be done in Selenium. Same with most of the rest. You just do not need it. See if this code meets your needs.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
query = 'sins of a solar empire' #my query about a video game
browser = webdriver.Chrome()
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
links = browser.find_elements_by_class_name('r') #I went on Google Search and found the container class for the link
for link in links:
url = link.find_element_by_tag_name('a').get_attribute("href") #this code extracts the url of the HTML link
browser.execute_script('''window.open("{}","_blank");'''.format(url)) # this code uses Javascript to open a new tab and open the given url in that new tab
print(link.find_element_by_tag_name('a').get_attribute("href"))

How to force a link that is invoked via javascript to open in a new tab

I am trying to scrape a webpage that opens up hyperlinks via javascript as shown below. I am using Selenium with Python.
<a href="javascript:openlink('120000020846')">
<subtitle>Blah blah blah</subtitle>
</a>
Using XPATH, I was able to open up the hyperlink using the following Python code.
driver = webdriver.Chrome()
xpath = '//a/subtitle[contains(text(),"Blah blah blah")]'
link_to_open = driver.find_element(By.XPATH,xpath);
link_to_open.click()
However, the link opens in the same tab. This is not what I want, as I want the links to open in a new tab, so that I can retain the information of the current page and continue processing the rest of the links.
Would greatly appreciate if anyone can give me some pointers if this can be done?
Thank you so much! :)

You can force a link to open in a new tab through the ActionChains implementation as follows :
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
action = ActionChains(driver)
link_to_open = driver.find_element(By.XPATH, "//a/subtitle[contains(.,'Blah blah blah')]")
action.key_down(Keys.CONTROL).click(link_to_open).key_up(Keys.CONTROL).perform()

Selenium drop down option unable to web scrape

So I have to web scrape the info of car year, model and make from https://auto-buy.geico.com/nb#/sale/vehicle/gskmsi/ (if the link doesn't work, kindly go to 'https://geico.com', fill in the zip code as '75002', enter random details in customer info and you will land up in the vehicle info link).
Having browsed through various answers, I have figured out that I can't use mechanize or something similar owing to the browser sending JavaScript requests every time I select an option in the menu. That leaves something like Selenium to help me.
Following is my code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Ie("IEDriverServer.exe")
WebDriverWait(driver, 10)
driver.get('https://auto-buy.geico.com/nb#/sale/customerinformation/gskmsi')
html = driver.page_source
soup = BeautifulSoup(html)
select = Select(driver.find_element_by_id('vehicleYear'))
print(select)
The output is an empty [] because it's unable to locate the form.
Please let me know how to select the data from the forms of the page.
P.S.: Though I have used IE, any code correction using Mozilla or Chrome is also welcome.

You need to fill out all the info in "Customer" tab using Selenium and then wait for the appearance of this select element:
from selenium.webdriver.support import ui
select_element = ui.Select(ui.WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "vehicleYear"))))
Then select a needed option:
select_element.select_by_visible_text("2017")
Hope it helps you!

Webscrape Flashscore with Python/Selenium

I started to learn scrape websites with Python and Selenium. I choose selenium because I need to navigate through the website and I also have to login.
I wrote an script that is able to open a firefox window and it opens the website www.flashscore.com. With this script I also be able to login and navigate to the different sports section (main menu) they have.
The code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# open website
driver = webdriver.Firefox()
driver.get("http://www.flashscore.com")
# login
driver.find_element_by_id('signIn').click()
username = driver.find_element_by_id("email")
password = driver.find_element_by_id("passwd")
username.send_keys("*****")
password.send_keys("*****")
driver.find_element_by_name("login").click()
# go to the tennis section
link = driver.find_element_by_link_text('Tennis')
link.click()
#go to the live games tab in the tennis section
# ?????????????????????????????'
Then it went more difficult. I also want to navigate to, for example, the sections "live games" and "finished" tabs in the sports sector. This part wouldn't work. I tried many things but I can't get into one of this tabs. When analyzing the website I see that they use some Iframes. I also find some code to switch to a Iframes window. But the problem is, I can't find the name of the Iframe where the tabs are that I want to click on. Maybe the Iframes are not the problem and do I look to the wrong way. (Maybe the problem is caused by some javascript?)
Can anybody please help me with this?

No, the iframes are not the problem in this case. The "Live games" element is not inside an iframe. Locate it by link text and click:
live_games_link = driver.find_element_by_link_text("LIVE Games")
live_games_link.click()
You may need to wait for this link to be clickable before actually trying to click it:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
wait = WebDriverWait(driver, 10)
live_games_link = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "LIVE Games")))
live_games_link.click()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Element item Xpath with Selenium - python

Related

How to use selenium for webscraping google flights?

How can I use Selenium (Python) to do a Google Search and then open the results of the first page in new tabs?

How to force a link that is invoked via javascript to open in a new tab

Selenium drop down option unable to web scrape

Webscrape Flashscore with Python/Selenium

Categories

Resources