So I have to web scrape the info of car year, model and make from https://auto-buy.geico.com/nb#/sale/vehicle/gskmsi/ (if the link doesn't work, kindly go to 'https://geico.com', fill in the zip code as '75002', enter random details in customer info and you will land up in the vehicle info link).
Having browsed through various answers, I have figured out that I can't use mechanize or something similar owing to the browser sending JavaScript requests every time I select an option in the menu. That leaves something like Selenium to help me.
Following is my code:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Ie("IEDriverServer.exe")
WebDriverWait(driver, 10)
driver.get('https://auto-buy.geico.com/nb#/sale/customerinformation/gskmsi')
html = driver.page_source
soup = BeautifulSoup(html)
select = Select(driver.find_element_by_id('vehicleYear'))
print(select)
The output is an empty [] because it's unable to locate the form.
Please let me know how to select the data from the forms of the page.
P.S.: Though I have used IE, any code correction using Mozilla or Chrome is also welcome.
You need to fill out all the info in "Customer" tab using Selenium and then wait for the appearance of this select element:
from selenium.webdriver.support import ui
select_element = ui.Select(ui.WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "vehicleYear"))))
Then select a needed option:
select_element.select_by_visible_text("2017")
Hope it helps you!
Related
I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the code below, all I get back is 14 empty lists.
from selenium import webdriver
from lxml import html
from time import sleep
driver = webdriver.Chrome(r"C:\Users\14074\Python\chromedriver")
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
sleep(1)
tree = html.fromstring(driver.page_source)
for flight_tree in tree.xpath('//div[#class="TQqf0e sSHqwe tPgKwe ogfYpf"]'):
title = flight_tree.xpath('.//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[2]/div/div[2]/div[6]/div/div[2]/div/div[1]/div/div[1]/div/div[2]/div[2]/div[2]/span/text()')
price = flight_tree.xpath('.//span[contains(#data-gs, "CjR")]')
print(title, price)
#driver.close()
This is just the first part of my code but I can't really continue without getting this to work. If anyone has some ideas on what I'm doing wrong that would be amazing! It's been driving me crazy. Thank you!
I noticed a few issues with your code. First of all, I believe that when entering this page, first google will show you the "I agree to terms and conditions" popup before showing you the content of the page, therefore you need to first click on that button.
Also, you should use the find_elements_by_xpath function directly on driver instead of using the page content, as this also allows you to render the javascript content. You can find more info here: python tree.xpath return empty list
To get more info on how to scrape using selenium and python you could check out this guide: https://www.webscrapingapi.com/python-selenium-web-scraper/
I used the following code to scrape the titles. (I also changed the xpaths to do so, by extracting them directly from google chrome. You can do that by right clicking on an element -> inspect and in the elements tab where the element is, you can right click -> copy -> Copy xpath)
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# I used these for the code to work on my windows subsystem linux
option = webdriver.ChromeOptions()
option.add_argument('--no-sandbox')
option.add_argument('--disable-dev-sh-usage')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=option)
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button/span').click() # this is necessary to pres the I agree button
elements = driver.find_elements_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[3]/div[3]/c-wiz/div/div[2]/div[1]/div/div/ol/li')
for flight_tree in elements:
title = flight_tree.find_element_by_xpath('.//*[#class="W6bZuc YMlIz"]').text
print(title)
I tried the below code, with screen maximized and having explicit waits and could successfully extract the information, please see below :
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.get("https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA")
wait = WebDriverWait(driver, 10)
titles = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div/descendant::h3")))
for name in titles:
print(name.text)
price = name.find_element(By.XPATH, "./../following-sibling::div/descendant::span[2]").text
print(price)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
Tokyo
₹38,473
Mumbai
₹3,515
Dubai
₹15,846
I am trying to get some information from a website. The Web Inspector shows the html source, with what JavaScript rendered into it. So I wanted to use chromedriver to render it for the purpose of extracting certain information, which cannot be accessed by simply requesting the website.
Now what seems confusing, is that even the driver is not returning anything.
My code looks like this:
driver = webdriver.Chrome('path/Chromedriver')
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all("tr", class_="odd")
And the website is:
https://www.amundietf.co.uk/professional/product/view/LU1681038243
Is there anything else that gets rendered into the html, when the Web Inspector is opened, which Chromedriver is not able to handle?
Thanks for your answers in advance!
At least you need to accept privacy settings, than click validateDisclaimer to site:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
url = "https://www.amundietf.co.uk/professional/product/view/LU1681038243"
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.implicitly_wait(10)
driver.get(url)
driver.find_element_by_id("footer_tc_privacy_button_3").click()
driver.find_element_by_id("validateDisclaimer").click()
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".fpFrame.fpBannerMore #blockleft>#part_principale_1")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all("tr", class_="odd")
print(results)
After it you need to wait for your page to load and to define elements you are looking for correctly.
Your question really contains many questions, that should be solved one by one.
I just pointed out the first of the problems.
Update
I solved the issue.
You will need to parse result by yourself.
So, you had problems:
Did not click two buttons.
Did not wait for a table you need to load.
Did not have any waits. In Selenium you must use them.
I want to click on Select Year dropdown and select a year from it. Go to that page and fetch the HTML.
I've written this piece of code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome('C:/Users/SoumyaPandey/Desktop/Galytix/Scrapers/data_ingestion/chromedriver.exe')
driver.get('https://investors.aercap.com/results-and-events/financial-results')
driver.maximize_window()
driver.find_element_by_id('year_filter_chosen').click()
driver.find_element_by_class_name('active-result')
I'm just starting to work with selenium and got no clue how to proceed further.
I tried to look for the next class after clicking on the dropdown. I want to set the attribute value 'data-option-array-index' to 1 first, open the page, get html. Then keep on changing the value of this attribute.
Any help would be much appreciated!!
driver.get('https://investors.aercap.com/results-and-events/financial-results')
elem=driver.find_element_by_css_selector('#year-filter')
driver.execute_script("arguments[0].style.display = 'block';", elem)
selectYear=Select(elem)
selectYear.select_by_index(1)
Simply find the element and use Select after you change it's style to display block to access it's values.
Imports
from selenium.webdriver.support.select import Select
For tag in selenium there's great Class Select, example provided my colleague in the neighbor answer. But there's also a bit easier way to do it as well, like:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://investors.aercap.com/results-and-events/financial-results')
el = driver.find_element(By.CSS_SELECTOR, '#year-filter')
time.sleep(3)
driver.execute_script("arguments[0].style.display = 'block';", el)
el.send_keys('2018')
time.sleep(3)
driver.quit()
As the title said, I'd like to performa a Google Search using Selenium and then open all results of the first page on separate tabs.
Please have a look at the code, I can't get any further (it's just my 3rd day learning Python)
Thank you for your help !!
Code:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import pyautogui
query = 'New Search Query'
browser = webdriver.Chrome('/Users/MYUSERNAME/Desktop/Desktop-Files/Chromedriver/chromedriver')
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
element = browser.find_element_by_class_name('LC20lb')
element.click()
The reason why I imported pyautogui is because I tried simulating a right click and then open in new tab for each result but it was a little confusing :)
Forget about pyautogui as what you want to do can be done in Selenium. Same with most of the rest. You just do not need it. See if this code meets your needs.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
query = 'sins of a solar empire' #my query about a video game
browser = webdriver.Chrome()
browser.get('http://www.google.com')
search = browser.find_element_by_name('q')
search.send_keys(query)
search.send_keys(Keys.RETURN)
links = browser.find_elements_by_class_name('r') #I went on Google Search and found the container class for the link
for link in links:
url = link.find_element_by_tag_name('a').get_attribute("href") #this code extracts the url of the HTML link
browser.execute_script('''window.open("{}","_blank");'''.format(url)) # this code uses Javascript to open a new tab and open the given url in that new tab
print(link.find_element_by_tag_name('a').get_attribute("href"))
I came across a website where I am hoping to scrape some data from. But the site seems to be un-scrapable for my limited Python knowledge. When using driver.find_element_by_xpath, I usually run into timeout exceptions.
Using the code I provided below, I hope to click on the first result and go to a new page. On the new page, I want to scrape the product title, and pack size. But no matter how I try it, I cannot even get Python to click the right thing for me. Let alone scraping the data. Can someone help out?
My desired output is:
Tris(triphenylphosphine)rhodium(I) chloride, 98%
190420010
1 GR 87.60
5 GR 367.50
These are the codes I have so far:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "http://www.acros.com/"
cas = "14694-95-2" # need to select for the appropriate one
driver = webdriver.Firefox()
driver.get(url)
country = driver.find_element_by_name("ddlLand")
for option in country.find_elements_by_tag_name("option"):
if option.text == "United States":
option.click()
driver.find_element_by_css_selector("input[type = submit]").click()
choice = driver.find_element_by_name("_ctl1:DesktopThreePanes1:ThreePanes:_ctl4:ddlType")
for option in choice.find_elements_by_tag_name("option"):
if option.text == "CAS registry number":
option.click()
inputElement = driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_tbSearchString")
inputElement.send_keys(cas)
driver.find_element_by_id("_ctl1_DesktopThreePanes1_ThreePanes__ctl4_btnGo").click()
Your code as presented works fine for me, in that it directs the instance of Firefox to http://www.acros.com/DesktopModules/Acros_Search_Results/Acros_Search_Results.aspx?search_type=CAS&SearchString=14694-95-2 which shows the search results.
If you locate the iframe element on that page:
<iframe id="searchAllFrame" allowtransparency="" background-color="transparent" frameborder="0" width="1577" height="3000" scrolling="auto" src="http://newsearch.chemexper.com/misc/hosted/acrosPlugin/center.shtml?query=14694-95-2&searchType=cas¤cy=&country=NULL&language=EN&forGroupNames=AcrosOrganics,FisherSci,MaybridgeBB,BioReagents,FisherLCMS&server=www.acros.com"></iframe>
and use driver.switch_to.frame to switch to that frame then I think data you want should be scrapable from there, for example:
driver.switch_to.frame(driver.find_element_by_xpath("//iframe[#id='searchAllFrame']"))
You can then carry on using the driver as usual to find elements within that iframe. (I think switch_to_frame works similarly, but is deprecated.)
(I can't seem to find a decent link to docs for switch_to, this isn't all that helpful.