I am new to web scrapping. I am trying to scrape a website (for getting flight prices) using Python Selenium which has some drop down menu for fields like departure city, arrival city. The first option listen should be accessed and used.
Lets say that the field departure city is button, on click of that button, a new html page will be loaded with input box and a list set is available with every list element consisting of a button.
While debugging I found that, once my keyword is entered the options loading screen appears but options are not getting loaded even after increasing the sleep time (I have attached the image for the same).
image with error
Manually there are no issues with the drop down menu. As soon as I enter my departure city, options will be loaded and I choose the respective location.
I also tried to use actionchains. Unfortunately there is no luck. I am attaching my code below. Any help is appreciated.
#Manually trying to access the first element:**
#Accessing Input Field:**
fly_from = browser.find_element("xpath", "//button[contains(., 'Departing')]").click()
element = WebDriverWait(browser,10).until(ec.presence_of_element_located((By.ID,"location")))
#Sending the keyword:**
element.send_keys("Los Angeles")
#Selecting the first element:**
first_elem = browser.find_element("xpath", "//div[#class='classname default-padding']//div//ul//li[0]]//button")
Using Action Chains:
fly_from = browser.find_element("xpath", "//button[contains(., 'Departing')]").click()
time.sleep(10)
element = WebDriverWait(browser, 10).until(ec.presence_of_element_located((By.ID, "location")))
element.send_keys("Los Angeles")
time.sleep(3)
list_elem = browser.find_element("xpath", "//div[#class='class name default-padding']//div//ul//li[0]]//button")
size=element.size
action = ActionChains(browser)
action.move_to_element_with_offset(to_element=list_elem,xoffset =0.5*size['width'], yoffset=70).click().perform()
action.click(on_element = list_elem)
action.perform()
Related
My code runs fine and prints the title for all rows but the rows with dropdowns.
For example, row 4 has a dropdown if clicked. I implemented a try which would in theory initiate the dropdown, to then pull the titles.
But my click/scrape for the rows with these drop downs are not printing.
Expected output- Print all titles including the ones in dropdown.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome()
driver.get('https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list')
time.sleep(4)
page_source = driver.page_source
soup = BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='card item-container session')
for property in productlist:
sessiontitle=property.find('h4',class_='session-title card-title').text
print(sessiontitle)
try:
ifDropdown=driver.find_elements_by_class_name('item-expand-action expand')
ifDropdown.click()
time.sleep(4)
newTitle=driver.find_element_by_class_name('card-title').text
print(newTitle)
except:
newTitle='none'
There were a couple of issues. First, when you locate from the driver by class and there is more than one, you need to separate them by dots, not spaces, so that the driver knows it's dealing with another class.
Second, find_elements returns a list, and the list has no .click(), so you get an error, which your except catches but assumes means there was no link to click.
I rewrote it (without soup for now) so that it instead checks (With the dot replacing space) for a link to open within the session and then loops over the new ones that appeared.
Here is what I have and tested. Note at the end this only gets the sessions and subsessions in the view. You will need to add logic to scroll and get the rest.
# stuff to initialize driver is above here, I used firefox
# Open the website page
URL = "https://cslide.ctimeetingtech.com/esmo2021/attendee/confcal/session/list"
driver.get(URL)
time.sleep(4)#time for page to populate
product_list=driver.find_elements_by_css_selector('div.card.item-container.session')
#above line gets all top level sessions
for product in product_list:
session_title=product.find_element_by_css_selector('h4.card-title').text
print(session_title)
dropdowns=product.find_elements_by_class_name('item-expand-action.expand')
#above line finds dropdown within this session, if any
if len(dropdowns)==0:#nothing more for this session
continue#move to next session
#still here, click on the dropdown, using execute because link can overlap chevron
driver.execute_script("arguments[0].scrollIntoView(true); arguments[0].click();",
dropdowns[0])
time.sleep(4)#wait for subsessions to appear
session_titles=product.find_elements_by_css_selector('h4.card-title')
session_index = 0#suppress reprinting title of master session
for session_title in session_titles:
if session_index > 0:
print(" " + session_title.text)#indent for clarity
session_index = session_index + 1
#still to do, deal with other sessions that only get paged into view when you scroll
#that is a different question
I'm making a project which goes to my orders page on amazon and collects data like product name, price, delivery date using selenium (cuz there is no api for that, and cant do with bs4). I get login and get to orders page without any problem.But I'm stuck where i have to find the delivery date using find element by class( I chose class because all other delivery date text have same class), but selenium says it cannot find it.
No, its not in an iframe as i cant see the option for This Frame when i right click on that element.
here is the code -
import requests
from selenium import webdriver
import time
userid = #userid
passwd = #passwd
browser = webdriver.Chrome()
browser.get('https://www.amazon.in/gp/your-account/order-history?ref_=ya_d_c_yo')
email_input = browser.find_element_by_id('ap_email')
email_input.send_keys(userid)
email_input.submit()
passwd_input = browser.find_element_by_id('ap_password')
passwd_input.send_keys(passwd)
passwd_input.submit()
time.sleep(5)
date = browser.find_element_by_class_name('a-color-secondary value')
print(date.text)
Finding element by xpath seems to work, but fails to find the date for all orders as xpath is different for every element.
Any help is appreciated.
Thanks
Refers to this line:
date = browser.find_element_by_class_name('a-color-secondary value')
It seem like your element target having multiple class name, a-color-secondary and value. Sadly .find_element_by_class_name just for single class name.
Instead you can use .find_element_by_css_selector:
date = browser.find_element_by_css_selector('.a-color-secondary.value')
The below code works. It returns data in the default loaded table (making use of answer provided here: link, but how to access the other tables (that can be found by clicking on the 'Contracts' button and selecting from the menu a different contract, eg. Mar 2019)?
driver.get("http://www.cmegroup.com/tools-information/quikstrike/treasury-analytics.html")
# Need to include some more time here for data in iframe to load?
driver.implicitly_wait(3)
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
soup = BeautifulSoup(driver.page_source, 'html.parser')
CMEtreasuryAnalytics._table = soup.select('table.grid')[0]
I tried this but get the following error returned: NoSuchFrameException: Message: no such frame: element is not a frame
driver.get("http://www.cmegroup.com/tools-nformation/quikstrike/treasury-analytics.html")
cDate = 'Dec 2018'
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
elements = driver.find_elements_by_class_name("square-corners ")
options = [element.get_attribute("innerText") for element in elements]
if cDate in options:
element = elements[options.index(cDate)]
else:
pass
driver.switch_to.frame(element)
I've also tryed to 'click()' but couldn't get that to work either. I'm new to selenium and would appreciate some pointers on how to access the said data. I'm using python and chrome webdriver.
OK. I think I worked it out. The menu lies within the iFrame,
so after getting the element details, then need to click() the menu, then element.click(), then scrape the displayed data. The final code follows, but I don't know if it's the most straightforward way to approach it.
driver.get("http://www.cmegroup.com/tools-nformation/quikstrike/treasury-analytics.html")
cDate = 'Jun 2019'
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
elements = driver.find_elements_by_class_name("square-corners ")
options = [element.get_attribute("innerText") for element in elements]
if cDate in options:
element = elements[options.index(cDate)]
else:
pass
# Click the dropdown menu labelled 'Contracts'
driver.find_element_by_xpath('//*[#id="ctl00_MainContent_ucViewControl_IntegratedStrikeAsYield_ucContractPicker_ucTrigger_lnkTrigger"]').click()
driver.implicitly_wait(1)
element.click()
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
soup = BeautifulSoup(driver.page_source, 'html.parser')
CMEtreasuryAnalytics._table = soup.select('table.grid')[0]
Update:
The above worked for a while but then started failing with the below message. So maybe this is the right track but I need better way to select an option from the drop down list labelled 'Contracts'. How to do that?
Message: unknown error: Element is not clickable at point (511, 475). Other element would receive the click: <
I am trying to scrape a webpage that employs JS objects.
I am using Selenium in a Python environment; I use selenium to load what I want, that being the "VIEW SELECT TV PACKAGE DETAILS" text which launches a modal container.
In this container, there are package headings, with channels underneath them. I am trying to iterate over each heading, and grab the channel names within each.
This is the webpage
Here is my code which will help you navigate to the container I am trying to scrape:
from selenium import webdriver
import pandas as pd
url = "https://www.rogers.com/consumer/tv#/packages"
#create a new Chrome session
driver = webdriver.Chrome()
driver.implicitly_wait(5)
driver.get(url)
#change the province to Ontario
province_button = driver.find_element_by_class_name("dropdown-toggle")
province_button.click() #clicks dropdown
province_button = driver.find_element_by_link_text("Ontario")
province_button.click() #clicks dropdown
#visit TV portal page, re-init url again
driver.get(url)
#####BEGIN SCRAPING PACKAGE INFO#####
#open Select Package window
package_button = driver.find_element_by_class_name("Package-details")
package_button.click() #clicks dropdown
package_data = driver.find_elements_by_class_name("Package-channels")
the package_data var returns all my headings and channel names; but not indication of which strings were headings and which were channels. I know I could write some complex regex to do the trick but I'm hoping for a dynamic approach. Any advice is appreciated. Thanks!
******EDITED*******
Per comments below, below is code that takes WebElements into a variable instead of outputting to console:
select_package_data = []
headingsCount = len(driver.find_elements_by_xpath("//div[#class='modal-
content']//*[contains(#class,'Package-channels--heading ng-binding')]"))
for index in range(headingsCount):
head = driver.find_element_by_xpath("//div[#class='modal-content']//*
[contains(#class,'Package-channels--heading ng-binding')]
[index]".replace('index',str(index+1)))
select_package_data.append(head.text)
channelsPerheading = driver.find_elements_by_xpath("(//div[#class='modal-
content']//ul[#ng-if='vm.channels'])[index]/li[not
(contains(#class,'Package-channels--heading ng-
binding'))]".replace('index',str(index+1)))
temp_list=[]
for channel in channelsPerheading:
temp_list.append(channel.text.encode('utf-8'))
select_package_data.insert((index+1), temp_list[:])`
*********EDITED V2 PER COMMENTS:*********
Final code required adding a parenthesis in the xpath method; I believe this is due to the [index] appended to the end of the actual xpath when assigning it to a variable:
#get the count of headings in the modal contaier
headingsCount = len(driver.find_elements_by_xpath("//div[#class='modal-
content']//*[contains(#class,'Package-channels--heading ng-binding')]"))
#use this count as an iterator
for index in range(headingsCount):
#get the first heading - we use replace method bc xpath is not zero-indexed
head = driver.find_element_by_xpath("(//div[#class='modal-content']//*
[contains(#class,'Package-channels--heading ng-binding')])
[index]".replace('index',str(index+1)))
header_placeholder = head.text
##takes heading element as text to use for dataframe row index label
#goes to //ul tag in accordance with current index, finds all BUT the
#headings
channelsPerheading = driver.find_elements_by_xpath("(//div[#class='modal-
content']//ul[#ng-if='vm.channels'])[index]/li[not
(contains(#class,'Package-channels--heading ng-
binding'))]".replace('index',str(index+1)))
temp_list=[]
for channel in channelsPerheading: #append the channels as text to a temp
list
temp_list.append(channel.text.encode('utf-8'))
The simplest way to fetch all the headings and channels in the modal window is by using the below xpaths. Also, below xpaths are dynamic and not hardcoded. Even if new channels or headings are added in future, these xpaths will still work.
headings = driver.find_elements_by_xpath("//div[#class='modal-content']//*[contains(#class,'Package-channels--heading ng-binding')]")
print('all headings: '+str(len(headings)))
channels= driver.find_elements_by_xpath("//div[#class='modal-content']//a[contains(#class,'PackageChannelImage')]")
print('all channels: '+str(len(channels)))
Output:
all headings: 17
all channels: 243
You can use the below approach to fetch the channels per heading and print them.
headingsCount = len(driver.find_elements_by_xpath("//div[#class='modal-content']//*[contains(#class,'Package-channels--heading ng-binding')]"))
for index in range(headingsCount):
print('For heading: '+ driver.find_element_by_xpath("(//div[#class='modal-content']//*[contains(#class,'Package-channels--heading ng-binding')])[index]".replace('index',str(index+1))).text+', Channels are:')
channelsPerheading = driver.find_elements_by_xpath("(//div[#class='modal-content']//ul[#ng-if='vm.channels'])[index]/li[not (contains(#class,'Package-channels--heading ng-binding'))]".replace('index',str(index+1)))
for channel in channelsPerheading:
print(channel.text.encode('utf-8').strip())
I have pasted the output here
I have a problem scraping data from the following site: https://arcc.sdcounty.ca.gov/Pages/Assessors-Roll-Tax.aspx.
I have to do these steps in order:
Select a drop down option "Street Address'
Enter a street address into a text field (ie 43 Hadar Dr)
Click the 'Submit' button.
After clicking submit, I should be directed to a page that has the APN number for a given address.
The problem:
I am able to do the above steps. However, when I select a drop down option and input address in the textbox, it fails as the textbox input address for some reason is cleared before clicking 'submit' ONLY when I have selected a drop down option.
I have tried using Selenium's Expected Conditions to trigger the input in the text box after a drop down option has been selected, but did nothing. I am looking for any help on identifying the why there is this problem as well as any advice on solutions.
Thanks.Much appreciated.
My code:
driver = webdriver.Chrome()
driver.get('https://arcc.sdcounty.ca.gov/Pages/Assessors-Roll-Tax.aspx')
#Selects drop down option ('Street Address')
mySelect = Select(driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25 ea12_ctl00_ddlSearch"))
my=mySelect.select_by_value('0')
wait = WebDriverWait(driver,300)
#Enter address in text box to left of drop down
driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ct l00_txtSearch").send_keys("11493 hadar dr")
#Click 'Submit' button to return API numbers associated with address
driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ctl00_btnSearch").click()
driver.quit()
Just changed a few things in your code to make it work.
mySelect = Select(driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25 ea12_ctl00_ddlSearch"))
To find_element_by_name(...):
mySelect = Select(driver.find_element_by_name("ctl00$ctl43$g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12$ctl00$ddlSearch"))
And
my=mySelect.select_by_value('0')
To select_by_visible_text('...'):
my = mySelect.select_by_visible_text("Street Address")
And
driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ct l00_txtSearch").send_keys("11493 hadar dr")
To find_element_by_xpath(...), since I usually get better results when finding elements by xpath.
driver.find_element_by_xpath('//*[#id="ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ctl00_txtSearch"]').send_keys("11493 hadar dr")
This is how it all looks like:
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver = webdriver.Chrome()
driver.get('https://arcc.sdcounty.ca.gov/Pages/Assessors-Roll-Tax.aspx')
#Selects drop down option ('Street Address')
mySelect = Select(driver.find_element_by_name("ctl00$ctl43$g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12$ctl00$ddlSearch"))
my = mySelect.select_by_visible_text("Street Address")
wait = WebDriverWait(driver,300)
#Enter address in text box to left of drop down
driver.find_element_by_xpath('//*[#id="ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ctl00_txtSearch"]').send_keys("11493 hadar dr")
#Click 'Submit' button to return API numbers associated with address
driver.find_element_by_id("ctl00_ctl43_g_d30f33ca_a5a7_4f69_bb21_cd4abc25ea12_ctl00_btnSearch").click()
driver.quit()
Not sure if this is your situation. But one thing that jumped out from your question is the text box input... Often, when filling in a website text box, even though the text is clearly visible, the text is not actually read by the text-box method until after the focus (cursor) is clicked or tabbed out and away from the text box.
Tabbing the text cursor out of the text entry box first, before 'clicking submit', will often solve this issue.