I am running a loop that performs a search and grabs an element. The element on each search page appears to have the same CSS selector. However, it always prints the element associated with one search, the search I first began testing the script with. Not sure if this a CSS selector issue? Or a cookie issue perhaps?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
EXE_PATH = r'C:\\geckodriver.exe'
tickers = ["bitcoin", "ethereum", "litecoin"]
for t in tickers:
with webdriver.Firefox(executable_path = EXE_PATH) as driver:
wait = WebDriverWait(driver, 10)
driver.get("https://coingecko.com/en")
driver.find_element_by_css_selector(".px-2").send_keys(t + Keys.RETURN)
first_result = wait.until(presence_of_element_located((By.CSS_SELECTOR, "div.text-3xl > span:nth-child(1)")))
price = first_result.get_attribute("innerHTML")
print(price)
I found the root cause of that issue my friend. The thing is that when you are sending the ticker in search field, it is taking some time to load the options as search field is an auto suggestive dropdown. But as per your script as soon as you are sending the ticker to the search field, you are hitting the enter button and the thing which is happening in the background is it is selecting bitcoin because if you will see in the trending bitcoin is at rank 1 and because of the lack of delay in between sending the ticker and hitting the enter, it is selecting bitcoin by default. I have modified a script you can view it below. If you don't want to use sleep then add web driver wait and wait for the desired option to be displayed in the search field drop down. Hope so that helps you. Please mark it is as accepted if you are happy with my answer.
tickers = ["bitcoin", "ethereum", "litecoin"]
for t in tickers:
with webdriver.Chrome(executable_path = EXE_PATH) as driver:
wait = WebDriverWait(driver, 10)
driver.get("https://coingecko.com/en")
driver.find_element_by_css_selector(".px-2").send_keys(t)
time.sleep(5)
driver.find_element_by_css_selector(".px-2").send_keys(Keys.RETURN);
first_result = wait.until(presence_of_element_located((By.CSS_SELECTOR, "div.text-3xl > span:nth-child(1)")))
price = first_result.get_attribute("innerHTML")
print(pric
Related
I'm trying to pull the airline names and prices of a specific flight. I'm having trouble with the x.path and/or using the right html tags because when I run the code below, all I get back is 14 empty lists.
from selenium import webdriver
from lxml import html
from time import sleep
driver = webdriver.Chrome(r"C:\Users\14074\Python\chromedriver")
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
sleep(1)
tree = html.fromstring(driver.page_source)
for flight_tree in tree.xpath('//div[#class="TQqf0e sSHqwe tPgKwe ogfYpf"]'):
title = flight_tree.xpath('.//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[2]/div/div[2]/div[6]/div/div[2]/div/div[1]/div/div[1]/div/div[2]/div[2]/div[2]/span/text()')
price = flight_tree.xpath('.//span[contains(#data-gs, "CjR")]')
print(title, price)
#driver.close()
This is just the first part of my code but I can't really continue without getting this to work. If anyone has some ideas on what I'm doing wrong that would be amazing! It's been driving me crazy. Thank you!
I noticed a few issues with your code. First of all, I believe that when entering this page, first google will show you the "I agree to terms and conditions" popup before showing you the content of the page, therefore you need to first click on that button.
Also, you should use the find_elements_by_xpath function directly on driver instead of using the page content, as this also allows you to render the javascript content. You can find more info here: python tree.xpath return empty list
To get more info on how to scrape using selenium and python you could check out this guide: https://www.webscrapingapi.com/python-selenium-web-scraper/
I used the following code to scrape the titles. (I also changed the xpaths to do so, by extracting them directly from google chrome. You can do that by right clicking on an element -> inspect and in the elements tab where the element is, you can right click -> copy -> Copy xpath)
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# I used these for the code to work on my windows subsystem linux
option = webdriver.ChromeOptions()
option.add_argument('--no-sandbox')
option.add_argument('--disable-dev-sh-usage')
driver = webdriver.Chrome(ChromeDriverManager().install(), options=option)
URL = 'https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA'
driver.get(URL)
driver.find_element_by_xpath('//*[#id="yDmH0d"]/c-wiz/div/div/div/div[2]/div[1]/div[4]/form/div[1]/div/button/span').click() # this is necessary to pres the I agree button
elements = driver.find_elements_by_xpath('//*[#id="yDmH0d"]/c-wiz[2]/div/div[2]/div/c-wiz/div/c-wiz/div[2]/div[3]/div[3]/c-wiz/div/div[2]/div[1]/div/div/ol/li')
for flight_tree in elements:
title = flight_tree.find_element_by_xpath('.//*[#class="W6bZuc YMlIz"]').text
print(title)
I tried the below code, with screen maximized and having explicit waits and could successfully extract the information, please see below :
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.get("https://www.google.com/travel/flights/searchtfs=CBwQAhopagwIAxIIL20vMHBseTASCjIwMjEtMTItMjNyDQgDEgkvbS8wMWYwOHIaKWoNCAMSCS9tLzAxZjA4chIKMjAyMS0xMi0yN3IMCAMSCC9tLzBwbHkwcAGCAQsI____________AUABSAGYAQE&tfu=EgYIAhAAGAA")
wait = WebDriverWait(driver, 10)
titles = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div/descendant::h3")))
for name in titles:
print(name.text)
price = name.find_element(By.XPATH, "./../following-sibling::div/descendant::span[2]").text
print(price)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
Tokyo
₹38,473
Mumbai
₹3,515
Dubai
₹15,846
I am trying to scrape the headers of wikipedia pages as an exercise, and i want to be able to distinguish between headers with "h2" and "h3" tags.
Therefore i wrote this code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
h1Header = driver.find_element_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_element_by_class_name("mw-headline")
driver.quit()
However, this does not work and the program does not terminate.
Anybody have a solution for this?
The problem here lies in the for loop! Apparently i can not scrape any elements by class name (or anything else) from the elements in h2HeaderTexts, although this should be possible.
You can filter in xpath iteself :
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
driver.maximize_window()
driver.implicitly_wait(30)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title)
for item in driver.find_elements(By.XPATH, "//h2/span[#class='mw-headline']"):
print(item.text)
this should give you, h2 heading with class mw-headline elements.
output :
Informelle Beschreibung
Der Algorithmus
Implementierung
Optimierungen
Vergleich von Minimax und AlphaBeta
Geschichte
Literatur
Weblinks
Fußnoten
Process finished with exit code 0
Update 1 :
The reason why your loop is still running and program does not terminate, is cause if you look the page HTML source, and the first h2 tag, that h2 tag does not have a child span with mw-headline, so selenium is trying to locate the element which is not there in HTML DOM. also you are using find_elements which return a list of web elements if found, if not return an empty list, is the reason you do not see exception as well.
You should wait until elements appearing on the page before accessing them.
Also, there are several elements with h1 tag name too there.
To search for elements inside element you should use xpath starting with a dot. Otherwise this will search for the first match on the entire page.
The first h2 element on that page has no element with class name mw-headline inside it. So, you should handle this issue too.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC #For being able to input key presses
import time #Useful for if your browser is faster than your code
PATH = r"C:\Users\Alireza\Desktop\chromedriver\chromedriver.exe" #Location of the chromedriver
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 20)
driver.get("https://de.wikipedia.org/wiki/Alpha-Beta-Suche") #Open website in Chrome
print(driver.title) #Print title of the website to console
wait.until(EC.visibility_of_element_located((By.XPATH, "//h1")))
h1Headers = driver.find_elements_by_tag_name("h1") #Find the first heading in the article
h2HeaderTexts = driver.find_elements_by_tag_name("h2") #List of all other major headers in the article
h3HeaderTexts = driver.find_elements_by_tag_name("h3") #List of all minor headers in the article
for items in h2HeaderTexts:
scor = items.find_elements_by_xpath(".//span[#class='mw-headline']")
if scor:
#do what you need with scor[0] element
driver.quit()
You're version does not finish executing because selenium will drop the process if it could not locate an element.
Devs do not like to use try/catch but i personnaly have not found a better way to work around. If you do :
for items in h2HeaderTexts:
try:
scor = items.find_element_by_class_name('mw-headline').text
print(scor)
except:
print('nothing found')
You will notice that it will execute till the end and you have a result.
Page which i need to scrape data from: Digikey Search result
Issue
It is allowed to show only 100 row in each table, so i have to move between multiple tables using the NextPageButton.
As illustrated in the code below, I actually do though, but the results retrieves to me every time the first table results and doesn't move on to the next table results on my click action ActionChains(driver).click(element).perform().
Keep in mind that NO new pages is opened, click is going to be intercepted by some sort of JavaScript to do some rich UI stuff on the same page to load a new table of data
My Expectations
I am just trying to validate that I could move to the next table, then i will edit the code to loop through all of them.
This piece of code should return the data in the second table from results, BUT it actually returns the values from the first table which loaded initially with the URL. This means that the click action didn't occur or it actually occurred but the WebDriver driver content isn't being updated by interacting with dynamic JavaScript elements in the page.
I will appreciate any help, Thanks..
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver import ActionChains
import time
import sys
url = "https://www.digikey.com/en/products/filter/coaxial-connectors-rf-terminators/382?s=N4IgrCBcoA5QjAGhDOl4AYMF9tA"
chrome_driver_path = "..PATH\\chromedriver"
chrome_options = Options()
chrome_options.add_argument ("--headless")
webdriver = webdriver.Chrome(
executable_path= chrome_driver_path
,options= chrome_options
)
with webdriver as driver:
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(presence_of_element_located((By.CSS_SELECTOR, "tbody")))
element = driver.find_element_by_css_selector("button[data-testid='btn-next-page']")
ActionChains(driver).click(element).perform()
time.sleep(10) #too much time i know, but to make sure it is not a waiting issue. something needs to be updated
results = driver.find_elements_by_css_selector("tbody")
for count in results:
countArr = count.text
print(countArr)
print()
driver.close()
Finally found a SOLUTION !
Source of the solution.
As expected the issue was in the clicking action itself. It is somehow not being done right or it's not being done at all as illustrated in the solution Source question.
the solution is to click the button using Javascript execution.
Change line 30
ActionChains(driver).click(element).perform()
to be as following:
driver.execute_script("arguments[0].click();",element)
That's it..
I would like to extract the "First Published" dates using selenium on Python, yet I am facing problems trying to get any visible date results even though I'm getting successful results when looking up the xpath through the browser's inspection tab, I do get successful length results of the elements, yet no text results of the dates on my console.
My Code:
import time
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\\Users\\MyComputer\\PycharmProjects\\SeleniumProject\\venv\\Lib\site-packages\\selenium\\webdriver\\common\\geckodriver.exe')
driver.get('https://tools.cisco.com/security/center/publicationListing.x?resourceIDs=93036,5834,80720&apply=1,1,1&totalbox=3&pt0=Cisco&cp0=93036&pt1=Cisco&cp1=5834&pt2=Cisco&cp2=80720#~FilterByProduct')
time.sleep(20)
prices = driver.find_elements_by_xpath('//span[#class="ng-binding" and contains(text(),"GMT")]')
for post in prices:
if post.text != "":
print(post.text)
print(len(prices))
driver.close()
I have tried other visible xpaths on the website to test on python and I can get the 20 vulnerability titles that show up on screen as seen when you open the link, so I am assuming that I have to tell selenium to click every link and extract the date and do that for every title ? But then how am I able to get them all in one go through the browser inspection tab ?
All help is appreciated,
Thanks,
Selenium can retrieve only visible text. If you don't won't to open all the hidden sections you can use get_attribute('textContent') or get_attribute('innerText')
for post in prices:
print(post.get_attribute('innerText'))
Please find a solution if you want to fetch first publish date
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.maximize_window()
driver.get("https://tools.cisco.com/security/center/publicationListing.x?resourceIDs=93036,5834,80720&apply=1,1,1&totalbox=3&pt0=Cisco&cp0=93036&pt1=Cisco&cp1=5834&pt2=Cisco&cp2=80720#~FilterByProduct")
print driver.current_url
element = WebDriverWait(driver, 15).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='tab-2']//tr[1]//td[1]//table[1]//tbody[1]//tr[1]/td[4]/span")))
print element.text
If you want to print all dates
dates = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[#id='tab-2']//tr[*]//td[1]//table[1]//tbody[1]//tr[*]/td[4]/span")))
print(len(dates))
for date in dates:
print date.text
I'm trying to automate the search process in this website: https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria
The process involves clicking on the "Continue" button to search under the 'guest' mode. The next page has got a list of drop-down items to refine the search criteria. My code either produces the "Element not visible" exception (which I corrected by using a wait) or times out. Please help.
Here's my code:
# navigate to the desired page
driver.get("https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria")
# get the guest button
btnGuest = driver.find_element_by_id("btnGuestContinue")
#click the guest button
btnGuest.click()
wait = WebDriverWait(driver,10)
#Find a Doctor Search Criteria page
element = wait.until(EC.visibility_of_element_located((By.ID,"ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance")))
lstGetInsurance = Select(element)
lstGetInsurance.select_by_value("BuyMyself$14States")
# close the browser window
#driver.quit()
you can use input search and key.return:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
divID = 'ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance_chosen'
inputID = 'ctl00_MainContent_maincontent_PFPlanQuestionnaire_ddlQuestionnaireInsurance_chosen_input'
inputValue = 'I buy it myself (or plan to buy it myself)'
driver = webdriver.Chrome()
driver.get("https://www.bcbsga.com/health-insurance/provider-directory/searchcriteria")
driver.find_element_by_id("btnGuestContinue").click()
driver.implicitly_wait(10)
driver.find_element_by_id(divID).click()
driver.find_element_by_id(inputID).send_keys(inputValue)
driver.find_element_by_id(inputID).send_keys(Keys.RETURN)
time.sleep(6)
driver.close()