Selenium webdriver finds elements from the previous page, not the current - python

I am trying to automate a YouTube search, click on a search result, and then follow the recommended videos on the right-hand side. The code I wrote goes to youtube, makes the search, and clicks on a video and the video gets opened up on the browser. However, I cannot bring it to click on one of the recommended videos.
The problem seems to be that, when I use recommended_videos = driver.find_elements_by_id("video-title") to get a list of elements of recommended videos from the right, what I get instead is a list from the previous page (when I first type in the word and get search results).
The code does work properly when I go directly to a video link with driver.get(url), instead of doing a search first.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import random
seed = 1
random.seed(seed)
driver = webdriver.Firefox()
driver.get("http://www.youtube.com/")
element = driver.find_element_by_tag_name("input")
# Put the word "history" in the search box and hit enter
element.send_keys("history")
element.send_keys(Keys.RETURN)
time.sleep(5)
# Get a list of elements (videos) that get returned by the search
search_results = driver.find_elements_by_id("video-title")
# Click randomly on one of the first five results
search_results[random.randint(0,4)].click()
# Go to the end of the page (I don't know if this is necessary
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(10)
# Get the recommended videos the same way as above. This is where the problem starts, because recommended_videos essentially becomes the same thing as the previous page's search_results, even though the browser is in a new page now.
recommended_videos = driver.find_elements_by_id("video-title")
recommended_videos[random.randint(0,4)].click()
So when I click (last line), I get the error
ElementNotInteractableException: Element <a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" href="/watch?v=1oean5l__Cc"> could not be scrolled into view

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import random
seed = 1
random.seed(seed)
driver = webdriver.Chrome()
driver.get("https://www.youtube.com")
element = driver.find_element_by_tag_name("input")
# Put the word "history" in the search box and hit enter
element.send_keys("history")
element.send_keys(Keys.RETURN)
time.sleep(5)
# Get a list of elements (videos) that get returned by the search
search_results = driver.find_elements_by_id("video-title")
# Click randomly on one of the first five results
search_results[random.randint(0,10)].click()
# Go to the end of the page (I don't know if this is necessary
#
time.sleep(4)
# Get the recommended videos the same way as above. This is where the problem starts, because recommended_videos essentially becomes the same thing as the previous page's search_results, even though the browser is in a new page now.
while True:
recommended_videos = driver.find_elements_by_xpath("//*[#id='dismissable']/div/a")
print(recommended_videos)
recommended_videos[random.randint(1,4)].click()
time.sleep(4)
i am also completely new and thank you for giving me interest in selenium. may this code helps you. change the driver to firefox if you want.

Related

Selenium not able to find particular elements on slow loading page

I am attempting to scrape the website basketball-reference and am running into an issue I can't seem to solve. I am trying to grab the box score element for each game played. This is something I was able to easily do with urlopen but b/c other portions of the site require Selenium I thought I would rewrite the entire process with Selenium
Issue seems to be that even if I wait to scrape until I to see the first element load using WebDriverWait, when I then move forward to grabbing the elements I get nothing returned.
One thing I found interesting is if I did a full site print using my results from urlopen w/ something like print (uClient.read()) I would get roughly 300 more lines of html after beautifying compared to doing the same with print (driver.page_source). Even if I put an ImplicitlyWait set for 5 minutes.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.wait = WebDriverWait(driver, 10)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="content"]/div[3]/div[1]')))
box = driver.find_elements_by_class_name('game_summary expanded nohover')
print (box)
driver.quit()
Try the below code, it is working in my computer. Do let me know if you still face problem.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.wait = WebDriverWait(driver, 60)
driver.get('https://www.basketball-reference.com/boxscores/')
driver.wait.until(EC.presence_of_element_located((By.XPATH, '//*[#id="content"]/div[3]/div[1]')))
boxes = driver.wait.until(
EC.presence_of_all_elements_located((By.XPATH, "//div[#class=\"game_summary expanded nohover\"]")))
print("Number of Elements Located : ", len(boxes))
for box in boxes:
print(box.text)
print("-----------")
driver.quit()
If it resolves your problem then please mark it as answer. Thanks
Actually the site doesn't require selenium at all. All the data is there through a simple requests (it's just in the Comments of the html, would just need to parse that). Secondly, you can grab the box scores quite easily with pandas
import pandas as pd
dfs = pd.read_html('https://www.basketball-reference.com/boxscores/')
for idx, table in enumerate(dfs[:-2]):
print (table)
if (idx+1)%3 == 0:
print("-----------")

How to Data Scrape from multiple pages

import os
from webdriver_manager.chrome import ChromeDriverManager
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--start-maximized')
options.page_load_strategy = 'eager'
driver = webdriver.Chrome(options=options)
url = "https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24#MS24"
driver.get(url)
wait = WebDriverWait(driver, 20)
I want to find the value of cash EPS (standalone as well as consolidated) but the main problem is, only 5 values are on the page and other values are retrieved with the arrow button till it ends.
How to retrieve such values in one go?
Taking my comment further to the code.
Comment:
this is a paging element, it's getting href as "javascript:void();" once click are over paging count. If data is still there its has a paging # number there(refer 4 in this case). moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/…. So any one condition can be used for the exit!
comment in code refers to the suggestion.
df_list=pd.read_html(driver.page_source) # read the table through pandas
result=df_list[0] #load the result, which will be eventually appended for next pages.
current_page=driver.find_element_by_class_name('nextpaging') # find elment of span
while True:
current_page.click()
time.sleep(20) # sleep for 20
current_page=driver.find_element_by_class_name('nextpaging')
paging_link = current_page.find_element_by_xpath('..') # get the parent of this span which has the href
print(f"Currentl url : { driver.current_url } Next paging link : { paging_link.get_attribute('href')} ")
if "void" in paging_link.get_attribute('href'):
print(f"Time to exit {paging_link.get_attribute('href')}")
break # exit rule
df_list=pd.read_html(driver.page_source)
result=result.append(df_list[0]) # append the result
Based on looking at the URL while navigating through this sight
https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/1#MS24
It appears the arrows navigate to a new URL, incrementing a number in the URL in front of the # symbol.
so, navigating through pages looks like this:
Page1: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/1#MS24
Page2: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/2#MS24
Page3: https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/3#MS24
etc...
these separate urls can be used to navigate through this particular website. Probably this would work
def get_pg_url(pgnum):
return 'https://www.moneycontrol.com/financials/marutisuzukiindia/ratiosVI/MS24/{}#MS24'.format(pgnum)
web scraping requires tuning to fit the target sight. I entered pgnum=10000, which resulted in the text Data Not Available for Key Financial Ratios being displayed. You can probably us this text to tell you when there are no remaining pages.

Dynamic element (Table) in page is not updated when i use Click() in selenium, so i couldn't retrive the new data

Page which i need to scrape data from: Digikey Search result
Issue
It is allowed to show only 100 row in each table, so i have to move between multiple tables using the NextPageButton.
As illustrated in the code below, I actually do though, but the results retrieves to me every time the first table results and doesn't move on to the next table results on my click action ActionChains(driver).click(element).perform().
Keep in mind that NO new pages is opened, click is going to be intercepted by some sort of JavaScript to do some rich UI stuff on the same page to load a new table of data
My Expectations
I am just trying to validate that I could move to the next table, then i will edit the code to loop through all of them.
This piece of code should return the data in the second table from results, BUT it actually returns the values from the first table which loaded initially with the URL. This means that the click action didn't occur or it actually occurred but the WebDriver driver content isn't being updated by interacting with dynamic JavaScript elements in the page.
I will appreciate any help, Thanks..
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver import ActionChains
import time
import sys
url = "https://www.digikey.com/en/products/filter/coaxial-connectors-rf-terminators/382?s=N4IgrCBcoA5QjAGhDOl4AYMF9tA"
chrome_driver_path = "..PATH\\chromedriver"
chrome_options = Options()
chrome_options.add_argument ("--headless")
webdriver = webdriver.Chrome(
executable_path= chrome_driver_path
,options= chrome_options
)
with webdriver as driver:
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(presence_of_element_located((By.CSS_SELECTOR, "tbody")))
element = driver.find_element_by_css_selector("button[data-testid='btn-next-page']")
ActionChains(driver).click(element).perform()
time.sleep(10) #too much time i know, but to make sure it is not a waiting issue. something needs to be updated
results = driver.find_elements_by_css_selector("tbody")
for count in results:
countArr = count.text
print(countArr)
print()
driver.close()
Finally found a SOLUTION !
Source of the solution.
As expected the issue was in the clicking action itself. It is somehow not being done right or it's not being done at all as illustrated in the solution Source question.
the solution is to click the button using Javascript execution.
Change line 30
ActionChains(driver).click(element).perform()
to be as following:
driver.execute_script("arguments[0].click();",element)
That's it..

Using selenium and python to extract data when it pops up after mouse hover

Hi guys this is my first question. I am trying to extract data from a website. But the problem is, it only appears when I hover my mouse over it. the website to the data is http://insideairbnb.com/melbourne/ . I want to extract the occupancy rate for each listing from the panel that pops up when I hover my mouse pointer over the points on the map. I am trying to use #frianH code from this stackoverflow post Scrape website with dynamic mouseover event. I am a newbie in data extraction using selenium. I have knowledge about bs4 package. I havent been successful in finding the right xpath to complete the task. Thank you in advance. my code so far is
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='C:\\Users\\Kunal\\chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()
#wait all circle
elements = WebDriverWait(browser, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//*[#id="map"]/div[1]/div[2]/div[2]/svg')))
table = browser.find_element_by_class_name('leaflet-zoom-animated')
#move perform -> to table
browser.execute_script("arguments[0].scrollIntoView(true);", table)
data = []
for circle in elements:
#move perform -> to each circle
ActionChains(browser).move_to_element(circle).perform()
# wait change mouseover effect
mouseover = WebDriverWait(browser, 30).until(EC.visibility_of_element_located((By.XPATH, '//*[#id="neighbourhoodBoundaries"]')))
data.append(mouseover.text)
print(data[0])
thanks in adnvace
So I checked out the page a bunch and it seems quite resistant to selenium's own methods, so we'll have to rely on javascript. Here's the full code-
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
browser = webdriver.Chrome(options=chrome_options, executable_path='chromedriver.exe')
browser.get('http://insideairbnb.com/melbourne/')
browser.maximize_window()
# Set up a 30 seconds webdriver wait
explicit_wait30 = WebDriverWait(browser, 30)
try:
# Wait for all circles to load
circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
except TimeoutException:
browser.refresh()
data = []
for circle in circles:
# Execute mouseover on the element
browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
# Wait for the data to appear
listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
# listing now contains the full element list - you can parse this yourself and add the necessary data to `data`
.......
# Close the listing
browser.execute_script("arguments[0].click()", listing.find_element_by_tag_name('button'))
I'm also using css selectors instead of XPATH. Here's how the flow works-
circles = explicit_wait30.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'svg.leaflet-zoom-animated > g:nth-child(2) > circle')))
This waits until all the circles are present and extracts them into circles.
Do keep in mind that the page is very slow to load the circles, so I've set up a try/except block to refresh the page automatically if it doesn't load within 30 seconds. Feel free to change this however you want
Now we have to loop through all the circles-
for circle in circles:
Next is simulating a mouseover event on the circle, we'll be using javascript to do this
This is what the javascript will look like (note that circle refers to the element we will pass from selenium)
const mouseoverEvent = new Event('mouseover');
circle.dispatchEvent(mouseoverEvent)
This is how the script is executed through selenium-
browser.execute_script("const mouseoverEvent = new Event('mouseover');arguments[0].dispatchEvent(mouseoverEvent)", circle)
Now we have to wait for the listing to appear-
listing = explicit_wait30.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '#listingHover')))
Now, you've listing which is an element that also contains many other elements, you can now extract each element however you want pretty easily and store them inside data.
If you do not care about extracting each element differently, simply doing .text on listing will result in something like this-
'Tanya\n(No other listings)\n23127829\nSerene room for a single person or a couple.\nGreater Dandenong\nPrivate room\n$37 income/month (est.)\n$46 /night\n4 night minimum\n10 nights/year (est.)\n2.7% occupancy rate (est.)\n0.1 reviews/month\n1 reviews\nlast: 20/02/2018\nLOW availability\n0 days/year (0%)\nclick listing on map to "pin" details'
That's it, then you can append the result into data and you're done!

Scraper doesn't stop clicking on the next page button

I've written a script in python in combination with selenium to get some names and corresponding addresses displayed upon a search and the search keyword is "Saskatoon". However, the data, in this case, traverse multiple pages. My script almost does everything except for one thing.
It still runs even though there are no more pages to traverse. The last page also holds ">" sign for next page option and is not grayed out.
Here is the link: Page_link
Search_keyword: Saskatoon (in the city/town field).
Here is what I've written:
from selenium import webdriver; import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("above_link")
time.sleep(3)
search_input = driver.find_element_by_id("cityField")
search_input.clear()
search_input.send_keys("Saskatoon")
search_input.send_keys(Keys.ENTER)
while True:
try:
wait.until(EC.visibility_of_element_located((By.LINK_TEXT, "›"))).click()
time.sleep(2)
except:
break
driver.quit()
BTW, I've just taken out the name and address part form this script which I suppose is not relevant here. Thanks.
You can use class attribute of > button as on last page it is "ng-scope disabled" while on rest pages - "ng-scope":
wait.until(EC.visibility_of_element_located((By.XPATH, "//li[#class='ng-scope']/a[.='›']"))).click()

Categories

Resources