Parsing a dynamically loaded webpage with Selenium - python

I'm trying to parse https://www.flashscore.com/football/albania/ using Selenium in Python, but my webdriver often doesn't wait for the scores to finish loading.
Here's the code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get("https://www.flashscore.com/football/albania/")
try:
WebDriverWait(driver, 100).until(
lambda s: s.execute_script("return jQuery.active == 0"))
print(driver.page_source)
finally:
driver.quit()
Occasionally, this will print out source code for a flashscore page with a blank table (i.e. the driver does not wait for the scores to finish loading). I suspect that this is because some of the live scores on the page are dynamically loaded. Is there any way to improve my wait condition?

There's an accept cookies button, so we have to click on that first.
I am using Explicit waits, first presence of table and then visibility of it's main body.
Code :
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
driver.get("https://www.flashscore.com/football/albania/")
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
try:
wait.until(EC.presence_of_element_located((By.ID, "live-table")))
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "section.event")))
print(driver.page_source)
finally:
driver.quit()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output is certainly to long, so I would be able to post it here because stackoverflow won't allow me to do so.

Related

find the URL after button click from the website using selenium python

Every Button of the website may contain the link, for the below website how to find out URL appears in next tab.
wants to print and scrape the URL after the button click
am using firefox web driver
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabelâ„¢')]").click()
This should be easy, just use driver.current_url. So with your code you could try
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
driver.find_element_by_xpath("//span[contains(text(),'Ingredients')]").click()
time.sleep(3)
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabelâ„¢')]").click()
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
I saw few problems:
1 Waits. Get rid of time.sleep(). Replace it with explicit/implicit waits. I observed that these elements are the last that are loaded on the page: picture[class='loaded']. So, I added wait for them.
2 To switch between tabs use: driver.switch_to.window(driver.window_handles[1]), driver.switch_to.window(driver.window_handles[0]) - to switch to initial tab.
Solution for Chrome
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
# driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
Output:
https://smartlabel.unileverusa.com/011111375512-0001-en-US/index.html
https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html
For Firefox you'll need to wait for at least one element on the second page, otherwise the output will not give you expected link:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://www.dove.com/us/en/skin-care/body-lotion/cream-oil-intensive-body-lotion.html")
wait = WebDriverWait(driver, 30)
wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "picture[class='loaded']")))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".collapsed>a[title='Ingredients']")))
driver.find_element_by_css_selector(".collapsed>a[title='Ingredients']").click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(text(),'Go to SmartLabel')]")))
driver.find_element_by_xpath("//button[contains(text(),'Go to SmartLabel')]").click()
driver.switch_to.window(driver.window_handles[1])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".container-fluid.content-section")))
print(driver.current_url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
print(driver.current_url)
P.S. If you are looking for a way to find links by attribute names, there is no way because this button does not have such. The link is generated.

Script fails to keep clicking on load more button

I've written a script in Python in association with selenium to keep clicking on MORE button to load more items until there are no new items left to load from a webpage. However, my below script can click once on that MORE button available on the bottom of that page.
Link to that site
This is my try so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=Mobile+App&locations[]=1688-United+States"
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(link)
while True:
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".results .name a.startup-link"))):
print(elems.get_attribute("href"))
try:
loadmore = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"[class='more']")))
driver.execute_script("arguments[0].scrollIntoView();", loadmore)
loadmore.click()
except Exception:break
driver.quit()
How can I keep clicking on that MORE button until there are no such button left to click and parse the links as I've already tried using for loop.
I've managed to solve the problem pursuing sir Andersson's logic within my exising script. This is what the modified script look like.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=Mobile+App&locations[]=1688-United+States"
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(link)
while True:
try:
loadmore = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"[class='more']")))
driver.execute_script("arguments[0].click();", loadmore)
wait.until(EC.staleness_of(loadmore))
except Exception:break
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".results .name a.startup-link"))):
print(elems.get_attribute("href"))
driver.quit()
why not just?
while (driver.FindElements(By.ClassName("more")).Count > 0)
{
driver.FindElement(By.ClassName("more")).Click();
//Some delay to wait lazyload to complete
}
c# example. pretty sure that it can be done with python as well

cannot search the tags from python but can view it at chrome

cannot search the following tags inside this URL
class="iw_component" id="c1417094965155"
i can view it from my desktop chrome browser, but cannot read it when execute the following python script
import time
from selenium import webdriver
from pyvirtualdisplay import Display
display=Display(visible=0,size=(800,800))
display.start()
driver=webdriver.Firefox()
driver.get('url')
time.sleep(5)
title=driver.page_source
print title
driver.close()
display.stop()
You can use the class name to locate elements using find_elements_by_class_name:
divs = driver.find_elements_by_class_name("iw_component")
for div in divs: # use a descriptive variable name
html_id = div.get_attribute("id")
...
Also, instead of time.sleep(5) to simulate/delay python waiting until all the elements are loaded, Explicit Waits can be used to wait for specific elements:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
For you, the part for the locator strategy would be:
presence_of_element_located((By.CLASS_NAME, "iw_component"))

Python Selenium Xpath from firebug not found

I am trying to login to the ESPN website using selenium. Here is my code thus far
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.maximize_window()
url = "http://www.espn.com/fantasy/"
driver.get(url)
login_button = driver.find_element_by_xpath("/html/body/div[6]/section/section/div/section[1]/div/div[1]/div[2]/a[2]")
login_button.click()
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[2]/div/div/section/section/form/section/div[1]/div/label/span[2]/input")))
except:
driver.quit()
Basically, there are 2 steps, first I have to click the login button and then I have to fill in the form. Currently, I am clicking the login button and the form is popping up but then I can't find the form. I have been using firebug to get the xpath as suggested in other SO questions. I don't really know much about selenium so I am not sure where to look
Try to use
driver.switch_to_frame('disneyid-iframe')
# handle authorization pop-up
driver.switch_to_default_content() # if required
This works for me, switching to the iframe first. Note that you will need to switch back out of the iframe after entering the credentials.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.maximize_window()
url = "http://www.espn.com/fantasy/"
driver.get(url)
login_button = driver.find_element_by_xpath("/html/body/div[6]/section/section/div/section[1]/div/div[1]/div[2]/a[2]")
login_button.click()
iframe = driver.find_element_by_id("disneyid-iframe")
driver.switch_to.frame(iframe)
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "/html/body/div[2]/div/div/section/section/form/section/div[1]/div/label/span[2]/input")))
element.send_keys("my username")
import time
time.sleep(100)
finally:
driver.quit()

Python Selenium - Can click element directly when it was present no need waiting for page load complete?

Current scenario is some page needs more time to loading. I want click an element when it was present, But no need wait for whole page load complete. How to fix this problem? Can provide some demo or example?
Thanks a lot.
The driver.get() call is blocking. There is a beta feature in the Firefox driver that could work combined with explicit waits:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
profile = webdriver.FirefoxProfile()
profile.set_preference("webdriver.load.strategy", "unstable")
profile.update_preferences()
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.ID,'someid'))
)
finally:
driver.quit()

Categories

Resources