How to scrape some links from a website using selenium - python

I've been trying to parse the links ended with 20012019.csv from a webpage using the below script but the thing is I'm always having timeout exception error. It occurred to me that I did things in the right way.
However, any insight as to where I'm going wrong will be highly appreciated.
My attempt so far:
from selenium import webdriver
url = 'https://promo.betfair.com/betfairsp/prices'
def get_info(driver,link):
driver.get(link)
for item in driver.find_elements_by_css_selector("a[href$='20012019.csv']"):
print(item.get_attribute("href"))
if __name__ == '__main__':
driver = webdriver.Chrome()
try:
get_info(driver,url)
finally:
driver.quit()

Your code is fine (tried it and it works), the reason you get a timeout is because the default timeout is 60s according to this answer and the page is huge.
Add this to your code before making the get request (to wait 180s before timeout):
driver.set_page_load_timeout(180)

You were close. You have to induce WebDriverWait for the the visibility of all elements located and you need to change the line:
for item in driver.find_elements_by_css_selector("a[href$='20012019.csv']"):
to:
for item in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a[href$='20012019.csv']"))):
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

Webdriver selenium - can't find any element from TradingView

I'm new to using Selenium but I watched enough videos and followed enough articles to know something is missing. I'm trying to get values from TradingView but the problem I'm running into is that I simply can't find any of the elements, not by Xpath or Css. I went ahead and tried to do a simple visibility element test as shown in the code below and to my surprise it times out.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
# Stops the UI interface (chrome browser) from popping up
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(executable_path='c:\se\chromedriver.exe', options=chrome_options)
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
driver.get("https://www.tradingview.com/chart/")
timeout = 20
try:
WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.XPATH, "/html/body/div[1]")))
print("Page loaded")
except TimeoutException:
print("Timed out waiting for page to load")
driver.quit()
I tried to click on one of the chart buttons too using the following and that doesn't work either. I noticed that unlike many other websites for Tradingview the elements don't have names and don't generate a relative path (only full) using Xpath.
driver.find.element_by_xpath('/html/body/div[2]/div[5]/div/div[2]/div/div/div/div/div[4]').click()
Any help is greatly appreciated!
I think there must be an issue with xpath.
When I try to click the AAPL button it is working for me.
The xpath I used is:
(//div[contains(text(),'AAPL')])[1]
If you specify exactly which element to be clicked I will try.
And also be familiar with the concept of frames because these type of websites has lot of frames in it.

How to select a textbox using selenium in python

HTMLI want to select a textbox using XPath or any other locator, but I am unable to do so. The code is working fine for one part of the page, while it is not working using any locator for the other half of the page. I am not sure if my code is wrong or if something else is the problem.
I have attached the HTML part.
Here is my code:
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
driver.get('Website')
driver.implicitly_wait(50)
driver.find_element_by_xpath('//*[#id="j_username"]').send_keys("Username")
driver.find_element_by_xpath('//*[#id="j_password"]').send_keys("Password")
driver.find_element_by_xpath('//*[#id="b_submit"]').click()
driver.find_element_by_xpath('//*[#id="15301"]/div[1]/a/span').click()
driver.find_element_by_xpath('//*[#id="22261"]/a').click()
driver.find_element_by_xpath('//*[#id="22323"]/a').click()
driver.implicitly_wait(50)
driver.find_element_by_xpath('//*[#id="filterRow"]').clear()
The last line is where I am getting the following error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[#id="filterRow"]"}
Page may have not finished rendering when you try to find the element. So it will give NoSuchElementException
Try the following method
elem = driver.find_element_by_xpath('//*[#id="filterRow"]')
if len(elem) > 0
elem[0].clear()
Hope this will help you
You can wait till the elements loads using wait -
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
Filter_Row = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="filterRow"]')))
Filter_Row.clear()
Try the above code and see what happens.
Try below solution
wait = WebDriverWait(driver, 20)
wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id='filterRow']"))).clear()
Note: add below imports to your solution :
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
As in one of comments, you mentioned upon clicking tab a new page is opening. Can you please check if its opening in a new frame. Is so please switch to frame first where your element is using below statement:
driver.switch_to.frame(driver.find_element_by_name(name))
To navigate back to original frame you can use:
driver.switch_to.default_content()
using 'find_element_by_css_selector'
driver.find_element_by_css_selector("input")

Unable to make my script wait conditionally

I've tried to write a script in python in combination with selenium to wait for a certain element to be available. The content I wish my script waits for is captcha protected. I do not want to set a fixed time. So, I need it to wait until I can solve myself.
I've tried like:
import time
from selenium import webdriver
URL = "https://www.someurl.com/"
driver = webdriver.Chrome()
driver.get(URL)
while not driver.find_element_by_css_selector(".listing-content"):
time.sleep(1)
print(driver.current_url)
driver.quit()
But, the script throws an error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
How can I make my script wait until the element is available no matter how long it takes?
If you don't want to hardcode wait time you can use ExplicitWait along with float("inf") which in Python stands for INFINITY:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
wait(driver, float("inf")).until(EC.presence_of_element_located((By.CLASS_NAME, "listing-content")))
As you've asked how to organize the try/except block, here is an idea. I would suggest to stick with the inf-wait method however.
while True:
try:
driver.find_element_by_css_selector(".listing-content")
break
except:
time.sleep(0.1)
I would include the time.sleep() statement to minimize your number of function calls.
You should use WebDriverWait:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
...
element = WebDriverWait(driver, 10000).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".listing-content")))
It will not wait indefinitely, but you can set the timeout high. Otherwise you could try to use the WebDriverWait statement in a loop.

Unable to use "explicit wait" in the right way

I've written a script using python with selenium to click on some links listed in the sidebar of google maps. When any of the items get clicked, the related information attached to each lead shows up in the right sided area. The script is doing fine. However, I've used hardcoded delay to do the job. How can I get rid of hardcoded delay by achieving the same with explicit wait. Thanks in advance.
Link to the site: website
The script I'm trying with:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "replace_with_above_link"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
item.location
time.sleep(3) #wish to try with explicit wait but can't find any idea
item.click()
driver.quit()
I tried with wait.until(EC.staleness_of(item)) instead of hardcoded delay but no luck.
If you want to wait until new data displayed after each clcik you may try below:
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "[id^='rlimg0_']"))):
div = driver.find_element_by_xpath("//div[#class='xpdopen']")
item.location
item.click()
wait.until(EC.staleness_of(div))

Can't get rid of hardcoded delay even when Explicit Wait is already there

I've written some code in python in combination with selenium to parse the different questions from quora.com. My scraper is doing it's job at this moment. The thing is I've used here hardcoded delay for the scraper to work, even when Explicit Wait has already been defined. As the page is an infinite scrolling one, i tried to make the scrolling process to a limited number. Now, I have got two questions:
Why wait.until(EC.staleness_of(page)) is not working within my scraper. It is commented out now.
If i use something else instead of page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link"))) the scraper throws an error: can't focus element.
Btw, I do not wish to go for page = driver.find_element_by_tag_name('body') this option.
Here is what I've written so far:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
for scroll in range(10):
page.send_keys(Keys.PAGE_DOWN)
time.sleep(2)
# wait.until(EC.staleness_of(page))
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
You can try below code to get as much XHR as possible and then parse the page:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.quora.com/topic/C-programming-language")
wait = WebDriverWait(driver, 10)
page = wait.until(EC.visibility_of_element_located((By.CLASS_NAME, "question_link")))
links_counter = len(wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "question_link"))))
while True:
page.send_keys(Keys.END)
try:
wait.until(lambda driver: len(driver.find_elements_by_class_name("question_link")) > links_counter)
links_counter = len(driver.find_elements_by_class_name("question_link"))
except TimeoutException:
break
for item in wait.until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "rendered_qtext"))):
print(item.text)
driver.quit()
Here we scroll page down and wait up to 10 seconds for more links to be loaded or break the while loop if the number of links remains the same
As for your questions:
wait.until(EC.staleness_of(page)) is not working because when you scroll page down you don't get the new DOM - you just make XHR which adds more links into existed DOM, so the first link (page) will not be stale in this case
(I'm not quite confident about this, but...) I guess you can send keys only to nodes that can be focused (user can set focus manually), e.g. links, input fields, textareas, buttons..., but not content division (div), paragraphs (p), etc

Categories

Resources