I am new to Selenium and I would like to write a Python script that searches some keywords on google and automatically opens a page when the keywords are found.
I have been trying to work around with the script from this website:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
if __name__ == "__main__":
driver = webdriver.Firefox()
wait = WebDriverWait(driver, 100)
driver.get("http://google.com")
inputElement = driver.find_element_by_name("q")
inputElement.send_keys("Irwin Kwan")
wait.until(EC.element_to_be_clickable((By.XPATH,"//a[#href='http://irwinhkwan.wordpress.com/']")))
blog = driver.find_element_by_xpath("//a[#href='http://irwinhkwan.wordpress.com/']")
blog.click()
driver.quit()
It works perfectly if the xpath is displayed in the page. However, if it is not, we wait forever.
How can I check if an element is there, and if it is there I click on it?
I tried using 'try' as it is given in the documentation, or NoSuchElementException, but I admit being confused with the "double negation ;-):
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, "myDynamicElement"))
)
finally:
driver.quit()
Ultimately, I would like a while loop that goes to the page number X of the google results and click each time it finds a specific address.
If someone can give me a hand that would be great! I am pretty sure it is trivial, but despite all the documentation available online I haven't been able to work this out for the last couple of days...
here is a little wrapper for WebElement I’ve written, hope it helps
class Element(object):
def __init__(self, xpath=None):
self.xpath = xpath
def __get_element(self):
return Driver().find_element_by_xpath(self.xpath)
def is_exist(self):
try:
return self.__get_element().is_displayed()
except NoSuchElementException, e:
logger.exception(e.message)
return False
you cant click an element that is not visible, first you need to expose it, and then only click.
hope it helps
Related
I have tried to scrap info from that site - specifically, from a table. Every time I occur, info that elements doesn't exist.
https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
I try to add time.slep(5) to my code or scrolling down function to load all element - ineffective.
Do you have any advice for me?
EDIT
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
I added code in regard to your request. So my main goal is to scrap content from the table at this site. I add Explicit Waits to my code and still I can't select anything from that table - it's looking like the script doesn't see anything from that area.
One way to try solve it, its using the Xpath of the element or the relative position of the same, to make Selenium, get allways the same "line" of position to return the "value" of the information that you are searching.
Ex1:find_element(:xpath,"//*[#id="wmd-input"]")#in that case it's the input of this check box.
If it doesnt work, try this one.
Ex2: browser.implicitly_wait(30) #makes a timer to load all the informations from the web to your machine.
I've written a script in Python in association with selenium to keep clicking on MORE button to load more items until there are no new items left to load from a webpage. However, my below script can click once on that MORE button available on the bottom of that page.
Link to that site
This is my try so far:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=Mobile+App&locations[]=1688-United+States"
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(link)
while True:
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".results .name a.startup-link"))):
print(elems.get_attribute("href"))
try:
loadmore = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"[class='more']")))
driver.execute_script("arguments[0].scrollIntoView();", loadmore)
loadmore.click()
except Exception:break
driver.quit()
How can I keep clicking on that MORE button until there are no such button left to click and parse the links as I've already tried using for loop.
I've managed to solve the problem pursuing sir Andersson's logic within my exising script. This is what the modified script look like.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://angel.co/companies?company_types[]=Startup&company_types[]=Private+Company&company_types[]=Mobile+App&locations[]=1688-United+States"
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(link)
while True:
try:
loadmore = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"[class='more']")))
driver.execute_script("arguments[0].click();", loadmore)
wait.until(EC.staleness_of(loadmore))
except Exception:break
for elems in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".results .name a.startup-link"))):
print(elems.get_attribute("href"))
driver.quit()
why not just?
while (driver.FindElements(By.ClassName("more")).Count > 0)
{
driver.FindElement(By.ClassName("more")).Click();
//Some delay to wait lazyload to complete
}
c# example. pretty sure that it can be done with python as well
I am writing a scraping code for the website Upwork, and need to click through each page for job listings. Here is my python code, which I used selenium to web crawl.
from bs4 import BeautifulSoup
import requests
from os.path import basename
from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
driver = webdriver.Chrome("./chromedriver")
driver.get("https://www.upwork.com/o/jobs/browse/c/design-creative/")
link = driver.find_element_by_link_text("Next")
while EC.elementToBeClickable(By.linkText("Next")):
wait.until(EC.element_to_be_clickable((By.linkText, "Next")))
link.click()
There are couple of problems:
EC has no attribute elementToBeClickable. In Python you should use element_to_be_clickable
Your link defined on the first page only, so using it on the second page should give you StaleElementReferenceException
There is no wait variable defined in your code. I guess you mean something like
wait = WebDriverWait(driver, 10)
By has no attribute linkText. Try LINK_TEXT instead
Try to use below code to get required behavior
from selenium.common.exceptions import TimeoutException
while True:
try:
wait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, Next"))).click()
except TimeoutException:
break
This should allow you to click Next button while it's available
So, what's I want to do is to run a function on a specific webpage (which is a match with my regex).
Right now I'm checking it every second and it works, but I'm sure that there is a better way (as it's flooding that website with getting requests).
while flag:
time.sleep(1)
print(driver.current_url)
if driver.current_url == "mydesiredURL_by_Regex":
time.sleep(1)
myfunction()
I was thinking to do that somehow with WebDriverWait but not really sure how.
This is how I implemented it eventually. Works well for me:
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 5)
desired_url = "https://yourpageaddress"
def wait_for_correct_current_url(desired_url):
wait.until(
lambda driver: driver.current_url == desired_url)
I was thinking to do that somehow with WebDriverWait
Exactly. First of all, see if the built-in Expected Conditions may solve that:
title_is
title_contains
Sample usage:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
wait.until(EC.title_is("title"))
wait.until(EC.title_contains("part of title"))
If not, you can always create a custom Expected Condition to wait for url to match a desired regular expression.
To really know that the URL has changed, you need to know the old one. Using WebDriverWait the implementation in Java would be something like:
wait = new WebDriverWait(driver, 10);
wait.until(ExpectedConditions.not(ExpectedConditions.urlToBe(oldUrl)));
I know the question is for Python, but it's probably easy to translate.
Here is an example using WebdriverWait with expected_conditions:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://example.com/before'
changed_url = 'https://example.com/after'
driver = webdriver.Chrome()
driver.get(url)
# wait up to 10 secs for the url to change or else `TimeOutException` is raised.
WebDriverWait(driver, 10).until(EC.url_changes(changed_url))
Use url_matches Link to match a regex pattern with a url. It does re.search(pattern, url)
from selenium import webdriver
import re
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
pattern='https://www.example.com/'
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
wait.until(EC.url_matches(pattern))
I am using Python and Selenium to scrape a website. What I do is go to the homepage, type in a keyword, such as 1300746-79-5. On the resulting page, I am trying to scrape the data in the "pricing" section. Specifically, I need to get the "SKU-Pack Size" and "Price(USD)" information. But these information is Javascript encripted, so I cannot see them in the source code. I am wondering how I can achieve this.
I have written some code that gets me to the page of interest, but I still cannot see the javascript information. Here is what I have so far.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint
# Create a new instance of the Firefox driver
driver = webdriver.Chrome('C:\Users\Rei\Desktop\chromedriver.exe')
driver.get("http://www.sigmaaldrich.com/united-states.html")
print driver.title
inputElement = driver.find_element_by_name("Query")
# type in the search
inputElement.send_keys("1300746-79-5")
inputElement.submit()
Everything you have done looks correct to me.
"SKU-Pack Size" and "Price(USD)" information are not "encrypted", but retrieved after JavaScript clicking action. All you need to do is to click product name or pricing link.
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pprint
driver = webdriver.Chrome()
driver.get("http://www.sigmaaldrich.com/united-states.html")
print driver.title
inputElement = driver.find_element_by_name("Query")
# type in the search
inputElement.send_keys("1300746-79-5")
inputElement.submit()
pricing_link = driver.find_element_by_css_selector("li.priceValue a")
print pricing_link.text
pricing_link.click()
# then deal with the data you want
price_table = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".priceAvailContainer tbody"))
)
print 'price_table.text: ' + price_table.text
driver.quit()