I wanted to get elment by html selector
a lit of events from python.org
from selenium import webdriver
from selenium.webdriver.common.by import By
chrome_driver_path = "C:\development\chromedriver.exe"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
driver.get("https://python.org")
event_time = driver.find_element(By.CLASS_NAME, ".event-widget time")
for time in event_time:
print(time.text)
The locator type is wrong, it should be CSS_SELECTOR not CLASS_NAME, also you have to mention find_elements not find_element:
event_time = driver.find_elements(By.CSS_SELECTOR, ".event-widget time")
for time in event_time:
print(time.text)
Output:
2023-02-05
2023-02-16
2023-02-21
2023-02-25
2023-03-06
About Invalid Selector Exception:
Selenium throws an InvalidSelectorException when an XPath or CSS selector does not conform to the XPath or CSS specification. In other words, an InvalidSelectorException occurs when you pass in a selector which can’t be parsed by Selenium WebDriver’s selector engine. This occurs when an element retrieval command is used with an unknown web element selector strategy
Problem:
In your case, I do not see any class attribute with the name .event-widget time in the entire HTML structure.
Solution:
Try using some other locators like ID, name or XPath. If you are specific about using ClassName, then make sure that web element's class attribute is accurately matching in your code.
Related
For instance I have this website: https://skinport.com/item/stattrak-usp-s-black-lotus-minimal-wear/6128018 and want to get the current price of the item. Selenium doesn't find the element by class name, XPath or css selector. I think that's just because the page source doesn't have the price. The site consists of a few scripts which prints the current price
So I have something like this in python:
driver.get("https://skinport.com/item/stattrak-usp-s-black-lotus-field-tested/6196388")
print(price = driver.find_element(By.XPATH, '//*[#id="content"]/div[1]/div[2]/div/div/div[2]/div[1]/div'))
And I get this error: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
With
print(driver.find_elements(By.CSS_SELECTOR("#content > div.ItemPage > div.ItemPage-column.ItemPage-column--right > div:nth-child(1) > div > div.ItemPage-price > div.ItemPage-value > div")))
I get this error: TypeError: 'str' object is not callable
You are missing a wait.
You should let the page loaded before accessing that element.
The preferred way to do that is to use the expected conditions explicit waits.
Also you are missing the text method to retrieve the text from the web element.
Also, your locator is bad.
Something like this should work:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://skinport.com/item/stattrak-usp-s-black-lotus-field-tested/6196388")
wait = WebDriverWait(driver, 20)
price = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.ItemPage-price div.Tooltip-link"))).text
print(price)
My guest is, you didn't add an implicit/explicit wait to your driver session. Your xpath seems to work.
Post your code. Maybe we could figure it out together.
The link to the documentation
https://selenium-python.readthedocs.io/waits.html
I'm currently trying to use some automation while performing a patent searching task. I'd like to get all the links corresponding to search query result. Particularly, I'm interested in Apple patents starting from the year 2015. So the code is the next one -
import selenium
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as options
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
new_driver_path = r"C:/Users/alexe/Desktop/Apple/PatentSearch/geckodriver-v0.30.0-win64/geckodriver.exe"
ops = options()
serv = Service(new_driver_path)
browser1 = selenium.webdriver.Firefox(service=serv, options=ops)
browser1.get("https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new")
elements = browser1.find_elements(By.CLASS_NAME, "search-result-item")
links = []
for elem in elements:
href = elem.get_attribute('href')
if href:
links.append(href)
links = set(links)
for href in links:
print(href)
And the output is the next one -
https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf
https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf
https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf
https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf
https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf
https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf
https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf
https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf
https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf
The problem here is that I've got 1 missing link -
result item and the missing link
So I've tried different selectors and still got the same result - one link is missing. I've also tried to search with different parameters and the pattern is the next one - all the missing links aren't linked with pdf output. I've spent a lot of time trying to figure out what's the reason, so I would be really grateful If you could provide me with any clue on the matter. Thanks in advance!
The option highlighted has no a tag with class pdflink in it. Put the line of code to extract the link in try block. If the required element is not found, search for the a tag available for that article.
Try like below once:
driver.get("https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new")
articles = driver.find_elements_by_tag_name("article")
print(len(articles))
for article in articles:
try:
link = article.find_element_by_xpath(".//a[contains(#class,'pdfLink')]").get_attribute("href") # Use a dot in the xpath to find an element with in an element.
print(link)
except:
print("Exception")
link = article.find_element_by_xpath(".//a").get_attribute("href")
print(link)
10
https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf
https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf
https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf
https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf
https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf
https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf
Exception
https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new#
https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf
https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf
https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf
To extract all the href attributes of the pdfs using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.search-result-item[href]")))])
Using XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(#class, 'search-result-item') and #href]")))])
Console Output:
['https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf', 'https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf', 'https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf', 'https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf', 'https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf', 'https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf', 'https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf', 'https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf', 'https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS: You can extract only nine(9) href attributes as one of the search items is a <span> element and isn't a link i.e. doesn't have the href attribute
I'm having a lot of issues understanding how to do this. What i need to do is simple, which is to flag whenever my automated google search is not able to find any search results. My code example:
driver = webdriver.Chrome(executable_path)
driver.get("https://google.com/")
search = driver.find_element_by_name("q")
search.send_keys('site:'+'www.pa.gov'+ ' "ADT.com" '+'\n')
if driver.find_element(By.XPATH, "//*[#id='topstuff']/div/div/p[1]/text()[2]"):
print(True)
else:
print(False)
I keep getting this error:
InvalidSelectorException: invalid selector: The result of the xpath expression "//*[#id='topstuff']/div/div/p[1]/text()[2]" is: [object Text]. It should be an element.
(Session info: chrome=87.0.4280.88)
This is the link i've searched No Results
What am I doing wrong?
This error message...
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: The result of the xpath expression "//a[following-sibling::input[#value="ST"]]/#href" is: [object Attr]. It should be an element.
......implies that your XPath expression was not a valid xpath expression.
You need to replace:
driver.find_element(By.XPATH, "//*[#id='topstuff']/div/div/p[1]/text()[2]")
with:
driver.find_elements(By.XPATH, "//*[#id='topstuff']/div/div/p[1]/text()")[2]
Additional Considerations
Selenium WebDriver supports xpath-1.0 only which returns the set of node(s) selected by the xpath.
You can find the xpath-1.0 specifications in XML Path Language (XPath) Version 1.0
However the xpath expression:
driver.find_elements(By.XPATH, "//*[#id='topstuff']/div/div/p[1]/text()")[2]
Is a xpath-2.0 based expression and would typically return an object Text. A couple of examples:
//#version: selects all the version attribute nodes that are in the same document as the context node
../#lang: selects the lang attribute of the parent of the context node
You can find the xpath-2.0 specifications in XML Path Language (XPath) 2.0 (Second Edition)
Solution
So effectively your block of code will be:
if driver.find_elements(By.XPATH, "//*[#id='topstuff']/div/div/p[1]")[2]:
Xpath you have provided doesn't supported by selenium yet.
I believe you are capturing the error for wrong website.
Induce WebDriverWait() and wait for visibility_of_element_located() and following xpath.
Induce try..except block to handle if any error occurs.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Chrome(executable_path)
driver.get("https://google.com/")
search = driver.find_element_by_name("q")
search.send_keys('site:'+'www.pa.gov'+ ' "ADT.com" '+'\n')
try:
Searchelement=WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//div[#id='topstuff']//p[#role='heading']")))
print(True)
print("============================")
print(Searchelement.text)
print("============================")
#If you want to get specific node value then try below xpath
print(driver.find_element_by_xpath("//div[#id='topstuff']//p[#role='heading']/span/em").text)
except:
print(False)
Console output:
True
============================
Your search - site:www.pa.gov "ADT.com" - did not match any documents.
============================
site:www.pa.gov "ADT.com"
I want to find text element on the web page (address is below)
I am interested in reddit post title. (it is pointed out on the screen)
I use Ctrl+Shift+I and inspected element and get:
<h2 class="s1ua9il2-0 hsWiTe" data-redditstyle="true">The problems of a dutchman in China.</h2>
There is no id and name. Only tag ('h2') and class ("s1ua9il2-0 hsWiTe"), right?
My code below doesn't work:
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(r"C:\Users\vishniakov\Desktop\python bj\driver\chromedriver.exe",chrome_options=options)
driver.get("https://www.reddit.com/r/funny/comments/9kxrv6/the_problems_of_a_dutchman_in_china/")
print(driver.title)
elem = driver.find_element_by_class_name("s1ua9il2-0 hsWiTe")
#driver.quit()
ERROR:
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Compound class names not permitted
Also, finding by css_selector doesn't work too, when I use copy click:
then,
from selenium.webdriver.common.by import By
driver.find_element(By.XPATH, '//button[text()="Some text"]')
Try to do it by xpath
f12 - click on element and copy xpath
for example:
browser.find_element_by_xpath('//*[#id="modal-manager"]/div/div/div[2]/div/div[3]/div[1]/button')
I am trying to wait for a few elements in a page to load before continuing with entering/getting data from the page. Some reading has lead me to this kind of code:
#including imports in case they are influencing things
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver.get('example.com')
try:
element1_present = EC.presence_of_element_located((By.ID, 'id'))
WebDriverWait(driver, timeout).until(element1_present)
except TimeoutException:
print('Timed out waiting for page to load')
#Then get some data from example.com
This works just fine, however, I'd like to determine if the element is present through CSS Selector, not ID. Reference to Selenium documentation is confusing me. It states that the presence_of_element_located method should take a "locator" as an argument, but looking at the By documentation, I do not see how (By.ID, 'id') is a valid locator (though clearly, it works, and I don't understand this), and more specifically, I don't see how to code it to be a CSS Selector locator.
I have attempted By.cssSelector, By.CSS, and other similar terms in place of By.ID, and moving parentheses about, but I am always returned:
AttributeError: type object 'By' has no attribute 'cssSelector'
Or something similar. I'm clearly missing some documentation somewhere, because I can't figure out why ID is a valid attritube for By and cssSelector isn't. What am I missing?
You want to use By.CSS_SELECTOR.
For your reference, here are the attributes available for the By class:
ID = "id"
XPATH = "xpath"
LINK_TEXT = "link text"
PARTIAL_LINK_TEXT = "partial link text"
NAME = "name"
TAG_NAME = "tag name"
CLASS_NAME = "class name"
CSS_SELECTOR = "css selector"