How to get all elements with multiple classes in selenium - python

This is how I get the website
from selenium import webdriver
url = '...'
driver = webdriver.Firefox()
driver.get(url)
Now I want to extract all elements with a certain classes into a list
<li class=foo foo-default cat bar/>
How would I get all the elements from the website with these classes?
There is something like
fruit = driver.find_element_by_css_selector("#fruits .tomatoes")
But when I do this (I tried without spaces between the selectors too)
elements = driver.find_element_by_css_selector(".foo .foo-default .cat .bar")
I get
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .foo .foo-default .cat .bar
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:183:5
NoSuchElementError#chrome://remote/content/shared/webdriver/Errors.jsm:395:5
element.find/</<#chrome://remote/content/marionette/element.js:300:16
These are the classes I copied from the DOM`s website though...

If this is just the HTML
<li class=foo foo-default cat bar/>
You can remove the space and put a . to make a CSS SELECTOR as a locator.
elements = driver.find_elements(By.CSS_SELECTOR, "li.foo.foo-default.cat.bar")
print(len(elements))
or my recommendation would be to use it with explicit waits:
elements_using_ec = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.foo.foo-default.cat.bar")))
print(len(elements))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Have you tried without spaces between class names?
fruit = driver.find_element_by_css_selector(".foo.foo-default.cat.bar")

There is an undocumented function
driver.find_elements_by_css_selector(".foo.foo-default.cat.bar")
^
This works.

Related

How to scrape <li> tag with class like active/selected?

I'm trying to scrape a list from a website. There are two different lists, and one will load only after the first option is chosen. Issue is, I'm unable to select the first option. I scraped the list of all available options. But after writing it, I have to select it from the given option, and I'm unable to do so.
I've tried using browser.find_element_by_css_selector(....).click(), but it's showing the elementnotfound exception even after putting the proper wait condition. I think that's because it's unable to find that element.
browser.find_element_by_css_selector("#Brand_name").send_keys(company[i])
element= browser.find_element_by_css_selector("#Brand_name_selectWrap")
browser.implicitly_wait(5) # seconds
browser.find_element_by_css_selector("""#Brand_name_selectWrap > ul > li.selected""").click()
PS: Following is the link which I'm trying to scrape. I need all the mobiles listed company wide.
https://bangalore.quikr.com/Escrow/post-classifieds-ads/?postadcategoryid=227
Can someone kindly suggest some better way?
You can gather all options and their rel attributes into a dictionary and then loop that with appropriate wait conditions for the sublist to appear:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
d = webdriver.Chrome()
d.get('https://bangalore.quikr.com/Escrow/post-classifieds-ads/?postadcategoryid=227')
options = {i.get_attribute('textContent'):i.get_attribute('rel') for i in d.find_elements_by_css_selector('#Brand_name_selectWrap .optionLists li:not(.optionHeading) a')}
input_element = d.find_element_by_id('Brand_name')
for k,v in options.items():
input_element.click()
input_element.send_keys(k)
selector = '[rel="' + v + '"]'
WebDriverWait(d, 3).until(EC.element_to_be_clickable((By.CSS_SELECTOR, selector))).click()
WebDriverWait(d, 2).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#Model_selectWrap.showCustomSelect")))
I created a dictionary that had all the options for mobiles in it. The keys were the actual text that would go in the input box and the values are the rel attribute values for those elements. Each option has a rel attribute. It means that I can input the phone name via the key so as to generate the dropdowns to select your mobile from possible values, then use the rel attribute in a css attribute = value selector to ensure I click on the right one
The rel attribute inside anchor tags (<a>) describes the relation to the document where the link points to.
The selector variable just hold the current css attribute = value selector for getting an mobile drop down option by its rel attribute value.
The element is covered by an other element so you'll need to use ActionChains to preform the click.
You'll needto import it:
from selenium.webdriver.common.action_chains import ActionChains
Then to click on the input only after that send_keys:
input_el = browser.find_element_by_css_selector('#Brand_name')
ActionChains(browser).move_to_element(input_el).click().perform()
It's a good practice to use WebDriverWait to validate your conditions:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
wait = WebDriverWait(browser, 15)
input_el = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"#Brand_name")))
ActionChains(browser).move_to_element(input_el).click().perform()

Why can't Selenium find this class on Wikipedia?

I am trying to pull a table from wikipedia. When I try and pull it using the following driver.find_element_by_class_name(name) it will not work. However when going to the html source code I can explicitly see the class name that I am looking for.
I do realize there are other ways to pull this table and I have moved on to easier ways. I am curious as to why Selenium does not find the class when it is in the HTML.
from selenium import webdriver
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
driver.implicitly_wait(2)
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
However, the error I get is
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".wikitable sortable jquery-tablesorter"}
(Session info: chrome=75.0.3770.142)
wikitable sortable jquery-tablesorter is 3 class names: wikitable, sortable, and jquery-tablesorter. .find_element_by_class_name() only takes a single parameter consisting of a single class name, e.g. .find_element_by_class_name("wikitable"). That may or may not find the element you want based on whether that class name uniquely locates the element that you want.
Another option would be to use a CSS selector so that you can use all three classes in a single locator, e.g.
.wikitable.sortable.jquery-tablesorter
where the . indicates a class name in CSS selector syntax. See the CSS selector references below for more info on CSS selectors and their syntax.
W3C Selectors Overview
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors
To handle dynamic element use WebdriverWait and visibility_of_element_located and following css selector.
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter")))
You need to import followings.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
If you want to print the value of table.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter"))).text)
Please use class name directly in function find_element_by_class_name(). So, instead of writing like:
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
Please write like:
driver.find_element_by_class_name('wikitable sortable jquery-tablesorter')
Hope it helps :)

How do I find element that contains specific class value?

The script is failing to found element if the class contains more values in the class.
For example this class:
<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal">
I want to find this element only by using class -- s-access-detail-page.
By looking for an element like this, I'm getting an error that element is not found:
find_element_by_css_selector("a[class*='s-access-detail-page']")
Same thing if I'm looking for an element with a class that contains:
a-link-normal a-text-normal
class on the page:
Parsing URL is Amazon: https://www.amazon.com/s?k=smart+watches&page=1
need to get product URLs.
You can use just the following CSS Selector:
.s-access-detail-page
Hope it helps you!
Try either of this.This should work.
find_element_by_css_selector("a.a-link-normal")
OR
find_element_by_css_selector(".a-link-normal")
OR
find_element_by_css_selector("a.s-access-detail-page")
OR
find_element_by_css_selector("a.s-color-twister-title-link")
Ensure you have a wait in and you can use just a simple class selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = 'https://www.amazon.com/s?k=smart+watches&page=1'
d = webdriver.Chrome()
d.get(url)
links = WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".s-access-detail-page")))
linkUrls = [link.get_attribute('href') for link in links]
print(linkUrls)

Grabbing text from a list with no ID or class using Selenium

I don't understand why the list I'm trying to extract the text from is returning blanks when I'm definitely using the correct Xpath. Here is my code:
driver = webdriver.Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
betweenLugs = driver.find_elements(By.XPATH, "/html/body/div[2]/main/div[3]/div/div/div[2]/div/div[2]/div[3]/div/ul/li[1]")])
print(betweenLugs.text)
This should grab the first list item and measurement
Between lugs: 20 mm
I have also tried other methods, but the fact that Xpath doesn't pick it up tells me something is wrong and it doesn't matter how I do it, I won't be able to extract the text inside the lists. Does anyone know what am I doing wrong? This is the first time I've ran into this problem.
The xpath is wrong. It fails in /div[2], it doesn't match anything. This is an example why you shouldn't use absolute path.
The section has id attribute, use it
betweenLugs = driver.find_elements(By.XPATH, "//*[#id='product-info-data-5bea7fa7406d7']/ul/li[1]")[0]
You might also want to add some wait for the loading
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.XPATH, "//*[#id='product-info-data-5bea7fa7406d7']/ul/li[1]")))
OK, try this and see if it solves the problem:
between_lugs = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]").get_attribute("innerHTML")
between_lugs_value = driver.find_element_by_xpath("//*[contains(text(), 'Between lugs')]/../span").get_attribute("innerHTML")
final_text = between_lugs + " " + between_lugs_value
That page already has jQuery on it so you can just:
driver.execute_script("return jQuery('li:contains(Between lugs)').text().trim().replace(/\s+/g, ' ')")
You can fiddle with selectors in the chrome selectors, it makes it much easier.
Another simpler approach might be the following one:
from contextlib import closing
from selenium import webdriver
from selenium.webdriver.support import ui
url = "https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001"
with closing(webdriver.Chrome()) as wd:
wait = ui.WebDriverWait(wd, 10)
wd.get(url)
item = wait.until(lambda wd: wd.find_element_by_xpath("//*[contains(#class,'technical-data')]//li")).get_attribute('textContent')
print(' '.join(item.split()))
Output:
Between lugs: 20 mm
Using a scroll down and a wait with a css selector to target the parent li
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome() #Firefox()
driver.get("https://www.omegawatches.com/watch-omega-specialities-first-omega-wrist-chronograph-51652483004001")
driver.execute_script("window.scrollTo(0, 2000)")
betweenLugs = WebDriverWait(driver, 10).until(expected_conditions.visibility_of_element_located((By.CSS_SELECTOR, "#product-info-data-5beaf5497d916 > ul > li:nth-child(1)")))
print(betweenLugs.text)

selenium css selector or xpath for complex class doesn't work when run as script

The following code, which extracts elements using css selector, works in the ipython3 terminal, but doesn't find the elements when run as script:
from selenium import webdriver
driver = webdriver.Chrome()
url = scrape_url + "&keywords=" + keyword
driver.get(url)
driver.find_elements_by_css_selector(".search-result.search-result__occluded-item.ember-view")
The complex class of the element:
"search-result search-result__occluded-item ember-view"
The following xpath worked in the terminal, but not as a script:
driver.find_elements_by_xpath("//li[contains(#class, 'search-result search-result__occluded-item')]")
This might be a timing issue: required element could be generated dynamically, so you need to wait some time until it appears in DOM:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver = webdriver.Chrome()
url = scrape_url + "&keywords=" + keyword
driver.get(url)
wait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//li[contains(#class, 'search-result search-result__occluded-item')]")))
Also some class names could be assigned dynamically. That's why using compound name as "search-result search-result__occluded-item ember-view" might not work without ExplicitWait
If you can't find any elements with selenium css selector, then can you always try to use xpath instead of the css selector.
More information about that can be found here.
Pass only partial class name like,
driver.find_elements_by_css_selector(".search-result__occluded-item")

Categories

Resources