The script is failing to found element if the class contains more values in the class.
For example this class:
<a class="a-link-normal s-access-detail-page s-color-twister-title-link a-text-normal">
I want to find this element only by using class -- s-access-detail-page.
By looking for an element like this, I'm getting an error that element is not found:
find_element_by_css_selector("a[class*='s-access-detail-page']")
Same thing if I'm looking for an element with a class that contains:
a-link-normal a-text-normal
class on the page:
Parsing URL is Amazon: https://www.amazon.com/s?k=smart+watches&page=1
need to get product URLs.
You can use just the following CSS Selector:
.s-access-detail-page
Hope it helps you!
Try either of this.This should work.
find_element_by_css_selector("a.a-link-normal")
OR
find_element_by_css_selector(".a-link-normal")
OR
find_element_by_css_selector("a.s-access-detail-page")
OR
find_element_by_css_selector("a.s-color-twister-title-link")
Ensure you have a wait in and you can use just a simple class selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = 'https://www.amazon.com/s?k=smart+watches&page=1'
d = webdriver.Chrome()
d.get(url)
links = WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".s-access-detail-page")))
linkUrls = [link.get_attribute('href') for link in links]
print(linkUrls)
Related
I'm doing a scraping process using selenium in which my goal is to extract the views, likes, comments and shares of the videos that are made to an audio in TikTok.
In the process I found this path:
<div data-e2e="music-item-list" mode="compact" class="tiktok-yvmafn-DivVideoFeedV2 e5w7ny40">
This contains the different videos of the audio, however it is inside a <div> and not <li>.
div dependency
How do I convert the divs contained in the path into a list that I can manipulate?
This is what I did:
url = 'https://www.tiktok.com/music/Sweater-Weather-Sped-Up-7086537183875599110'
driver.get(url)
posts = driver.find_element(By.XPATH, '//div[#data-e2e="music-item-list"]')
post1 = posts[0]
A proper way to locate those elements would be too wait for them in a first instance, and then locate them as a list, then access them:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
[...]
wait = WebDriverWait(driver, 20)
[...]
posts = wait.until(EC.presence_of_all_elements_located((By.XPATH , '//div[#data-e2e="music-item-list"]/div')))
for post in posts:
print(post.text)
Selenium documentation: https://www.selenium.dev/documentation/
This is how I get the website
from selenium import webdriver
url = '...'
driver = webdriver.Firefox()
driver.get(url)
Now I want to extract all elements with a certain classes into a list
<li class=foo foo-default cat bar/>
How would I get all the elements from the website with these classes?
There is something like
fruit = driver.find_element_by_css_selector("#fruits .tomatoes")
But when I do this (I tried without spaces between the selectors too)
elements = driver.find_element_by_css_selector(".foo .foo-default .cat .bar")
I get
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .foo .foo-default .cat .bar
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:183:5
NoSuchElementError#chrome://remote/content/shared/webdriver/Errors.jsm:395:5
element.find/</<#chrome://remote/content/marionette/element.js:300:16
These are the classes I copied from the DOM`s website though...
If this is just the HTML
<li class=foo foo-default cat bar/>
You can remove the space and put a . to make a CSS SELECTOR as a locator.
elements = driver.find_elements(By.CSS_SELECTOR, "li.foo.foo-default.cat.bar")
print(len(elements))
or my recommendation would be to use it with explicit waits:
elements_using_ec = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.foo.foo-default.cat.bar")))
print(len(elements))
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Have you tried without spaces between class names?
fruit = driver.find_element_by_css_selector(".foo.foo-default.cat.bar")
There is an undocumented function
driver.find_elements_by_css_selector(".foo.foo-default.cat.bar")
^
This works.
I want to retrieve from a website every div class='abcd' element using Selenium together with 'waiter' and 'XPATH' classes from Explicit.
The source code is something like this:
<div class='abcd'>
<a> Something </a>
</div>
<div class='abcd'>
<a> Something else </a>
...
When I run the following code (Python) I get only 'Something' as a result. I'd like to iterate over every instance of the div class='abcd' appearing in the source code of the website.
from explicit import waiter, XPATH
from selenium import webdriver
driver = webdriver.Chrome(PATH)
result = waiter.find_element(driver, "//div[#class='abcd']/a", by=XPATH).text
Sorry if the explanation isn't too technical, I'm only starting with webscraping. Thanks
I've used like this. You can also use if you like this procedure.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(PATH)
css_selector = "div.abcd"
results = WebDriverWait(driver, 10).until((expected_conditions.presence_of_all_elements_located((By.CSS_SELECTOR, css_selector))))
for result in results:
print(result.text)
I am trying to pull a table from wikipedia. When I try and pull it using the following driver.find_element_by_class_name(name) it will not work. However when going to the html source code I can explicitly see the class name that I am looking for.
I do realize there are other ways to pull this table and I have moved on to easier ways. I am curious as to why Selenium does not find the class when it is in the HTML.
from selenium import webdriver
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
driver.implicitly_wait(2)
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
However, the error I get is
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".wikitable sortable jquery-tablesorter"}
(Session info: chrome=75.0.3770.142)
wikitable sortable jquery-tablesorter is 3 class names: wikitable, sortable, and jquery-tablesorter. .find_element_by_class_name() only takes a single parameter consisting of a single class name, e.g. .find_element_by_class_name("wikitable"). That may or may not find the element you want based on whether that class name uniquely locates the element that you want.
Another option would be to use a CSS selector so that you can use all three classes in a single locator, e.g.
.wikitable.sortable.jquery-tablesorter
where the . indicates a class name in CSS selector syntax. See the CSS selector references below for more info on CSS selectors and their syntax.
W3C Selectors Overview
Selenium Tips: CSS Selectors
Taming Advanced CSS Selectors
To handle dynamic element use WebdriverWait and visibility_of_element_located and following css selector.
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter")))
You need to import followings.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
If you want to print the value of table.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(r"\chromedriver_win32\chromedriver.exe")
driver.get(r'https://en.wikipedia.org/wiki/List_of_airports_in_the_United_States')
print(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".wikitable.sortable.jquery-tablesorter"))).text)
Please use class name directly in function find_element_by_class_name(). So, instead of writing like:
driver.find_element_by_class_name(name='wikitable sortable jquery-tablesorter')
Please write like:
driver.find_element_by_class_name('wikitable sortable jquery-tablesorter')
Hope it helps :)
The following code, which extracts elements using css selector, works in the ipython3 terminal, but doesn't find the elements when run as script:
from selenium import webdriver
driver = webdriver.Chrome()
url = scrape_url + "&keywords=" + keyword
driver.get(url)
driver.find_elements_by_css_selector(".search-result.search-result__occluded-item.ember-view")
The complex class of the element:
"search-result search-result__occluded-item ember-view"
The following xpath worked in the terminal, but not as a script:
driver.find_elements_by_xpath("//li[contains(#class, 'search-result search-result__occluded-item')]")
This might be a timing issue: required element could be generated dynamically, so you need to wait some time until it appears in DOM:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver = webdriver.Chrome()
url = scrape_url + "&keywords=" + keyword
driver.get(url)
wait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//li[contains(#class, 'search-result search-result__occluded-item')]")))
Also some class names could be assigned dynamically. That's why using compound name as "search-result search-result__occluded-item ember-view" might not work without ExplicitWait
If you can't find any elements with selenium css selector, then can you always try to use xpath instead of the css selector.
More information about that can be found here.
Pass only partial class name like,
driver.find_elements_by_css_selector(".search-result__occluded-item")