I am learning selenium. I am trying to scrape the amazon website with selenium. Here is the link I am trying to scrape.
In the above url I am trying to extract all the elements with the class a-size-mini and extract the link from these elements.
here is my code
links = driver.find_elements_by_class_name("a-size-mini")
for link in links:
element = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.LINK_TEXT, link.text)))
print(element.get_attribute('href'))
But this is returning None. I am not sure what I am doing wrong. the length of the links list is showing as 55 and when I try to print the element variable I get the following
<selenium.webdriver.remote.webelement.WebElement (session="121606058bd493d1a70fc957699d7f6d", element="c3dd6f5b-a9bb-409c-8ee2-666cac7e7432")>
So these variables are not empty or None. But when I try to extract the link using get_attribute('href') method it returns None
Please help me out. Thanks in advance
Please use this command.
links = driver.find_elements_by_xpath('//h2[contains(#class, "a-size-mini")]/a')
It's more efficient to parse html by xpath than class name.
Related
I want to scrape the URLs within the HTML of the 'Racing-Next to Go' section of www.tab.com.au.
Here is an excerpt of the HTML:
<a ng-href="/racing/2020-07-31/MACKAY/MAC/R/8" href="/racing/2020-07-31/MACKAY/MAC/R/8"><i ng-
All I want to scrape is the last bit of that HTML which is a link, so:
/racing/2020-07-31/MACKAY/MAC/R/8
I have tried to find the element by using xpath, but I can't get the URL I need.
My code:
driver = webdriver.Firefox(executable_path=r"C:\Users\Harrison Pollock\Downloads\Python\geckodriver-v0.27.0-win64\geckodriver.exe")
driver.get('https://www.tab.com.au/')
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.text)
Probaly you want to use get_attribute insted of .text. Documentation here.
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))
Yes, you can use getAttribute(attributeLocator) function for your requirement.
selenium.getAttribute(//xpath#href);
Specify the Xpath of the element for which you require to know the class of.
The value /racing/2020-07-31/MACKAY/MAC/R/8 within the HTML is the value of href attribute but not the innerText.
Solution
Instead of using the text attribute you need to use get_attribute("href") and the effective lines of code will be:
elements = driver.find_elements_by_xpath('/html/body/ui-view/main/div[1]/ui-view/version[2]/div/section/section/section/race-list/ul/li[1]/a')
for e in elements:
print(e.get_attribute("href"))
I am learning to use scrapy and playing with XPath selectors, and decided to practice by scraping job titles from craigslist.
Here is the html of a single job link from the craigslist page I am trying to scrape the job titles from:
Full Stack .NET C# Developer (Mid-Level, Senior) ***LOCAL ONLY***
What I wanted to do was retrieve all of the similar a tags with the class result-title, so I used the XPath selector:
titles = response.xpath('//a[#class="result-title"/text()]').getall()
but the output I receive is an empty list: []
I was able to copy the XPath directly from Chrome's inspector, which ended up working perfectly and gave me a full list of job title names. This selector was:
titles = response.xpath('*//div[#id="sortable-results"]/ul/li/p/a/text()').getall()
I can see why this second XPath selector works, but I don't understand why my first attempt did not work. Can someone explain to me why my first XPath selector failed? I have also provided a link to the full html for the craigslist page below if that is helpful/neccessary. I am new to scrapy and want to learn from my mistakes. Thank you!
view-source:https://orangecounty.craigslist.org/search/sof
Like this:
'//a[contains(#class,"result-title ")]/text()'
Or:
'//a[starts-with(#class,"result-title ")]/text()'
I use contains() or starts-with() because the class of the a node is
result-title hdrlnk
not just
result-title
In your XPath:
'//a[#class="result-title"/text()]'
even if the class was result-title, the syntax is wrong, you should use:
'//a[#class="result-title"]/text()'
Simply '//a[#class="result-title hdrlnk"]/text()'
Needed 2 fixes:
/text() outside of []
"result-title hdrlnk" not only "result-title" in attribute selection because XPath is XML parsing not CSS; so exact attribute content is needed to match.
I am trying to scrape the target website for product_links. The program should open the required URL in the browser and scrape all the links with a particular class name. But for some reason, I am getting a NoSuchElementException for this piece of code
links = driver.find_elements_by_class_name("styles__StyledTitleLink-mkgs8k-5")
for link in links:
self.driver.implicitly_wait(15)
product_links.append(link.find_element_by_css_selector('a').get_attribute('href'))
I tried printing out the text in each link with link.text in the for loop. The code is actually selecting the required elements. But for some reason is not able to extract the href URL from each link. I am not sure what I am doing wrong.
This is the entire error message
NoSuchElementException: Message: no such element: Unable to locate
element: {"method":"css selector","selector":"a"} (Session info:
chrome=83.0.4103.106)
Error seems there is no css element with 'a' so you need to try with other locators to identify elements. try with xpath=//a[contains(text(),'text of that element')]
You are looking for a class name generated by a builder, check the random string at the end of the class name, these classes won't be found in every web.
if you want to scrape them, find a different generic class or find all classes with a subset string "StyledTitleLink"
Here's how to do it with JQuery
You should try and find a different solution to your problem
i am using a school class schedule website and i want to access the div element that contains info on how many seats are in a class and who is teaching it in order to scrape it. i first find the element which contains the div element i want, after that i try to find the div element i want by using xpaths. the problem i face is when i try to use either the find_element_by_xpath or find_elements_by_xpath to get the div i want i get this error:
'list' object has no attribute 'find_element_by_xpath'
is this error happening because the div element i want to find is nested? is there a way to get nested elements using a div tag?
here is the code i have currently :
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_elements_by_id('INST366')
sixteen_grid = section_container.find_element_by_xpath(".//div[#class = 'sections sixteen colgrid']").text
the info i want is this:
<div class = "sections sixteen colgrid"</div>
its currently inside this id tag:
<div id="INST366" class="course"</div>
greatly appreciated if anyone could help me out with this
From documentation of find_elements_by_id:
Returns : list of WebElement - a list with elements if any was found. An empty list if not
Which means section_container is a list. You can't call find_element_by_xpath on a list but you can on each element within the list because they are WebElement.
What says the documentation about find_element_by_id?
Returns : WebElement - the element if it was found
In this case you can use find_element_by_xpath directly. Which one you should use? Depends on your need, if need to find the first match to keep digging for information or you need to use all the matches.
After fixing that you will encounter a second problem: your information is displayed after executing javascript code when clicking on "Show Sections", so you need to do that before locating what you want. For that go get the a href and click on it.
The new code will look like this:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_element_by_id('INST366')
section_container.find_element_by_xpath(".//a[#class='toggle-sections-link']").click()
sleep(1)
section_info = section_container.find_element_by_xpath(".//div[#class='sections sixteen colgrid']").text
driver.quit()
I am trying to scrape information from a website using a CSS Selector in order to get a specific text element but have come across a problem. I try to search for my desired portion of the website but my program is telling me that it does not exist. My program returns an empty list.
I am using the requests and lxml libraries and am using CSS Selectors to do my HTML Scraping. I have Python 3.7. I try searching for the part of the website that I need with a selector and it is not appearing. I have also tried using XPath but that has failed as well. I have tried using the following selector:
div#showtimes
When I use this selector, I get the following result:
[<Element div at 0x3bf6f60>]
I get the expected result, which is the desired element. When I try to go one step further and access the element nested inside of the div#showtimes element (see below), I get an empty list.
div#showtimes div
I get the following result:
[]
Through inspection of the website's HTML, I know that there is a nested element within the div#showtimes element. This problem has occurred on other web pages as well. I am using the code below.
import requests
from lxml import html
from lxml.cssselect import CSSSelector
# Set URL
url = "http://www.fridleytheatres.com/location/7425/Paramount-7-Theatres-
Showtimes"
# Get HTML from page
page = requests.get(url)
data = html.fromstring(page.text)
# Set up CSSSelector
sel = CSSSelector('div#showtimes div')
# Apply Selector
results = sel(data)
print(results)
I expect the output to be a list containing a element, but it is returning an empty list [].
If I understand the problem correctly, you're attempting to get a div element which is a child of div#showtimes. Try using div#showtimes > div.