Selenium by data-react-id attribute on web scrape - python

I'm trying to scrape some data from Yahoo! Finance, but I've noticed that most of the elements have something called data-reactid. So when using selenium locate the element by, when I try to name or id I get an error each time. I've never used the XPath method, but could someone take a look at https://finance.yahoo.com/quote/IBM.
I want to save data-reactid='35' which are the $165 close price to a variable name data for example and then print the variable.

Use the following CSS selector (here I used a nested element structure):
price_per_share = driver.find_element_by_css_selector("#quote-header-info > div > div > div > span[data-reactid='35']")
print(price_per_share.text)
It's more accurate. Hope it helps you!
PS: data-reactid is custom attribute of the span element.

css_locator = 'div.quote-header-section span[data-reactid="35"]'
price = driver.find_element_by_css_selector(css_locator).text
print price

Related

Using Python + Selenium to select a tag within divs of a class within a div?

Alright so, what I'm trying to do is searching for the first a tags within the divs of a specific class, in a div with a specific ID. Using Python + Selenium offcourse.
Right now I have as my code
newest_elements = driver.find_elements_by_css_selector("div.elements > a")
What this is doing is searching for all divs in a page with class "elements", and taking the very top most link from those divs. But I do not want to search all of the divs on the entire page with the class "elements". I only want to search the "elements" divs that are in another larger div with an specific id called "list-all".
How do I achieve this? Thanks in advance for your help guys
According to your description instead of
newest_elements = driver.find_elements_by_css_selector("div.elements > a")
You should use
newest_elements = driver.find_elements_by_css_selector("div#list-all div.elements > a")
You may possibly add waits / delays here.

How to extract text from div class using Selenium with Python

I am trying to create a bot to pay some bills automatically. The issue is I can't extract the amount(text) under div class. The error is element not found.
Used driver.find_element_by_xpath and WebDriverWait. Can you please indicate how to get the highlighted text-see the attached link? Thanks in advance.Page_inspect
I believe there was some issue with your xpath. Try below it should work:
amount = WebDriverWait(self.driver, self.timeout).until( EC.presence_of_element_located((By.XPATH, '//div[starts-with(#class,"bill-summary-total")]//div[contains(#data-ng-bind-html,"vm.productList.totalAmt")]')))
print('Your amount is: {}'.format(amount.text))
return float(amount.text)
You can use -
driver.find_element_by_xpath("//div[#data-ng-bind-html='vm.productList.totalAmt']").text
I have written XPath on the basis of your attached image. Use list indexing to get a target div. For Example -
driver.find_element_by_xpath("(//div[#data-ng-bind-html='vm.productList.totalAmt'])[1]").text

how do i access nested html elements using selenium?

i am using a school class schedule website and i want to access the div element that contains info on how many seats are in a class and who is teaching it in order to scrape it. i first find the element which contains the div element i want, after that i try to find the div element i want by using xpaths. the problem i face is when i try to use either the find_element_by_xpath or find_elements_by_xpath to get the div i want i get this error:
'list' object has no attribute 'find_element_by_xpath'
is this error happening because the div element i want to find is nested? is there a way to get nested elements using a div tag?
here is the code i have currently :
driver = webdriver.Chrome(ChromeDriverManager().install())
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_elements_by_id('INST366')
sixteen_grid = section_container.find_element_by_xpath(".//div[#class = 'sections sixteen colgrid']").text
the info i want is this:
<div class = "sections sixteen colgrid"</div>
its currently inside this id tag:
<div id="INST366" class="course"</div>
greatly appreciated if anyone could help me out with this
From documentation of find_elements_by_id:
Returns : list of WebElement - a list with elements if any was found. An empty list if not
Which means section_container is a list. You can't call find_element_by_xpath on a list but you can on each element within the list because they are WebElement.
What says the documentation about find_element_by_id?
Returns : WebElement - the element if it was found
In this case you can use find_element_by_xpath directly. Which one you should use? Depends on your need, if need to find the first match to keep digging for information or you need to use all the matches.
After fixing that you will encounter a second problem: your information is displayed after executing javascript code when clicking on "Show Sections", so you need to do that before locating what you want. For that go get the a href and click on it.
The new code will look like this:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
url = "https://app.testudo.umd.edu/soc/202008/INST"
driver.get(url)
section_container = driver.find_element_by_id('INST366')
section_container.find_element_by_xpath(".//a[#class='toggle-sections-link']").click()
sleep(1)
section_info = section_container.find_element_by_xpath(".//div[#class='sections sixteen colgrid']").text
driver.quit()

Scraping with xpath with requests and lxml but having problems

I keep running into an issue when I scrape data with lxml by using the xpath. I want to scrape the dow price but when I print it out in python it says Element span at 0x448d6c0. I know that must be a block of memory but I just want the price. How can I print the price instead of the place in memory it is?
from lxml import html
import requests
page = requests.get('https://markets.businessinsider.com/index/realtime-
chart/dow_jones')
content = html.fromstring(page.content)
#This will create a list of prices:
prices = content.xpath('//*[#id="site"]/div/div[3]/div/div[3]/div[2]/div/table/tbody/tr[1]/th[1]/div/div/div/span')
#This will create a list of volume:
print (prices)
You're getting generators which as you said are just memory locations. To access them, you need to call a function on them, in this case, you want the text so .text
Additionally, I would highly recommend changing your XPath since it's a literal location and subject to change.
prices = content.xpath("//div[#id='site']//div[#class='price']//span[#class='push-data ']")
prices_holder = [i.text for i in prices]
prices_holder
['25,389.06',
'25,374.60',
'7,251.60',
'2,813.60',
'22,674.50',
'12,738.80',
'3,500.58',
'1.1669',
'111.7250',
'1.3119',
'1,219.58',
'15.43',
'6,162.55',
'67.55']
Also of note, you will only get the values at load. If you want the prices as they change, you'd likely need to use Selenium.
The variable prices is a list containing a web element. You need to call the text method to extract the value.
print(prices[0].text)
'25,396.03'

How to skip certain tags with BeautifulSoup?

I'm a beginner in Python and currently I'm trying to write a simple script using BeautifulSoup to extract some information from a web page and write it to a CSV file. What I'm trying to do here, is to go through all the lists on the web page. In the specific HTML file which I'm looking to work with, only one 'ul' has an id and I wish to skip that one and save all the other list elements in an array. My code doesn't work and I can't figure out how to solve my problem.
for ul in content_container.findAll('ul'):
if 'id' in ul:
continue
else:
for li in ul.findAll('li'):
list.append(li.text)
print(li.text)
here when I print the list out, I still see the elements from the ul with the id. I know it's a simple problem but I'm stuck at the moment. Any help would be appreciated
You are looking for id=False. Use this:
for ul in content_container.find_all('ul', id=False):
for li in ul.find_all('li'):
list.append(li.text)
print(li.text)
This will ignore all tags that have id as an attribute. Also, your approach was nearly correct. You just need to check whether id is present in tag attributes, and not in tag itself (as you are doing). So, use if 'id' in ul.attrs() instead of if 'id' in ul
try this
all_uls = content_container.find_all('ul')
#assuming that the ul with id is the first ul
for i in range(1, len(all_uls)):
print(all_uls[i])

Categories

Resources