I want to get information from table at the page https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR/ .
This is a table, which automatically change her items(mb with js, ajax).
If i write following code, I get error 'HtmlElement' object has no attribute 'find_element_by_xpath'
url = 'https://www.oddsportal.com/soccer/england/premier-league/wolves-newcastle-utd-nNNqedbR/'
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)
html = lxml.html.fromstring(driver.page_source)
tbody = html.find_element_by_xpath('//*[#id="odds-data-table"]/div[1]/table/tbody')
trows = tbody.find_elements_by_tag_name("tr")
lxml is (presumably) the lxml library, so your html object is an instance of it. As the exception says - it does not have find_element_by_xpath() and tag_name methods, those are in the selenium library.
So instead of working with the html object, work with driver:
tbody = driver.find_element_by_xpath('//*[#id="odds-data-table"]/div[1]/table/tbody')
trows = tbody.find_element_by_tag_name("tr")
Related
I am new bee on python selenium environment. I am trying to get the SQL version table from enter link description here
from selenium.webdriver.common.by import By
from selenium import webdriver
# define the website to scrape and path where the chromediver is located
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
# define 'driver' variable
# open Google Chrome with chromedriver
driver.get(website)
matches = driver.find_elements(By.TAG_NAME, 'tr')
for match in matches:
b=match.find_elements(By.XPATH,"./td[1]")
print(b.text)
it says AttributeError: 'list' object has no attribute 'text'. Am i choosing the write syntax and right parameters to grab the data?
Below is the table which i am trying to get data.
enter image description here
Below are the parameters which i am trying to put in code.
enter image description here
Please advise what is required to modify in the code to obtain the data in table format.
Thanks,
Arun
If you need data only from first table:
from selenium.webdriver.common.by import By
from selenium import webdriver
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
driver.get(website)
show_service_pack_versions = True
xpath_first_table_sql_rows = "(//table[#class='tbl'])[1]//tr/td/a[starts-with(text(),'SQL Server')]//ancestor::tr"
matches = driver.find_elements(By.XPATH, xpath_first_table_sql_rows)
for match in matches:
sql_server_a_element = match.find_element(By.XPATH, "./td/a[2]")
print(sql_server_a_element.text)
sql_server_rtm_version_a_element = match.find_element(By.XPATH, ".//td[#class='rtm']")
print('RTMs:')
print(sql_server_rtm_version_a_element.text)
if(show_service_pack_versions):
print('SPs:')
sql_server_sp_version_td_elements = match.find_elements(By.XPATH, ".//td[#class='sp']")
for td in sql_server_sp_version_td_elements:
print('---')
print(td.text)
print('----------------------------------')
if you set show_service_pack_versions = False then information regarding service packs will be skipped
There was a part of your code where you were calling b.text after getting the result of find_elements, which returns a list. You can only call b.text on a single WebElement (not a list of them). Here's the updated code:
from selenium.webdriver.common.by import By
from selenium import webdriver
website = "https://www.sqlserverversions.com"
driver = webdriver.Chrome(executable_path='/Users//Downloads/chromedriver/chromedriver.exe')
driver.get(website)
matches = driver.find_elements("css selector", "tr")
for match in matches[1:]:
items = match.find_elements("css selector", "td")
for item in items:
print(item.text)
That will print out A LOT of rows, unless you limit the loop.
If you just need text it's simpler to do it on the browser side:
data = driver.execute_script("""
return [...document.querySelectorAll('tr')].map(tr => [...tr.querySelectorAll('td')].map(td => td.innerText))
""")
I'm trying to pull the href and the data-promoname from the
URL:
https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav
I tried the code below but can only extract href under the class "promo-focus", but I also want to get the COVID-19 Economic cases: Scenarios for business leaders from data-promoname
driver = webdriver.Chrome(executable_path=r'C:\chromedriver.exe')
url = "https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav"
driver.get(url)
for i in driver.find_elements_by_class_name('promo-focus'):
print(i.get_attribute('href'))
Can anyone tell me how to do that using Python?
Try using the text method to get the text.
Example
from selenium import webdriver
chrome_browser = webdriver.Chrome()
url = "https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav"
chrome_browser.get(url)
for a in chrome_browser.find_elements_by_class_name('promo-focus'):
print(a.get_attribute('href'))
print(a.text)
To get the value from data-promoname you can do this by using .get_attribute method. This method can be used to get the value of any attribute corresponding to its tag.
driver_path = 'C:/chromedriver.exe' #the path to your chrome driver
browser = webdriver.Chrome(driver_path)
url_to_open = 'https://www2.deloitte.com/global/en/pages/about-deloitte/topics/combating-covid-19-with-resilience.html?icid=covid-19_article-nav'
browser.get(url_to_open)
for a in browser.find_elements_by_class_name('promo-focus'):
print(a.get_attribute('href'))
print(a.get_attribute("data-promoname"))
If you are looking for the content being displayed on the page under the anchor tags, you can use .text instead
print(a.text)
I am trying to get data from evernote 'shared notebook'.
For example, from this one: https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c
I tried to use Beautiful Soup:
url = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c'
r = requests.get(url)
bs = BeautifulSoup(r.text, 'html.parser')
bs
The result doesn't contain any text information from the notebook, only some code.
I also seen an advice to use selenium and find elements by XPath.
For example I want to find the head of this note - 'Term 3 Week2'. In Google Chrome i found that it's XPath is '/html/body/div[1]/div[1]/b/span/u/b'.
So i tried this:
driver = webdriver.PhantomJS()
driver.get(url)
t = driver.find_element_by_xpath('/html/body/div[1]/div[1]/b/span/u/b')
But it also didn't work, the result was 'NoSuchElementException:... '.
I am a newbie in python and especially parsing, so I would be glad to receive any help.
I am using python 3.6.2 and jupiter-notebook.
Thanks in advance.
The easiest way to interface with Evernote is to use their official Python API.
After you've configured your API key and can generally connect, you can then download and reference Notes and Notebooks.
Evernote Notes use their own template language called ENML (EverNote Markup Language) which is a subset of HTML. You'll be able to use BeautifulSoup4 to parse the ENML and extract the elements you're looking for.
If you're trying to extract information against a local installation (instead of their web app) you may also be able to get what you need from the executable. See how to pass arguments to the local install to extract data. For this you're going to need to use the Python3 subprocess module.
HOWEVER
If you want to use selenium, this will get you started:
import selenium.webdriver.support.ui as ui
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
# your example URL
URL = 'https://www.evernote.com/pub/missrspink/evernoteexamples#st=p&n=56b67555-158e-4d10-96e2-3b2c57ee372c'
# create the browser interface, and a generic "wait" that we can use
# to intelligently block while the driver looks for elements we expect.
# 10: maximum wait in seconds
# 0.5: polling interval in seconds
driver = Chrome()
wait = ui.WebDriverWait(driver, 10, 0.5)
driver.get(URL)
# Note contents are loaded in an iFrame element
find_iframe = By.CSS_SELECTOR, 'iframe.gwt-Frame'
find_html = By.TAG_NAME, 'html'
# .. so we have to wait for the iframe to exist, switch our driver context
# and then wait for that internal page to load.
wait.until(EC.frame_to_be_available_and_switch_to_it(find_iframe))
wait.until(EC.visibility_of_element_located(find_html))
# since ENML is "just" HTML we can select the top tag and get all the
# contents inside it.
doc = driver.find_element_by_tag_name('html')
print(doc.get_attribute('innerHTML')) # <-- this is what you want
# cleanup our browser instance
driver.quit()
I am trying to scrape data from a data table on this website: [http://www.oddsshark.com/ncaab/lsu-alabama-odds-february-18-2017-744793]
The site has multiple tabs, which changes the html (I am working in the 'matchup' tab). Within that matchup tab, there is a drop-down menu that changes the data table that I am trying to access. The items in the table that I am trying to access are 'li' tags within an unordered list. I just want to scrape the data from the "Overall" category of the drop-down menu.
I have been unable to access the data that I want. The item that I'm trying to access is coming back as a 'noneType'. Is there a way to do this?
url = "http://www.oddsshark.com/ncaab/lsu-alabama-odds-february-18-2017-
744793"
html_page = requests.get(url)
soup = BeautifulSoup(html_page.content, 'html.parser')
dataList = []
for ultag in soup.find_all('ul', {'class': 'base-list team-stats'}):
print(ultag)
for iltag in ultag.find_all('li'):
dataList.append(iltag.get_text())
So the problem is that the content of the tab you are trying to pull data from is dynamically loaded using React JS. So you have to use the selenium module in Python to open a browser to click the list element "Matchup" programmatically then get the source after clicking it.
On my mac I installed selenium and the chromewebdriver using these instructions:
https://gist.github.com/guylaor/3eb9e7ff2ac91b7559625262b8a6dd5f
Then signed the python file, so that the OS X firewall doesn't complain to us when trying run it, using these instructions:
Add Python to OS X Firewall Options?
Then ran the following python3 code:
import os
import time
from selenium import webdriver
from bs4 import BeautifulSoup as soup
# Setup Selenium Chrome Web Driver
chromedriver = "/usr/local/bin/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
# Navigate in Chrome to specified page.
driver.get("http://www.oddsshark.com/ncaab/lsu-alabama-odds-february-18-2017-744793")
# Find the matchup list element using a css selector and click it.
link = driver.find_element_by_css_selector("li[id='react-tabs-0'").click()
# Wait for content to load
time.sleep(1)
# Get the current page source.
source = driver.page_source
# Parse into soup() the source of the page after the link is clicked and use "html.parser" as the Parser.
soupify = soup(source, 'html.parser')
dataList = []
for ultag in soupify.find_all('ul', {'class': 'base-list team-stats'}):
print(ultag)
for iltag in ultag.find_all('li'):
dataList.append(iltag.get_text())
# We are done with the driver so quit.
driver.quit()
Hope this helps as I noticed this was a similar problem to the one I just solved here - Beautifulsoup doesn't reach a child element
I am using Selenium and Splinter to run my UI tests for my web application. I am generating a random id for the views on the page and would like to select the random ID for testing.
Here is the code I am using
from selenium import webdriver
from splinter import Browser
executable_path = {'executable_path':'./chromedriver.exe'}
browser = Browser('chrome', **executable_path)
data_view_id = browser.find_by_xpath('//ul[#class="nav"]').find_by_xpath('.//a[#href="#"]')[0].get_attribute("data-view-id")
# I am trying to get the id for the first item in the nav list and use it elsewhere
print(data_view_id)
This is the error I am receiving:
AttributeError: 'WebDriverElement' object has no attribute 'get_attribute'
I've looked at the 'readthedocs' page for WebElement and it has the 'get_attribute' value. I cannot find any documentation regarding WebDriverElements and need help accessing the WebElement instead
That WebDriverElement is from Splinter, not Selenium.
In Splinter, you access attributes like a dict (see the Splinter docs)
data_view_id = browser.find_by_xpath('//ul[#class="nav"]').find_by_xpath('.//a[#href="#"]')[0]['data-view-id']
Or if you wanted to do it in Selenium:
browser.find_element_by_xpath('//ul[#class="nav"]').find_element_by_xpath('.//a[#href="#"]').get_attribute("data-view-id")