Get text using selenium PhantomJS inside Span - python

I tried to get text inside span using Seleniung webdriver PhantomJS. My code is like this :
href = driver.find_elements_by_xpath("//a[#class='_8mlbc _vbtk2 _t5r8b']")
for rt in href:
rt.click()
if href:
name = driver.find_elements_by_xpath("//*[#class='_99ch8']/span").text
# name = driver.find_element_by_xpath("//li[a[#title='nike']]/span").text
print(name)
In HTML :
<li class="_99ch8"><a class="_4zhc5 notranslate _ebg8h" title="nike" href="/nike/">nike</a><span><span>Nobody believed a boy from Madeira would make it to the stars. Except the boy from Madeira. </span><br>#nike<span> </span>#soccer<span> </span>#football<span> </span>#CR7<span> </span>#Cristiano<span> </span>#CristianoRonaldo<span> </span>#Mercurial<span> </span>#justdoit</span></li>
I want try to get text inside span.

You cannot use XPath expression that returns text node as it unacceptable option for selenium- selector should return WebDriver element only
Also note that class name of li seem to be dynamic, so you might use title attribute value of child anchor instead:
driver.find_element_by_xpath("//li[a[#title='nike']]/span").text
UPDATE
The complete code:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get('https://www.instagram.com/nike/')
links = driver.find_elements_by_xpath('//a[contains(#href, "/?taken-by=nike")]')
for link in links:
link.click()
wait(driver, 5).until(EC.presence_of_element_located((By.XPATH, "//div/article")))
print(driver.find_element_by_xpath("//li[a[#title='nike']]/span").text)
driver.find_element_by_xpath("//div[#role='dialog']/button").click()
UPDATE#2
You can also simply grab the same text without opening each image:
links = driver.find_elements_by_xpath('//img')
for img in links:
print(img.get_attribute('alt'))

I think first of all, if you want to go for a single element, you need to use find_element_by_xpath() method instead of find_elements_by_xpath() method to get to the element.
If you're using find_elements_by_xpath() then you need to use a looping statement for printing all the names that comes in the name variable.
Also, using the .text property of an element would give you the desired result.
Try this
name = driver.find_element_by_xpath(//li[#class='_69ch8']/span).text
print(name)

Related

How to use XPath to scrape javascript website values

I'm trying to scrape (in python) the savings interest rate from this website using the value's xpath variable.
I've tried everything: beautifulsoup, selenium, etree, etc. I've been able to scrape a few other websites successfully. However, this site and many others are giving me fits. I'd love a solution that can scrape info from several sites regardless of their formatting using xpath variables.
My current attempt:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
service = Service(executable_path="/chromedriver")
options = Options()
options.add_argument(' — incognito')
options.headless = True
driver = webdriver.Chrome(service=service, options=options)
url = 'https://www.americanexpress.com/en-us/banking/online-savings/account/'
driver.get(url)
element = driver.find_element(By.XPATH, '//*[#id="hysa-apy-2"]')
print(element.text)
if element.text == "":
print("Error: Element text is empty")
driver.quit()
The interest rates are written inside span elements. All span elements which contain interest rates share the same class heading-6. But bear in mind, the result returns two span elements for each interest rate, each element for a different viewport.
The xpath selector:
'//span[#class="heading-6"]'
You can also get elements by containing text APY:
'//span[contains(., "APY")]'
But this selector looks for all span elements in the DOM that contain word APY.
If you find unique id, it is recommended to be priority, like this :find_element(By.ID,'hysa-apy-2') like #John Gordon comment.
But sometimes when the element found, the text not yet load.
Use xpath with add this logic and text()!=""
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//span[#id="hysa-apy-2" and text()!=""]')))
Following import:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Can't get all the necessary links from web-page via Selenium

I'm currently trying to use some automation while performing a patent searching task. I'd like to get all the links corresponding to search query result. Particularly, I'm interested in Apple patents starting from the year 2015. So the code is the next one -
import selenium
from selenium import webdriver
from selenium.webdriver.firefox.options import Options as options
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
new_driver_path = r"C:/Users/alexe/Desktop/Apple/PatentSearch/geckodriver-v0.30.0-win64/geckodriver.exe"
ops = options()
serv = Service(new_driver_path)
browser1 = selenium.webdriver.Firefox(service=serv, options=ops)
browser1.get("https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new")
elements = browser1.find_elements(By.CLASS_NAME, "search-result-item")
links = []
for elem in elements:
href = elem.get_attribute('href')
if href:
links.append(href)
links = set(links)
for href in links:
print(href)
And the output is the next one -
https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf
https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf
https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf
https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf
https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf
https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf
https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf
https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf
https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf
The problem here is that I've got 1 missing link -
result item and the missing link
So I've tried different selectors and still got the same result - one link is missing. I've also tried to search with different parameters and the pattern is the next one - all the missing links aren't linked with pdf output. I've spent a lot of time trying to figure out what's the reason, so I would be really grateful If you could provide me with any clue on the matter. Thanks in advance!
The option highlighted has no a tag with class pdflink in it. Put the line of code to extract the link in try block. If the required element is not found, search for the a tag available for that article.
Try like below once:
driver.get("https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new")
articles = driver.find_elements_by_tag_name("article")
print(len(articles))
for article in articles:
try:
link = article.find_element_by_xpath(".//a[contains(#class,'pdfLink')]").get_attribute("href") # Use a dot in the xpath to find an element with in an element.
print(link)
except:
print("Exception")
link = article.find_element_by_xpath(".//a").get_attribute("href")
print(link)
10
https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf
https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf
https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf
https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf
https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf
https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf
Exception
https://patents.google.com/?assignee=apple&after=priority:20150101&sort=new#
https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf
https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf
https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf
To extract all the href attributes of the pdfs using Selenium and python you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.search-result-item[href]")))])
Using XPATH:
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[contains(#class, 'search-result-item') and #href]")))])
Console Output:
['https://patentimages.storage.googleapis.com/86/30/47/7bc39ddf0e1ea7/KR20210106968A.pdf', 'https://patentimages.storage.googleapis.com/e6/16/c0/292a198e6f1197/AU2021218193A1.pdf', 'https://patentimages.storage.googleapis.com/3e/77/e0/b59cf47c2b30a1/AU2021212005A1.pdf', 'https://patentimages.storage.googleapis.com/c1/1a/c6/024f785fd5ea10/AU2021204695A1.pdf', 'https://patentimages.storage.googleapis.com/1b/3d/c2/ad77a8c9724fbc/AU2021204422A1.pdf', 'https://patentimages.storage.googleapis.com/ca/2a/bc/9380e1657c2767/US20210318798A1.pdf', 'https://patentimages.storage.googleapis.com/b3/19/cc/8dc1fae714194f/US20210312694A1.pdf', 'https://patentimages.storage.googleapis.com/ed/06/50/67e30960a7f68d/JP2021152951A.pdf', 'https://patentimages.storage.googleapis.com/ad/bc/0f/d1fcc65e53963e/US20210314041A1.pdf']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
PS: You can extract only nine(9) href attributes as one of the search items is a <span> element and isn't a link i.e. doesn't have the href attribute

How do i scrape something from multiple divs under one parent div in selenium?

Hello i am new to scraping, I want to scrape some text from multiple divs that are under one parent div.. I have attached the screen shot of the html.
Under the class "partnerships_cont"
there are multiple divs with class "items".. from these divs i want to scrape the div i marked. But i run into an error.
This is the code i used:
def get_partnerships(driver):
WebDriverWait(driver,15).until(EC.visibility_of_element_located((By.XPATH,"//div[contains(#class, 'partnerships-cont')]/div[1]")))
partnerships_cont = driver.find_element_by_xpath("//div[contains(#class, 'partnerships-cont')]")
items = partnerships_cont.find_element_by_xpath("//div[contains(#class, 'item')]")
for item in items:
div = item.find_element_by_xpath("//div[1]")
text = div.find_element_by_xpath("//div").text
print(text)
driver = webdriver.Chrome(r'C:\Users\User\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_py\chromedriver_win32.exe')
driver.get('https://xangle.io/project/ZRX/full-disclosure')
get_partnerships(driver)
No matter what i do i get this error:
TypeError: 'WebElement' object is not iterable
please could you tell me why i get this error? and how do i fix it?
Exception because it should be find_elements() instead of find_element(). Change below code and try
items = partnerships_cont.find_elements_by_xpath("//div[contains(#class, 'item')]")
Optimized way of your code to grab name and description:
def get_partnerships(driver):
items = driver.find_elements_by_css_selector('div.partnerships-cont>.item')
for item in items:
name = item.find_element_by_css_selector('div.name.fv1')
desc = item.find_element_by_css_selector('div.description.fv1')
print(name.text)
print(desc.text)
driver = webdriver.Chrome(r'C:\Users\User\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_py\chromedriver_win32.exe')
driver.get('https://xangle.io/project/ZRX/full-disclosure')
get_partnerships(driver)
You can use the simple css selector achieve that.Induce WebDriverWait() and wait for visibility_of_all_elements_located()
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
driver = webdriver.Chrome(r'C:\Users\User\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_py\chromedriver_win32.exe')
driver.get('https://xangle.io/project/ZRX/full-disclosure')
elements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,".partnerships-cont>.item>.info-cont>.name.fv1")))
for ele in elements:
print(ele.text)
Output:
Harbor
Aragon

Scraper unable to extract titles from a website

I've written a script in python in combination with Selenium to extract the titles of different news being displayed in the left sided bar in finance.yahoo website. I've used css selector to get the content. However, the script is neither giving any result nor throwing any error. I can't figure out the mistake I'm making. Hope somebody will take a look into it. Thanks in advance.
Here is my script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://finance.yahoo.com/")
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "u.StretchedBox")))
for item in driver.find_elements_by_css_selector("u.StretchedBox span"):
print(item.text)
driver.quit()
Elements within which the titles are:
<h3 class="M(0)" data-reactid="128"><a rel="nofollow noopener noreferrer" class="Fw(b) Fz(20px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 Td(n) C(#0078ff):h C(#000)" target="_blank" href="https://beap.gemini.yahoo.com/mbclk?bv=1.0.0&es=bVwDtPMGIS8NDKqncZWZBjLsQQHm58Z9cLJuMqC6LadDlYfVCoy.d3GqO599EPAiYnsxB0SB8aRURPve9Q8mOEjH.NrcVcVDhldut.C_9Vn16XER1q1G07a48FMQ_.sv9GCyVx7zcj1kBtWPysaYzQqboJWgUo5DRRHbAnejwVtYRPHJTEptil92tx_ccJZ9FnxE8L3tfDuS0Q3l5ftVhamTOon_nzuvtvqqBwD7X0T.7Z3wZBgtH93gM1xImZ0hdFUzsuQPDAjZWs1KdH0YsXIf3uLrmcJFoI9leh8KRljnIPC.RdhOF6OYcJfHtDks85nSIgfOsMyUr1wEhMA2Qa2htpEg5w.P4UIXeoldjzJ_NsUrtXqEFIJNKoaeq_FNiQ9wcI16utKO87167zkfSPzVY09d3pVLZg20V7tqTThOkG_IakPnmlOriJKnufsBWj1wp.6Q4PasAt2g4Y1yw9U71FIfG2dDwpryRKDWrUBfTvjwwItlSyXyvWvIYUyXXxR74qWcIEC3KAvVN7.iqSckV_EssVM8ytp5HiN4iTACpEmc96rpdNEqHYpRotwze8NF5cDubsZbW58Hauq_aO.DbhZJ7TbBDx5vZK_M%26lp=https%3A%2F%2Fin.search.yahoo.com%2Fsearch%3Fp%3Dcheap%2Bairfare%2Bdomestic%26fr%3Dstrm-tts-thg%26.tsrc%3Dstrm-tts-thg%26type%3Dcheapairfaredomestic-in" data-reactid="129">
<u class="StretchedBox" data-reactid="130"></u>
<span data-reactid="131">The Cheapest Domestic Airfare Rates</span></a></h3>
You didn't get neither error nor results because:
find_elements_...() method intend to return you a list. If your selector match no elements you won't get error, just an empty list. Also if to try to iterate through the empty list, you won't get error
your CSS selector should match span that is descendant of u with attribute class="StretchedBox", but actually required span is not descendant, but sibling.
Try to use below code:
for item in driver.find_elements_by_css_selector("u.StretchedBox+span"):
print(item.text)

Python Selenium: find and click an element

How do I use Python to simply find a link containing text/id/etc and then click that element?
My imports:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
Code to go to my URL:
browser = webdriver.Firefox() # Get local session of firefox
browser.get(myURL) # Load page
#right here I want to look for element containing "Click Me" in the text
#and then click it
Look at method list of WebDriver class - http://selenium.googlecode.com/svn/trunk/docs/api/py/webdriver_remote/selenium.webdriver.remote.webdriver.html
For example, to find element by its ID and click it, the code will look like this
element = browser.find_element_by_id("your_element_id")
element.click()
Hi there are a couple of ways to find a link (or any element):
element = driver.find_element_by_id("element-id")
element = driver.find_element_by_name("element-name")
element = driver.find_element_by_xpath("//input[#id='element-id']")
element = driver.find_element_by_link_text("link-text")
element = driver.find_element_by_class_name("class-name")
I think the best option for you is find_element_by_link_text since it's a link.
Once you saved the element in a variable, you call the click function: element.click() or element.send_keys(Keys.RETURN)
Take a look to the selenium-python documentation, there are a couple of examples there.
You just use this code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Firefox()
driver.get("https://play.spotify.com/")# here change your link
wait=WebDriverWait(driver,250)
# it will wait for 250 seconds an element to come into view, you can change the #value
submit=wait.until(EC.presence_of_element_located((By.LINK_TEXT, 'Click Me')))
submit.click()
here wait is best fit for javascript enabled website let suppose if you load the page in webdriver but due to javascript some element loads after some second then it best suite to use wait element there.
for link containing text
browser = webdriver.firefox()
browser.find_element_by_link_text(link_text).click()
You can also do it by xpath:
Browser.find_element_by_xpath("//a[#href='you_link']").click()
Finding Element by Link Text
driver.find_element_by_link_text("")
Finding Element by Name
driver.find_element_by_name("")
Finding Element By ID
driver.find_element_by_id("")
Try SST - it is a simple yet very good test framework for python.
Install it first: http://testutils.org/sst/index.html
Then:
Imports:
from sst.actions import *
First define a variable - element - that you're after
element = assert_element(text="some words and more words")
I used this: http://testutils.org/sst/actions.html#assert-element
Then click that element:
click_element('element')
And that's it.
First start one instance of browser. Then you can use any of the following methods to get element or link. After locating the element you can use element.click() or element.send_keys(Keys.RETURN) or any other object of selenium webdriver
browser = webdriver.Firefox()
Selenium provides the following methods to locate elements in a page:
To find individual elements. These methods will return individual element.
browser.find_element_by_id(id)
browser.find_element_by_name(name)
browser.find_element_by_xpath(xpath)
browser.find_element_by_link_text(link_text)
browser.find_element_by_partial_link_text(partial_link_text)
browser.find_element_by_tag_name(tag_name)
browser.find_element_by_class_name(class_name)
browser.find_element_by_css_selector(css_selector)
To find multiple elements (these methods will return a list). Later you can iterate through the list or with elementlist[n] you can locate individual element from list.
browser.find_elements_by_name(name)
browser.find_elements_by_xpath(xpath)
browser.find_elements_by_link_text(link_text)
browser.find_elements_by_partial_link_text(partial_link_text)
browser.find_elements_by_tag_name(tag_name)
browser.find_elements_by_class_name(class_name)
browser.find_elements_by_css_selector(css_selector)
browser.find_element_by_xpath("")
browser.find_element_by_id("")
browser.find_element_by_name("")
browser.find_element_by_class_name("")
Inside the ("") you have to put the respective xpath, id, name, class_name, etc...

Categories

Resources