How to extract the value of the src attribute using Selenium

How to extract the value of the src attribute using Selenium - python

I have problems with something I don't even know the name.
I'm trying to reach the link next to where it says src= "LINK" with selenium.
I have class name = tWeCI, I guess I need to achieve using this but I have no idea how to do it.
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
browser.get("https://www.instagram.com/p/CYMHMEGBSRT/")
a = browser.find_element(by=By.CLASS_NAME, value="tWeCl")
Output:
https://instagram.fist4-1.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fist4-1.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX_ul9E4&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=c8bb85b1e6&ccb=7-4&oe=62408107&oh=00_AT-E77LrFH7G6VVXGeh8bRFQsu95hhQlqUGZdinzFBIUYQ&_nc_sid=83d603"
Snapshot of the HTML:

To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "video.tWeCl"))).get_attribute("src"))
Using XPATH:
driver.get('https://www.instagram.com/p/CYMHMEGBSRT/')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//video[#class='tWeCl']"))).get_attribute("src"))
Console Output:
https://instagram.fpnq13-2.fna.fbcdn.net/v/t50.2886-16/271301182_1082110899295465_6686845989868801609_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjcyMC5jbGlwcy5iYXNlbGluZSIsInFlX2dyb3VwcyI6IltcImlnX3dlYl9kZWxpdmVyeV92dHNfb3RmXCJdIn0&_nc_ht=instagram.fpnq13-2.fna.fbcdn.net&_nc_cat=101&_nc_ohc=6RJahYZ39DIAX-Iphu9&edm=AABBvjUBAAAA&vs=453007466354843_3354420889&_nc_vs=HBksFQAYJEdENjZLeERwbE1LVExOZ0RBRWtPNE5YLWNzeGNicV9FQUFBRhUAAsgBABUAGCRHS0ZaS1JCMldIUXZxRzhCQUJ4bFB2ZnVrbE1hYnFfRUFBQUYVAgLIAQAoABgAGwAVAAAmqN%2FFrpms4z8VAigCQzMsF0A%2BmZmZmZmaGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHX%2BBwA%3D&_nc_rid=34a08d5a01&ccb=7-4&oe=62408107&oh=00_AT9sh_BH__zjeReDB7lde4t3avzYqDjimTJRnoZi6Lj-TQ&_nc_sid=83d603
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Related

How to locate the email address element with a Facebook page using Selenium

So I am trying to first pull the body where the email address lies on this specific facebook page and then pull the email address plus other attributes within that body. However, I have tried every way to locate these elements using Chropath, and SelectorHub but nothing has worked. Any idea why I am unable to use these tools?
Code trials:
from ast import Return
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome("C:/Users/Carson/Desktop/chromedriver.exe")
driver.get("https://www.facebook.com/TheVillageAtGracyFarms")
driver.maximize_window()
try:
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "body._6s5d._71pn.system-fonts--body.segoe:nth-child(2) div.rq0escxv.l9j0dhe7.du4w35lb div.rq0escxv.l9j0dhe7.du4w35lb:nth-child(6) div.du4w35lb.l9j0dhe7.cbu4d94t.j83agx80 div.j83agx80.cbu4d94t.l9j0dhe7 div.j83agx80.cbu4d94t.l9j0dhe7.jgljxmt5.be9z9djy.qfz8c153 div.j83agx80.cbu4d94t.d6urw2fd.dp1hu0rb.l9j0dhe7.du4w35lb:nth-child(1) div.j83agx80.cbu4d94t.dp1hu0rb:nth-child(1) div.j83agx80.cbu4d94t.buofh1pr.dp1hu0rb.hpfvmrgz div.l9j0dhe7.dp1hu0rb.cbu4d94t.j83agx80 div.bp9cbjyn.j83agx80.cbu4d94t.d2edcug0:nth-child(4) div.rq0escxv.d2edcug0.ecyo15nh.k387qaup.r24q5c3a.hv4rvrfc.dati1w0a.tr9rh885:nth-child(2) div.rq0escxv.l9j0dhe7.du4w35lb.pfnyh3mw.gs1a9yip.j83agx80.btwxx1t3.lhclo0ds.taijpn5t.sv5sfqaa.o22cckgh.obtkqiv7.fop5sh7t div.rq0escxv.l9j0dhe7.du4w35lb.hpfvmrgz.g5gj957u.aov4n071.oi9244e8.bi6gxh9e.h676nmdw.aghb5jc5.o387gat7.g1e6inuh.fhuww2h9.rek2kq2y:nth-child(1) div.lpgh02oy:nth-child(2) div.sjgh65i0 div.j83agx80.l9j0dhe7.k4urcfbm div.rq0escxv.l9j0dhe7.du4w35lb.hybvsw6c.io0zqebd.m5lcvass.fbipl8qg.nwvqtn77.k4urcfbm.ni8dbmo4.stjgntxs.sbcfpzgs div.sej5wr8e div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.pfnyh3mw.i1fnvgqd.gs1a9yip.owycx6da.btwxx1t3.hv4rvrfc.dati1w0a.discj3wi.b5q2rw42.lq239pai.mysgfdmx.hddg9phg:nth-child(2) > div.rq0escxv.l9j0dhe7.du4w35lb.j83agx80.cbu4d94t.d2edcug0.hpfvmrgz.rj1gh0hx.buofh1pr.g5gj957u.p8fzw8mz.pcp91wgn.iuny7tx3.ipjc6fyt")))
print("Dub")
except:
print("Failed")
driver.quit()

The Email Address field within the webpage contains dynamic elements.
To locate the Email Address field instead of presence_of_element_located() you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using XPATH and the text #:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., '#')]"))).text)
Using XPATH and the texts # & com:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(., '#') and contains(., 'com')]"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Web scraping Tennis24 in play stats

I have been trying to work out how to scrape the live and updating statistics on Tennis 24 "https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0" a page such as this but when I try to use selenium nothing is returned. even if I just try to return the 1 element such as
<div class="statText statText--awayValue">4</div>
Could someone please give me some pointers as this is my first scraping project?

To print the text 4 you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH and text attribute:
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).text)
Using XPATH and get_attribute('innerHTML'):
driver.get('https://www.tennis24.com/match/4xFaW6fP/#match-statistics;0')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='statText statText--titleValue' and text()='Aces']//following::div"))).get_attribute('innerHTML'))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How can I select the image link from src html part using python selenium

I'm trying to get all the links of the images from the below mentioned html code type:
<img src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg">
I am not able to fetch any link from this part of html out f 3 any 1 is good for me. Im new to python please guide.

To print the value of the src attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='elevatezoom-gallery']/img"))).get_attribute("src"))
Using CSS_SELECTOR:
print(WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.elevatezoom-gallery>img"))).get_attribute("src"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can achieve that by using the find_elements_by_css_selector() function.
links = self.webdriver.find_elements_by_css_selector('img[src="https://img.xyz.com/rcmshopping/PROD_IMAGES/fffff_2018_05_10_10_49_23.jpg"])

xpath to get a simple div id="xyz" with selenium

the html I'm digging (there is no iframe on it) is:
<div id="app" class="container-fluid">
I try to get the block of the DOM tree corresponding to the above div
this is my code:
from selenium.common.exceptions import NoSuchElementException
try:
app = driver.find_element_by_xpath('//div[#id="app"]')
print(4444444444444444, app)
is_vue_app_loaded = len(app.text) != 0
except NoSuchElementException as e:
print(f'e: {e}')
I get this error:
4444444444444444 <selenium.webdriver.firefox.webelement.FirefoxWebElement
(session="c8da2a89-63fd-4dce-bdb5-28fdf0e67b01",
element="b9be2828-115e-4965-8ebc-b75c09236193")>
e: Message: Unable to locate element: //div[#id='app']
Is there a way to ask the driver to return me the raw html so I can check that the DOM try I think I have is the one I have actualy ?

Check if the element is present inside any iframe.If there you need to switch it first.
If Not then try.
Induce WebDriverWait() and visibility_of_element_located()
WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH,"//div[#id='app' and #class='container-fluid']")))
You need to import following libraries.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

If your usecase is to return me the raw html of the element you have to induce WebDriverWait for the visibility_of_element_located() inconjunction with get_attribute("outerHTML") and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.container-fluid#app"))).get_attribute("outerHTML"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='container-fluid' and #id='app']"))).get_attribute("outerHTML"))
Note : You have to add the following imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Why can't I find element with selenium?

So I've tried almost every method that I knew of including xpath and nothing. It should be a simple find element and click and that's it, but I can't figure it out.
Here is my code
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
ff_options = Options()
#profile
binary = FirefoxBinary("C:\\Program Files\\Mozilla Firefox\\firefox.exe")
profile = webdriver.FirefoxProfile('C:\\Users\\bravoj\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\7k4o5psw.CCC Deafualt')
ff_driver = webdriver.Firefox(executable_path='C:\\Users\\bravoj\Downloads\\geckodriver.exe')
#fire fox driver
ff_driver.get('about:profiles')
#ff_driver.get('about:preferences#home')
ff_driver.find_element_by_id('profiles-set-as-default').click()
There is a duplicate code too

'profiles-set-as-default' is not id attribute, it's data-l10n-id attribute. You can use css_selector to locate it, and get unique element using previous brother element with data-l10n-args {CCC Deafult}
ff_driver.find_element_by_css_selector('[data-l10n-args*="CCC Deafult"] ~ [data-l10n-id="profiles-set-as-default"]')

The element with text as Set as default profile is a dynamic element so to locate and click() on the element you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.tab#profiles button[data-l10n-id='profiles-set-as-default']"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#data-l10n-id='profiles-set-as-default' and text()='Set as default profile']"))).click()
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract the value of the src attribute using Selenium - python

Related

How to locate the email address element with a Facebook page using Selenium

Web scraping Tennis24 in play stats

How can I select the image link from src html part using python selenium

xpath to get a simple div id="xyz" with selenium

Why can't I find element with selenium?

Categories

Resources