Python webscraping with Selenium chrome driver - python

I'm trying to get the number of publications of an instagram account which is in a span tag by using Python Selenium with Chrome driver this is a part of the html code:
<!doctype html>
<html lang="fr" class="js logged-in client-root js-focus-visible sDN5V">
<head>-</head>
<body class style>
<div id="react-root"> == 50
<form enctype^murtipart/form-data" method="POST" role="presentation">_</form>
<section class=”_9eogI E3X2T">
<div></div>
<main class="SCxLW o64aR " role=”main">
<div class=”v9tJq AAaSh VfzDr">
<header class=" HVbuG">_</header>
► <div class="-vDIg">_</div>
► <div class="_4bSq7">_</div>
▼ <ul class=” _3dEHb">
▼ <li class=” LH36I">
▼ <span class=" _81NM2">
<span class="g47SY 10XF2">6 588</span>
"publications"
</span>
</li>
THE PYTHON CODE
def get_publications_number(self, user):
self.nav_user(user)
sleep(16)
publication = self.driver.find_element_by_xpath('//div[contains(id,"react-root")]/section/main/div/ul/li[1]/span/span')
THE ERROR MESSAGE
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element:
{"method":"xpath","selector":"//div[contains(id,"react-root")]/section/main/div/ul/li[1]/span/span"}
(Session info: chrome=80.0.3987.149)
IMPORTANT:
This xpath is pasted from the Chrome element inspector so I don't think it's the problem. When I put self.driver.find_elements_by_xpath() (with 's') there will be no error and if I do:
for value in publication:
print(value.text)
there will be no error too but nothing will be printed
SO THE QUESTION IS:
Why am I getting this error while the Xpath exists?

Try
'//div[#id="react-root"]//ul/li//span[contains(., "publications")]/span'
Explanation:
//div[#id="react-root"] << find the element which has the id of "react-root"
//ul/li << inside the found react root find elements anywhere (//) which are li elements which are children of an ul tagged element
//span[contains(., "publications")] << in the found li elements find span elements anywhere which contain publications as text
/span get span elements of the found span
One more thing: find_element_by_xpath returns the first element which matches. In case you have more than one 'publications', you can collect them all with the xpath above (if you want to ) if you just use find_elements_by_xpath instead of find_element_by_xpath in selenium.
Recently I found this page which is a quite good read to start mastering Xpath, check it out if you want to know more.

//div[contains(id,"react-root")]/section/main/div/ul/li[1]/span/span
Use this Xpath. It might work. I think you made a coma error there.

Related

XPATH target div and image in loop?

Here's the document struvture:
<div class="search-results-container">
<div>
<div class="feed-shared-update-v2">
<div class="update-components-actor">
<div class="update-components-actor__image">
<img class="presence-entity__image" src="https://www.testimage.com/test.jpg"/>
<span></span>
<span>test</span>
</div>
</div>
</div>
</div>
<div>
<div class="feed-shared-update-v2">
<div class="update-components-actor">
<div class="update-components-actor__image">
<img class="presence-entity__image" src="https://www.testimage.com/test.jpg"/>
<span></span>
<span>test</span>
</div>
</div>
</div>
</div>
</div>
not sure the best way to do this but hoping someone can help. I have a for loop that grabs all the divs that precede a div with class "feed-shared-update-v2". This works:
elements = driver.find_elements(By.XPATH, "//*[contains(#class, 'feed-shared-update-v2')]//preceding::div[1]");
I then run a for loop over it:
for card in elements:
however i'm having trouble trying to target the img and the second span in these for loops. I tried:
for card in elements:
profilePic = card.find_element(By.XPATH, ".//following::div[#class='update-components-actor']//following::img[1]").get_attribute('src')
text = card.find_element(By.XPATH, ".//following::div[#class='update-components-text']//following::span[2]").text
but this produces a error saying:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//following::div[#class='update-components-actor']//following::img[1]"}
so I'm hoping someone can point me in the right direction as to what i'm doing wrong. I know its my xpath syntax and i'm not allowed to chain "followings" (although even just trying .//following doesn't work, so is ".//" not the right syntax?) but i'm not sure what the right syntax should be, especially since the span does not have a class. :(
Thanks!
I guess you are overusing the following:: axis. Simply try the following (no pun intended):
For your first expression use
//*[contains(#class, 'feed-shared-update-v2')]/..
This will select the parent <div> of the <div class="feed-shared-update-v2">. So you will select the whole surrounding element.
To retrieve the children you want, use these XPaths: .//img/#src and .//span[2]. Full code is
for card in elements:
profilePic = card.find_element(By.XPATH, ".//img").get_attribute('src')
text = card.find_element(By.XPATH, ".//span[2]").text
That's all. Hope it helps.
It seems in the span that there is not such class of div called: update-components-text
did you mean: update-components-actor?
Im not such a fan of xpath, but when i copied your html and img selector, it did find me 2 img, maybe you are not waiting for the element to load, and then it fails?
try using implicit/explicit waits in your code.
I know you are using xpath, but concider using css
This might do the trick:
.feed-shared-update-v2 span:nth-of-type(2)
And if you want a css of the img:
.feed-shared-update-v2 img

Locating an element using Python and Selenium via innerHTML

I'm new to Selenium and I'm trying to write my first real script using the package for Python.
I'm using:
Windows 10
Python 3.10.5
Selenium 4.3.0
So far I've been able to do everything I need with different selectors, like ID, name, XPATH etc.
However I've stumbled upon an issue where I need to find a specific element by using the innerHTML of it.
The issue I'm facing is that I need to find an element with the innerHTML-value of "Changed" as seen in the HTML below.
The first challenge I'm facing is that the element doesn't have a unique ID, name or otherwise to identify it and there's many objects/elements of "dlx-treeview-node".
The second challenge is that XPATH won't work because the element changes position depending on where you are on the website (the number of "dlx-treeview-node"-elements change), so if I use XPATH I'll get the wrong element depending on where I am.
I can successfully get the name by using the below XPATH, "get_attribute" and printing to console, which is why I know it's innerHTML and not innerText, but as mentioned this will change depending on where I am on the website.
I would really appreciate any help I can get to solve this challenge and to learn more about the use of Selenium with Python.
Code trials:
select_filter_name = wait.until(EC.element_to_be_clickable((By.XPATH, "/html/body/div/app-root/dlx-select-filter-attribute-dialog/dlx-dialog-window/div/div[2]/div/div/div[5]/div/div/dlx-view-column-selector-component/div[1]/dlx-treeview/div/dlx-treeview-nodes/div/dlx-treeview-nodes/div/dlx-treeview-node[16]/div/div/div/div[2]/div/dlx-text-truncater/div")))
filter_name = select_filter_name.get_attribute("innerHTML")
print(filter_name)
HTML:
<dlx-treeview-node _nghost-nrk-c188="" class="ng-star-inserted">
<div _ngcontent-nrk-c188="" dlx-droppable="" dlx-draggable="" dlx-file-drop="" class="d-flex flex-column position-relative dlx-hover on-hover-show-expandable-menu bg-control-active bg-control-hover">
<div _ngcontent-nrk-c188="" class="d-flex flex-row ml-2">
<div _ngcontent-nrk-c188="" class="d-flex flex-row text-nowrap expand-horizontal" style="padding-left: 15px;">
<!---->
<div _ngcontent-nrk-c188="" class="d-flex align-self-center ng-star-inserted" style="min-width: 16px; margin-left: 3px;">
<!---->
</div>
<!---->
<div _ngcontent-nrk-c188="" class="d-flex flex-1 flex-no-overflow-x" style="padding: 3.5px 0px;">
<div class="d-flex flex-row justify-content-start flex-no-overflow-x align-items-center expand-horizontal ng-star-inserted">
<!---->
<dlx-text-truncater class="overflow-hidden d-flex flex-no-overflow-x ng-star-inserted">
<div class="text-truncate expand-horizontal ng-star-inserted">Changed</div>
<!---->
<!---->
</dlx-text-truncater>
<!---->
</div>
<!---->
<!---->
<!---->
</div>
</div>
<!---->
<!---->
</div>
</div>
<!---->
<dlx-attachment-content _ngcontent-nrk-c188="">
<div style="position: fixed; z-index: 10001; left: -10000px; top: -10000px; pointer-events: auto;">
<!---->
<!---->
</div>
</dlx-attachment-content>
</dlx-treeview-node>
Edit-1:
NOTE: I'm not sure I'm using the correct terms for HTML, so please correct me if I'm wrong.
I've learned that I have a follow up question:
How do I search for the text as described, but only searching in the "dlx-treeview-node" (there's about 100 of these)? So basically searching in the "children" of these.
The question is because I've learned that there are more elements with the specific text I'm searching for in other places.
Edit-2/solution:
I ended up finding my own solution before I received answers - I'm writing it here in case it can help anyone else.
The reply that is marked as "answer" is because this came the closest to what I needed.
The final code ended up like this (first searching the nodes - then searching the children for the specific innerHTML):
select_filter_name = wait.until(EC.element_to_be_clickable((By.XPATH, "//dlx-treeview-node[.//div[text()='Changed']]")))
Presuming the innerText of the <div> element as a unique text within the HTML DOM to locate the element with the innerHTML as Changed you can use either of the following xpath based locator strategies:
Using xpath and text():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Changed']")))
Using xpath and contains():
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[contains(., 'Changed')]")))
just run this code on your page and you will get an array of all elements which are a div with the value of Changed
# Define XPath Function (used in the next step)
driver.execute_script("function getXPathOfElement(elt) {var path = "";for (; elt && elt.nodeType == 1; elt = elt.parentNode) { idx = getElementIdx(elt); xname = elt.tagName; if (idx > 1) xname += "[" + idx + "]"; path = "/" + xname + path;} return path;}")
# Get all XPaths for all nodes which are a div with the text of "changed"
xpaths = driver.execute_script("return Array.from(document.querySelectorAll(\"div\")).find(el => el.textContent.includes('Changed')).map((node)=>{ return getXPathOfElement(node)});');
write up
the first execute adds a javascript function to the dom called getXPathOfElement this function accepts a html node element and will provide the xpath string for said node.
the second execute gets all elements which are a div with the text of Changed this will then loop through each element and then provide you with an array of strings, where each string is an xpath by calling the above getXPathOfElement function on each node.
the js is quite simple and harmless.
Tips
check if xpaths length is more than or equal to 1
index xpaths such as xpaths[0] or do loops to make your changes
you will now have an xpath which can be used like a normal selector.
good luck
Edit 1
execute_script() synchronously executes JavaScript in the current window/frame.
or find more here

Get href from link above another element with Selenium

I'm using selenium and I need to get an href from a link that is above many tags!
But the only information that I can use and I have for sure, is the text "Test text!" from the h3 tag!
Here is the example:
<a href="/link/post" class="link" >
<div class="inner">
<div class="header flex">
<h3 class="mb-0">
Test text!
</h3>
</div>
</div>
</a>
Try using the following xpath to locate the desired element:
//a[#href and .//h3[contains(text(),'Test text!')]]
So, to get the href value you have to
from selenium.webdriver.common.by import By
href = driver.find_element(By.XPATH, '//a[#href and .//h3[contains(text(),'Test text!')]]')
An alternative to the approach in Prophet's answer would be to use a XPATH like
//h3[contains(text(),"Test text!")]]//ancestor::a
i.e. first search for the h3 tag and then for an a tag above.
Prophet's answer uses the opposite approach, first find all a tags and then only keep the one with the correct h3 tag below.

Why did changing my xpath make my selenium click work consistently?

I am running a series of selenium tests with python. I have a navigation on the page I'm testing that has this structure:
<ul>
<li class="has-sub">
<a href="#">
<span> First nav </span>
</a>
<ul style="display:block">
<li>
<a href="#">
<span> First subnav </span>
</a>
</li>
<li>...</li>
<li>...</li>
<li>...</li>
</ul>
</li>
<li>...</li>
</ul>
Now I am clicking on the first subnav, that is the first span, but clicking on First nav to open up that list then first subnav. I implement a webdriverwait, to wait for the element to be visible and click on it via it's xpath,
//span[1]
I often got timeout exceptions waiting for the subnav span to be visible after clicking on the nav, which made me think something was wrong with clicking on the first nav to open up the list. So I changed the xpath of the first nav (//span[1]) to
//li[#class='has-sub']/descendant::span[text()='First subnav']
and I never get timeout exceptions when waiting for subnav span to be visible now. So seems like it's always clicking on the nav span every time to open it up and give me no timeout when trying to get to the subnav. Anyone have any idea why that is?
Here is my python code as well:
inside LoadCommPage class:
def click_element(self, by, locator):
try:
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((by, locator)))
print "pressing element " + str(locator)
self.driver.find_element(by, locator).click()
except TimeoutException:
print "no clickable element in 10 sec"
print self.traceback.format_exc()
self.driver.close()
inside main test (load_comm_page is an instance of LoadCommPage, where click_clement is defined):
load_comm_page.click_element(*LoadCommPageLocators.sys_ops_tab)
And another class for the locators:
class LoadCommPageLocators(object):
firstnav_tab = (By.XPATH, "//li[#class='has-sub']/descendant::span[text()='First nav']")
Xpath indexes begin at one, not 0 so the Xpath
//span[1]
is looking for the first span element in the html. Whereas
//span[2]
will look for the second span.

NoSuchElement Exception erron when using "find_element_by_link_text"

Selenium fails to find element by link text:
time.sleep(3);
driver = self.driver
driver.implicitly_wait(10)
findHeaderLearn = driver.find_element_by_link_text('Learn')
findHeaderLearn.click()
pageTitle = driver.title
driver.back()
return pageTitle
I get this error:
raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"Learn"}' ; Stacktrace:
I read extensively through the web but cant find any hints of why it cant find the element.
Added a "implicitly_wait(10)" to make sure the element is visible but it didnt solve the problem.
Any other ideas?
Here is the HTML code:
<div class="l-wrap">
<h1 id="site-logo">
<div id="nav-global">
<h2 class="head">
<ul class="global-nav">
<li class="global-nav-item">
<li class="global-nav-item">
<a class="global-nav-link" href="/learn/">Learn</a> ======> im trying to find this element
</li>
<li class="global-nav-item">
<li class="global-nav-item">
<li class="global-nav-item global-nav-item-last buy-menu">
<li class="global-nav-item global-nav-addl">
</ul>enter code here
</div>
<a class="to-bottom" href="#l-footer">Jump to Bottom of Page</a>
</div>
Try a different locator, ideally Css Selector or XPath. Don't use find_element_by_link_text.
CSS Selector:
findHeaderLearn = driver.find_element_by_css_selector("#nav-global a[href*='learn']")
XPath:
findHeaderLearn = driver.find_element_by_xpath(".//*[#id='nav-global']//a[contains(#href, 'learn')]")
# findHeaderLearn = driver.find_element_by_xpath(".//*[#id='nav-global']//a[text()='Learn']")
Find element by linktext might not be as reliable as css selector or xpath. Also verify if the element is stale before performing any actions on it

Categories

Resources