I want to scrape text of few fields on the basis of their web elements (xpath, classes etc).
<div class = myOnlyElement>
<div> ......
<div class = afafasf> ......</div>
<div class = klklkl> ......
<div class = qwqwqwq> ......
<div class = reaction> text i need</div>
</div>
</div>
</div>
</div>
<div class = myElement>
<div> ......
<div class = dfdfdf> ......</div>
<div class = ghgghghg> ......
<div class = erererere> ......
<div class = reaction> text i don't need</div>
</div>
</div>
</div>
</div>
Suppose I have backend of element like this. I find element like:
myelem = driver.find_element_by_classname('myOnlyElement')
Now I only want to pick class "reaction" with text I need.
I am doing like:
myelem.find_element_by_classname('reaction')
if this class is present it captures it, but in some cases it goes for class = "reaction" whose text is "text i don't need"
Hope I have clearly mentioned my question. Can you please help me
my friend, best solution when it comes to this stuff, right click on the webpage, where you see the text. Right click in the DOM inspector and click Copy -> Copy Full XPath value. then you might need to do .text .source to get those values. but try and play around.
To print the text text i need you can use either of the following Locator Strategies:
Using css_selector and get_attribute():
print(driver.find_element_by_css_selector("div.myOnlyElement div.reaction").get_attribute("innerHTML"))
Using xpath and text attribute:
print(driver.find_element_by_xpath("//div[#class='myOnlyElement']//div[#class='reaction']").text)
Ideally, to print the text text i need you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and get_attribute():
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.myOnlyElement div.reaction"))).get_attribute("innerHTML"))
Using XPATH and text attribute:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='myOnlyElement']//div[#class='reaction']"))).text)
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
Related
I'm trying to pull a specific number out of a div class in Python Selenium but can't figure out how to do it. I'd want to get the "post_parent" ID 947630 as long as it matches the "post_name" number starting 09007.
I'm looking to do this across multiple "post_name" classes, so I'd feed it something like this: search_text = "0900766b80090cb6", but there will be multiple in the future so it has to read the "post_name" first then pull the "post_parent" if that makes sense.
Appreciate any advice anyone has to offer.
<div class="hidden" id="inline_947631">
<div class="post_title">Interface Converter</div>
<div class="post_name">0900766b80090cb6</div>
<div class="post_author">28</div>
<div class="comment_status">closed</div>
<div class="ping_status">closed</div>
<div class="_status">inherit</div>
<div class="jj">06</div>
<div class="mm">07</div>
<div class="aa">2001</div>
<div class="hh">15</div>
<div class="mn">44</div>
<div class="ss">17</div>
<div class="post_password"></div>
<div class="post_parent">947630</div>
<div class="page_template">default</div>
<div class="tags_input" id="rs-language-code_947631">de</div>
</div>
If you see <div class="post_name">0900766b80090cb6</div> this and <div class="post_parent">947630</div> are siblings nodes to each other.
You can use xpath -> following-sibling like this:
Code:
search_text = "0900766b80090cb6"
post_parent_num = driver.find_element(By.XPATH, f"//div[#class='post_name' and text()='{search_text}']//following-sibling::div[#class='post_parent']").text
print(post_parent_num)
or Using ExplicitWait:
search_text = "0900766b80090cb6"
post_parent_num = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, f"//div[#class='post_name' and text()='{search_text}']//following-sibling::div[#class='post_parent']"))).get_attribute('innerText')
print(post_parent_num)
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Update:
NoSuchElementException:
Please check in the dev tools (Google chrome) if we have unique entry in HTML-DOM or not.
xpath that you should check :
//div[#class='post_name' and text()='0900766b80090cb6']//following-sibling::div[#class='post_parent']
Steps to check:
Press F12 in Chrome -> go to element section -> do a CTRL + F -> then paste the xpath and see, if your desired element is getting highlighted with 1/1 matching node.
If this is unique //div[#class='post_name' and text()='0900766b80090cb6']//following-sibling::div[#class='post_parent'] then you need to check for the below conditions as well.
Check if it's in any iframe/frame/frameset.
Solution: switch to iframe/frame/frameset first and then interact with this web element.
Check if it's in any shadow-root.
Solution: Use driver.execute_script('return document.querySelector to have returned a web element and then operates accordingly.
Make sure that the element is rendered properly before interacting with it. Put some hardcoded delay or Explicit wait and try again.
Solution: time.sleep(5) or
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='post_name' and text()='0900766b80090cb6']//following-sibling::div[#class='post_parent']"))).text
If you have redirected to a new tab/ or new windows and you have not switched to that particular new tab/new window, otherwise you will likely get NoSuchElement exception.
Solution: switch to the relevant window/tab first.
If you have switched to an iframe and the new desired element is not in the same iframe context then first switch to default content and then interact with it.
Solution: switch to default content and then switch to respective iframe.
I don't see any specific relation between "post_parent" ID 947630 and "post_name" number starting 09007. Moreover, the parent <div> is having class="hidden".
However, to pull the specific number you can use either of the following locator strategies:
Using css_selector:
print(driver.find_element(By.CSS_SELECTOR, "div[id^='inline'] div.post_parent").text)
Using xpath:
print(driver.find_element(By.XPATH, "//div[starts-with(#id, 'inline_')]//div[#class='post_parent']").text)
Ideally you need to induce WebDriverWait for the presence_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div[id^='inline'] div.post_parent"))).text)
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//div[starts-with(#id, 'inline_')]//div[#class='post_parent']"))).text)
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can create a method and use the following xpath to get the post_parent text based on post_name text.
def getPostPatent(postname):
element=driver.find_element(By.XPATH,"//div[#class='post_name' and starts-with(text(),'{}')]/following-sibling::div[#class='post_parent']".format(postname))
print(element.get_attribute("textContent"))
getPostPatent('09007')
This will return value if it is matches the text starts-with('09007')
It seems parent class is hidden you need to use textContent to get the value.
<div>
<div class="alk_dvImage"><img class="alk_prImg" src="https://a random photo" alt="a random product">
</div>
<div class="product-score"></div>
<a href="/products/" class="alk_prName alk_pr" title="Products Title">Strong Graphic Card
</a>
</div>
Lets assume we have a html as given above. I want to extract the title of the 'a' element which is nested in a div. And also i want the class of this same element how ever when i try this code
browser.find_element_by_css_selector('a.alk_prName alk_pr')
this does not respond anything. Btw i couldnt do anything to get tite of a element.
What happens?
Your not chaining the classes by dot in your selector, try the following:
browser.find_element_by_css_selector('a.alk_prName.alk_pr').get_attribute("title")
Example:
from selenium import webdriver
browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
html_content = """
Strong Graphic Card
"""
browser.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
browser.find_element_by_css_selector('a.alk_prName.alk_pr').get_attribute("title")
To print the value of the title attribute i.e. Products Title you can use either of the following Locator Strategies:
Using css_selector:
print(driver.find_element(By.CSS_SELECTOR, "a.alk_prName.alk_pr[href='/products/']").get_attribute("title"))
Using xpath:
print(driver.find_element(By.XPATH, "//a[#class='alk_prName alk_pr' and #href='/products/'][contains(., 'Strong Graphic Card')]").get_attribute("title"))
Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.alk_prName.alk_pr[href='/products/']"))).get_attribute("value"))
Using XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[#class='alk_prName alk_pr' and #href='/products/'][contains(., 'Strong Graphic Card')]"))).get_attribute("value"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Hope you´re really fine and can help me with this short question.
I´m trying to locate the following object id=C39_W133_V136_thtmlb_button_27 but using the text that is located after an span (text = "Edit"). Please I tried different ways but didn´t work till now, any idea?
<a href="javascript:void(0)" class="th-bt th-bt-icontext-dis icon-font" tabindex="-1" oncontextmenu="return false;" ondragstart="return false;" id="C39_W133_V136_thtmlb_button_27">
::before
<img class="th-bt-img" src="/SAP/BC/BSP/SAP/thtmlb_styles/sap_skins/belize/images/1x1.png">
<span class="th-bt-span"><b class="th-bt-b">Edit</b></span>
<b class="th-bt-b">Edit</b>
</a>
In order to locate an element using text contained in an element, the only option is to use XPath.
//a[./b[.='Edit']]
^ Start at the top of the document and find an A tag
^ ...that has a descendant B tag
^ ...that contains the text 'Edit'
To locate the <a> element which have a descended <span> with text as Edit you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[contains(#id, 'thtmlb_button')][.//b[text()='Edit']]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
How can I filter elements that have the same class?
<html>
<body>
<p class="content">Link1.</p>
</body>
</html>
<html>
<body>
<p class="content">Link2.</p>
</body>
</html>
You can try to get the list of all elements with class = "content" by using find_elements_by_class_name:
a = driver.find_elements_by_class_name("content")
Then you can click on the link that you are looking for.
By.CLASS_NAME was not yet mentioned:
from selenium.webdriver.common.by import By
driver.find_element(By.CLASS_NAME, "content")
This is the list of attributes which can be used as locators in By:
CLASS_NAME
CSS_SELECTOR
ID
LINK_TEXT
NAME
PARTIAL_LINK_TEXT
TAG_NAME
XPATH
As per the HTML:
<html>
<body>
<p class="content">Link1.</p>
</body>
<html>
<html>
<body>
<p class="content">Link2.</p>
</body>
<html>
Two(2) <p> elements are having the same class content.
So to filter the elements having the same class i.e. content and create a list you can use either of the following Locator Strategies:
Using class_name:
elements = driver.find_elements_by_class_name("content")
Using css_selector:
elements = driver.find_elements_by_css_selector(".content")
Using xpath:
elements = driver.find_elements_by_xpath("//*[#class='content']")
Ideally, to click on the element you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CLASS_NAME:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "content")))
Using CSS_SELECTOR:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".content")))
Using XPATH:
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#class='content']")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
References
You can find a couple of relevant discussions in:
How to identify an element through classname even though there are multiple elements with the same classnames using Selenium and Python
Unable to locate element using className in Selenium and Java
What are properties of find_element_by_class_name in selenium python?
How to locate the last web element using classname attribute through Selenium and Python
Use nth-child, for example: http://www.w3schools.com/cssref/sel_nth-child.asp
driver.find_element(By.CSS_SELECTOR, 'p.content:nth-child(1)')
or http://www.w3schools.com/cssref/sel_firstchild.asp
driver.find_element(By.CSS_SELECTOR, 'p.content:first-child')
The most simple way is to use find_element_by_class_name('class_name')
The first answer has been deprecated, and the other answers only return one result. This is the correct answer:
driver.find_elements(By.CLASS_NAME, "content")
I want to extract the first span with the text Extract this text. Already tried:
element.find_element_by_css_selector(".moreContent span:nth-child(1)").text.strip('"')
This is not working, I am not sure why. The output is just empty.
<p class="mainText">
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
<span class="moreEllipses">… </span>
<span class="moreContent">
<span> Extract this text </span>
<span class="link moreLink">Show More</span>
</span>
</p>
However I am getting this, so Selenium finds the element but why the output is empty:
<selenium.webdriver.remote.webelement.WebElement (session="e7012b303842651848aa0b0e40f5d5c1", element="df5644e9-fc98-4300-ad86-9ff433154d82")>
EDIT:
I managed to solve this by clicking on show more button. For some reason i can't extract the content if not visible even if present in page.
As per your cssSelector it seems you are targeting below
<span> Extract this text </span>
You can use below Xpath:
(//p[#class='mainText']//span[#class='moreContent']/span)[1]
OR
(//span[#class='moreContent']/span)[1]
Example Code:
element = driver.find_element_by_xpath("(//p[#class='mainText']//span[#class='moreContent']/span)[1]").text
To extract the text from the first <span> i.e. Extract this text you need to to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text property:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.mainText span.moreContent>span"))).text)
Using XPATH and get_attribute() method:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[#class='mainText']//span[#class='moreContent']/span"))).get_attribute("innerHTML"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC