How to get all text in inside div parent with xpath - python

I want to get all text inside a div with xpath
Here HTML code:
<div class="JobDescriptionsc__DescriptionContainer-sc-1jylha1-2 dGyoDf">
<div class="DraftEditorContainersc__DraftEditorContainer-sc-1x4uima-0 cGUaQf">
<div class="DraftEditor-root">
<div class="DraftEditor-editorContainer">
<div class="public-DraftEditor-content" contenteditable="false" spellcheck="false" style="outline:none;user-select:text;-webkit-user-select:text;white-space:pre-wrap;word-wrap:break-word">
<div data-contents="true">
#Here the all text
<div class="" data-block="true" data-editor="d54la" data-offset-key="bhkoa-0-0">
<div data-offset-key="bhkoa-0-0" class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr">
<span data-offset-key="bhkoa-0-0" style="font-weight:bold">
<span data-text="true">Job Description:</span>
</span>
</div>
</div>
<div class="" data-block="true" data-editor="d54la" data-offset-key="51e5u-0-0">
<div data-offset-key="51e5u-0-0" class="public-DraftStyleDefault-block public-DraftStyleDefault-ltr">
<span data-offset-key="51e5u-0-0">
<span data-text="true">· Identify & developed application base on predefined business requirements.</span>
</span>
</div>
</div>
...
#there's more, I'm just showing you a few
</div>
</div>
</div>
</div>
</div>
</div>
This my XPath code:
dom_job.xpath('//*[#class="DraftEditorContainersc__DraftEditorContainer-sc-1x4uima-0 cGUaQf"]//text()')
I need the all text inside the div parent with xpath, can it?

I'm assuming the Python module which provides your XPath interpreter supports XPath version 1. Your XPath expression below returns the set of all text nodes which are descendants of the div element:
//*[#class="DraftEditorContainersc__DraftEditorContainer-sc-1x4uima-0 cGUaQf"]//text()
You should be able to iterate over all that collection of text nodes, and concatenate them into a single string, in Python.
But it's simpler, if you want the concatenated value of the text nodes within a particular div, to just apply the XPath string() function to the div; e.g.:
string(//*[#class="DraftEditorContainersc__DraftEditorContainer-sc-1x4uima-0 cGUaQf"])
See https://www.w3.org/TR/1999/REC-xpath-19991116/#function-string
Note that, in XPath 1, if you apply the string() function to a larger set of nodes (such as the set of text nodes returned by your first query), the function will return the string value of just the first node.

Related

XPath local-name() SyntaxError: The expression is not a legal expression

I'm trying to web scrape a table from an iframe. In order to switch the driver to that frame I'm using driver.find_element_by_xpath, but the problem is that the path in the html code includes some namespaces that I cannot get Python to figure out using the local-name() function.
Here is the chunk of the HTML I'm using:
<xbrl:campo-captura xbrl:solo-lectura="true" xbrl:id-hecho-plantilla="ar_pros_CorporateStructure_11933a35-3932-44c0-b394-f0ebd4f722d2"
id="8a97271e-df5c-4fbe-bedf-513ea1508bf2"><div>
<div>
<i style="cursor:pointer; float:right;margin-right:-20px;" id="d9fa20ae-c55f-4344-baf5-0112a13827b6" class="i i-arrow-down-2 botonDetalleOperacionXbrl">
</i>
<div id="abrir_nota_F2a26d5a7-2934-4ff0-86df-7a8983c05e47" style="cursor:pointer;float:right;margin-right:-20px;margin-top:20px;" data-toggle="tooltip" data-placement="right" title="Abrir nota">
<i class="fa fa-external-link"></i>
</div>
</div>
<div class="campoTextBlock">
<div id="F2a26d5a7-2934-4ff0-86df-7a8983c05e47">
<div class="celdaAnchoFijo textBlockLimit div-default divTextBlockMaximo" id="divAreaTextod9fa20ae-c55f-4344-baf5-0112a13827b6" style="overflow-y:hidden">
<iframe scrolling="no" id="frame_8a97271e-df5c-4fbe-bedf-513ea1508bf2" style="width:100%;height:100%" frameborder="0"></iframe>
</div>
</div>
</div>
<div>
</div>
</div></xbrl:campo-captura>
I want to get to the "iframe" using something like:
framLogin= driver.find_element_by_xpath('//[local-name()="campo-captura"][#*[local-name()="id-
hecho-plantilla" and .="ar_pros_CorporateStructure_11933a35-3932-44c0-b394-f0ebd4f722d2"]]
/div[2]/div/div/iframe')
The message I get is
Given xpath expression ... is invalid: SyntaxError: Document.evaluate: The expression is not a legal expression.
I've already looked for more information but all I have found is not for Python.
I'm aware I could get to the iframe by using its id, but later on I want to make a loop to scrap the same tables in other URLs with the exact same format, and the iframe's id is not constant.
Your immediate syntax error can be fixed by changing
//[local-name()="campo-captura"]
to
//*[local-name()="campo-captura"]
^

Css selector of parent text

I want to get this figure $185,000,000. Is there any way to get text from parent tag and avoiding text from child tags
<div class="txt-block">
<h4 class="inline">Budget:</h4>
$185,000,000
<span class="attribute">(estimated)</span>
</div>
<div class="txt-block">
<h4 class="inline">Budget:</h4>
<span class="value">$185,000,000</span>
<span class="attribute">(estimated)</span>
</div>
Yes you can do this. Simply write
response.css('.txt-block::text').extract_first()
This will return only $185,000,000. If you put space between :: and .txt-block. This extract the text of children also

Loop through tags inside tags in Selenium/Python

I am trying to use selenium to loop through a list of properties on a web page and return the property address and auction time. I have the following python code so far and html for the web page below.
I'm able to return the links to every property in the list, but can't seen to return the values I need from the "H4" tags. I think I'm doing something wrong with getting the elements by Xpath but I can't seem to figure it out.
Any help would be greatly appreciated!
HTML:
<div data-elem-id="asset_list_content">
<a href="/details/123-memory-lane">
<div data-elm-id="asset_2352111_address" class="styles__address-container--2l39p styles__u-mr-1--3qZyj">
<h4 data-elm-id="asset_2352111_address_content_1" class="styles__asset-font-big--vQU7K">123 memory-lane</h4>
<label data-elm-id="asset_2352111_address_content_2" class="styles__asset-font-small--2JgrX">POWDER SPRINGS, GA 30127, Cobb County</label>
</div>
<div class="styles__auction-container--45DZU styles__u-ml-1--34mF_">
<h4 data-elm-id="asset_2352111_auction_date" class="styles__asset-font-big--vQU7K">Apr 04, 10:00am</h4>
</div>
</a>
<a href="/details/456-memory-lane">
<div data-elm-id="asset_8463157_address" class="styles__address-container--2l39p styles__u-mr-1--3qZyj">
<h4 data-elm-id="asset_8463157_address_content_1" class="styles__asset-font-big--vQU7K">456 memory-lane</h4>
<label data-elm-id="asset_8463157_address_content_2" class="styles__asset-font-small--2JgrX">POWDER SPRINGS, GA 30127, Cobb County</label>
</div>
<div class="styles__auction-container--45DZU styles__u-ml-1--34mF_">
<h4 data-elm-id="asset_8463157_auction_date" class="styles__asset-font-big--vQU7K">March 10, 10:00am</h4>
</div>
</a>
</div>
Python (Selenium):
propertyList = browser.find_elements_by_xpath('//div[#data-elm-id="asset_list_content"]')
for element in propertyList:
propertyLinks = element.find_elements_by_tag_name('a')
for propertyLink in propertyLinks:
propertyAddress = propertyLink.get_element_by_xpath('//h4[1]')
propertyAuctionTime = propertyLink.get_element_by_xpath('//h4[2]')
print(propertyAddress).text
print(propertyAuctionTime).text
Output:
propertyAddress = propertyLink.get_element_by_xpath('//h4[1]')
AttributeError: 'WebElement' object has no attribute 'get_element_by_xpath'
The error seems to be you are using get_element_by_xpath(), which isn't a valid method. You used find_elements_by_xpath() in your code before that moment, and to find the elements you are looking for you just need to use the method that only finds a single element: find_element_by_xpath().

Find the elements only after a specific text in html using selenium python

Lets say I have following HTML Code
<div class="12">
<div class="something"></div>
</div>
<div class="12">
<div class="34">
<span>TODAY</span>
</div>
</div>
<div class="12">
<div class="something"></div>
</div>
<div class="12">
<div class="something"></div>
</div>
Now If I use driver.find_elements_by_class_name("something") then I get all the classes present in the HTML code. But I want to get classes only after a specific word ("Today") in HTML. How to exclude classes that appear before the specific word. Next divs and classes could be at any level.
You can use search by XPath as below:
driver.find_elements_by_xpath('//*/text()[.="some specific word"]/following-sibling::div[#class="something"]')
Note that you might need some modifications in case your real HTML differs from provided simplified HTML
Update
replace following-sibling with following if required div nodes are not siblings:
driver.find_elements_by_xpath('//*/text()[.="some specific word"]/following::div[#class="something"]')

How to uniquely identify xpath for multiple tags and multiple values

I have following html:
<div class=‘content active’>
<div>
<div class=‘var’>
<div class=‘field var-field’>
<label>Interface Name</label>
<div class=‘ui input’>
<input type=‘input’ placeholder=‘.*’ value> ==$0
</div>
</div>
</div>
</div>
<div>
<div class=‘var’>
<div class=‘field var-field’>
<label>Neighbor Id</label>
<div class=‘ui input’>
<input type=‘input’ placeholder=‘.*’ value> ==$0
</div>
</div>
</div>
</div>
</div>
I need to send text to the text box with label: Interface Name.
Is there a way to uniquely write the xpath to send the text to the textbox.
Note that the only way to identify uniquely is wrt the label. The other fields in the tag is same for both.
I tried using AND operator. No luck.
Please help me out here.
Try this :
//label[text()='Interface Name']/following-sibling::div/child::input
To send text to the <input> element with respect to the <label> tag you can create a function as follows :
def test_me(myText):
driver.find_element_by_xpath("//label[.='" + myText + "']//following::div[1]/input").send_keys("hello")
Now, you can call this function from anywhere within your script as follows :
test_me("Interface Name")
# or
test_me("Neighbor Id")
You can use this XPATH :- //*[text()='Interface Name']/following-sibling::div/input"

Categories

Resources