How to get all texts with class jss160:
<div class="jss157 red">
<img class="jss158" src="/static/media/e18d9.png" alt="Izimu">
<span style="margin: 0px 3px; font-size: 10px;">:</span>
<span>
<span>
<span class="jss159">ご質問</span>
<span class="jss160">質問</span>
</span>
<span>
<span class="jss159">答え</span>
<span class="jss160 answer">絶対に。</span>
</span>
</span>
</div>
My goal is to get all texts inside jss160 class. for this code my output is 質問
With Selenium you could do the following:
browser.find_element_by_css_selector("span.jss160").text
To understand this better, try to play with the CSS selector, for example with simple code like this with lxml.html and cssselect installed:
import lxml.html
tree = lxml.html.fromstring("""<div class="jss157 red">
<img class="jss158" src="/static/media/e18d9.png" alt="Izimu">
<span style="margin: 0px 3px; font-size: 10px;">:</span>
<span>
<span>
<span class="jss159">ご質問</span>
<span class="jss160">質問</span>
</span>
<span>
<span class="jss159">答え</span>
<span class="jss160 answer">絶対に。</span>
</span>
</span>
</div>""")
tree.cssselect("span[class='jss160']")[0].text
# '質問'
IIUC, You are using selenium webdriver
You can get elements with class jss160 using
element = driver.find_elements_by_class_name("jss160");
and get text using
element.text
Related
So I am automating with selenium and running into an issue where everytime I refresh a page, the element ID changes, no matter if I copy XPATH, CSS Selector, ID, they all have a number in them that changes.
so the code I'm using is simple, I just want to click the button which I can accomplish with
browser.find_element(by=By.CSS_SELECTOR, value='*VALUE HERE*').click()
But I don't know what to put as the value.
Here is the HTLM code
<a class="x4-tab x4-unselectable x4-box-item x4-tab-default x4-noicon x4-tab-noicon x4-tab-default-noicon x4-top x4-tab-top x4-tab-default-top x4-tab-after-title x4-active x4-tab-active x4-tab-default-active x4-top-active x4-tab-top-active x4-tab-default-top-active" hidefocus="on" unselectable="on" id="tab-1965" tabindex="0" data-ui-comp-name="wm-np-tab-wrkstn" style="right: auto; left: 66px; margin: 0px; top: 0px;">
<span id="tab-1965-btnWrap" role="presentation" class="x4-tab-wrap" unselectable="on">
<span id="tab-1965-btnEl" class="x4-tab-button" role="presentation">
<span id="tab-1965-btnInnerEl" class="x4-tab-inner x4-tab-inner-center"
unselectable="on">Workstations</span>
<span role="presentation" id="tab-1965-btnIconEl" class="x4-tab-icon-el " unselectable="on"
style="">
</span>
</span>
</span>
</a>
If you look at the HTML, anywhere you see that number 1965, that number will change if the page is refreshed. How do I make selenium find this element no matter what that number is?
Also, not sure if this matters but this is all in an iframe which I have selenium target by using
frame1 = browser.find_element(by=By.CLASS_NAME, value='defaultView')
browser.switch_to.frame(frame1)
Also, another problem is that HTML code is almost identical to other buttons, the only differences between the buttons is that number (that changes) and where is says "Workstations". Here is an example of another button that is next to it, this one is for servers.
<span id="tab-1964-btnWrap" role="presentation" class="x4-tab-wrap" unselectable="on">
<span id="tab-1964-btnEl" class="x4-tab-button" role="presentation">
<span id="tab-1964-btnInnerEl" class="x4-tab-inner x4-tab-inner-center"
unselectable="on">Servers</span>
<span role="presentation" id="tab-1964-btnIconEl" class="x4-tab-icon-el"
unselectable="on" style="">
</span>
</span>
</span>
</a>
You can use XPath for this:
browser.find_element(by=By.XPATH, value="//span[starts-with(#id, 'tab-') and contains(#id, '-btnEl')]").click()
This is an example of the HTML (I've tried to make it a lot neater than what it actually looks like):
<P>
random text
<br>
<br>
<i>Anonymous</i>
<span style="font-size: 10px; margin-left: 10px; color: #994;">Nov 30 12:46pm</span>
<span style="font-size: 10px; margin-left: 20px;">
<a style="color: #888; text-decoration: none;" title="Flag as offensive post"
href="/flag?a=248830&r=1">FLAG
</a>
</span>
<hr> **THIS IS THE TEXT I NEED**
<br>
<br>
<i>Anonymous</i>
<span style="font-size: 10px; margin-left: 10px; color: #994;">Nov 30 3:40pm</span>
<span style="font-size: 10px; margin-left: 20px;">
<a style="color: #888; text-decoration: none;" title="Flag as offensive post"
href="/flag?a=248830&r=2">FLAG
</a>
</span>
<hr>**THIS IS THE TEXT I NEED**
<br>
<br>
<script type="text/javascript">
<script type="text/javascript" src="//cdn.chitika.net/getads.js" async></script>
**THIS IS THE TEXT I NEED**
<br>
<br>
<i>Anonymous</i>
I'm trying to get the text from the hr tag. However, doing
for i in soup.find_all('hr'):
print(i.text)
does not work. Instead, I get a blank output.
I've also tried
soup.find('i').previousSibling
but that outputs a blank, I'm not sure if that's because there's <br> <br> before.
How can I get the **THIS IS THE TEXT I NEED**?
The text you need isn't in an <hr> it's in a p. So you can get it like this:
soup = BeautifulSoup(doc, "html.parser")
ps = soup.findAll("p")
print(ps[0].getText())
Now considering that this prints:
random text
Anonymous
Nov 30 12:46pm
FLAG
**THIS IS THE TEXT I NEED**
Anonymous
Nov 30 3:40pm
FLAG
**THIS IS THE TEXT I NEED**
**THIS IS THE TEXT I NEED**
Anonymous
Process finished with exit code 0
You'll need to parse out the text you need with something like:
import re
rawText = ps[0].getText()
matches = re.findall(r'\*\*.*\*\*',rawText)
for m in matches:
print(m)
Which prints out:
**THIS IS THE TEXT I NEED**
**THIS IS THE TEXT I NEED**
**THIS IS THE TEXT I NEED**
But You'll need to fish out your text some other way because I doubt it is surrounded by asterixis. Edit: As a side not you can use soup.find instead of soup.findAll but I don't think that really matters.
You could try just accessing the next element:
for hr in soup.find_all('hr'):
print(hr.next_element.get_text(strip=True))
For your HTML this displays:
**THIS IS THE TEXT I NEED**
**THIS IS THE TEXT I NEED**
I am trying to click on download button. The HTML Code for the button is as below:
<a class="x-btn toolbar-menu x-unselectable x-box-item x-toolbar-item x-btn-transparent-medium" style="padding: 0px 5px; right: auto; left: 1121px; margin: 0px; top: 0px;" hidefocus="on" unselectable="on" id="toolbarbutton-1054" tabindex="-1" data-qtip="<b>Export</b><br/>Export your report into a CSV file." componentid="toolbarbutton-1054">
<span id="toolbarbutton-1054-btnWrap" data-ref="btnWrap" role="presentation" unselectable="on" style="" class="x-btn-wrap x-btn-wrap-transparent-medium ">
<span id="toolbarbutton-1054-btnEl" data-ref="btnEl" role="presentation" unselectable="on" style="" class="x-btn-button x-btn-button-transparent-medium x-btn-no-text x-btn-icon x-btn-icon-left x-btn-button-center ">
<span id="toolbarbutton-1054-btnIconEl" data-ref="btnIconEl" role="presentation" unselectable="on" class="x-btn-icon-el x-btn-icon-el-transparent-medium sdc-icon-export " style=""> </span>
<span id="toolbarbutton-1054-btnInnerEl" data-ref="btnInnerEl" unselectable="on" class="x-btn-inner x-btn-inner-transparent-medium">
</span>
</span>
</span>
</a>
I tried this :
driver.find_element(By.ID , "toolbarbutton-1054-btnEl").click()
Getting an error: selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable
When I try below command it does not give error and element is recognizable. It's just that I cannot click on it.
driver.find_element(By.ID , "toolbarbutton-1054-btnEl")
Hey you can use the below code
class_element= driver.find_element_by_class('x-btn toolbar-menu x-unselectable x-box-item x-toolbar-item x-btn-transparent-medium')
class_element.click()
and also you can use driver.find_element_by_xpath('xpath here')
Actually, Span does not have a clickable feature.
Please execute the following javascript code using python selenium:
document.getElementById('toolbarbutton-1054-btnEl').click();
I don't know how to execute javascript code using python selenium but it will work.
I used it with C# Selenium.
In Python it should work with the following code:
s=driver.find_element(By.ID , "toolbarbutton-1054-btnEl")
driver.execute_script("arguments[0].click();",s)
I used the link below for my reference:
https://www.tutorialspoint.com/running-javascript-in-selenium-using-python
<div id="crmMasthead" tabindex="-1">
<div id="crmTopBar" class="ms-crm-TopBarContainer ms-crm-TopBarContainerGlobal newNavBarMode">
<div id="crmAppMessageBar" class="crmAppMessageBar" style="display: none; height: 0px;">
<div id="crmRibbonManager" currentribbonelement="commandContainer15" style="height: 62px; display: block; visibility: visible;">
<div id="commandContainer15" style="display: inline;">
<ul class="ms-crm-CommandBar-Menu" role="application">
<li id="ewrb_importfile|NoRelationship|HomePageGrid|Mscrm.HomepageGrid.ewrb_importfile.NewRecord" class="ms-crm-CommandBarItem ms-crm-CommandBar-Menu ms-crm-CommandBar-Button" tabindex="-1" title="New Create a new Import File record." command="ewrb_importfile|NoRelationship|HomePageGrid|Mscrm.NewRecordFromGrid" style="white-space: pre-line; display: inline-block;">
<span class="ms-crm-CommandBar-Button ms-crm-Menu-Label-Hovered" tabindex="-1" style="max-width:200px">
<a class="ms-crm-Menu-Label" tabindex="0" onclick="return false">
<img class="ms-crm-ImageStrip-New_16 ms-crm-commandbar-image16by16" tabindex="-1" src="/_imgs/imagestrips/transparent_spacer.gif" style="vertical-align:top"/>
<span class="ms-crm-CommandBar-Menu" tabindex="-1" style="max-width:150px" command="ewrb_importfile|NoRelationship|HomePageGrid|Mscrm.NewRecordFromGrid"> New </span>
<div class="ms-crm-div-NotVisible"> Create a new Import File record. </div>
</a>
</span>
</li>
The Xpath Shows me like this:
.//*[#id='ewrb_importfile|NoRelationship|HomePageGrid|Mscrm.HomepageGrid.ewrb_importfile.NewRecord']/span/a
and If i use this, Selenium doesn't click the button
First try to remove "."(dot) from your xpath and check it if it works.
Secondly, try to write the xpath yourself. For this a node, try this one:
//a[#class="ms-crm-Menu-Label"]
You should check it if the part of html that you share is inside an iframe node or not. Otherwise, you should share more. With the current part that you shared, it is not possible to say that if it is inside an iframe or not.
Also, it can be a good idea to check the visibility of the button. The last thing: do you receive any error message. If yes, share it.
I'm learning to extract content from a Website using Python and BeautifulSoup.
This is the HTML structure:
<div id="preview-prediction" class="two-cols rc-b rc-r">
<span style="position: absolute; top: 0.5em; left: 1em; color: #808080;">Prediction: </span>
<div class="home">
<div class="team-name">
<img src="http://164.177.157.12/img/teams/13.png" class="team-emblem">
Arsenal
</div>
<span class="predicted-score">2</span>
<div class="clear"></div>
</div>
<div class="away">
<span class="predicted-score">1</span>
<div class="team-name">
Liverpool
<img src="http://164.177.157.12/img/teams/26.png" class="team-emblem">
</div>
<div class="clear"></div>
</div>
</div>
I want to extract the exact text from the specific tag in the page. I cannot use find_all() or find() as the page has this complex structure. So i'm using the select() function with the CSS selector:
soup.select("#preview-prediction > .home > .team-name > .team-link")
The last class team-link contains the text which i need to extract. How to perform this task ?
This would create a list of all the contents of selected tags.
>>> [i.text for i in soup.select('#preview-prediction > .home > .team-name > .team-link')]
['Arsenal']
OR
This would print the contents of first selected tag.
>>> soup.select('#preview-prediction > .home > .team-name > .team-link')[0].text
'Arsenal'