i have a text area tha is tinymce application that devolve the text introduce in html.
so if i introduce this:
I, this is an example.
that i recive,<p> I, this is an example.</p>
in my view i remove the html tags, and if the tag are "p" than replace for an\n.
result:I, this is an example.\n
but if i introduce this:
I,
this is an example.
this return:
<p> I,</p>
<p>this is an example.</p>
result:
I,\n
this is an example.\n
I want to remove the line break, so the result are:I,\n this is an example.\n
I would recommend fixing the <p> tag issue on the client side, as you can configure tinymce not to use <p> tags at all:
tinyMCE.init({
...
forced_root_block : false
});
http://www.tinymce.com/wiki.php/Configuration:forced_root_block
Related
I'm learning to create an Omegle bot, but the Omegle interface was created in HTML and I don't know very much about HTML nor MechanicalSoup.
In the part where the text is inserted, the code snippet is as follows:
<td class="chatmsgcell">
<div class="chatmsgwrapper">
<textarea class="chatmsg " cols="80" rows="3"></textarea>
</div>
</td>
In the part of the button to send the text, the code snippet is:
<td class="sendbthcell">
<div class="sendbtnwrapper">
<button class="sendbtn">Send<div class="btnkbshortcut">Enter</div></button>
</div>
</td>
I want to set a text in textarea and send it via button.
Looking at some examples in HTML, I guess the correct way to set text in a textarea is as follows:
<textarea>Here's a text.</textarea>
Also, I'm new at MechanicalSoup, but I think I know how to find and set a value in an HTML code:
# example in the Twitter interface
login_form = login_page.soup.find("form", {"class": "signin"})
LOGIN = "yourlogin"
login_form.find("input", {"name": "session[username_or_email]"})["value"] = LOGIN
From what I understand, the first argument is the name of the tag and a second argument is a dictionary whose first element is the name of the attribute and the second element is the value of the attribute.
But the tag textarea don't have an attribute for setting a text, like value="Here's a text.". What I should do for set a text in a textarea using MechanicalSoup?
I know it's not the answer you expect, but reading the doc would help ;-).
The full documentation is available at:
https://mechanicalsoup.readthedocs.io/
You probably want to start with the tutorial:
https://mechanicalsoup.readthedocs.io/en/stable/tutorial.html
In short, you need to select the form you want to fill-in:
browser.select_form('form[action="/post"]')
Then, filling-in fields is as simple as
browser["custname"] = "Me"
browser["custtel"] = "00 00 0001"
browser["custemail"] = "nobody#example.com"
browser["comments"] = "This pizza looks really good :-)"
I am trying to pull the text out of a tag that follows an element I'm starting with. The HTML looks like this, with multiple entries of the same structure:
<h5>
Title
</h5>
<div class="author">
<p>"Author A, Author B"</p>
</div>
<div id="abstract-more#####" class="collapse">
<p>
<strong>Abstract:</strong>
"Text here..."
</p>
<p>...</p>
So once I've isolated a given title element/node (stored as 'paper'), I want to store the author and abstract text. It works when I use this to get the author:
author = paper.find_element_by_xpath("./following::div[contains(#class, 'author')]/p").text
But is returning a blank output for 'abstract' when I use this:
abstract = paper.find_element_by_xpath("./following::div[contains(#id, 'abstract-more')]/p").text
Why does it work fine for the author but not for the abstract? I've tried using .// instead of ./ and other slight tweaks but to no avail. I also don't know why it's not giving an error out and saying it can't find the abstract element and is instead just returning a blank...
Try this:
//div[contains(#id, 'abstract-more')]/p[1]
Please use starts-with in xpath instead of contains.
XPath: .//div[starts-with(#id, 'abstract-more')]/p"
abstract = paper.find_element_by_xpath(".//div[starts-with(#id, 'abstract-more')]/p").text
You can try this xpath :
//div[#class="author"]/following-sibling::div[contains(#id,'abstract-more')]/p[1]
in code :
author = paper.find_element_by_xpath("//div[#class="author"]/following-sibling::div[contains(#id,'abstract-more'')]/p[1]")
print(author.text)
I am trying to generate an html object using Python's py._xmlgen's html class. In this html I want to place a after each line of text.
The problem is that when running this code
from py._xmlgen import html
if __name__ == '__main__':
br_standalone = html.br('some text between br tags')
div_standalone = html.div('some more text between div tags')
print(br_standalone)
print(div_standalone)
div_with_br = div_standalone
div_with_br.append(br_standalone)
print(div_with_br)
I get this output:
<br>some text between br tags</br>
<div>some more text between div tags</div>
<div>some more text between div tags<br>some text between br tags</br></div>
Which if you'll "Run code snippet" and inspect the result html below, you will see that the closing is also rendered as 2nd line break. So, for each html.br I get 2 line breaks, instead of a single line break.
How can I get the output to have just a single br tag for each html.br I use?
Thanks to user3080953's comment: "br tags shouldn't have content inside them", I modified the code.
In order to get a single br tag after the text, use the following empty tag:
br_standalone = html.br()
So, when using the following code:
from py._xmlgen import html
if __name__ == '__main__':
div1 = html.div(class_='multiline_div')
div1.append('text line #1')
div1.append(html.br())
div1.append('text line #2')
div1.append(html.br())
print(div1)
I got the following html:
<div class="multiline_div">
text line #1<br/>
text line #2<br/>
</div>
Please note an additional finding here, that you can also append to the html.div plain text, followed by appending a br tag, to create multi lines text within the div tag.
I'm pulling contact information (text) from a website and I can currently pull all the class data, using the following XPath syntax:
//*[#id="nomapdata"]/div/div/div/div[2]/div[1]
Using this XPath expression for the element, I get the following text as the result:
Name
Title
Company Website
Phone Number
I want to pull each of these elements individually, but the problem is that, the data is separated by <br> </br>, and I haven't had success on isolating each element.
Below is an example of the HTML structure:
<div class="col-sm-d">
"
Name"
<br>
"
Title"
<br>
a href="www.website.com" target="_blank">http://www.website.com</a>
<br>
"
Phone: (555) 555-5555"
<br>
The only element I am able to isolate is the website.
How can I isolate each data in this scenario?
Try to get the list of text nodes as
driver.find_element_by_xpath('//*[#id="nomapdata"]/div/div/div/div[2]/div[1]').text.split("\n")
If there are more text nodes after the phone number which you don't want to use:
driver.find_element_by_xpath('//*[#id="nomapdata"]/div/div/div/div[2]/div[1]').text.split("\n")[:4]
You can use the same locator but get the innerHTML instead of .text. This will get you all the HTML between the open and close <DIV> tags. Then you can split the resulting string by <br> and you will have all the desired pieces. From your sample HTML, it looks like you will probably want to strip() each piece to remove spaces and you will have to process/parse the link portion however you need.
s = driver.find_element_by_xpath("//*[#id='nomapdata']/div/div/div/div[2]/div[1]").get_attribute("innerHTML")
data = [item.strip() for item in s.split("<br>")]
data will now be an array of strings, e.g.
['Name', 'Title', 'http://www.website.com', 'Phone: (555) 555-5555']
You can then process whatever else you want/need to.
First, get the elements:
var elements = _webDriver.FindElements(By.XPath(#"//*[#id='nomapdata']/div/div/div/div[2]/div[1]"));
Second;
foreach (var element in elements)
{
var temp = element.Split('\n');
YourClass yourClass = new YourClass
{
Name = temp[0],
Title = temp[1],
CompanyWebsite = temp[2],
PhoneNumber = temp[3],
};
yourList.Add(yourClass);
}
I am having a problem with python and the Scrappy library. When this code:
self.item['char_SP4_TIP'] = response.xpath('//p[contains(#class, "spell-tooltip")]/text()').extract()
runs, it extracts the text from the paragraph but it splits it by the <br> tags.
So instead of being able to access it like: self.item['char_SP4_TIP'][0], I have to access [0][1][2] etc.. for however many <br> tags there are. Is there any way to fix it so it does not split it by the <br> tags? Thanks.
Your xpath selects all text nodes, but a <br> is not a text node.
<p class='spell-description'> blah <br><br> blah2 </p>
Selects these ^^^^ ^^^^^
You can join the split text.
texts = response.xpath('//p[contains(#class, "spell-tooltip")]/text()').extract()
text = '\n'.join(texts)
If there are multiple <p> tags with that class:
text = ['\n'.join(p.xpath('/text()').extract())
for p in response.xpath('//p[contains(#class, "spell-tooltip")]')]