Python, return new line in function return - python

I have code like this `
celldata=""
count=0
for tableData in y:
count = count+1
strcount=str(count)
celldata += strcount + ")" + tableData .text + "\n"
return celldata
`
I am returning the value to be used in flask, the issue is I want each for loop row in a new line but after trying \n, and in the flask web app I am getting celldata in one single line with one space each between each output line of the for loop.
Here is my current output for celldata in flask web
1)xxxx 2)yyyy
I want the flask web url to return
1)xxxx
2)yyyy

You're presumably returning HTML, and viewing that HTML in a browser.
In HTML, all runs of whitespace are equivalent—there's no difference between '\n' and ' '. The browser should convert them all to single spaces, and then decide how to flow the results nicely.
So, you're going to have to learn some basic HTML. But here are a few quick hints to get you started:
<p>one paragraph</p> <p>another paragraph</p> defines two separate paragraphs.
<p>one paragraph<br />with a line break in the middle</p> defines a paragraph with a line break in the middle.
<table><tr><td>row one</td></tr> <tr><td>row two</td></tr></table> defines a table of two rows (and one column).
The last one is the most complicated, but given that you've got things named tableData and celldata, I suspect it may be what you actually want here.
HTML itself only specifies "structure", not layout. It's up to the browser to decide what "two paragraphs" or "a line break" or "two rows" actually means in terms of actual pixels. If you want finer control, you need to learn CSS as well as HTML, which lets you specify explicit styles for these elements.

If you are trying to format this as HTML, I would suggest you add <br /> also to the returned text:
celldata = []
for count, tableData in enumerate(y, start=1):
celldata.append('{}) {}<br/>'.format(count, tableData.text))
return '\n'.join(celldata)
This first builds a list of entries with the correct numbering, and then joins each line together with a newline. The newline is purely cosmetic and will only effect how the HTML appears when viewed as source. It is the <br /> which will ensure each entry appears on a different line.
enumerate() is used to automatically count your entries for you.

Related

Can i create a list column in SQLAlchemy

I want to input the content of a text area and want to output it on another page but seems like there is nothing like multiline-text area in Flask. When I do the following
content = request.form['content']
it returns a string with line breaks as '\n' but when I try to output that content with replacing \n with or
, it doesn't seem to work.
So I thought I can store the multiline content in the form of a list.
So is there db column for the list, something like
content = db.Column(db.list(String))
or is there any other alternative.
Just to clarify, to the computer these 2 text examples are exactly equivalent:
myString = """Hello
World
"""
myString = "Hello\nWorld"
We can confirm this by checking the repr value for both versions
repr(myString)
# 'Hello\nWorld'
Whether or not the formatting is performed in a "friendly way" where the newlines are rendered as such, is entirely dependent on how you choose to display them. In HTML, newlines are denoted with a <br> tag, so one option would be to store the actual HTML-formatted string in your database after inserting them. However, this may pose a security hazard by either allowing malicious links to be made clickable, or by allowing Javascript snippets to be executed when rendering the page.
The simplest solution would be to use the HTML <pre> tag, which tells it that you have already handled the formatting ahead-of-time. Using the same myString value as before, we can display it nicely with
<pre>
{{ myString }}
<pre>
using the Jinja2 syntax, as long as we pass this string to the render_template function, for example
#app.route("/")
def index():
myString = "Hello\nWorld"
return render_template("index.html", myString=myString)

Using the Join command to eliminate extra paragraph breaks

So I have this text:
'
Location
Address
Number
Website
'
Except the top and bottom lines are empty as well, there aren't single quotes on those two lines. I basically want to make sure each line is one after another without any line breaks. This is what I would like it to look like.
Location
Address
Number
Website
I want to strip all of the line breaks and just have each result one line after another. This is the code to scrape the information from a webpage.
results = soup.findAll('div', class_='name')
for each in results:
worksheet.write(row,1,each.text)
row += 1
Each time I run through this, I want the results to print one line after another. Thanks.
Is there a reason you cannot use a simple if?
results = soup.findAll('div', class_='name')
for each in results:
if each.text:
worksheet.write(row,1,each.text)
row += 1
To join the results with a line-break use :
('\n').join(results)
To join with new lines and remove any new line present use :
import re
line=re.sub(r"(\n)+",r"\n",('\n').join(results))
The above case is useful if you don't know how many new lines exist between the text.(reduces multiple newlines to one)
Also the answer given by Malvolio is to avoid the blank line while writing:
if each.text:
This line would check if a line(each in your case) has text, if it doesn't it skips the statements below it.

How do I preserve new lines when extracting text from html using lxml.text_content()

I am trying to learn to use Whoosh. I have a large collection of html documents I want to search. I discovered that the text_content() method creates some interesting problems for example I might have some text that is organized in a table that looks like
<html><table><tr><td>banana</td><td>republic</td></tr><tr><td>stateless</td><td>person</td></table></html>
When I take the original string and and get the tree and then use text_content to get the text in the following manner
mytree = html.fromstring(myString)
text = mytree.text_content()
The results have no spaces (as should be expected)
'bananarepublicstatelessperson'
I tried to insert new lines using string.replace()
myString = myString.replace('</tr>','</tr>\n')
I confirmed that the new line was present
'<html><table><tr><td>banana</td><td>republic</td></tr>\n<tr><td>stateless</td><td>person</td></table></html>'
but when I run the same code from above the line feeds are not present. Thus the resulting text_content() looks just like above.
This is a problem from me because I need to be able to separate words, I thought I could add non-breaking spaces after each td and line breaks after rows as well asd line breaks after body elements etc to get text that reasonably conforms to my original source.
I will note that I did some more testing and found that line breaks inserted after paragraph tag closes were preserved. But there is a lot of text in the tables that I need to be able to search.
Thanks for any assistance
You could use this solution:
import re
def striphtml(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
>>> striphtml('I Want This <b>text!</b>')
>>> 'I Want This text!'
Found here: using python, Remove HTML tags/formatting from a string

Interpreting nested HTML <blockquote>s in Python?

I have a web app that reads from the Tumblr API and reformats the way that "reblog chains" are formatted.
With Tumblr, commentary for a post is stored as HTML blockquotes. As users respond to the commentary above, another level gets added to the blockquote chain, eventually resulting in many nested reblog chains.
Here is an example of how a "reblog chain" looks in plain HTML:
<p><a class="tumblr_blog" href="http://chainsaw-police.tumblr.com/post/96158438802/example-tumblr-post">chainsaw-police</a>:</p><blockquote>
<p><a class="tumblr_blog" href="http://example-blog-domain.tumblr.com/post/96158384215/example-tumblr-post">example-blog-domain</a>:</p><blockquote>
<p>Here is an example of a Tumblr post.</p> <p>It can have multiple <p> elements sometimes. It may only have one, though, at other times.</p>
</blockquote>
<p>This is an example of a user “reblogging” a post. As you can see, the previous comment is stored above as a <blockquote>.</p>
</blockquote>
<p>This is another reblog. As you can see, all of the previous comments are stored as blockquotes, with earlier ones being residing deeper in the nest of blockquotes.</p>
And this is what it looks like when rendered.
I want to be able to reformat the reblog chain so that it looks more like this:
example-blog-domain:
Here is an example of a Tumblr post.
It can have multiple <p> elements sometimes. It may only have one, though, at other times.
chainsaw-police:
This is an example of a user “reblogging” a post. As you can see, the previous comment is stored above as a <blockquote>.
example-blog-domain:
This is another reblog. As you can see, all of the previous comments are stored as blockquotes, with earlier ones being residing deeper in the nest of blockquotes.
I know, It's an incredibly confusing structure, hence why I'm trying to write something to make it more readable.
Is there any way to interpret the HTML and split the reblogs up into individual "comments"? For example, having an array or dict that has the username and the commentary would be more than enough. However, after messing with lxml and BeautifulSoup for months, I'm at my wits' end.
If there was even a way to do it in CSS, which I highly doubt, that would be fine.
Thanks in advance, everyone!
I guess CSS does not have a such functionality.
You need parse to a structure by lxml, ... and render it. It is easier way. You can also create a filter using regexp that does not pass wrong items of html code.
reddit user /u/joyeusenoelle has answered my question over at /r/LearnPython using a tonne of convoluted regexes that end up looking more like a voodoo magic spell than a text manipulation script.
Lots of regexes later, I think I've solved this for an
arbitrarily-deep comment chain.
import re
with open("tcomment.txt","r") as tf:
text = ""
for line in tf:
text += line
tf.close()
text = text.replace("\n","")
text = text.replace(">",">\n")
text = text.replace("<","\n<")
text = re.sub("</p>\s*<p>","<br><br>", text)
text = text.replace("<p>\n", "")
text = text.replace("</p>\n","\n")
text = re.sub("<[/]{0,1}blockquote>","<chunk>",text)
text = re.sub("<a class=\"tumblr_blog\"[^>]+?>","<chunk>",text)
text = text.replace("</a>","")
text = re.sub("\n+","", text)
text = re.sub("\s{2,}"," ", text)
text = re.sub("<chunk>\s*<chunk>","<chunk>",text)
bits = text.split("<chunk>")
bits[0] = "Latest:"
comments = []
for i in range(len(bits)):
temp = ""
j = 0 - (i+1)
if (len(bits)-i) > i:
temp = "<b>" + bits[i] + "</b> " + bits[j]
comments.append(temp)
comments.reverse()
for comment in comments:
print("<p>%s</p>" % (comment))
print()
The line bits[0] = "Latest:" can be changed to whatever you want the
most recent comment to display, and you'll probably want to change how
the text comes into the script.
For the text you provided, this gives me:
<p><b>example-blog-domain:</b> Here is an example of a Tumblr post.<br><br>It can have multiple <p> elements sometimes. It may
only have one, though, at other times.
<p><b>chainsaw-police:</b> This is an example of a user "reblogging" a post. As you can see, the previous comment is stored
above as a <blockquote>.
<p><b>Latest:</b> This is another reblog. As you can see, all of the previous comments are stored as blockquotes, with earlier ones
being residing deeper in the nest of blockquotes.
e: Some thoughts: this is in Python 3, but everything but the print
statements should work in Python 2, I think. I used text.split()
whenever possible because direct string manipulation is typically
faster than regular expressions are, but that may not be appropriate
here. And finally, it's possible that I'm making more work for myself
than I need to in the substitutions section, but at this point I've
looked at the code too long to figure out if it could be slimmed down.

Improper output while calling ajax request and appending output

Ajax output is
\u001b[1mGetting NS records for yahoo.com\u001b[0m\n\n\n\nIp Address\tServer Name\n\n----------\t-----------\n\n68.180.131.16\tns1.yahoo.com\n\n98.138.11.157\tns4.yahoo.com\n\n203.84.221.53\tns3.yahoo.com\n\n68.142.255.16\tns2.yahoo.com\n\n119.160.247.124\tns5.yahoo.com\n\n202.43.223.170\tns6.yahoo.com\n\n\n\nZone Transfer not enabled\n\n
When I append into html it looks like
[1mGetting NS records for yahoo.com[0m Ip Address Server Name ---------- ----------- 68.180.131.16 ns1.yahoo.com 98.138.11.157 ns4.yahoo.com 203.84.221.53 ns3.yahoo.com 68.142.255.16 ns2.yahoo.com 119.160.247.124 ns5.yahoo.com 202.43.223.170 ns6.yahoo.com Zone Transfer not enabled
"\t" "\n" doesnt seem to be working.
Please help.
HTML does not render tabs and line breaks. For a line break in HTML, use <br>. There are no tabs in HTML, but if you just want to insert some spaces, you can use for each blank space (of course, you can always insert a single space, but multiple spaces will get collapsed unless you explicitly use ).
Another option is to wrap your text in a <pre></pre> element to display the text exactly as you have it formatted in the HTML source (you may need to play with the CSS if you don't like the default formatting of <pre> content). web2py also includes a CODE() helper, which uses <pre> but also enables line numbers and syntax highlighting.

Categories

Resources