How to handle line-breaks in HTML forms? - python

I have a form with a textarea and need to submit multiple lines of input to the textarea.
I use :
rows = [('a','b'), ('c','d')]
data_set = [ '%s\n' % '|'.join(row) for row in rows ] # Note : ADDED '\n'
data_dump = ''.join(data_set)
from mechanize import Browser
br = Browser()
br.open('http://example.com/page.html')
br.select_form(nr=1)
br.form['my_text_area']=data_dump
br.submit()
Problem:
Webserver is not able to see the input as multiple lines.
ADDED \n is not working for simulating line breaks in the inputs.
What am I doing wrong ?
Feel free to ask for more info if I have missed something !
Update
I also tried \n\r in place of \n, but the problem persists.

I figured it out with the help of https://stackoverflow.com/users/87015/salman-a
CR = \r
LF = \n
And HTML forms take a line-break as CRLF, so therefore :
\r\n worked !

Related

How convert text from shell to html?

Probably it really easy and stupid question, but I'm new in Python, so deal with it
What I need to do - execute command in shell and output it as telegra.ph page
Problem - telegra.ph API ignores \n things and all outpot text in one line
Used Python Telegraph API wrapper - https://github.com/python273/telegraph
I understand it needs to convert my text to html-like and remove <p> tags, I've tried some scripts, but my program gave me error:
telegraph.exceptions.NotAllowedTag: span tag is not allowed
So, I've removed all span tags and got same result as if I've put without converting
Then I tried to use replace("\n", "<p>") but stucked in closing tags...
Code:
import subprocess
from telegraph import Telegraph
telegraph = Telegraph()
telegraph.create_account(short_name='1111')
tmp = subprocess.run("arp", capture_output=True, text=True, shell=True).stdout
print( '\n\n\n'+tmp+'\n\n\n\n') ### debug line
response = telegraph.create_page(
'Random',
html_content= '<p>' + tmp + '</p>'
)
print('https://telegra.ph/{}'.format(response['path']))
The closest html equivalent to \n is the "hard break" <br/> tag.
It does not require closing, because it contains nothing, and directly signifies line break.
Assuming it is supported by telegra.ph, you could simply:
tmp.replace('\n', '<br/>');
Add this line to convert all intermediate newlines to individual <p>-sections:
tmp = "</p><p>".join(tmp.split("\n"))
tmp.split("\n") splits the string into an array of lines.
"</p><p>".join(...) glues everything together again, closing the previous <p>-section and starting a new one.
This way, the example works for me and line breaks are correctly displayed on the page.
EDIT: As the other answer suggests, of course you can also use tags. It depends on what you want to achieve!
It is not clear to me, why does the telegraph module replace newlines with spaces. In this case it seems reasonable to disable this functionality.
import subprocess
import re
import telegraph
from telegraph import Telegraph
telegraph.utils.RE_WHITESPACE = re.compile(r'([ ]{10})', re.UNICODE)
telegraph = Telegraph()
telegraph.create_account(short_name='1111')
tmp = subprocess.run("/usr/sbin/arp",
capture_output=True,
text=True,
shell=True).stdout
response = telegraph.create_page(
'Random',
html_content = '<pre>' + tmp + '</pre>'
)
print('https://telegra.ph/{}'.format(response['path']))
Would output
that comes close to actual formatted arp output.

Python Mechanize write to TinyMCE Text Editor

I'm using Python Mechanize for adding an event to WordPress but I can't seem to figure out how to write to the TinyMCE Editor in the 'Add New' Event section.
I've been able to make a draft so far by just setting the Title with some value for testing purposes but I am stuck here. What I've done so far is...
br = mechanize.Browser()
response = br.open(url)
Intermediate steps to get to the correct page that don't need to be listed...
Once on the correct page I choose the form that I want to work with, select it and set the title. Once I submit I can actually travel to my drafts section in my normal chrome/firefox browser to see a draft has been created.
for f in br.forms():
if f.name == postForm:
print f
br.select_form(f.name)
br.form['post_title'] = 'Creating from MECHANIZE'
br.submit(name='save', label='Save Draft')
What would be the intermediary steps to input data into the TinyMCE editor?
I realized that by writing:
br.form['content'] = "some content"
You are able to write to the custom textarea. Any HTML content that you have in triple double-quotes will show up as you want once you submit the post.

Python (Anaconda Spyder) Turkish Character issue

I have a problem about python 3.5 Turkish character.
You can see issue in pictures. How can I fix this ?
My Codes is below. You can see last row that print(blink1.text)give charcter problem but print("çÇğĞıİuÜoÖşŞ")is not problem despite that's all same
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.ensonhaber.com/son-dakika")
soup = BeautifulSoup(r.text)
for tag in soup.find_all("ul",attrs={"class":"ui-list"}):
for link1 in tag.find_all('li'):
for link2 in link1.find_all('a',href=True):
print("www.ensonhaber.com" + link2['href'])
print("\n")
print(link2['title'])
for link3 in link1.find_all('span',attrs={"class":"spot"}):
# özet kısmı print(link3.text)
print("\n")
rbodysite = "http://www.ensonhaber.com"+link2['href']
rbody = requests.get(rbodysite)
soupbody = BeautifulSoup(rbody.text)
for btag in soupbody.find_all("article",attrs={"class":""}):
for blink1 in btag.find_all("p"):
print(blink1.text)
print("çÇğĞıİuÜoÖşŞ")
My output :
Hangi Åehirde çekildiÄi bilinmeyen videoda bir çocuk, ailesiyle yolculuk yaparken gördüÄü trafik polisinin üÅüdüÄünü düÅünerek gözyaÅlarına boÄuldu. Trafik polisi, yanına gelen çocuÄu "Ben üÅümüyorum" diyerek teselli etti.
çÇğĞıİuÜoÖşŞ
The problem is most certainly wrong code page. Python is codepage agnostic and neither print nor beautifulsoup is going to fix it for you.
The site seems to serve all pages in UTF-8 so I think your terminal is something else. I don't know what character set has ı but the locations of the corrupted characters and their values suggest Windows-1254. You need to call iconv, but you first need to read the meta tag <meta charset= because it won't always be UTF-8. On the other side, you also need to know your terminal's encoding, but that's harder to get.

Remove newline in python with urllib

I am using Python 3.x. While using urllib.request to download the webpage, i am getting a lot of \n in between. I am trying to remove it using the methods given in the other threads of the forum, but i am not able to do so. I have used strip() function and the replace() function...but no luck! I am running this code on eclipse. Here is my code:
import urllib.request
#Downloading entire Web Document
def download_page(a):
opener = urllib.request.FancyURLopener({})
try:
open_url = opener.open(a)
page = str(open_url.read())
return page
except:
return""
raw_html = download_page("http://www.zseries.in")
print("Raw HTML = " + raw_html)
#Remove line breaks
raw_html2 = raw_html.replace('\n', '')
print("Raw HTML2 = " + raw_html2)
I am not able to spot out the reason of getting a lot of \n in the raw_html variable.
Your download_page() function corrupts the html (str() call) that is why you see \n (two characters \ and n) in the output. Don't use .replace() or other similar solution, fix download_page() function instead:
from urllib.request import urlopen
with urlopen("http://www.zseries.in") as response:
html_content = response.read()
At this point html_content contains a bytes object. To get it as text, you need to know its character encoding e.g., to get it from Content-Type http header:
encoding = response.headers.get_content_charset('utf-8')
html_text = html_content.decode(encoding)
See A good way to get the charset/encoding of an HTTP response in Python.
if the server doesn't pass charset in Content-Type header then there are complex rules to figure out the character encoding in html5 document e.g., it may be specified inside html document: <meta charset="utf-8"> (you would need an html parser to get it).
If you read the html correctly then you shouldn't see literal characters \n in the page.
If you look at the source you've downloaded, the \n escape sequences you're trying to replace() are actually escaped themselves: \\n. Try this instead:
import urllib.request
def download_page(a):
opener = urllib.request.FancyURLopener({})
open_url = opener.open(a)
page = str(open_url.read()).replace('\\n', '')
return page
I removed the try/except clause because generic except statements without targeting a specific exception (or class of exceptions) are generally bad. If it fails, you have no idea why.
Seems like they are literal \n characters , so i suggest you to do like this.
raw_html2 = raw_html.replace('\\n', '')

unnecessary exclamation marks(!)'s in HTML code

I am emailing the content of a text file "gerrit.txt" # http://pastie.org/8289257 in outlook using the below code,
however after the email is sent when I look at the source code( #http://pastie.org/8289379) of the email in outlook ,i see unnecessary
exclamation markds(!)'s in the code which is messing up the output, can anyone provide inputs on why is it so and how to avoid this ?
from email.mime.text import MIMEText
from smtplib import SMTP
def email (body,subject):
msg = MIMEText("%s" % body, 'html')
msg['Content-Type'] = "text/html; charset=UTF8"
msg['Subject'] = subject
s = SMTP('localhost',25)
s.sendmail('userid#company.com', ['userid2#company.com'],msg=msg.as_string())
def main ():
# open gerrit.txt and read the content into body
with open('gerrit.txt', 'r') as f:
body = f.read()
subject = "test email"
email(body,subject)
print "Done"
if __name__ == '__main__':
main()
Some info available here: http://bugs.python.org/issue6327
Note that mailservers have a 990-character limit on each line
contained within an email message. If an email message is sent that
contains lines longer than 990-characters, those lines will be
subdivided by additional line ending characters, which can cause
corruption in the email message, particularly for HTML content. To
prevent this from occurring, add your own line-ending characters at
appropriate locations within the email message to ensure that no lines
are longer than 990 characters.
I think you must split your html to some lines. You can use textwrap.wrap method.
adding a '\n' in between my html string , some random 20 characters before "!" was appearing solved my problem
I also faced the same issue, Its because outlook doesn't support line more than 990 characters it starts giving below issues.
Nested tables
Color change of column heading
Adding unwanted ! marks .
Here is solution for the same.
if you are adding for single line you can add
"line[:40]" + \r\n + "line[40:]".
If you are forming a table then you can put the same in loop like
"<td>" + line[j][:40]+"\r\n"+line[j][40:] + "</td>"
In my case the html is being constructed outside of the python script and is passed in as an argument. I added line breaks after each html tag within the python script which resolved my issue:
import re
result_html = re.sub(">", ">\n", html_body)

Categories

Resources