unnecessary exclamation marks(!)'s in HTML code - python

I am emailing the content of a text file "gerrit.txt" # http://pastie.org/8289257 in outlook using the below code,
however after the email is sent when I look at the source code( #http://pastie.org/8289379) of the email in outlook ,i see unnecessary
exclamation markds(!)'s in the code which is messing up the output, can anyone provide inputs on why is it so and how to avoid this ?
from email.mime.text import MIMEText
from smtplib import SMTP
def email (body,subject):
msg = MIMEText("%s" % body, 'html')
msg['Content-Type'] = "text/html; charset=UTF8"
msg['Subject'] = subject
s = SMTP('localhost',25)
s.sendmail('userid#company.com', ['userid2#company.com'],msg=msg.as_string())
def main ():
# open gerrit.txt and read the content into body
with open('gerrit.txt', 'r') as f:
body = f.read()
subject = "test email"
email(body,subject)
print "Done"
if __name__ == '__main__':
main()

Some info available here: http://bugs.python.org/issue6327
Note that mailservers have a 990-character limit on each line
contained within an email message. If an email message is sent that
contains lines longer than 990-characters, those lines will be
subdivided by additional line ending characters, which can cause
corruption in the email message, particularly for HTML content. To
prevent this from occurring, add your own line-ending characters at
appropriate locations within the email message to ensure that no lines
are longer than 990 characters.
I think you must split your html to some lines. You can use textwrap.wrap method.

adding a '\n' in between my html string , some random 20 characters before "!" was appearing solved my problem

I also faced the same issue, Its because outlook doesn't support line more than 990 characters it starts giving below issues.
Nested tables
Color change of column heading
Adding unwanted ! marks .
Here is solution for the same.
if you are adding for single line you can add
"line[:40]" + \r\n + "line[40:]".
If you are forming a table then you can put the same in loop like
"<td>" + line[j][:40]+"\r\n"+line[j][40:] + "</td>"

In my case the html is being constructed outside of the python script and is passed in as an argument. I added line breaks after each html tag within the python script which resolved my issue:
import re
result_html = re.sub(">", ">\n", html_body)

Related

How to add multiple embedded images to an email in Python?

This question is really a continuation of this answer
https://stackoverflow.com/a/49098251/19308674. I'm trying to add multiple embedded images (not just one) to the email content.
I want to do it in a way that I loop through a list of images, in addition, there will be different text next to each image. Something like this for example as you can see in Weather Next 10 days I want to loop through images from a folder and next to each image there will be some different text as in the example.
from email.message import EmailMessage
from email.utils import make_msgid
import mimetypes
msg = EmailMessage()
# generic email headers
msg['Subject'] = 'Hello there'
msg['From'] = 'ABCD <abcd#example.com>'
msg['To'] = 'PQRS <pqrs#example.org>'
# set the plain text body
msg.set_content('This is a plain text body.')
# now create a Content-ID for the image
image_cid = make_msgid(domain='example.com')
# if `domain` argument isn't provided, it will
# use your computer's name
# set an alternative html body
msg.add_alternative("""\
<html>
<body>
<p>This is an HTML body.<br>
It also has an image.
</p>
<img src="cid:{image_cid}">
</body>
</html>
""".format(image_cid=image_cid[1:-1]), subtype='html')
# image_cid looks like <long.random.number#example.com>
# to use it as the img src, we don't need `<` or `>`
# so we use [1:-1] to strip them off
# now open the image and attach it to the email
with open('path/to/image.jpg', 'rb') as img:
# know the Content-Type of the image
maintype, subtype = mimetypes.guess_type(img.name)[0].split('/')
# attach it
msg.get_payload()[1].add_related(img.read(),
maintype=maintype,
subtype=subtype,
cid=image_cid)
# the message is ready now
# you can write it to a file
# or send it using smtplib
If I'm able to guess what you are trying to ask, the solution is simply to generate a unique cid for each image.
from email.message import EmailMessage
from email.utils import make_msgid
# import mimetypes
msg = EmailMessage()
msg["Subject"] = "Hello there"
msg["From"] = "ABCD <abcd#example.com>"
msg["To"] = "PQRS <pqrs#example.org>"
# create a Content-ID for each image
image_cid = [make_msgid(domain="example.com")[1:-1],
make_msgid(domain="example.com")[1:-1],
make_msgid(domain="example.com")[1:-1]]
msg.set_content("""\
<html>
<body>
<p>This is an HTML body.<br>
It also has three images.
</p>
<img src="cid:{image_cid[0]}"><br/>
<img src="cid:{image_cid[1]}"><br/>
<img src="cid:{image_cid[2]}">
</body>
</html>
""".format(image_cid=image_cid), subtype='html')
for idx, imgtup in enumerate([
("path/to/first.jpg", "jpeg"),
("file/name/of/second.png", "png"),
("path/to/third.gif", "gif")]):
imgfile, imgtype = imgtup
with open(imgfile, "rb") as img:
msg.add_related(
img.read(),
maintype="image",
subtype=imgtype,
cid=f"<{image_cid[idx]}>")
# The message is ready now.
# You can write it to a file
# or send it using smtplib
Kudos for using the modern EmailMessage API; we still see way too many questions which blindly copy/paste the old API from Python <= 3.5 with MIMEMultipart etc etc.
I took out the mimetypes image format quessing logic in favor of spelling out the type of each image in the code. If you need Python to guess, you know how to do that, but for a small static list of images, it seems to make more sense to just specify each, and avoid the overhead as well as the unlikely but still not impossible problem of having the heuristics guess wrong.
I'm guessing your images will all use the same format, and so you could actually simply hardcode subtype="png" or whatever.
It should hopefully be obvious how to add more per-image information into the loop over image tuples, though if your needs go beyond the trivial, you'll probably want to encapsulate the image and its various attributes into a simple class.
Your message apparently makes no sense for a recipient who cannot access the HTML part, so I took out the bogus text/plain part you had. You were effectively sending a different message entirely to recipients whose preference is to view plain text over HTML; if that was genuinely your intent, please stop it. If you are unable to provide the same information in the plain text version as in the HTML version, at least don't make it look to those recipients like you had nothing of importance to say in the first place.
Tangentially, please don't fake email addresses of domains you don't own. You will end up tipping off the spammers and have them trying to send unsolicited messages to an innocent third party. Always use IANA-reserved domains like example.com, example.org etc which are guaranteed to never exist in reality. I edited your question to fix this.

How convert text from shell to html?

Probably it really easy and stupid question, but I'm new in Python, so deal with it
What I need to do - execute command in shell and output it as telegra.ph page
Problem - telegra.ph API ignores \n things and all outpot text in one line
Used Python Telegraph API wrapper - https://github.com/python273/telegraph
I understand it needs to convert my text to html-like and remove <p> tags, I've tried some scripts, but my program gave me error:
telegraph.exceptions.NotAllowedTag: span tag is not allowed
So, I've removed all span tags and got same result as if I've put without converting
Then I tried to use replace("\n", "<p>") but stucked in closing tags...
Code:
import subprocess
from telegraph import Telegraph
telegraph = Telegraph()
telegraph.create_account(short_name='1111')
tmp = subprocess.run("arp", capture_output=True, text=True, shell=True).stdout
print( '\n\n\n'+tmp+'\n\n\n\n') ### debug line
response = telegraph.create_page(
'Random',
html_content= '<p>' + tmp + '</p>'
)
print('https://telegra.ph/{}'.format(response['path']))
The closest html equivalent to \n is the "hard break" <br/> tag.
It does not require closing, because it contains nothing, and directly signifies line break.
Assuming it is supported by telegra.ph, you could simply:
tmp.replace('\n', '<br/>');
Add this line to convert all intermediate newlines to individual <p>-sections:
tmp = "</p><p>".join(tmp.split("\n"))
tmp.split("\n") splits the string into an array of lines.
"</p><p>".join(...) glues everything together again, closing the previous <p>-section and starting a new one.
This way, the example works for me and line breaks are correctly displayed on the page.
EDIT: As the other answer suggests, of course you can also use tags. It depends on what you want to achieve!
It is not clear to me, why does the telegraph module replace newlines with spaces. In this case it seems reasonable to disable this functionality.
import subprocess
import re
import telegraph
from telegraph import Telegraph
telegraph.utils.RE_WHITESPACE = re.compile(r'([ ]{10})', re.UNICODE)
telegraph = Telegraph()
telegraph.create_account(short_name='1111')
tmp = subprocess.run("/usr/sbin/arp",
capture_output=True,
text=True,
shell=True).stdout
response = telegraph.create_page(
'Random',
html_content = '<pre>' + tmp + '</pre>'
)
print('https://telegra.ph/{}'.format(response['path']))
Would output
that comes close to actual formatted arp output.

Remove newline in python with urllib

I am using Python 3.x. While using urllib.request to download the webpage, i am getting a lot of \n in between. I am trying to remove it using the methods given in the other threads of the forum, but i am not able to do so. I have used strip() function and the replace() function...but no luck! I am running this code on eclipse. Here is my code:
import urllib.request
#Downloading entire Web Document
def download_page(a):
opener = urllib.request.FancyURLopener({})
try:
open_url = opener.open(a)
page = str(open_url.read())
return page
except:
return""
raw_html = download_page("http://www.zseries.in")
print("Raw HTML = " + raw_html)
#Remove line breaks
raw_html2 = raw_html.replace('\n', '')
print("Raw HTML2 = " + raw_html2)
I am not able to spot out the reason of getting a lot of \n in the raw_html variable.
Your download_page() function corrupts the html (str() call) that is why you see \n (two characters \ and n) in the output. Don't use .replace() or other similar solution, fix download_page() function instead:
from urllib.request import urlopen
with urlopen("http://www.zseries.in") as response:
html_content = response.read()
At this point html_content contains a bytes object. To get it as text, you need to know its character encoding e.g., to get it from Content-Type http header:
encoding = response.headers.get_content_charset('utf-8')
html_text = html_content.decode(encoding)
See A good way to get the charset/encoding of an HTTP response in Python.
if the server doesn't pass charset in Content-Type header then there are complex rules to figure out the character encoding in html5 document e.g., it may be specified inside html document: <meta charset="utf-8"> (you would need an html parser to get it).
If you read the html correctly then you shouldn't see literal characters \n in the page.
If you look at the source you've downloaded, the \n escape sequences you're trying to replace() are actually escaped themselves: \\n. Try this instead:
import urllib.request
def download_page(a):
opener = urllib.request.FancyURLopener({})
open_url = opener.open(a)
page = str(open_url.read()).replace('\\n', '')
return page
I removed the try/except clause because generic except statements without targeting a specific exception (or class of exceptions) are generally bad. If it fails, you have no idea why.
Seems like they are literal \n characters , so i suggest you to do like this.
raw_html2 = raw_html.replace('\\n', '')

HTML formatting issues with python smtplib and Outlook 2010

I am generating html files using elementtree.ElementTree.dump on an Element. The files look ok in all browsers, and the underlying code within the files looks fine (no unclosed brackets or anything).
When I send an email to Outlook 2010 via smtplib, I am seeing weird formatting issues. These issues will be 100% repeatable, so the issue is logical. Here is an example:
<table b="" order="1">
That is from the source code of a HTML email I sent myself. It is correctly written as:
<table border="1">
within the original source code.
If in Outlook I write a HTML email using the original HTML as source, it correctly formats. (New email-attach html file->insert as text)
Is the issue going to be Outlook or Python? The function I used for reading the html file and sending is below.
def email_Report(mailOptions):
reportName = time.strftime("%Y%m%d.%H%M") + ".html"
ElementTree(mailOptions['report']).write("/home/%s/%s" %(mailOptions['username'],reportName))
#Set sender and receiver to the user building the report.
mailaddr = '%s#acme.com' %(mailOptions['username'])
#Access the report file. Added binary in case we ever use code on Windows
filename = "/home/%s/%s" % (mailOptions['username'], reportName)
open_file = open(filename, 'rb')
emsg = MIMEText(open_file.read(), 'html')
open_file.close()
emsg['Subject'] = "Report for %s generated by %s %s" % (mailOptions['zone'], mailOptions['username'], time.strftime("%d%m%Y-%H%M"))
emsg['To'] = mailaddr
emsg['From'] = mailaddr
#Hostname can be a parameter to SMTP method if localhost isn't listening
sc = smtplib.SMTP()
sc.connect()
sc.sendmail(mailaddr, mailaddr, emsg.as_string())
sc.close()
return
The HTML is extremely simple. No CSS, no title or head tags etc. Just html->body->table->tr->th->(newrow)->td->td etc. Could I have overlooked something like encoding/escaping? Do I have to use mime multipart? I am using Python 2.4.3 and can't use any module that didn't come stock.
Are you sure you're not running into the 990 character limit for mail servers as per
workaround for the 990 character limitation for email mailservers

How to handle line-breaks in HTML forms?

I have a form with a textarea and need to submit multiple lines of input to the textarea.
I use :
rows = [('a','b'), ('c','d')]
data_set = [ '%s\n' % '|'.join(row) for row in rows ] # Note : ADDED '\n'
data_dump = ''.join(data_set)
from mechanize import Browser
br = Browser()
br.open('http://example.com/page.html')
br.select_form(nr=1)
br.form['my_text_area']=data_dump
br.submit()
Problem:
Webserver is not able to see the input as multiple lines.
ADDED \n is not working for simulating line breaks in the inputs.
What am I doing wrong ?
Feel free to ask for more info if I have missed something !
Update
I also tried \n\r in place of \n, but the problem persists.
I figured it out with the help of https://stackoverflow.com/users/87015/salman-a
CR = \r
LF = \n
And HTML forms take a line-break as CRLF, so therefore :
\r\n worked !

Categories

Resources