I am trying to send an email as below using conflictedblocks_string,string >>>>>>> gets printed fine but gets messedup when sending as email, can anyone explain why and how to fix it?
conflictedblocks_string = ''
conflictedblocks = {'README': '<<<<<<< HEAD\nTBD1\n=======\nTRP1\n>>>>>>> b9bde66...\n', 'DO_NOT_READ': 'Probably a new file'}
for key,value in conflictedblocks.items():
conflictedblocks_string += key + ":" + "\n" + value + "\n"
print conflictedblocks_string --> `>>>>>>>` prints fine
sendemail(conflictedblocks_string ) --> `>>>>>>>` messed up while sending email
sendemail api snippet:
body = '''%s''' % (data)
msg = MIMEText(body)
mail = smtplib.SMTP('company.apple.com', 25)
mail.sendmail(sender, receivers, msg.as_string())
CURRENT OUTPUT:-
EXPECTED OUTPUT:-
README:
<<<<<<< HEAD
TBD1
=======
TRP1
>>>>>>> b9bde66...
DO_NOT_READ:
Probably a new file
There's nothing at all wrong with your code. Or the mail servers. The email has >>>>>>> in it, just as you intended.
However, many mail programs and webmail systems translate > at the start of a line into an indent marker when formatting mail for viewing.
Traditionally, > at the start of a line is how you mark that you're quoting someone inline. So, to make email threads earlier to read, mail clients turn those quotes into something that looks more like quotes.
For example, this is a traditional plain-text email:
John shouted:
> My father said:
>> No! You will BE KILL BY DEMONS
> No! I must kill the demons
The radio said:
> No, John, You are the demons.
And then John was a zombie.
An email client might render it like this:
John shouted:
My father said:
No! You will BE KILL BY DEMONS
No! I must kill the demons
The radio said:
No, John, You are the demons.
And then John was a zombie.
There's no universal workaround that will work for every client, because every client has its own heuristic code that messes things up. But there a few things that often work.
First, if you send both HTML and plain-text versions of your mail, most of the fancy clients that would have treated > as formatting will display the HTML instead, which you can format however you want, while clients that refuse to display HTML will probably also not try to do anything with the >.
Another option is to put the diff you're trying to include as an attachment, instead of the body. You can try to mark it as an inline attachment, in hopes that some clients will show it without making the user click on the attachment and open it, but I don't think too many clients like inline plain-text attachments.
Prefixing the line with a space often works, like this:
<<<<<<< HEAD
TBD1
=======
TRP1
>>>>>>> b9bde66
But of course the person reading the mail will have to know about the extra space—and if they're copying and pasting, they'll have to remember to remove it.
If that works, prefixing every line in the email, or just every line in the diff, with a space, would also work. It doesn't look quite as ugly, but if anything it can cause more copy-paste headaches.
It is unrelated to Python and unrelated to mail sending...
Historically, when mails were just plain ASCII text, the > character was used (as a convention) in responses to mark citations from the original mail.
With HTML and richer character sets, the citations are now indicated with vertical bars | and different colors.
In order to give a nicer user experience, mail readers interpret > characters in plain text mails and format them as modern citation marks.
So:
you mail was correctly sent and contains the correct >
the culprit is your mail reader which (wrongly here) formats initial > in a body line as if it was a citation
Related
I am working on a CRM, where I am receiving hundreds of emails for offers/requirements per day. I am building an API that will process the email and will insert entries in the CRM.
I am using imap_tools to get the mails in my API. but I am stuck at the point when there's a thread/conversation. I read some articles regarding using reference or in-reply-to header from the mail. but unlucky so far. I have also tried using the message-id but it gave me the same email thread instead of multiple emails.
I am getting an email thread/conversation as a single email and I want to get separated emails so I can process them easily.
here's what I have done so far.
from imap_tools import MailBox
with MailBox('mail.mail.com').login('abc#abc.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch():
From = msg.headers['from'][0]
To = msg.headers['to'][0]
subject = msg.headers['subject'][0]
received_date = msg.headers['date'][0]
raw_email = msg.text
process_email(raw_email) #processing the email
The issue you are facing is not related to the headers reference or in-reply-to. Most email clients will append the previous email as quoted text to the new mail when you reply. Hence in a thread, a mail will have the body of all previous mails as quoted text.
In most cases, and I say most since the Email standards vary a lot from client to client, the client will quote the previous mail by pretending > before all quoted lines
new message
> old message
>> very old message
As a hacky solution, you can drop all lines that start with >
In python, you can splitlines() and filter
lines = email.splitlines()
new_lines = [i for i in lines if not i.startswith('>')]
or
new_lines = list(filter(lambda i: not i.startswith('>'), lines))
you may use regular expressions or other techniques too.
the issue with the solution is obvious, if an email contains > else where it will cause loss of information. Hence a more complicated approach is to select lines with > and compare them with the previous emails in the thread using references and remove those which match.
Google has their patented implementation here
https://patents.google.com/patent/US7222299
Source: How to remove the quoted text from an email and only show the new text
Edit
I realized Gmail follows the > quoting and other clients may follow other methods. There's a Wikipedia article on it: https://en.wikipedia.org/wiki/Posting_style
conceptually the approach needed will be similar, but different types of clients will need to be handled
I have a bot I'm writing using imaplib in python to fetch emails from gmail and output some useful data from them. I've hit a snag on selecting the inbox, though; the existing sorting system uses custom labels to separate emails from different customers. I've partially replicated this system in my test email, but imaplib.select() throws a "imaplib.IMAP4.error: SELECT command error: BAD [b'Could not parse command']" with custom labels. Screenshot attatched My bot has no problem with the default gmail folders, fetching INBOX or [Gmail]/Spam. In that case, it hits an error later in the code that deals with completely different problem I have yet to fix. The point, though, is that imaplib.select() is succsessful with default inboxes and just not custom labels.
The way my code works is it works through all the available inboxes, compares it to a user-inputted name, and if they match, saves the name and sets a boolean to true to signal that it found a match. It then checks, if there was a match (the user-inputted inbox exists) it goes ahead, otherwise it throws an error message and resets. It then attempts to select the inbox the user entered.
I've verified that the variable the program's saving the inbox name to matches what's listed as the name in the imap.list() command. I have no idea what the issue is.
I could bypass the process by iterating through all mail to find the email's I'm looking for, but it's far more efficient to use the existing sorting system due to the sheer number of emails on the account I'll be using.
Any help is appreciated!
EDIT: Code attached after request. Thank you to the person who told me to do so.
'''
Fetches emails from the specified inbox and outputs them to a popup
'''
def fetchEmails(self):
#create an imap object. Must be local otherwise we can only establish a single connection
#imap states are kinda bad
imap = imaplib.IMAP4_SSL(host="imap.gmail.com", port="993")
#Login and fetch a list of available inboxes
imap.login(username.get(), password.get())
type, inboxList = imap.list()
#Set a reference boolean and iterate through the list
inboxNameExists = False
for i in inboxList:
#Finds the name of the inbox
name = self.inboxNameParser(i.decode())
#If the given inbox name is encountered, set its existence to true and break
if name.casefold().__eq__(inboxName.get().casefold()):
inboxNameExists = True
break
#If the inbox name does not exist, break and give error message
if inboxNameExists != True:
self.logout(imap)
tk.messagebox.showerror("Disconnected!", "That Inbox does not exist.")
return
'''
If/else to correctly feed the imap.select() method the inbox name
Apparently inboxes containing spaces require quoations before and after
Selects the inbox and pushes it to a variable
two actually but the first is unnecessary(?)
imap is weird
'''
if(name.count(" ") > 0):
status, messages = imap.select("\"" + name + "\"")
else:
status, messages = imap.select(name);
#Int containing total number of emails in inbox
messages = int(messages[0])
#If there are no messages disconnect and show an infobox
if messages == 0:
self.logout(imap)
tk.messagebox.showinfo("Disconnected!", "The inbox is empty.")
self.mailboxLoop(imap, messages)
Figured the issue out after a few hours banging through it with a friend. As it turns out the problem was that imap.select() wants quotations around the mailbox name if it contains spaces. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"")
You can see this reflected in the code I posted with the last if/else statement.
Python imaplib requires mailbox names with spaces to be surrounded by apostrophes. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"").
I working on some analytics for our email help line. I can see the headers and everything that is in them, but I need to separate each header component into its own field/variable. What is the best way to accomplish this.
here is the the code i currently have.
import win32com.client
import win32com
import pandas as pd
M_date = []
M_sender = []
M_sub = []
M_flag = []
M_cat = []
M_folder = []
outlook = win32com.client.Dispatch("outlook.application").GetNamespace("MAPI")
for i in range(0, 20):
try:
inbox = outlook.getdefaultfolder(6).folders[i]
try:
for message in inbox.items:
try:
Folder = str(inbox) + " " + str(i)
Sender= message.sendername
Subject= message.subject
Dates= message.ReceivedTime
M_import = message.Importance
if message.FlagRequest == None :
Flag = ""
else:
Flag = message.FlagRequest
if message.Categories == None:
cat = ""
else:
cat = message.Categories
msg = message.PropertyAccessor.GetProperty("http://schemas.microsoft.com/mapi/proptag/0x007D001F")
print(msg) #debug header
M_folder.append(Folder)
M_date.append(Dates.strftime("%b %d %Y %H:%M:"))
M_sender.append(Sender)
M_sub.append(Subject)
M_flag.append(Flag)
M_cat.append(cat)
except:
pass
except:
pass
except:
pass
df = pd.DataFrame({
'In folder': M_folder,
'Date': M_date,
'Sender': M_sender,
'Subject': M_sub,
'flags': M_flag,
'Categrories': M_cat})
df.to_csv('email_data.csv', index=False)
Thanks
Transport headers is a string which contains properties and their values separated by ":". Basically you need to loop through all lines backwards. If the line starts with space or tab, append it to the previous line and delete the current line. Then loop through all lines and separate them into the header name (left of the first ":") and the header value (right of the first ":").
I do not know Python so I cannot provide any code, but I can tell you about the format of the Transport Message Headers. (I must learn Python, my son-in-law swears by it.)
The Transport Message Headers contain an indefinite number of lines separated by carriage return linefeed. In VBA to access the individual lines, you would have something like:
Dim msgParts() As String
msgParts = Split(msg, vbCrLf)
If a line starts with one or more spaces and or horizontal tabs, it is a continuation of the previous line. Replace all the spaces and tabs at the beginning of a continuation line with one space and append to the previous line.
A line, together with any continuation lines, starts “Xxxx: ”. “Xxxx” will be “To” or “From” or any of the other specified identifiers or a private identifier.
The specification of the lines are RFCs (Request For Comments). I would start with RFC 5321 and follow the references to the related RFCs. Or perhaps I would not.
I have not looked at the RFCs for SMTP (Simple Mail Transfer Protocol) for many years. My recollection is that they were once much simpler. For example, my recollection is that the specification dealt with the continuations and then dealt with the combined line; this would have been standard practice when I was young. I was looking at the specification for email addresses which seemed overly complicated with lots of CRLFs that I did not remember as being allowed within a line. I finally realised that the specification for an email address allowed for a continuation line break between any two elements. In my humble opinion, this made for an unnecessarily complex specification. I would also expect the processing code to be slower since it would be attempting to solve two separate problems at the same time.
In the end, I gave up on the SMTP RFCs. Partly because of the continuation line issue but mainly because they now handle a lot of specialised situations that are quite outside the needs of the simple emails I send and receive. I decided it was easier to analyse the emails I had sent or received than attempt to simplify the specification down to my requirements.
My interest in looking at the Transport Message Headers was because I wanted to identify the other party of every email. For every email in my Outlook folders, I was either the sender or I was one of the recipients. If I was the sender, I wanted the first or only recipient. If I was a recipient, I wanted the sender. This proved difficult or impossible from the properties such as To and From because they usually contain display names. The display names for myself, were every possible variation of my name. If this issue is relevant to you, I am happy to share how I handled it.
I've been experimenting with a Python CGI script to send an e-mail (hosted with a comercial web host - 123reg), and the problem is whenever I run the script from my web browser, it sends two identical e-mails.
The code to send the mail is definitely only being executed once, there are no loops which could cause it to happen twice, I am definitely not clicking the button twice. No exceptions are thrown and the "success" page is sent to the browser as normal.
The strangest thing is that when I comment out the code to print the result page (which is very simple and has no side effects, just 3 print statements in a row) and replace it with a dummy print statement (print "Content-type: text/plain\n\ntest"), it works properly and only sends one e-mail.
I have tried googling the problem to no avail.
I am at my wit's end because this problem doesn't make any sense to me. I'm pretty sure it must be my script since inexplicably it works when you comment out those print statements.
I'd appreciate any help, thanks.
EDIT:
Here's the code which, when commented out, fixes the problem:
print "Content-type: text/html"
print
print page
EDIT:
The code to send the e-mail:
#send_email function: sends message from from_addr, assumes valid input
def send_email(from_addr, message):
#form the email headers/text:
email = "From: " + from_addr + "\n"
email += "To: " + TO[0] + "\n"
email += "Subject: " + SUBJECT + "\n"
email += "\n"
email += message
#return true for success, false for failure:
try:
server = smtplib.SMTP(SERVER)
server.sendmail(from_addr, TO, email)
server.quit()
return True;
except smtplib.SMTPException:
return False;
#end of send_email function
I'd post the code to format the page variable, but all it does is read from a file, format a string and return the string. Nothing unusual going on.
EDIT
OK, I've commented out the file IO code in the create_page function and it solves the issue, but I don't understand why, and I don't know how to modify it so that it'll work properly.
The create_page function, and therefore the file IO, was still being executed when I found that commenting out the print statements solved the problem.
This is the file IO code from before I commented it out (it's at the very start of the create_page function and the rest of the function simply modifies the page string, then returns it):
#read the template from the file:
frame_f = open(FRAME)
page = frame_f.read()
frame_f.close()
EDIT:
I have just replaced the file IO by copying and pasting the file text directly into a string in my source file, so there is no longer any file IO. This still hasn't fixed the problem. At this point my only theory is that computers hate me...
EDIT:
I'll have to post this here since stackoverflow won't let me answer my own question since I'm a newbie here...
EDIT:
OK, I posted it as an actual answer now.
PROBLEM SOLVED!
It turns out that it was the browser's fault all along. The reason I didn't notice this sooner was because I tested it in both Firefox and Chrome ages ago to rule the browser out, however it turns out that both Chrome and Firefox share this same bug.
I realised what was happening when the server logs finally updated, I realised that often GET requests were immediately (1 second later) followed by another GET request. I did some googling and found this:
What causes Firefox to make a GET request after submitting a form via the POST method?
It turns out that if you have an img tag with an empty src attribute e.g.
<img src=""/>
(I had some javascript which modified that tag), Firefox will send a duplicate GET request in place of a request for the image. It also turns out that Chrome has the same problem. This also explains why the problem was only happening when I was trying to include my html template.
It would help if you posted more code, but does the "page" variable contain code that would execute the email server a second time, or cause a page refresh that would trigger the email a second time.
The same thing will happen if you have a Javascript call with an empty src or "#" as src:
<script type="text/javascript" src="#"></script>
Perhaps also with an empty href for a css link. I haven't experienced that, but I'd expect the same behavior.
# settings.py
EMAIL_BACKEND = 'django.core.mail.backends.filebased.EmailBackend'
# view.py
from django.core.mail import send_mail
def send_letter(request):
the_text = 'this is a test of a really long line that has more words that could possibly fit in a single column of text.'
send_mail('some_subject', the_text, 'me#test.com', ['me#test.com'])
The Django view code above, results in a text file that contains a broken line:
this is a test of a really long line that has more words that could possibl=
y fit in a single column of text.
-------------------------------------------------------------------------------
Anyone know how to change it so the output file doesn't have linebreaks? Is there some setting in Django that controls this? Version 1.2 of Django.
Update - to back up a level and explain my original problem :) I'm
using the django-registration app, which sends an email with an
account activation link. This link is a long URL, with a random
token at the end (30+ characters), and as a result, the line is breaking in the middle of the token.
In case the problem was using the Django's filebased EmailBackend, I switched to the smtp backend and ran the built-in Python smtpd server, in debugging mode. This dumped my email to the console, where it was still broken.
I'm sure django-registration is working, with zillions of people using it :) So it must be something I've done wrong or mis-configured. I just have no clue what.
Update 2 - according to a post in a Django list, it's really the underlying Python email.MIMEText object, which, if correct, only pushes the problem back a little more. It still doesn't tell me how to fix it. Looking at the docs, I don't see anything that even mentions line-wrapping.
Update 3 (sigh) - I've ruled out it being a MIMEText object problem. I used a pure Python program and the smtplib/MIMEText to create and send a test email, and it worked fine. It also used a charset = "us-ascii", which someone suggested was the only charset to not wrap text in MIMEText objects. I don't know if that's correct or not, but I did look more closely at my Django email output, and it has a charset of "utf-8".
Could the wrong charset be the problem? And if so, how do I change it in Django?
Here's the entire output stream from Django's email:
---------- MESSAGE FOLLOWS ----------
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Subject: some_subject
From: me#test.com
To: me#test.com
Date: Tue, 17 May 2011 19:58:16 -0000
this is a test of a really long line that has more words that could possibl=
y fit in a single column of text.
------------ END MESSAGE ------------
You might be able to get your email client to not break on the 78 character soft limit by creating an EmailMessage object and passing in headers={'format': 'flowed'} Like so:
from django.core.mail import EmailMessage
def send_letter(request):
the_text = 'this is a test of a really long line that has more words that could possibly fit in a single column of text.'
email = EmailMessage(
subject='some_subject',
body=the_text,
from_email='me#test.com',
to=['me#test.com'],
headers={'format': 'flowed'})
email.send()
If this doesn't work, try using a non-debug smtp setup to send the file to an actual email client that renders the email according to rules defined in the email header.
Try to define EMAIL_BACKEND in your settings.py. Maybe it doesn't solve your problem, but is the right place where to define it, otherwise it's likely not going to be used.
(Since I'm not sure I'm solving your problem here, I was trying to make a comment on your, but apparently I cannot.)
The email lines aren't "broken" per se -- they're just represented in the quoted-printable encoding. As such, at 76 characters, =\n is inserted. Any competent mail client ought to decode the message properly and remove the break.
If you want to represent the body of an email decoded, you can use this by passing decode=True to the the get_payload method:
body = email.get_payload(decode=True)
This tells the message to decode the quoted-printable encoding.
More to the point, if your main concern is getting the python console debugging server to print the message decoded, you could do something quick and dirty like this snippet rather than using the built-in DebuggingServer. More properly, you could parse the "data" string as an Email object, print out the headers you care about, then print the body with decode=True.
I've seen this is python2.5 and it's fixed in python2.7.
The relevant code in email/generator.py now has a comment saying
# Header's got lots of smarts, so use it. Note that this is
# fundamentally broken though because we lose idempotency when
# the header string is continued with tabs. It will now be
# continued with spaces. This was reversedly broken before we
# fixed bug 1974. Either way, we lose.
You can read about the bug here http://bugs.python.org/issue1974
Or you can just change the '\t' to ' ' in this line of email/generator.py
print >> self._fp, Header(
v, maxlinelen=self._maxheaderlen,
header_name=h, continuation_ws='\t').encode()