=)
I need get all messages from email inbox with specific address.
For that i use command:
self.server.search(None, '(HEADER FROM "test#gmail.com")')
and it's work but when I try find message form st#gmail.com I got the same results. And I know with this criteria I searching all messages CONTAINS specific string. But for me test#gmail.com and st#gmail.com is diffrents addresses. How can I search for EQUAL not CONTAINS addresses?
import imaplib
self.server = imaplib.IMAP4(self.imap_ssl_host, self.imap_ssl_port)
You can try searching for <test#gmail.com> instead of test#gmail.com.
A message from test#gmail.com usually says From: Firstname Lastname <test#gmail.com>, which contains the substring <test#, and most IMAP searches are substring searches, including FROM. If this hack is enough for you and whatever server you're using, good for you, otherwise you need to do clientside filtering to remove the false positives.
Related
I am working on a CRM, where I am receiving hundreds of emails for offers/requirements per day. I am building an API that will process the email and will insert entries in the CRM.
I am using imap_tools to get the mails in my API. but I am stuck at the point when there's a thread/conversation. I read some articles regarding using reference or in-reply-to header from the mail. but unlucky so far. I have also tried using the message-id but it gave me the same email thread instead of multiple emails.
I am getting an email thread/conversation as a single email and I want to get separated emails so I can process them easily.
here's what I have done so far.
from imap_tools import MailBox
with MailBox('mail.mail.com').login('abc#abc.com', 'password', 'INBOX') as mailbox:
for msg in mailbox.fetch():
From = msg.headers['from'][0]
To = msg.headers['to'][0]
subject = msg.headers['subject'][0]
received_date = msg.headers['date'][0]
raw_email = msg.text
process_email(raw_email) #processing the email
The issue you are facing is not related to the headers reference or in-reply-to. Most email clients will append the previous email as quoted text to the new mail when you reply. Hence in a thread, a mail will have the body of all previous mails as quoted text.
In most cases, and I say most since the Email standards vary a lot from client to client, the client will quote the previous mail by pretending > before all quoted lines
new message
> old message
>> very old message
As a hacky solution, you can drop all lines that start with >
In python, you can splitlines() and filter
lines = email.splitlines()
new_lines = [i for i in lines if not i.startswith('>')]
or
new_lines = list(filter(lambda i: not i.startswith('>'), lines))
you may use regular expressions or other techniques too.
the issue with the solution is obvious, if an email contains > else where it will cause loss of information. Hence a more complicated approach is to select lines with > and compare them with the previous emails in the thread using references and remove those which match.
Google has their patented implementation here
https://patents.google.com/patent/US7222299
Source: How to remove the quoted text from an email and only show the new text
Edit
I realized Gmail follows the > quoting and other clients may follow other methods. There's a Wikipedia article on it: https://en.wikipedia.org/wiki/Posting_style
conceptually the approach needed will be similar, but different types of clients will need to be handled
I have a bot I'm writing using imaplib in python to fetch emails from gmail and output some useful data from them. I've hit a snag on selecting the inbox, though; the existing sorting system uses custom labels to separate emails from different customers. I've partially replicated this system in my test email, but imaplib.select() throws a "imaplib.IMAP4.error: SELECT command error: BAD [b'Could not parse command']" with custom labels. Screenshot attatched My bot has no problem with the default gmail folders, fetching INBOX or [Gmail]/Spam. In that case, it hits an error later in the code that deals with completely different problem I have yet to fix. The point, though, is that imaplib.select() is succsessful with default inboxes and just not custom labels.
The way my code works is it works through all the available inboxes, compares it to a user-inputted name, and if they match, saves the name and sets a boolean to true to signal that it found a match. It then checks, if there was a match (the user-inputted inbox exists) it goes ahead, otherwise it throws an error message and resets. It then attempts to select the inbox the user entered.
I've verified that the variable the program's saving the inbox name to matches what's listed as the name in the imap.list() command. I have no idea what the issue is.
I could bypass the process by iterating through all mail to find the email's I'm looking for, but it's far more efficient to use the existing sorting system due to the sheer number of emails on the account I'll be using.
Any help is appreciated!
EDIT: Code attached after request. Thank you to the person who told me to do so.
'''
Fetches emails from the specified inbox and outputs them to a popup
'''
def fetchEmails(self):
#create an imap object. Must be local otherwise we can only establish a single connection
#imap states are kinda bad
imap = imaplib.IMAP4_SSL(host="imap.gmail.com", port="993")
#Login and fetch a list of available inboxes
imap.login(username.get(), password.get())
type, inboxList = imap.list()
#Set a reference boolean and iterate through the list
inboxNameExists = False
for i in inboxList:
#Finds the name of the inbox
name = self.inboxNameParser(i.decode())
#If the given inbox name is encountered, set its existence to true and break
if name.casefold().__eq__(inboxName.get().casefold()):
inboxNameExists = True
break
#If the inbox name does not exist, break and give error message
if inboxNameExists != True:
self.logout(imap)
tk.messagebox.showerror("Disconnected!", "That Inbox does not exist.")
return
'''
If/else to correctly feed the imap.select() method the inbox name
Apparently inboxes containing spaces require quoations before and after
Selects the inbox and pushes it to a variable
two actually but the first is unnecessary(?)
imap is weird
'''
if(name.count(" ") > 0):
status, messages = imap.select("\"" + name + "\"")
else:
status, messages = imap.select(name);
#Int containing total number of emails in inbox
messages = int(messages[0])
#If there are no messages disconnect and show an infobox
if messages == 0:
self.logout(imap)
tk.messagebox.showinfo("Disconnected!", "The inbox is empty.")
self.mailboxLoop(imap, messages)
Figured the issue out after a few hours banging through it with a friend. As it turns out the problem was that imap.select() wants quotations around the mailbox name if it contains spaces. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"")
You can see this reflected in the code I posted with the last if/else statement.
Python imaplib requires mailbox names with spaces to be surrounded by apostrophes. So imap.select("INBOX") is fine, but with spaces you'd need imap.select("\"" + "Label Name" + "\"").
I am trying to grab a list of messages that have a specific content e.g. billing emails and work on data in there.
In order to get these messages, I run the following
service.users().messages().list(userId=user_id, page_token=page_token, q=query).execute()
which returns all the messages.
I want to limit the messages that I get to confirm to the following criteria:
Sent in the last two days
Definitely deny if from: address not in a list of email addresses i.e. blacklist e.g. notifications, facebook
Definitely accept if from: address in a list of email addresses i.e. whitelist
Look if the subject: matches a set of strings
I understand that I can create a query that would match the email address and subject (from:bill#pge.com AND subject:"Your bill for this month"), but the blacklist and whitelist, as mentioned above, can become significantly large as the scope and the number of vendors I can accept increases, and similar is the case with subject. So my question is:
Is there a limit on the number of query terms?
Is there a way to achieve this other than generating a very long query string combining the black list whitelist and subject (from:abc#this.com AND NOT from:xyz#that.com AND subject:"Your bill" AND subject:"This month's bill")?
Note: For project settings I mostly conform to https://developers.google.com/gmail/api/quickstart/python
There's no limit documented for the number of query terms you can use. Yes, you would have to create programmatically a long query string combining all the emails from the lists. Here [1] you can check the operators you can use, the best approach would be like this:
1) Use "after" or "newer" operators with a timestamp from 2 days before the current date.
2) -from:{xxx#xxx.com xxx#xxx.com ...}
3) from:{xxx#xxx.com xxx#xxx.com ...}
4) subject:{xxx xxx ...}
[1] https://support.google.com/mail/answer/7190
I am retrieving emails from my email server using IMAPClient (Python), by checking for emails flagged with "\Recent". After the email has been read the email server automatically sets the email flag to "\Seen".
What I want to do is reset the email flag to "\Recent" so when I check the email directly on the server is still appears as unread.
What I'm finding is that IMAPClient is throwing an exception when I try to add the "\Recent" flag to an email using IMAPClient's "set_flag" definition. Adding any other flag works fine.
The IMAPClient documentation say's the Recent flag is read-only, but I was wondering if there is still a way to mark an email as un-read.
From my understanding email software like Thunderbird allows you to set emails as un-read so I assume there must be a way to do it.
Thanks.
For completeness, here's an actual example using IMAPClient. The \Seen flag is updated in order to control whether messages are marked as read or unread.
from imapclient import IMAPClient, SEEN
client = IMAPClient(...)
client.select_folder('INBOX')
msg_ids = client.search(...)
# Mark messages as read
client.add_flags(msg_ids, [SEEN])
# Mark messages as unread
client.remove_flags(msg_ids, [SEEN])
Note that add_flags and remove_flags are used instead of set_flags because the latter resets the flags to just those specified. When setting the read/unread status you typically want to leave any other message flags intact.
It's also worth noting that it's possible call fetch using the "BODY.PEEK" data item to retrieve parts of messages without affecting the \Seen flag. This can avoid the need to fix up the \Seen flag after downloading a message.
See section 6.4.5 of RFC 3501 for more details.
IMAPClient docs specifically stated the '\Recent' flag is ReadOnly:
http://imapclient.readthedocs.org/en/latest/#message-flags
This is probably a feature (or limitation) of IMAP and IMAP servers. (That is: probably not an IMAPClient limitation).
Use the '\Seen' flag to mark something unread.
Disclaimer: I'm familiar with IMAP but not Python-IMAPClient specifically.
Normally the 'seen' flag determines if an email summary will be shown normal or bold.
You should be able to reset the seen flag. However the recent flag may not be under your direct control. The imap server will set it if notices new messages arriving.
#Menno Smits:
I'm having issues adding the '\Seen' flag to a mail after parsing through it.
I only want to mark a mail as READ when it contains a particular text.
I've been trying to use the add_flags using the "client.add_flags(msg_ids, [SEEN])" you gave above but I keep getting store failed: Command received in invalid state What exactly goes into the [SEEN](is this just a placeholder or the exact syntax?)
Here is a portion of my code:
#login and authentication
context=ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
iobj=imapclient.IMAPClient('outlook.office365.com', ssl=True,ssl_context=context)
iobj.login(uname,pwd)
iobj.select_folder('INBOX', readonly=True)
unread=iobj.search('UNSEEN')
print('There are: ',len(unread),' UNREAD emails')
for i in unread:
mail=iobj.fetch(i,['BODY[]'])
mail_body=html2text.html2text(mcontent.html_part.get_payload().decode(mcontent.html_part.charset))
##Do some regex to parse the email to check if it contains text
meter_no=(re.findall(r'\nACCOUNT NUMBER: (\d+)', mail_body))
req_type=(re.findall(r'Complaint:..+?\n(.+)\n', mail_body))
if 'Key Change' in req_type:
if meter_no in kct['Account_no'].values:
print 'Going to sendmail'# Call a function
sending_email(meter_no,subject,phone_no,req_type,)
mail[b'FLAGS']=r'b\Seen'+','+''+r'b\Answered'##Trying to manuaally alter the flag but didn't work##
iobj.add_flags(i,br'\Seen')# Didn't work too (but is 'i' my msg_id??)
iobj.add_flags(i,[SEEN]) # Complains Name SEEN not defined
else: print 'KCT is yet to be generated'
Is it possible to use wildcards in searching for a specific sender on IMAP folder?
typ, data = M.SEARCH(None, 'from','"security#website*"')
IMAP RFC 3501 6.4.4:
In all search keys that use strings, a message matches the key if
the string is a substring of the field. The matching is
case-insensitive.
So you need to search without * and you should almost similar result.
(you get security#website ...)