Selectively get email messsages with Gmail API - python

I am trying to grab a list of messages that have a specific content e.g. billing emails and work on data in there.
In order to get these messages, I run the following
service.users().messages().list(userId=user_id, page_token=page_token, q=query).execute()
which returns all the messages.
I want to limit the messages that I get to confirm to the following criteria:
Sent in the last two days
Definitely deny if from: address not in a list of email addresses i.e. blacklist e.g. notifications, facebook
Definitely accept if from: address in a list of email addresses i.e. whitelist
Look if the subject: matches a set of strings
I understand that I can create a query that would match the email address and subject (from:bill#pge.com AND subject:"Your bill for this month"), but the blacklist and whitelist, as mentioned above, can become significantly large as the scope and the number of vendors I can accept increases, and similar is the case with subject. So my question is:
Is there a limit on the number of query terms?
Is there a way to achieve this other than generating a very long query string combining the black list whitelist and subject (from:abc#this.com AND NOT from:xyz#that.com AND subject:"Your bill" AND subject:"This month's bill")?
Note: For project settings I mostly conform to https://developers.google.com/gmail/api/quickstart/python

There's no limit documented for the number of query terms you can use. Yes, you would have to create programmatically a long query string combining all the emails from the lists. Here [1] you can check the operators you can use, the best approach would be like this:
1) Use "after" or "newer" operators with a timestamp from 2 days before the current date.
2) -from:{xxx#xxx.com xxx#xxx.com ...}
3) from:{xxx#xxx.com xxx#xxx.com ...}
4) subject:{xxx xxx ...}
[1] https://support.google.com/mail/answer/7190

Related

Get email address of members of an Exchange Distribution List - Python

I used win32.client and could successfully access members of an exchange distribution list using python. However, because there are two users with the same first and last name, I would like to be able to access their email address instead of the name.
Using below loop, I can go through the members of Exchange Distribution List and print the name of all members:
import win32com.client
outlook_obj = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
#This function gets outlook object and retuens all the members of ALL Groups
address_lists = outlook_obj.AddressLists
#Access Exchange Distribution Lists
dist_lists = address_lists['All Distribution Lists']
return(dist_lists)
dl_index = a_numerical_index_greater_than_zero # you can try different numbers until you find the index of your desired distributionList, or loop thorough all the members and find what you are looking for
for m in dist_lists.AddressEntries.Item(dl_index).GetExchangeDistributionList().Members:
print(str(m))
The above script perfectly works and prints out all the name of all the members of the distribution list. However, I am looking for distinct email address of the members, as I see names are not distinct (I can have two people with the same name Jack Smith, but jack.smith#xyz.com and jack.smith2#xyz.com are still distinct).
I used the object definition from this source to build above code, but it seems I am unable to connect members to their email address.
Appreciate any help!
Okay - I got my answer and I am sharing in case others may need this.
Indeed below script is returning the addressEntry of the Member
dist_lists.AddressEntries.Item(dl_index).GetExchangeDistributionList().Members[0].GetExchangeUser()
and addressEntry can give you access to all details of the account, including email address. Below is the exact code to fetch email address of the user
dist_lists.AddressEntries.Item(dl_index).GetExchangeDistributionList().Members[0].GetExchangeUser().PrimarySmtpAddress

IMAP search for address is equal not contains

=)
I need get all messages from email inbox with specific address.
For that i use command:
self.server.search(None, '(HEADER FROM "test#gmail.com")')
and it's work but when I try find message form st#gmail.com I got the same results. And I know with this criteria I searching all messages CONTAINS specific string. But for me test#gmail.com and st#gmail.com is diffrents addresses. How can I search for EQUAL not CONTAINS addresses?
import imaplib
self.server = imaplib.IMAP4(self.imap_ssl_host, self.imap_ssl_port)
You can try searching for <test#gmail.com> instead of test#gmail.com.
A message from test#gmail.com usually says From: Firstname Lastname <test#gmail.com>, which contains the substring <test#, and most IMAP searches are substring searches, including FROM. If this hack is enough for you and whatever server you're using, good for you, otherwise you need to do clientside filtering to remove the false positives.

Extracting the Server Name \Ip from the Description using Python

I have a column called Description in my Dataframe. I have text in that column as below.
Description
Summary: SD1: Low free LOG space in database saptempdb: 2.99% Date: 01/01/2017 Severity: Major Reso
Summary: SD1: Low free DATA space in database 10:101:101:1 2.99% Date: 01/01/2017 Severity: Major Res
Summary: SAP SolMan Sys=SM1_SNG01AMMSOL04,MO=AGEEPM40,Alert=Columnstore Unloads,Desc= ,Cat=Exception
How to extract the Server name or IPs fro the above description. I have around 10000 rows.
I have written as below, to split the senetences as comma separated. Now I need to filter the server names or ips
df['sentsplit'] = df["Description"].str.split(" ")
print df
The general case of what you're asking is "How do I parse this input?". The task then is what knowledge of your input can you exploit to answer your question? Do all the lines follow one or a few forms? Can you place any restrictions on where the hostname or IP address will be on each line?
Given your input, here's a regex I might apply. Quick and dirty -- not elegant -- but if it's only for 10,000 lines, and a one-off job, who cares? It's functional:
database (\d+:\d+:\d+:\d+)|database (\w+)|Sys=([^, ]+),
This regex assumes that the IP address will always be after the word database and preceded by a space, OR that the hostname will be after the word database, OR that the hostname will be preceded bySys=and followed by a,` or a space.
Obviously, test for your purposes, and fine tune as appropriate. In the Python API:
host_or_ip_re = re.compile(r'database (\d+:\d+:\d+:\d+)|database (\w+)|Sys=([^, ]+),')
for line in log:
m = host_or_ip_re.searc( line )
if m:
print m.groups()
The detail that always trips me up is the difference between match and search. Match only matches from the beginning of the string

How to grab the individual address from a mail group on Outlook?

I try to list all the recipients I send emails to by using MAPI.
On MSDN, I found a script which can complete this task. One question is, the mail group only shows the group address. But I want to list all the individual addresses within the group.
Anyone know if this is possible?
recips = message.Recipients
for recip in recips:
pa = recip.PropertyAccessor
smtpAddress = pa.GetProperty(PR_SMTP_ADDRESS)
Process the Recipient.AddressEntry.Members collection recursively. It will be null for the recipients who are not distribution lists.

Django or python manipulate email addresses and reason about domains

I want to be able to parse email addresses to isolate the domain part, and test if an email address is part of a given domain.
The email module doesn't, as far as I can tell, do that. Is there anything worth using to do this other than the usual string handling and regex routines?
Note: I know how to deal with python strings. I don't need basic recipes, although awesome recipes are welcome.
The problem here is essentially that email addresses have the format (schematically) userpart#sub\.domain\.[sld]+\.tld.
Stripping the part before the # is easy; the hard part is parsing the domain to work out which parts are subdomains on a larger organisation's domain, rather than generic second-level (or, I guess even higher order) public domains.
Imagine parsing user#mail.organisation.co.uk to find that the organisation's domain name is organisation.co.uk and so be able to match both mail.organisation.co.uk and finance.organisation.co.uk as subdomains of organisation.co.uk.
There are basically two possible (non-dns-based) approaches: build a finite automaton that knows about all generic slds and their relation to the tld (including popular 'fake' slds like uk.com), or try to guess, based on the knowledge that there must be a tld, and assuming that if there are three (or more) elements, the second-level domain is generic if it has fewer than three/four characters. The relative drawbacks of each approach should be obvious.
The alternative is to look through DNS entries to work out what is a registered domain, which has its own drawbacks.
In any case, I would rather piggyback on the work of others.
As per #dm03514's comment, there is a python library that does exactly this: tldextract:
>>> import tldextract
>>> tldextract.extract('foo#bar.baz.org.uk')
ExtractResult(subdomain='bar', domain='baz', tld='org.uk')
With this simple script, we replace # with #. so that our domain is terminated and the endswith won't match a domain ending with the same text.
def address_in_domain(address, domain):
return address.replace('#', '#.').endswith('.' + domain)
if __name__ == '__main__':
addresses = [
'user1#domain.com',
'user1#anotherdomain.com',
'user2#org.domain.com',
]
print filter(lambda address: address_in_domain(address, 'domain.com'), addresses)
# Prints: ['user1#domain.com', 'user2#org.domain.com']

Categories

Resources