I have this regex for extracting emails which works fine:
([a-zA-Z][\w\.-]*[a-zA-Z0-9])#([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z])
however there are some e-mails I don't want to include like:
server#example.com
noreply#example.com
name#example.com
I've been trying to add things like ^(?!server|noreplay|name) but isn't no working.
Also by using parentheses as above will afect tuples with (name, domain) ?
Just check for those email addresses after you extract them...
bad_addresses=['server#example.com', 'noreply#example.com', 'name#example.com']
emails=re.findall('[a-zA-Z][\w\.-]*[a-zA-Z0-9])#([a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]', contentwithemails)
for item in emails[:]:
if item in bad_addresses:
emails.remove(item)
You have to do a slice of emails ( emails[:] ), because you can't do a for loop on a list that keeps changing size. This creates a "ghost" list that can be read while the real list is acted on.
Check the results from your regex for any emails that match the bad emails list.
results = list_from_your_regex
invalids = ['info', 'server', 'noreply', ...]
valid_emails = [good for good in results if good.split('#')[0] not in invalids]
Related
I am facing a following problem, I want to have an allowed senders list in my email parser so it will look like:
allowed_emails = ["email1#gmail.com", "email2#gmail.com", "email3#gmail.com"]
I want to use this list in this line, for now it works only with one allowed_sender, how to send a list as an argument?
result, data = mail.search(None,'(UNSEEN FROM "%s")' % allowed_sender)
I'm attempting to implement the mass mail send out.
Here is the mass mail doc: Just a link to the Django Docs
In order to achieve this I need create this tuple:
datatuple = (
('Subject', 'Message.', 'from#example.com', ['john#example.com']),
('Subject', 'Message.', 'from#example.com', ['jane#example.com']),
)
I query the ORM for some recipients details. Then I would imagine there's some looping involved, each time adding another recipient to the tuple. All elements of the message are the same except for username and email.
So far I have:
recipients = notification.objects.all().values_list('username','email')
# this returns [(u'John', u'john#example.com'), (u'Jane', u'jane#example.com')]
for recipient in recipients:
to = recipient[1] #access the email
subject = "my big tuple loop"
dear = recipient[0] #access the name
message = "This concerns tuples!"
#### add each recipient to datatuple
send_mass_mail(datatuple)
I've been trying something along the lines of this :
SO- tuple from a string and a list of strings
If I understand correctly, this is pretty simple with a comprehension.
emails = [
(u'Subject', u'Message.', u'from#example.com', [address])
for name, address in recipients
]
send_mass_mail(emails)
Note that we leverage Python's ability to unpack tuples into a set of named variables. For each element of recipients, we assign its zeroth element to name and its first element to address. So in the first iteration, name is u'John' and address is u'john#example.com'.
If you need to vary the 'Message.' based on the name, you can use string formatting or any other formatting/templating mechanism of your choice to generate the message:
emails = [
(u'Subject', u'Dear {}, Message.'.format(name), u'from#example.com', [address])
for name, address in recipients
]
Since the above are list comprehensions, they result in emails being a list. If you really need this to be a tuple instead of a list, that's easy, too:
emails = tuple(
(u'Subject', u'Message.', u'from#example.com', [address])
for name, address in recipients
)
For this one, we're actually passing a generator object into the tuple constructor. This has the performance benefits of using a generator without the overhead of creating an intermediate list. You can do that pretty much anywhere in Python where an iterable argument is accepted.
Just a little bit of cleanup needed here:
1) actually build the tuple in the loop (this is a bit tricky since you need the extra comma to ensure that a tuple is appended and not the values from the tuple)
2) move the send_mass_mail call outside the loop
This should be working code:
recipients = notification.objects.all().values_list('username','email')
# this returns [(u'John', u'john#example.com'), (u'Jane', u'jane#example.com')]
datatuple = []
for recipient in recipients:
to = recipient[1] #access the email
subject = "my big tuple loop"
dear = recipient[0] #access the name
message = "This concerns tuples!"
#### add each recipient to datatuple
datatuple.append((subject, message, "from#example.com", [to,]),)
send_mass_mail(tuple(datatuple))
EDIT:
jpmc26's technique is definitely more efficient, and if you're planning to have a large email list to send to you should use that. Most likely you should use whichever code makes the most sense to you personally so that when your requirements change you can easily understand how to update.
I am using the IMAPClient library in Python. I am able to download the attached document in the email. I am interested in only Excel files.
I am interested to extract the recipient list from the email. Any idea how to do it in Python ?
Here is the code snippet which might be useful
for ind_mail in emails:
msg_string = ind_mail['RFC822'].decode("utf-8")
#print(msg_string.decode("utf-8"))
email_msg = email.message_from_string(msg_string)
for part in email_msg.walk():
# Download only Excel File
filetype = part.get_content_type()
if(filetype == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'):
#download
The straightforward answer to your question is to get the corresponding headers' values, i.e.:
to_rcpt = email_msg.get_all('to', [])
cc_rcpt = email_msg.get_all('cc', [])
, inside that first loop. The MIME standard doesn't enforce uniqueness on the headers (though strongly suggests it), thus get_all; if not present, you'll still have an empty list for a consecutive loop.
But as tripleee has rightfully pointed out, the mime headers can be easily censored, spoofed or simply removed.
Yet this is the only info persisted and returned by a server, and all mail clients use to present to us :)
Calling msg.get_all will return a list containing one entry per one header, so if you have multiple header, you'll get a list per header
BUT
If one header has multiple emails in a coma-separated way, you will only get one string and you'll have to split it.
The best way to have the list of all the emails from a specific header is to use getaddresses (https://docs.python.org/3/library/email.utils.html#email.utils.getaddresses)
from email.utils import getaddresses
to_rcpt = getaddresses(email_msg.get_all('to', []))
get_all will return an array of all the "To:" headers, and getaddresses will parse each entry and return as many emails as present on each headers. For instance:
message = """
To: "Bob" <email1#gmail.com>, "John" <email2#gmail.com>
To: email3#gmail.com, email4#gmail.com
"""
to_rcpt = getaddresses(email_msg.get_all('to', []))
=> [('Bob', 'email1#gmail.com'), ('John', 'email2#gmail.com'), ('', 'email3#gmail.com'), ('', 'email4#gmail.com')]
Right now I'm "removing" emails from a list by mapping a new list excluding the things I don't want. This looked like:
pattern = re.compile('b\.com')
emails = ['user#a.com', 'user#b.com', 'user#c.com', 'user#d.com']
emails = [e for e in emails if pattern.search(e) == None]
# resulting list: ['user#a.com', 'user#c.com']
However, now I need to filter out multiple domains, so I have a list of domains that need to be filtered out.
pattern_list = ['b.com', 'c.com']
Is there a way to do this still in list comprehension form or am I going to have to revert back to nested for loops?
Note: splitting the string at the # and doing word[1] in pattern_list won't work because c.com needs to catch sub.c.com as well.
There are a few ways to do this, even without using a regex. One is:
[e for e in emails if not any(pat in e for pat in pattern_list)]
This will also exclude emails like user#crumb.com and bob.com#bob.com, but so does your original solution. It does not, however, exclude cases like user#bocom, which your existing solution does. Again, it's not clear if your existing solution actually does what you think it does.
Another possibility is to combine your patterns into one with rx = '|'.join(pattern_list) and then match on that regex. Again, though, you'll need to use a more complex regex if you want to only match b.com as a full domain (not as just part of the domain or as part of the username).
import re
pattern = re.compile('b.com$|c.com$')
emails = ['user#a.com', 'user#b.com', 'user#c.com', 'user#d.com']
emails = [e for e in emails if pattern.search(e) == None]
print emails
what about this
This app will download a webpage and find all email addresses in the text of the page and return a list of them.
This is my current code:
def emails(content):
'return list of email addresses contained in string content'
email = []
content = urlopen(url).read().decode()
pattern='[A-Za-z0-9_.]+\#[A-Za-z0-9_.]+\....'
email.append(re.findall(pattern,content))
print(email)
But for some reason I get:
[['somePERSON#university.ca"']]
instead of :
['somePERSON#university.ca']
re.findall actually returns a list, so you are appending a list to the list. You could do something like email.extend(re.findall(pattern,content)) if you didn't want that behavior (although I usually do checks for matches on their own line to ensure that matches are found and non-matches are handled properly).