Extracting URL from email inbox

Extracting URL from email inbox - python

Ok there has been some confusion in what I am trying to do so I am doing this over again. I am looking to write a script to run against my inbox that will give me the From Address, Subject, and URL in the email body. The issue I am having is that the URL parsing of the script is pulling all URL's from the email and not just the one from the body. Here is an example
To: Tom#mail.com
From: Joe#test.com
Subject: Confirm you test score
Please go to the following URL to confirm your test score. WWW.test.com/confirmation
Thanks again for your input.
Signed
Joe
(Part of Joes signature has an image)
The URL for the image is
http://www.test.com/wp-content/uploads/_client_image/66-dcfc0fc8.png
I want my output to be
From: Joe#test.com
Subject: Confirm your test score
URL: WWW.test.com/confirmation
I get this instead
From: Joe#test.com
Subject: Confirem your test score
URL: WWW.test.com/confirmation, http://www.test.com/wp-content/uploads/_client_image/66-dcfc0fc8.png
And here is my script
import re
import mailbox
import urlparse
mbx=mailbox.mbox("Mail Box Path")
url_pattern = re.compile('''["']http://[^+]*?['"]''')
for k, m in mbx.iteritems():
print "From %s\n" % m['from']
print "Subject %s\n" % m['subject']
print "URL %s\n" % url_pattern.findall(m.as_string())

Signatures count as the body of the email - so you can't really separate them.
If you're sure there's only one link in the email that you care about, you could try just looking at only the first URL you match - but there isn't a (reliable) way to make sure that you're only interacting with the body of the email and not the signature as well.
Someone even wrote a paper on this - it's extremely difficult, especially when you can't control the format of the emails you're dealing with.

Related

How to send hyperlink with SendGrid using Python

I'm trying to send a simple mail with SendGrid which must contain one hyperlink.
I'm not doing anything fancy, just following the documentation example with some changes
import os
from sendgrid.helpers.mail import *
sg = sendgrid.SendGridAPIClient(api_key=os.environ.get('SENDGRID_API_KEY'))
from_email = Email("test#example.com")
to_email = To("test#example.com")
subject = "Sending with SendGrid is Fun"
content = Content("text/html", '<html>google</html>')
mail = Mail(from_email, to_email, subject, content)
response = sg.client.mail.send.post(request_body=mail.get())
It looks fine to me, but once I run the script and the mail is sent, it shows up like plain text I cannot click on.
I also tried many other combinations removing the <html> tag, using single and double quotes with the backslash, but nothing really worked. I even tried to do the same thing without the Mail Helper Class, but it didn't work.
Thanks very much for the help.

content = Content(
"text/html", "Hi User, \n This is a test email.\n This is to also check if hyperlinks work <a href='https://www.google./com'> Google </a> Regards Karthik")
This helped me. I believe you don't need to mention the html tags

Finding links in an emails body with Python

I am currently working on a project in Python that would be connecting to an email server and looking at the latest email to tell the user if there is an attachment or a link embedded in the email. I have the former working but not the latter.
I may be having troubles with the if any() part of my script. As it seems to half work when I test. Although it may be due to how the email string is printed out?
Here is my code for connecting to gmail and then looking for the link.
import imaplib
import email
word = ["http://", "https://", "www.", ".com", ".co.uk"] #list of strings to search for in email body
#connection to the email server
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('email#gmail.com', 'password')
mail.list()
# Out: list of "folders" aka labels in gmail.
mail.select("Inbox", readonly=True) # connect to inbox.
result, data = mail.uid('search', None, "ALL") # search and return uids instead
ids = data[0] # data is a list.
id_list = ids.split() # ids is a space separated string
latest_email_uid = data[0].split()[-1]
result, data = mail.uid('fetch', latest_email_uid, '(RFC822)') # fetch the email headers and body (RFC822) for the given ID
raw_email = data[0][1] # here's the body, which is raw headers and html and body of the whole email
# including headers and alternate payloads
print "---------------------------------------------------------"
print "Are there links in the email?"
print "---------------------------------------------------------"
msg = email.message_from_string(raw_email)
for part in msg.walk():
# each part is a either non-multipart, or another multipart message
# that contains further parts... Message is organized like a tree
if part.get_content_type() == 'text/plain':
plain_text = part.get_payload()
print plain_text # prints the raw text
if any(word in plain_text for word in word):
print '****'
print 'found link in email body'
print '****'
else:
print '****'
print 'no link in email body'
print '****'
So basically as you can see I have a variable called 'Word' which contains an array of keywords to search for in the plain text email.
When I send a test email with an embedded link that is in the format of 'http://' or 'https://' - the email prints out the email body with the link in the text like this -
---------------------------------------------------------
Are there links in the email?
---------------------------------------------------------
Test Link <http://www.google.com/>
****
found link in email body
****
And I get my print message saying 'found link in email body' - which is the result I am looking for in my test phase, yet this will lead onto something else to happen within the final program.
Yet, if I add an embedded link in the email with no http:// such as google.com then the link doesn't print out and I don't get the result, even though I have an embedded link.
Is there a reason for this? I'm also suspecting maybe my if any() loops is not really the best. I didn't really understand it when I originally added it but it worked for http:// links. Then I tried just a .com and got my problem which I am having trouble finding a solution for.

To check if there are attachments to an e-mail you can search the headers for Content-Type and see if it says "multipart/*". E-mails with multipart content types may contain attachments.
To inspect the text for links, images, etc, you can try using Regular Expressions. As a matter of fact, this is probably your best option in my opinion. With regex (or Regular Expressions) you can find strings that match a given pattern. The pattern "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>", for example, should match all links in your email message regardless of whether they are a single word or a full URL. I hope that helps!
Here's an example of how you can implement this in Python:
import re
text = "This is your e-mail body. It contains a link to <a
href='http//www.google.com'>Google</a>."
link_pattern = re.compile('<a[^>]+href=\'(.*?)\'[^>]*>(.*)?</a>')
search = link_pattern.search(text)
if search is not None:
print("Link found! -> " + search.group(0))
else:
print("No links were found.")
For the "end-user" the link will just appear as "Google", without www and much less http(s)... However, the source code will have the html wrapping it, so by inspecting the raw body of the message you can find all links.
My code is not perfect but I hope it gives you a general direction... You can have multiple patterns looked up in your e-mail body text, for image occurences, videos, etc. To learn Regular Expressions you'll need to research a little, here's another link, to Wikipedia

How to extract user information from github using their email address?

If I have the email address of some users and I want to extract their information from GitHub account. how I can do this using Python.I found this (https://help.github.com/articles/searching-users/) but how it can help me in extracting user information.

You can either lookup how to web scrape information or try using the API for user emails.

I know that retrieving user data using their e-mail is possible using the GitHub API, which should return a JSON item with the user's information. I believe most of the tutorials for using the API use Ruby, though I see no reason why the same general principles wouldn't carry over to Python.
Otherwise, if you choose to use a web scraper instead, I'd recommend using BeautifulSoup.

you can try this. This ain't exactly the solution you might want , but this will work for you. In this code we have the Username not the Email-Id as input. The API is given in the code . But then to connect you need access token ( Similar to password ) . So you can create your own Personal Token . Here is the link to it :- https://github.com/settings/tokens . So now you have all the details and then you can play around with loops and all stuffs and extract whatever information you want.
P.S. :- If this solution doesn't meet your requirements , you can follow this link :- https://developer.github.com/v3/users/emails/ and do some changes accordingly in API
import urllib
import json
serviceurl = 'https://api.github.com/users/'
while True:
user = raw_input('Enter user_name : ')
if len(user) < 1 : break
serviceurl += user +'?'
access_token = "f6f02691c1d45293156ac5a2b7b324ed4fb9d2b4"
url = serviceurl + urllib.urlencode({'access_token': access_token})
print 'Retrieving', url
uh = urllib.urlopen(url)
data = uh.read()
#print data
js = json.loads(str(data))
print json.dumps(js, indent=4)
"""for i in js:
print i
print js["email"]"""

How do I log into Google through python requests?

I'm making an API using Python requests, and HTTP GET is working fine, but I'm having a little bit of trouble with HTTP POST. So, as with a lot of websites, you can read information, but in order to make a post request (such as following a user, or writing a post), you need to have an authenticated session. THIS website, uses google to log in. Normally, I would just pass the username:password into the POST request formdata to log in, but this google thing is pretty wonky (and quite frankly I'm not that experienced). Does anyone have a reference or an example to help me out? ;/

I do not know about python requests but to send an email its as easy as this
import yagmail
yagmail.SMTP(emailh).send(email, subject, body)
#emailh = your email (just username no #gmail.com)
#email = send to (full email including domain ***#gmail.com or ***#outlook.com)
#subject = subject of the message
#body = body of the message
Even better
emailh = raw_input('Your email: ')
email = raw_input('Send to: ')
subject = raw_input('Subject: ')
body = raw_input('Body: ')
yagmail.SMTP(emailh).send(email, subject, body)
print('Email Sent.')
If this is what you are talking about anyway.
This page might be useful link

Sending a Html file via python

I have a test.html file that I want to send via email(I am refering about the page content). Is there a way for getting the information from the html and sending it as a email? If you have any other ideas please share.

Here's a quick and dirty script I just wrote which might be what you're looking for.
https://gist.github.com/1790847
"""
this is a quick and dirty script to send HTML email - emphasis on dirty :)
python emailpage.py http://www.sente.cc
made to answer: http://stackoverflow.com/questions/9226719/sending-a-html-file-via-python
Stuart Powers
"""
import lxml.html
import smtplib
import sys
import os
page = sys.argv[1] #the webpage to send
root = lxml.html.parse(page).getroot()
root.make_links_absolute()
content = lxml.html.tostring(root)
message = """From: Stuart Powers <stuart.powers#gmail.com>
To: Stuart Powers <stuart.powers#gmail.com>
MIME-Version: 1.0
Content-type: text/html
Subject: %s
%s""" %(page, content)
smtpserver = smtplib.SMTP("smtp.gmail.com",587)
smtpserver.starttls()
smtpserver.login("stuart.powers#gmail.com",os.environ["GPASS"])
smtpserver.sendmail('stuart.powers#gmail.com', ['stuart.powers#gmail.com'], message)

There are many ways of reading files in python and there are also ways to send emails in python. Why don't you look up the documentation and come back with some coding error ?
Sending emails in python: http://docs.python.org/library/email-examples.html
Reading files in python: http://docs.python.org/tutorial/inputoutput.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting URL from email inbox - python

Related

How to send hyperlink with SendGrid using Python

Finding links in an emails body with Python

How to extract user information from github using their email address?

How do I log into Google through python requests?

Sending a Html file via python

Categories

Resources