Extract email content through Airflow

Extract email content through Airflow - python

Is there a way I can extract the following information from an email:
Subject
Sender
Receiver
Timestamp
Search email based on email cc-ed to it. The email is gmail.
I've gone through this article but this is to download attachment.
I only need the above information. Then store in BQ.
How can I achieve this?
So far, this is what I've done:
from airflow.operators import IMAPAttachmentOperator
extract_email = IMAPAttachmentOperator(
imap_conn_id='my_email_conn',
mailbox='inbox',
search_criteria={"CC": "some_email#gmail.com"},
task_id='extract_email_content',
dag=dag)

As far as I can tell there's no ready made operator so you need to build your own. The bright side is that there seems to be enough logic in Airflow that can be reused or used as guidance. For example here in IMapHook is logic for getting emails and you can then filter them in operator. Other option would be to use search, something like IMapHook().mail_client.search(...), reference search docs.
Regarding storing the information in BQ - construct your custom operator to save the required information to BQ using BigQueryHook.insert_rows.
You can use can also use GCSHook.upload to upload the data to GCS and then use GCSToBigQueryOperator

Related

Python: Getting the text of email body using Gmail API

I am trying to extract body of the email from my gmail accounts based off their email ids. I am able to acquire the body, however I am not able to generalize the approach and have to resort to hardcoding.
The following code snippet is modification from here.
if 'parts' in email['payload']:
## path 1
data = email['payload']['parts'][0]['parts'][0]["body"]["data"]
### Only one of the paths will work. Need to find other patterns!
## path 2
data = email['payload']['parts'][0]["body"]["data"]
else:
data = email['payload']['body']['data']
My problem is, so far I have found only two patterns. But I want to generalize the approach to get the body and not rely on hardcoding the paths.
Assumption: The email body is in HTML and not simple text. Thus, sending a simple text will not work, I've tried.
The API documentation for the message structure is found here. I have created a test file containing the different json structures that I can send over if anyone wanna help me in the investigation.
.

Azure Blob Bindings with Azure Function (Python)

I currently have a process of reading from sql, using pandas and pd.Excelwriter to format the data and email it out. I want my function to read from sql (no problem) and write to a blob, then from that blob (using SendGrid binding) attach that file from the blob and send it out.
My question is do I need both an in (attaching for email) and an out (archiving to the blob) binding for that blob? Additionally, is this the simplest way to do this? It's be nice to send it and write to the blob as two unconnected operations instead of sequentially.
It also appears that with the binding, I have to hard code the name of the file in the blob-path? That seems a little ridiculous, does anyone know a workaround, or perhaps I have misunderstood.

do I need both an in (attaching for email) and an out (archiving to
the blob) binding for that blob?
Firstly I don't think you could bind the blob in and out simultaneously if the not existed. If you have tried you will find it will return error. And I suppose you could send the mail directly with the content from sql and write to blob, don't need to read content from blob again.
I have to hard code the name of the file in the blob-path?
If you could accept guid or datetime blob name you could bind the path with {rand-guid} or {DateTime}(you could format the time).
I fyou could not accept this binding, you could pass the blob path from the trigger body with json data like below pic. If you use other like queue trigger, you also could pass the json data with the path value.

How to get all content posted by a Facebook Group using Graph API

I am very new to the Graph API and trying to write a simple python script that first identifies all pages that a user has liked and all groups that he/she is a part of. To do this, I used the following:
To get the groups he has joined:
API: /{user-id}/groups
Permissions req: user_groups
To get the pages he has liked:
API: /{user-id}/likes
Permissions req: user_likes
and
url='https://graph.facebook.com/'+userId+'/likes?access_token='+accessToken +'&limit='+str(limit)
Now that I can see the id's of the groups in the JSON output, I want to hit them one by one and fetch all content (posts, comments, photos etc.) posted within that group. Is this possible and if yes, how can I do it? What API calls do I have to make?

That's quite a broad question, before asking here you should have give a try searching on SO.
Anyways, I'll tell you broadly how can you do it.
First of all go through the official documentation of Graph API: Graph API Reference.
You'll find each and every API which can be used to fetch the data. For example: /group, /page. You'll get to know what kind of access token with what permissions are required for an API call.
Here are some API calls useful to you-
to fetch the group/page's posts- /{group-id/page-id}/posts
to fetch the comments of a post- {post-id}/comments
to fetch the group/page's photos- /{group-id/page-id}/photos
and so on. Once you'll go through the documentation and test some API calls, the things would be much clear. It's quite easy!
Hope it helps. Good luck!

Here's an example using facepy:
from facepy import GraphAPI
import json
graph = GraphAPI(APP_TOKEN)
groupIDs = ("[id here]","[etc]")
outfile_name ="teacher-groups-summary-export-data.csv"
f = csv.writer(open(outfile_name, "wb+"))
for gID in groupIDs:
groupData = graph.get(gID + "/feed", page=True, retry=3, limit=500)
for data in groupData:
json_data=json.dumps(data, indent = 4,cls=DecimalEncoder)
decoded_response = json_data.decode("UTF-8")
data = json.loads(decoded_response)
print "Paging group data..."
for item in data["data"]:
...etc, dealing with items...

Check the API reference. You should use feed.
You can use /{group-id}/feed to get an array of Post objects of the group. Remember to include a user access token for a member of the group.

How can I edit a secondary calendar using google python API

I wonder if it's possible to create and delete events with Google API in a secondary calendar. I know well how to do it in main calendar so I only ask, how to change calendar_service to read and write to other calendar.
I've tried loging with secondary calendar email, but that's not possible with BadAuthentication Error. The URL was surely correct, becouse it was read by API.
Waiting for your help.

A'm answering my own question so I can finally accept this one. The problem has been solved some time ago.
The most important answer is in this documentation.
Each query can be run with uri as argument. For example "InsertEvent(event, uri)". Uri can be set manually (from google calendar settings) or automatically, as written in post below. Note, that CalendarEventQuery takes only username, not the whole url.
The construction of both goes this way:
user = "abcd1234#group.calendar.google.com"
uri = "http://www.google.com/calendar/feeds/{{ user }}/private/full-noattendees"
What's useful, is that you can run queries with different uri and add/delete events to many different calendars in one script.
Hope someone finds it helpful.

I got the same issue but I found this solution (I do not remember where)
This solution is to extract the secondary calendar user from its src url provided by google
This is probably not the better one but it's a working one
Note:the code is extracted from a real project [some part has been removed] and must be adapted to your particular case and is provide as sample just to have a support for explaination (It will not work as is)
# 1 - Connect using the main user email address in a classical way
cal_client = gdata.calendar.service.CalendarService()
# insert here after connection stuff
# 2 - For each existing calendars
feed = cal_client.GetAllCalendarsFeed():
# a loop the find the calendar by it's title (cal_title)
for a_calendar in feed.entry:
if cal_title in a_calendar.title.text:
cal_user = a_calendar.content.src.split('/')[5].replace('%40','#')
# If you print a_calendar.content.src.split you will see that the url
# contains an email like definition. This is the one to used to work
# with the calendar
Then you just have to replace the default user by the cal_user in the api to work on the secondary calendar.
Replace call is required because google api function are doing internal conversion on special characters like '%'
I hope this will help you.

Insert inline image into Lotus Notes message

I've been able to send emails using Lotus Notes and VBA and Python using the COM API like this:
Can I use Lotus Notes to send mail?
My question is how can I insert an image inline with the body text (not as an attachment) in a programmatic way (equivalent to the Edit | Paste Special)? I haven't been able to find any workable solutions from a few Google searches. Any solution using stock VBA or Python would be appreciated.
Thanks!

If you don't need to do anything specific to Notes, i.e. work with a specific form with #functions etc, then you are much better off constructing the message as a multipart mime message.
You need to set up the session so that when you create the document it is mime and you can then set up your message appropriately, see NotesSession.ConvertMIME. You will then use NotesMIMEEntity and NotesMIMEHeader objects to construct the mime message.
If you are unfamiliar with how mime messages are constructed then this is going to be a little tricky, so you may want to have a look at some raw mime messages to see what they look like. From there you should be able to work out how to use the api for the NotesMIMEEntity and NotesMIMEHeader classes to construct the message.

It should be possible to do this using the DXLImporter class, available from VBA through the COM interface. DXL is a Notes-specific XML, which you can generate to a temp file, then import into your database. There is sample code on this blog entry, which may be close to what you are looking for (this imports a rich-text body, including in-line image, and then attaches that rich text to a mail document).
http://www.cubetoon.com/2008/notes-rich-text-manipulation-using-dxl/
Other options you might consider are:
(1) using the C or C++ API's - definitely more effort, especially when working with rich-text, but would essentially have no limits. (http://www.ibm.com/developerworks/lotus/library/capi-nd/index.html)
(2) using the MIDAS Toolkit from Genii (http://www.geniisoft.com) - extends the Lotuscript API's and exposes much of what is in the C API.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract email content through Airflow - python

Related

Python: Getting the text of email body using Gmail API

Azure Blob Bindings with Azure Function (Python)

How to get all content posted by a Facebook Group using Graph API

How can I edit a secondary calendar using google python API

Insert inline image into Lotus Notes message

Categories

Resources