I currently have a process of reading from sql, using pandas and pd.Excelwriter to format the data and email it out. I want my function to read from sql (no problem) and write to a blob, then from that blob (using SendGrid binding) attach that file from the blob and send it out.
My question is do I need both an in (attaching for email) and an out (archiving to the blob) binding for that blob? Additionally, is this the simplest way to do this? It's be nice to send it and write to the blob as two unconnected operations instead of sequentially.
It also appears that with the binding, I have to hard code the name of the file in the blob-path? That seems a little ridiculous, does anyone know a workaround, or perhaps I have misunderstood.
do I need both an in (attaching for email) and an out (archiving to
the blob) binding for that blob?
Firstly I don't think you could bind the blob in and out simultaneously if the not existed. If you have tried you will find it will return error. And I suppose you could send the mail directly with the content from sql and write to blob, don't need to read content from blob again.
I have to hard code the name of the file in the blob-path?
If you could accept guid or datetime blob name you could bind the path with {rand-guid} or {DateTime}(you could format the time).
I fyou could not accept this binding, you could pass the blob path from the trigger body with json data like below pic. If you use other like queue trigger, you also could pass the json data with the path value.
Related
This is the problem I have to face, I'll be as clear as possible.
My clients are sending daily reports of their sales via an excel file on google Drive. I am only a viewer on this file and I have it in my "shared with me" folder in Google Drive. Ideally, I would like to import this file on Python to be able to post-process it and analyze it. Few notes:
I have seen solutions here on StackOverflow that suggest adding a shortcut to Drive in order to import it but I have two problems with that:
After adding it I see a .gsheet file that I do not know how to load in Python
I am not sure that the file would be dynamically updated on daily basis
I can not use gdown since I am only a viewer on that file, unfortunately!
Let me know if you have other ideas/approaches! Thanks
You can list all the files shared with you using the Drive API.
We will need to use the following methods:
Files.list [Drive API] (https://developers.google.com/drive/api/v3/reference/files/list) to list all files you have access to.
You can use the API explorer available in most documentation files and once you have a better grasp on the API behaviour experiment starting with this code sample https://developers.google.com/drive/api/quickstart/python, this Quickstart makes a simple list of files with Python.
I recommend you use the following flow:
Call the Files.list method with the following parameters:
{
"q": "not ('me' in owners or creator = 'me')",
"fields": "nextPageToken,items(fileSize,owners,title,id,mimeType)"
}
This will return only the files you have opened that are shared with you (file you are not owner nor creator). For you to access .gsheet file you will not handle it as a regular file because they are not, instead use the Google Sheets API (https://developers.google.com/sheets/api/reference/rest) to fetch the data inside the Google Sheet file, the same thing is true for Google Docs and Google Slides, each have their respective API you can use to manipulate/access the data in each file.
If you look closely the parameters we are using, q filters the results you will obtain to only list files you don't own but can access, you can also filter files owned by a particular email address; the other parameter fields makes the response you obtain much shorter, since you won't make use of all the properties of a file this parameters provides a more simplifies response that will take less time for the server to process and less bandwidth, adjust the fields parameter if you need more or less data.
Finally, direct your focus to the nextPageToken property in the fields parameter, the API response will be paginated, meaning that you will receive up to a certain amount of files in one response, to retrieve the 'next page' of results just make the same call again but using the nextPageToken you obtained in the response as a new parameter in the request. This is explained in this documentation article https://developers.google.com/calendar/api/guides/pagination.
Note: If you need clarification on how to execute certain actions on a Google Sheet file I recommend you submit a new question since additional tasks with other APIs are outside the scope of this question and will make this response much larger than it needs to be.
Is there a way I can extract the following information from an email:
Subject
Sender
Receiver
Timestamp
Search email based on email cc-ed to it. The email is gmail.
I've gone through this article but this is to download attachment.
I only need the above information. Then store in BQ.
How can I achieve this?
So far, this is what I've done:
from airflow.operators import IMAPAttachmentOperator
extract_email = IMAPAttachmentOperator(
imap_conn_id='my_email_conn',
mailbox='inbox',
search_criteria={"CC": "some_email#gmail.com"},
task_id='extract_email_content',
dag=dag)
As far as I can tell there's no ready made operator so you need to build your own. The bright side is that there seems to be enough logic in Airflow that can be reused or used as guidance. For example here in IMapHook is logic for getting emails and you can then filter them in operator. Other option would be to use search, something like IMapHook().mail_client.search(...), reference search docs.
Regarding storing the information in BQ - construct your custom operator to save the required information to BQ using BigQueryHook.insert_rows.
You can use can also use GCSHook.upload to upload the data to GCS and then use GCSToBigQueryOperator
I used this code to store attachment xlsx files from a specific address email in Outlook, but now I would like to store these files in a database in SQL Server, not in a folder in my laptop? Do you have any idea about how to store these files directly in a database? Many thanks.
outputDir = r"C:\Users\CMhalla\Desktop\Hellmann_attachment"
i=0
for m in messages:
if m.SenderEmailAddress == 'adress#outlook.com':
body_content=m.Body
for attachment in m.Attachments:
i=i+1
attachment.SaveAsFile(os.path.join(outputDir,attachment.FileName + str(i)+'.xlsx'))
The Oultook object model doesn't provide any property or method for saving attachments to DBs directly. You need to save the file on the disk first and then add it to the Db in any convenient way.
However, you may be interested in reading the bytes array of the attached item in Outlook. In that case you may write the byte array directly to the Db without touching the file system which may slow down the overall performance. The PR_ATTACH_DATA_BIN property contains binary attachment data typically accessed through the Object Linking and Embedding (OLE) IStream interface. This property holds the attachment when the value of the PR_ATTACH_METHOD property is ATTACH_BY_VALUE, which is the usual attachment method and the only one required to be supported.
The Outlook object model cannot retrieve large binary or string MAPI properties using PropertyAccessor.GetProperty. On the low level (Extended MAPI) the IMAPIProp::GetProps() method does not work for the large PT_STING8 / PT_UNICODE / PT_BINARY properties. They must be opened as IStream in the following way - IMAPIProp::OpenProperty(PR_ATTACH_DATA_BIN, IIS_IStream, ...). See PropertyAccessor.GetProperty( PR_ATTACH_DATA_BIN) fails for outlook attachment for more information.
You can use Microsoft Power Automate to save the attachment in the drive and then upload the file to the Python environment.
I am developing a web application in which users can upload excel files. I know I can use the OPENROWSET function to read data from excel into a SQL Server but I am refraining from doing so because this function requires a file path.
It seems kind of indirect as I am uploading a file to a directory and then telling SQL Server go look in that directory for the file instead of just giving SQL Server the file.
The other option would be to read the Excel file into a pandas dataframe and then use the to_sql function but pandas read_excel function is quite slow and the other method I am sure would be faster.
Which of these two methods is "correct" when handling file uploads from a web application?
If the first method is not frowned upon or "incorrect", then I am almost certain it is faster and will use that. I just want an experienced developers thoughts or opinions. The webapp's backend is Python and flask.
If I am understanding your question correctly, you are trying to load the contents of an xls(s) file into a SQLServer database. This is actually not trivial to do, as depending on what is in the Excel file you might want to have one table, or more probably multiple tables based on the data. So I would step back for a bit and ask three questions:
What is the data I need to save and how should that data be structured in my SQL tables. Forget about excel at this point -- maybe just examine the first row of data and see how you need to save it.
How do I get the file into my web application? For example, when the user uploads a file you would want to use a POST form and send the file data to your server and your server to save that file (for example, either on S3, or in a /tmp folder, or into memory for temporary processing).
Now that you know what your input is (the xls(x) file and its location) and how you need to save your data (the sql schema), now it's time to decide what the best tool for the job is. Pandas is probably not going to be a good tool, unless you literally just want to load the file and dump it as-is with minimal (if any) changes to a single table. At this point I would suggest using something like xlrd if only xls files, or openpyxl for xls and xlsx files. This way you can shape your data any way you want. For example, if the user enters in malformed dates; empty cells (should they default to something?); mismatched types, etc.
In other words, the task you're describing is not trivial at all. It will take quite a bit of planning and designing, and then quite a good deal of python code once you have your design decided. Feel free to ask more questions here for more specific questions if you need to (for example, how to capture the POST data in a file update or whatever you need help with).
The docs are pretty clear about how to create/get a blob, but I can't find any reference to how to modify and save an existing blob.
Is this actually possible given the BlobInfo object?
https://developers.google.com/appengine/docs/python/blobstore/overview#Writing_Files_to_the_Blobstore
You cannot modify an existing blob.
You can use the Files API to read from an existing blob and write to a new blob.
If you don't want to use the Files API to read the existing blob then you can use a BlobReader.
I use http://tsssaver.1conan.com to save blobs. You put in your info, and it sends a request for all signed firmwares.
You do not need to be on the version to save blobs, blobs are just a small file from Apple’s servers that use some of your device’s info and their encryption key to verify a restore.
Assuming you want to save your blobs for your current iOS no longer signed by apple hoping you could future restore it, you are out of luck as you may only be able to save your blob at your end but you can't obtain Apples encryption key to verify and perform the restore for future use.