I have a couple hundred daily Excel attachments in email that I want to pull appropriate data from and save into a database. I want to avoid saving each attachment to disk only to re-open from disk to read, since I'll never need the files saved to disk ever again. For this project, sure, I could just do it and delete them, but there ought to be a better way.
Here's what I'm doing so far
outlook = Dispatch("Outlook.Application").GetNamespace("MAPI")
folder = outlook.Folders[blah].Folders[blahblah]
for item in folder.items:
for att in item.Attachments:
att.SaveAsFile(???) # This is where I need something cool, like stream or bytes or something that I don't understand
# do something with the file, either read with pandas or openpyxl
If I can get around even doing the save and have pandas / openpyxl read it without saving, that would be great, but neither of them can read the att directly.
Outlook Object Model won't let you do that: Attachment.SaveAsFile only allows to specify a valid file name.
On the Extended MAPI level (C++ or Delphi only), the one and only way to access attachment data (Extended MAPI does not know anything about files) is to open the PR_ATTACH_DATA_BIN MAPI property as IStream interface: IAttach::OpenProperty(PR_ATTACH_DATA_BIN, IID_IStream, ...). You can then retreive the data directly from the IStream interface.
If using Redemption (any language, I am its author) is an option, it exposes RDOAttachment.AsStream / AsArray / AsText properties that allow to access raw attachment data without saving it as file first.
Related
This is the problem I have to face, I'll be as clear as possible.
My clients are sending daily reports of their sales via an excel file on google Drive. I am only a viewer on this file and I have it in my "shared with me" folder in Google Drive. Ideally, I would like to import this file on Python to be able to post-process it and analyze it. Few notes:
I have seen solutions here on StackOverflow that suggest adding a shortcut to Drive in order to import it but I have two problems with that:
After adding it I see a .gsheet file that I do not know how to load in Python
I am not sure that the file would be dynamically updated on daily basis
I can not use gdown since I am only a viewer on that file, unfortunately!
Let me know if you have other ideas/approaches! Thanks
You can list all the files shared with you using the Drive API.
We will need to use the following methods:
Files.list [Drive API] (https://developers.google.com/drive/api/v3/reference/files/list) to list all files you have access to.
You can use the API explorer available in most documentation files and once you have a better grasp on the API behaviour experiment starting with this code sample https://developers.google.com/drive/api/quickstart/python, this Quickstart makes a simple list of files with Python.
I recommend you use the following flow:
Call the Files.list method with the following parameters:
{
"q": "not ('me' in owners or creator = 'me')",
"fields": "nextPageToken,items(fileSize,owners,title,id,mimeType)"
}
This will return only the files you have opened that are shared with you (file you are not owner nor creator). For you to access .gsheet file you will not handle it as a regular file because they are not, instead use the Google Sheets API (https://developers.google.com/sheets/api/reference/rest) to fetch the data inside the Google Sheet file, the same thing is true for Google Docs and Google Slides, each have their respective API you can use to manipulate/access the data in each file.
If you look closely the parameters we are using, q filters the results you will obtain to only list files you don't own but can access, you can also filter files owned by a particular email address; the other parameter fields makes the response you obtain much shorter, since you won't make use of all the properties of a file this parameters provides a more simplifies response that will take less time for the server to process and less bandwidth, adjust the fields parameter if you need more or less data.
Finally, direct your focus to the nextPageToken property in the fields parameter, the API response will be paginated, meaning that you will receive up to a certain amount of files in one response, to retrieve the 'next page' of results just make the same call again but using the nextPageToken you obtained in the response as a new parameter in the request. This is explained in this documentation article https://developers.google.com/calendar/api/guides/pagination.
Note: If you need clarification on how to execute certain actions on a Google Sheet file I recommend you submit a new question since additional tasks with other APIs are outside the scope of this question and will make this response much larger than it needs to be.
I need to save Outlook-Mails with the attachments in the msg-file in Python. Currently working with win32com.client I use: message.SaveAs(path + name) which gives me a nice .msg file, but that does not include attachments (if attachments existent). Attached files are visible using message.Attachments.Count and message.Attachments, but how can I create a .msg-file with the attachments included to store as one file which works when messages are exported straight from Outlook?
how can I create a .msg-file with the attachments included to store as one file which works when messages are exported straight from Outlook?
The Outlook object model doesn't provide anything for that. Potentially, the best that you could do, is save the attached files along with your mail items (msg). Use the Attachment.SaveAsFile method which saves the attachment to the specified path.
I used this code to store attachment xlsx files from a specific address email in Outlook, but now I would like to store these files in a database in SQL Server, not in a folder in my laptop? Do you have any idea about how to store these files directly in a database? Many thanks.
outputDir = r"C:\Users\CMhalla\Desktop\Hellmann_attachment"
i=0
for m in messages:
if m.SenderEmailAddress == 'adress#outlook.com':
body_content=m.Body
for attachment in m.Attachments:
i=i+1
attachment.SaveAsFile(os.path.join(outputDir,attachment.FileName + str(i)+'.xlsx'))
The Oultook object model doesn't provide any property or method for saving attachments to DBs directly. You need to save the file on the disk first and then add it to the Db in any convenient way.
However, you may be interested in reading the bytes array of the attached item in Outlook. In that case you may write the byte array directly to the Db without touching the file system which may slow down the overall performance. The PR_ATTACH_DATA_BIN property contains binary attachment data typically accessed through the Object Linking and Embedding (OLE) IStream interface. This property holds the attachment when the value of the PR_ATTACH_METHOD property is ATTACH_BY_VALUE, which is the usual attachment method and the only one required to be supported.
The Outlook object model cannot retrieve large binary or string MAPI properties using PropertyAccessor.GetProperty. On the low level (Extended MAPI) the IMAPIProp::GetProps() method does not work for the large PT_STING8 / PT_UNICODE / PT_BINARY properties. They must be opened as IStream in the following way - IMAPIProp::OpenProperty(PR_ATTACH_DATA_BIN, IIS_IStream, ...). See PropertyAccessor.GetProperty( PR_ATTACH_DATA_BIN) fails for outlook attachment for more information.
You can use Microsoft Power Automate to save the attachment in the drive and then upload the file to the Python environment.
I'm getting an Image from URL with Pillow, and creating an stream (BytesIO/StringIO).
r = requests.get("http://i.imgur.com/SH9lKxu.jpg")
stream = Image.open(BytesIO(r.content))
Since I want to upload this image using an <input type="file" /> with selenium WebDriver. I can do something like this to upload a file:
self.driver.find_element_by_xpath("//input[#type='file']").send_keys("PATH_TO_IMAGE")
I would like to know If its possible to upload that image from a stream without having to mess with files / file paths... I'm trying to avoid filesystem Read/Write. And do it in-memory or as much with temporary files. I'm also Wondering If that stream could be encoded to Base64, and then uploaded passing the string to the send_keys function you can see above :$
PS: Hope you like the image :P
You seem to be asking multiple questions here.
First, how do you convert a a JPEG without downloading it to a file? You're already doing that, so I don't know what you're asking here.
Next, "And do it in-memory or as much with temporary files." I don't know what this means, but you can do it with temporary files with the tempfile library in the stdlib, and you can do it in-memory too; both are easy.
Next, you want to know how to do a streaming upload with requests. The easy way to do that, as explained in Streaming Uploads, is to "simply provide a file-like object for your body". This can be a tempfile, but it can just as easily be a BytesIO. Since you're already using one in your question, I assume you know how to do this.
(As a side note, I'm not sure why you're using BytesIO(r.content) when requests already gives you a way to use a response object as a file-like object, and even to do it by streaming on demand instead of by waiting until the full content is available, but that isn't relevant here.)
If you want to upload it with selenium instead of requests… well then you do need a temporary file. The whole point of selenium is that it's scripting a web browser. You can't just type a bunch of bytes at your web browser in an upload form, you have to select a file on your filesystem. So selenium needs to fake you selecting a file on your filesystem. This is a perfect job for tempfile.NamedTemporaryFile.
Finally, "I'm also Wondering If that stream could be encoded to Base64".
Sure it can. Since you're just converting the image in-memory, you can just encode it with, e.g., base64.b64encode. Or, if you prefer, you can wrap your BytesIO in a codecs wrapper to base-64 it on the fly. But I'm not sure why you want to do that here.
I am writing a bit of code that will take a CSV file input and perform an operation based on its contents. In the admin panel I am designing, the admin should be able to select a CSV file on their local system which my application will then read. The application does not need to store the CSV file, just read from it for a one-time operation.
Any ideas on how to best handle this in Pyramid?
What you want is essentially a file upload, followed by additional processing on the uploaded data. You can create input elements of type "file" in HTML forms to allow uploading of files.
Refer to the cookbook in the Pyramid documentation on file uploads for how to handle the uploaded data on the server side (summarized: use the file-like object request.POST[ field_name ].file).