Python - download excel file from email attachment then parse it - python

EDIT - UPDATE
I have created a horrible hack that opens the excel file then saves it down with the same filename before then opening the excel file into pandas. This is really horrible but I can't see any other way to solve the problem as attachment.SaveFileAs creates and endian problem.
I have the following code that finds an email in my outlook then downloads the excel file to a directory. There is a problem when I try and open the file to parse it and use it for another part in my script it comes up with a formatting error.
I know this is caused from the way Python saves it down as when I do it manually it works fine.
Any help greatly appreciated.
from win32com.client import Dispatch
import email
import datetime as date
import pandas as pd
import os
outlook = Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder("6")
all_inbox = inbox.Items
val_date = date.date.today()
sub_today = 'Hi'
att_today = 'Net - Regional.xls'
## loop through inbox attachments
for msg in all_inbox:
yourstring = msg.Subject.encode('ascii', 'ignore').decode('ascii')
if(yourstring.find('Regional Reporting Week') != -1):
break
## get attachments
for att in msg.Attachments:
if att.FileName == att_today:
attachments = msg.Attachments
break
attachment = attachments.Item(1)
fn = os.getcwd() + '\\' + att_today
attachment.SaveASFile(fn)
# terrible hack but workable in the short term
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.DisplayAlerts = False
excel.Visible = True
wb = excel.Workbooks.Open(fn)
wb.SaveAs(fn)
wb.Close(True)
xl = pd.ExcelFile(fn)
data_df = xl.parse("RawData - Global")
print(data_df)

What is the file name string of att_today? Is it using the appropriate extension?
You're saving it as a ".xls" file. Could it possibly be a ".xlsx" extension?
Asides from the ".SaveAsFile()" method, you may want to look into ".ExtractFile" or "WriteToFile".
Lastly, even if Python may be saving it differently from how you manually saved it, you could still possibly use some 3rd-Party Excel packages to read the file properly before re-writing it for manual opening / viewing.
For ".xls" extensions, I would recommend XLRD.
For ".xlsx" extensions, I would recommend OpenPyxl.

Related

xlwings book open excel workbook but change the file name to lowercase

I used xlwings to open excel workbook. It worked fine up to last month. But today, when I run the same code, it opened my worksheet but convert my worksheet name into lowercase.
Anybody know why is that? And how can I keep my original captalization?
I am using windows 10.
Example, when I run below code, the ABC.xlsx automatically converted abc.xlsx by xlwings.
import xlwings as xw
fullPath = ''\\\\xxx\\xxx\\ABC.xlsx'
psw = '123'
wb = xw.Book(fullPath, password = psw)
I had the same issue and I am not sure why it does that (I think it only happens to protected workbooks). An easy fix is renaming the file again as follows:
import os
old_file_name = os.path.split(fullPath)[0] +'\\' + os.path.split(fullPath)[1].lower()
new_file_name = fullPath
os.rename(old_file_name, new_file_name)
This code assumes that your path and file are saved under fullPath.

Objects of type 'WindowsPath' can not be converted to a COM VARIANT

I have an xlsx file which is a template for a receipt. It contains images and cells. I used to go into the file manually, update the information and then export to pdf before sending to my clients. I would like to be able to convert an xlsx to pdf through python if possible.
My problem is no one shows a tutorial which just chooses a xlsx file and changes it to pdf. Or no decent video tutorial.
I've tried getting openpyxl to save it as an extension with .pdf but i know that was a long shot. And i tried to follow an example on stack overflow but it didnt work that well.
I keep getting :
File "<COMObject <unknown>>", line 5, in ExportAsFixedFormat
Objects of type 'WindowsPath' can not be converted to a COM VARIANT
and I'm pretty stuck.
#this file will open a wb and save it as another file name
#this first part opens a file from a location and makes a copy to another location
from pathlib import Path
from win32com import client
#sets filename and file
file_name = 'After Summer Bookings.xlsx'
dir_path = Path('C:/Users/BOTTL/Desktop/Business')
new_file_name = 'hello.pdf'
new_save_place = Path('C:/Users/BOTTL/Desktop/Business Python')
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open(dir_path / file_name)
ws = books.Worksheets[0]
ws.Visible = 1
ws.ExportAsFixedFormat(0, new_save_place / new_file_name)
I'd like it to open the xlsx file I have called After Summer Bookings.xlsx and save it as a pdf file called hello.pdf
Solved it myself :)
from pathlib import Path
from win32com import client
#sets filename and file
file_name = 'After Summer Bookings.xlsx'
dir_path = Path('C:/Users/BOTTL/Desktop/Business')
new_file_name = 'hello.pdf'
new_save_place = ('C:/Users/BOTTL/Desktop/Business Python/')
path_and_place = new_save_place + new_file_name
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open(dir_path / file_name)
ws = books.Worksheets[0]
ws.Visible = 1
ws.ExportAsFixedFormat(0,path_and_place)
when concatenating the location and the filename it didn't like that I had made it a path, so now that I removed path, it works like a dream :)

Using python to extract an excel table and pasting as an image on an outlook email and sending it out

I need to write a code to tell Python to look into an excel sheet, extract a table and paste it as an image on an email. Thus far, I have managed to tell Python to write web content onto that excel file from a website and even send out an email. The gap is really lies in the step described in the first sentence above. The whole idea of this project is really not needing to open the excel file at all and running everything via Python.
Been spending quite abit of time on this but the best result so far is only one where Python can print cell data from excel but no the entire table as an image.
Really appreciate some help here. Thank you and cheers.
Code attached:
import requests
response = requests.get("")
txt=response.text
lines = txt.split('\n')
print(lines[25])
from openpyxl import load_workbook
wb = load_workbook(filename = 'abc.xlsm', read_only=False, keep_vba=True)
ws1 = wb['Sheet1']
ws1['A2'].value = (lines[25])
wb.save('abc.xlsm')
# Up to this point I have managed to get Python to extract web content and write into an existing excel file.
import win32com.client
olMailItem = 0x0
obj = win32com.client.Dispatch("Outlook.Application")
newMail = obj.CreateItem(olMailItem)
newMail.Subject = "My Subject"
newMail.Body = "My Body"
newMail.To = "..."
newMail.Send()
#This sends out the email. This tries to get Python to print out the entire sheet but it comes out quite jumbled:
import xlrd
book = xlrd.open_workbook('C:/Users/adriel.cheng/Desktop/Inspection Sales_v8.xlsm')
print (book.nsheets)
print (book.sheet_names())
#An alternative approac but it also comes out jumbled:
import pandas as pd
xl = pd.ExcelFile('C:/Users/adriel.cheng/Desktop/Inspection Sales_v8.xlsm')
xl.sheet_names
df = xl.parse("Inspections sheet")
print (df)

Saving attachments from outlook, error when loading with pandas/xlrd

I have this script, which has previously worked for other emails, to download attachments:
import win32com.client as win
import xlrd
outlook = win.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder("6")
all_inbox = inbox.Items
subject = 'Email w/Attachment'
attachment1 = 'Attachment - 20160715.xls'
for msg in all_inbox:
if msg.subject == subject:
break
for att in msg.Attachments:
if att.FileName == attachment1:
break
att.SaveAsFile('L:\\My Documents\\Desktop\\' + attachment1)
workbook = xlrd.open_workbook('L:\\My Documents\\Desktop\\' + attachment1)
However, when I try and open the file using xlrd reader (or with pandas)I get this:
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\r\nVisit '
Can anyone explain what's gone wrong here?
Is there a way I can open the attachment, without saving it, and just copy a worksheet and save that copy as a .csv file instead?
Thank you
take a look at this question. It's possible the file you are trying to download is not a true excel file, but a csv saved as an .xls file. The evidence is the error message Expected BOF record; found b'\r\nVisit '. I think an excel file would start with <?xml or something to that effect. You could get around it with a try/catch:
import pandas as pd
try: #try to read it as a .xls file
workbook = xlrd.open_workbook(path)
except XLRDError: #if fails, read as csv
workbook = pd.read_csv(path)

Password Protecting Excel file using Python

I havent found much of the topic of creating a password protected Excel file using Python.
In Openpyxl, I did find a SheetProtection module using:
from openpyxl.worksheet import SheetProtection
However, the problem is I'm not sure how to use it. It's not an attribute of Workbook or Worksheet so I can't just do this:
wb = Workbook()
ws = wb.worksheets[0]
ws_encrypted = ws.SheetProtection()
ws_encrypted.password = 'test'
...
Does anyone know if such a request is even possible with Python? Thanks!
Here's a workaround I use. It generates a VBS script and calls it from within your python script.
def set_password(excel_file_path, pw):
from pathlib import Path
excel_file_path = Path(excel_file_path)
vbs_script = \
f"""' Save with password required upon opening
Set excel_object = CreateObject("Excel.Application")
Set workbook = excel_object.Workbooks.Open("{excel_file_path}")
excel_object.DisplayAlerts = False
excel_object.Visible = False
workbook.SaveAs "{excel_file_path}",, "{pw}"
excel_object.Application.Quit
"""
# write
vbs_script_path = excel_file_path.parent.joinpath("set_pw.vbs")
with open(vbs_script_path, "w") as file:
file.write(vbs_script)
#execute
subprocess.call(['cscript.exe', str(vbs_script_path)])
# remove
vbs_script_path.unlink()
return None
Looking at the docs for openpyxl, I noticed there is indeed a openpyxl.worksheet.SheetProtection class. However, it seems to be already part of a worksheet object:
>>> wb = Workbook()
>>> ws = wb.worksheets[0]
>>> ws.protection
<openpyxl.worksheet.protection.SheetProtection object at 0xM3M0RY>
Checking dir(ws.protection) shows there is a method set_password that when called with a string argument does indeed seem to set a protected flag.
>>> ws.protection.set_password('test')
>>> wb.save('random.xlsx')
I opened random.xlsx in LibreOffice and the sheet was indeed protected. However, I only needed to toggle an option to turn off protection, and not enter any password, so I might be doing it wrong still...
You can use python win32com to save an excel file with a password.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
#Before saving the file set DisplayAlerts to False to suppress the warning dialog:
excel.DisplayAlerts = False
wb = excel.Workbooks.Open(your_file_name)
# refer https://learn.microsoft.com/en-us/previous-versions/office/developer/office-2007/bb214129(v=office.12)?redirectedfrom=MSDN
# FileFormat = 51 is for .xlsx extension
wb.SaveAs(your_file_name, 51, 'your password')
wb.Close()
excel.Application.Quit()
Here is a rework of MichaƂ Zawadzki's solution that doesn't require creating and executing a separate vbs file:
def PassProtect(Path, Pass):
from win32com.client.gencache import EnsureDispatch
xlApp = EnsureDispatch("Excel.Application")
xlwb = xlApp.Workbooks.Open(Path)
xlApp.DisplayAlerts = False
xlwb.Visible = False
xlwb.SaveAs(Path, Password = Pass)
xlwb.Close()
xlApp.Quit()
PassProtect(FullExcelWorkbookPathGoesHere, DesiredPasswordGoesHere)
If you wanted to choose a file name that's in your project's folder, you could also do:
from os.path import abspath
PassProtect(abspath(FileNameInsideProjectFolderGoesHere), DesiredPasswordGoesHere)
openpyxl is unlikely ever to provide workbook encryption. However, you can add this yourself because Excel files (xlsx format version >= 2010) are zip-archives: create a file in openpyxl and add a password to it using standard utilities.

Categories

Resources