How do you read excel files with xlrd on Appengine - python

I am using xlrd in appengine. I use flask
I cant read the input file and it keeps on showing the same error message
the code is
def read_rows(inputfile):
rows = []
wb = xlrd.open_workbook(inputfile)
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
rows.append(sh.row_values(rownum))
return rows
#app.route('/process_input/',methods=['POST','GET'])
def process_input():
inputfile = request.files['file']
rows=read_rows(request.files['file'])
payload = json.dumps(dict(rows=rows))
return payload
I realize that this might be caused by not uploading and saving it as a file. Any workaround on this? This would help many others as well. Any help is appreciated, thx
Update: Found a solution that I posted below. For those confused with using xlrd can refer to the open source project repo I posted. The key is passing the content of the file instead of the filename

Find a solution finally
here's how I do it. Instead of saving the file, I read the content of the file and let xlrd reads it
def read_rows(inputfile):
rows = []
wb = xlrd.open_workbook(file_contents=inputfile.read())
sh = wb.sheet_by_index(0)
for rownum in range(sh.nrows):
rows.append(sh.row_values(rownum))
return rows
worked nicely and turned the excel files into JSON-able formats. If you want to output the json simply use json.dumps().
full code example can be found at https://github.com/cjhendrix/HXLator/blob/master/gae/main.py and it features full implementation of the xlrd and how to work with the data.
Thx for the pointers

Use:
wb = xlrd.open_workbook(file_contents=inputfile)
The way you are invoking open_workbook expects what you're passing in to be a filename, not a Flask FileStorage object wrapping the actual file.

Judge from your traceback.
File "/Users/fauzanerichemmerling/Desktop/GAEHxl/gae/lib/xlrd/init.py", line 941, in biff2_8_load
f = open(filename, open_mode)
You can try changing this line to :
f = filename

Related

Excel python openpyxl methods

Python excel methods doesn't work correctly.
import pandas as pd
import os
from openpyxl import load_workbook
path = "C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\"
spreadsheet_file = pd.read_excel(os.path.join(path, "PlikExcelDoKonwersji.xlsx"), engine='openpyxl', header = 1)
print(spreadsheet_file)
It works perfectly, but if I would like to use methods from openpyxl I have error.
import pandas as pd
import os
from openpyxl import load_workbook
path = "C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\"
spreadsheet_file = pd.read_excel(os.path.join(path, "PlikExcelDoKonwersji.xlsx"), engine='openpyxl', header = 1)
#sheet = spreadsheet_file.sheet_by_name('sheet')
book = load_workbook(path, "PlikExcelDoKonwersji.xlsx")
sheet = book['SendMail1']
data = []
for row in sheet.rows:
print(row[1].value)
Error:
line 94, in _validate_archive
raise InvalidFileException(msg)
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format, please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
Process finished with exit code 1
load_workbook function only takes a single argument for file-path.
try
book = load_workbook(os.path.join(path, "PlikExcelDoKonwersji.xlsx"))
Instead of making the path 2 values, make it 1.
For Example, Try:
book = load_workbook("C:\\My files\\Staff\\xProject\\ProjektExcelPython\\test_files\\PlikExcelDoKonwersji.xlsx")
Thanks guys it works but now another problem occurs.
I would like to start as before without first row, I used header = 1, but now it doesn't work I go for documentation.
def load_workbook(filename, read_only=False, keep_vba=KEEP_VBA,
data_only=False, keep_links=True):
So it isn't there. How you manage it in openpyxl?
I would like to make something like:
Find name in column[1], the same will be in column[2] and get one of data from row[7] using column[1] and column[2].
Thanks for any suggestions.

Flask Uploads reading an xlsx file without saving it

I'd like to upload an excel file in my web app, read the contents of it and display some cells. So basically I don't need to save the file as it's a waste of time.
Relevant code:
if form.validate_on_submit():
f = form.xml_file.data.stream
xml = f.read()
workbook = xlrd.open_workbook(xml)
sheet = workbook.sheet_by_index(0)
I can't wrap my mind around this as I keep getting filetype errors no matter what I try. I'm using Flask Uploads, WTF.file and xlrd for reading the file.
Reading the file works okay if I save it previously with f.save
To answer my own question, I solved it with
if form.validate_on_submit():
# Put the file object(stream) into a var
xls_object = form.xml_file.data.stream
# Open it as a workbook
workbook = xlrd.open_workbook(file_contents=xls_object.read())

TypeError: expected str, bytes or os.PathLike object, not FieldFile

I have these excel sheets which i am getting from the queryset in django rest framework.I converted that to list and want to process the excel file .
Do i need to store it somewhere in my app before reading it or will storing it a variable and reading work fine .
What is the best method to accomplish this ?
What i tried doing is this but it does not seem to be working .
excel_data =list(ExcelFiles.objects.all())
print("excel_data", excel_data)
for item in excel_data:
print("item id ", item.id)
print("item.company is ", item.company)
print("item again", item.plan_type)
print("item.excel is ", item.excelFile)
print("item.status is" ,item.status)
if item.status == False:
if hasattr(item,'excelFile'):
print(item.excelFile)
excel_sheet=item.excelFile
wb = xlrd.open_workbook(excel_sheet)// error occurs here
sheet = wb.sheet_by_index(0)
print("sheet", sheet)
I am using xlrd.
We have no idea what inside the ExcelFiles.objects.all(), neither the error you have. I would say there are two things you can do:
check what item.excelFile is, is it a link? Since you don't want to save it, it's better a link that point to the file.
xlrd should have a function read from a link, but it must be the url content, you could try:
import requests
import xlrd
import urllib
link = 'https://raw.githubusercontent.com/SheetJS/test_files/a9c6bbb161ca45a077779ecbe434d8c5d614ee37/AutoFilter.xls'
file_name, headers = urllib.request.urlretrieve(link)
print (file_name)
workbook = xlrd.open_workbook(file_name)
print (workbook)
Resource: Open an excel from http website using xlrd
FieldFile is a proxy object for accessing the stored file on the server. From the docs, FieldFile.name
The name of the file including the relative path from the root of the Storage of the associated FileField.
So you can simply pass the path to the file:
excel_sheet=item.excelFile
wb = xlrd.open_workbook(excel_sheet.name) # pass the path to the file instead

cells with formulas that created by python openpyxl appears as empty when viewed in mail

i have a weekly report that i need to do, i chooseed to create it with openpyxl python module, and send it via mail, when i open the received mail (outlook), the cells with formulas appears as empty, but when downloading the file and open it, the data appears, OS fedora 20. parts of the code :
# imported modules from openpyxl ...
wb = Workbook()
ws = wb.active
counter = 3
ws.append(row)
for day in data :
row = ['']*(len(hosts)*2 +5)
row[0] = day.dayDate
row[1] ='=SUM(F'+str(counter)+':'+get_column_letter(len(hosts)+5)+str(counter)+\
')/(COUNT(F'+str(counter)+':'+get_column_letter(len(hosts)+5)+str(counter)+'))'
row[2] = '=SUM('+get_column_letter(len(hosts)+6)+str(counter)+':'+\
get_column_letter(len(hosts)*2+5)+str(counter)+')/COUNT('+\
get_column_letter(len(hosts)+6)+str(counter)+':'+\
get_column_letter(len(hosts)*2+5)+str(counter)+')'
row[3] = '=MAX('+get_column_letter(len(hosts)+6)+str(counter)+':'+\
get_column_letter(len(hosts)*2+5)+str(counter)+')'
row[4] = '=_xlfn.STDEV.P('+get_column_letter(len(hosts)+6)+str(counter)\
+':'+get_column_letter(len(hosts)*2+5)+str(counter)+')'
counter += 1
then, i create from the date some charts, etc.. and save, then send via mail :
wb.save(pathToFile+fileName+'.xlsx')
os.system('echo -e "'+msg+'" | mail -s "'+fileName+'" -a '+\
pathToFile+fileName+'.xlsx -r '+myUsr+' '+ppl2send2)
those are parts of the actual code, any one have an idea why the email don't show the results of the formulas in the cells ? Thanks in advance :)
openpyxl doesn't compute a result for formulas inserted into a spreadsheet; if you open the sheet with excel and save it, the result will have values filled in.
opepyxl has a problem with formulas, after you update your excel file you need to open it and save it to get the values generated. There are two ways to solve this problem.
(I won't advice you to use this unless you really want it.)
You can automate the process of opening the file and saving it from python before reading it. You can do this by using the win32com module
import win32com.client
wb.save('PUT YOUR FILE PATH HERE')
ab = win32com.client.Dispatch("Excel.Application")
wb2 = ab.Workbooks.Open('PUT YOUR FILE PATH HERE')
ws = ab.Sheets('PUT THE SHEET NAME HERE')
ab.DisplayAlerts = False
wb2.Save()
wb2.Close()
ab.Application.Quit()
#Now you can read from the file and you can see the values generated from the formula
Or you can use xlwings instead of openpyxl. if you use this module you don't have to worry about saving the excel file. The module will do it for you.
import xlwings
wb= xlwings.Book('PUT YOUR FILE PATH HERE')
ws = wb.sheets[0]
#Do your update here example ws.range(2, 8).value = 34
wb.save()
wb.close()

pd.read_excel throws PermissionError if file is open in Excel

Whenever I have the file open in Excel and run the code, I get the following error which is surprising because I thought read_excel should be a read only operation and would not require the file to be unlocked?
Traceback (most recent call last):
File "C:\Users\Public\a.py", line 53, in <module>
main()
File "C:\Users\Public\workspace\a.py", line 47, in main
blend = plStream(rootDir);
File "C:\Users\Public\workspace\a.py", line 20, in plStream
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\xlrd\__init__.py", line 394, in open_workbook
f = open(filename, "rb")
PermissionError: [Errno 13] Permission denied: '<Path to File>'
Generally Excel have a lot of restrictions when opening files (can't open the same file twice, can't open 2 different files with the same name ..etc).
I don't have excel on machine to test, but checking the docs for read_excel I've noticed that it allows you to set the engine.
from the stack trace you posted it seems like the error is thrown by xlrd which is the default engine used by pandas.
try using any of the other ones
Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”, default “xlrd”.
so try with the rest, like
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True, engine="openpyxl")
I know this is not a real answer, but you might want to submit a bug report to pandas or xlrd teams.
As a workaround I suggest making python create a copy of the original file then read from the copy. After that the code should delete the copied file. It's a bit of extra work but should work.
Example
import shutil
shutil.copy("C://Test//Test.xlsx", "C://Test//koko.xlsx")
I would suggest using the xlwings module instead which allows for greater functionality.
Firstly, you will need to load your workbook using the following line:
If the spreadsheet is in the same folder as your python script:
import xlwings as xw
workbook = xw.Book('myfile.xls')
Alternatively:
workbook = xw.Book('"C:\Users\...\myfile.xls')
Then, you can create your Pandas DataFrame, by specifying the sheet within your spreadsheet and the cell where your dataset begins:
df = workbook.sheets[0].range('A1').options(pd.DataFrame,
header=1,
index=False,
expand='table').value
When specifying a sheet you can either specify a sheet by its name or by its location (i.e. first, second etc.) in the following way:
workbook.sheets[0] or workbook.sheets['sheet_name']
Lastly, you can simply install the xlwings module by using Pip install xlwings
Mostly there is no issues in your code. [ If you publish the code it will be easier.]
You need to change the permissions of the directory you are using so that all users have read and write permissions.
I got this to work by first setting the working directory, then opening the file. Maybe something to do with shared drive permissions and read_excel function.
import os
import pandas as pd
os.chdir("c:\\Users\\...\\")
filepath = "...\\filename.xlsx"
sheetname = 'sheet1'
df_xls = pd.read_excel(filepath, sheet_name=sheetname, engine='openpyxl')
I fix this error simply closing the .xlsx file that was open.
You can set engine = 'xlrd', then you can run the code while Excel has the file open.
df = pd.read_excel(filename, sheetname, engine = 'xlrd')
You may need to pip install xlrd if you don't have it
You may also want to check if the file has a password? Alternatively you can open the file with the password required using the code below:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = <-- enter your own filename and password
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets([insert number here]) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
You can set engine='python' then you can run it even if the file is open
df = pd.read_excel(filename, engine = 'python')

Categories

Resources