I have this dataframe, and I want to save it as a excel file in a sharepoint folder.
This is my code:
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
# auth
client_credentials = ClientCredential(var_client_id, var_client_secret)
ctx = ClientContext(var_sp_site).with_credentials(client_credentials)
df = pd.DataFrame(sql_table)
var_relative_url = "sharepoint_path/sharepoint_path"
target_folder = ctx.web.get_folder_by_server_relative_url(var_relative_url)
target_folder.upload_file(content=df.to_excel(excel_writer='teste.xlsx'), file_name='teste.xlsx').execute_query() # Here is my problem
When I execute this code, the excel file is created at the folder, but when I try to open the file on sharepoint interface it raises a error ("cannot be opened").
This code will run on a cloud function, so I can't use local files to upload.
I'm investigating this issue right now. Not solved yet buy I can give you a work around: use .save()
wb = pd.ExcelWriter( outputFile, mode='w', engine="openpyxl" )
myDataFrame.to_excel( wb, sheet_name='sheet1', index=False )
wb.save()
From error to warning ;)
Related
I am trying to read an excel file from SharePoint to python and I get the following error:
ValueError: Excel file format cannot be determined, you must specify an engine manually
My Code:
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
url_sp = 'https://company.sharepoint.com/teams/TeamE'
username_sp = 'MyUsername'
password_sp = 'MyPassword'
folder_url_sp = '/Shared%20Documents/02%20Team%20IAP/06_Da-An/Data/E/Edate.xlsx?web=1'
#Authentication
ctx_auth = AuthenticationContext(url_sp)
if ctx_auth.acquire_token_for_user(username_sp, password_sp):
ctx = ClientContext(url_sp, ctx_auth)
web = ctx.web
ctx.load(web)
ctx.execute_query()
print('Authentication sucessfull')
else:
print(ctx_auth.get_last_error())
import io
response = File.open_binary(ctx,folder_url_sp)
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0)
data = pd.read_excel(bytes_file_obj,sheet_name = None)
Can it be related to the fact that the Excel file consists of several worksheets?
Can you help me further?
Thanks in advance
Several Sheets should not be a problem. Have you tried specifying an engine in your code like the error message says ?
data = pd.read_excel(bytes_file_obj, sheet_name=None, engine= ... )
Possible options can be found in the documentation of pandas here (scroll down to engine: str, default None). The explanation
If io is not a buffer or path, this must be set to identify io
seem to fit your fit your problem
I am reading an Excel file (.xlsx) with pysmb.
import tempfile
from smb.SMBConnection import SMBConnection
conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)
conn.connect(server_ip, 139)
file_obj = tempfile.TemporaryFile()
file_attributes, filesize = conn.retrieveFile(service_name, test.xlsx, file_obj)
This step works, I am able to transform the file in pandas.DataFrame
import pandas as pd
pd.read_excel(file_obj)
Next, I want to save the file, the file is saved but if I want to open it with Excel, I have an error message "Excel has run into an error"
Here the code to save the file
conn.storeFile(service_name, 'test_save.xlsx', file_obj)
file_obj.close()
How can I save correctly the file and open it with excel ?
Thank you
I tried with a .txt file file and it is working. An error occurs with .xlsx, .xls and .pdf files. I have also tried without extension, same issue, imossible to open the file.
I would like to save the file with .pdf and .xlsx extension, and open it.
Thank you.
I found a solution an I will post it here in case someone face a similar issue.
Excel can be save as a binary stream.
from io import BytesIO
df = pd.read_excel(file_obj)
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index = False)
writer.save()
output.seek(0)
conn.storeFile(service_name, 'test_save.xlsx', output)
I found to have problem with conversion of .xlsx file to .csv using pandas library.
Here is the code:
import pandas as pd
# If pandas is not installed: pip install pandas
class Program:
def __init__(self):
# file = input("Insert file name (without extension): ")
file = "Daty"
self.namexlsx = "D:\\" + file + ".xlsx"
self.namecsv = "D:\\" + file + ".csv"
Program.export(self.namexlsx, self.namecsv)
def export(namexlsx, namecsv):
try:
read_file = pd.read_excel(namexlsx, sheet_name='Sheet1', index_col=0)
read_file.to_csv(namecsv, index=False, sep=',')
print("Conversion to .csv file has been successful.")
except FileNotFoundError:
print("File not found, check file name again.")
print("Conversion to .csv file has failed.")
Program()
After running the code the console shows the ValueError: File is not a recognized excel file error
File i have in that directory is "Daty.xlsx". Tried couple of thigns like looking up to documentation and other examples around internet but most had similar code.
Edit&Update
What i intend afterwards is use the created csv file for conversion to .db file. So in the end the line of import will go .xlsx -> .csv -> .db. The idea of such program came as a training, but i cant get past point described above.
You can use like this-
import pandas as pd
data_xls = pd.read_excel('excelfile.xlsx', 'Sheet1', index_col=None)
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
I checked the xlsx itself, and apparently for some reason it was corrupted with columns in initial file being merged into one column. After opening and correcting the cells in the file everything runs smoothly.
Thank you for your time and apologise for inconvenience.
i have build an app in django to extract data from an mssql server and display the results on a table on a template.
what i want to do now is to export the same sql query results to an excel file. I have used pymssql driver to connect to the db and pysqlalchemy.
This is what i did, but some how excel file wasn't created when the function was call
def download_excel(request):
if "selectdate" in request.POST:
if "selectaccount" in request.POST:
selected_date = request.POST["selectdate"]
selected_acc = request.POST["selectaccount"]
if selected_date==selected_date:
if selected_acc==selected_acc:
convert=datetime.datetime.strptime(selected_date, "%Y-%m-%d").toordinal()
engine=create_engine('mssql+pymssql://username:password#servername /db')
connection = engine.connect()
metadata=MetaData()
fund=Table('gltrxdet',metadata,autoload=True,autoload_with=engine)
rate=Table('gltrx_all',metadata,autoload=True,autoload_with=engine)
stmt=select([fund.columns.account_code,fund.columns.description,fund.columns.nat_balance,fund.columns.rate_type_home,rate.columns.date_applied,rate.columns.date_entered,fund.columns.journal_ctrl_num,rate.columns.journal_ctrl_num])
stmt=stmt.where(and_(rate.columns.journal_ctrl_num==fund.columns.journal_ctrl_num,fund.columns.account_code==selected_acc,rate.columns.date_entered==convert))
df = pd.read_sql(stmt,connection)
writer = pd.ExcelWriter('C:\excel\export.xls')
df.to_excel(writer, sheet_name ='bar')
writer.save()
my code actually worked. I thought it was going to save the excel file to 'C:\excel' folder so i was looking for the file in the folder but i couldn't find the excel file. The excel file was actually exported to my django project folder instead.
How to i allow the end user to be able to download the file to their desktop instead of exporting it to the server itself
I manage to get it to work with much time spend research. This code will export sql query to excel file which will allow end user to download the excel file
import pandas as pd
from django.http import HttpResponse
try:
from io import BytesIO as IO # for modern python
except ImportError:
from StringIO import StringIO as IO # for legacy python
def download_excel(request):
if "selectdate" in request.POST:
if "selectaccount" in request.POST:
selected_date = request.POST["selectdate"]
selected_acc = request.POST["selectaccount"]
if selected_date==selected_date:
if selected_acc==selected_acc:
convert=datetime.datetime.strptime(selected_date, "%Y-%m-%d").toordinal()
engine=create_engine('mssql+pymssql://username:password#servername /db')
metadata=MetaData(connection)
fund=Table('gltrxdet',metadata,autoload=True,autoload_with=engine)
rate=Table('gltrx_all',metadata,autoload=True,autoload_with=engine)
stmt=select([fund.columns.account_code,fund.columns.description,fund.columns.nat_balance,rate.columns.date_applied,fund.columns.journal_ctrl_num,rate.columns.journal_ctrl_num])
stmt=stmt.where(and_(rate.columns.journal_ctrl_num==fund.columns.journal_ctrl_num,fund.columns.account_code==selected_acc,rate.columns.date_applied==convert))
results=connection.execute(stmt)
sio = StringIO()
df = pd.DataFrame(data=list(results), columns=results.keys())
####dowload excel file##########
excel_file = IO()
xlwriter = pd.ExcelWriter(excel_file, engine='xlsxwriter')
df.to_excel(xlwriter, 'sheetname')
xlwriter.save()
xlwriter.close()
excel_file.seek(0)
response = HttpResponse(excel_file.read(), content_type='application/ms-excel vnd.openxmlformats-officedocument.spreadsheetml.sheet')
# set the file name in the Content-Disposition header
response['Content-Disposition'] = 'attachment; filename=myfile.xls'
return response
Whenever I have the file open in Excel and run the code, I get the following error which is surprising because I thought read_excel should be a read only operation and would not require the file to be unlocked?
Traceback (most recent call last):
File "C:\Users\Public\a.py", line 53, in <module>
main()
File "C:\Users\Public\workspace\a.py", line 47, in main
blend = plStream(rootDir);
File "C:\Users\Public\workspace\a.py", line 20, in plStream
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\xlrd\__init__.py", line 394, in open_workbook
f = open(filename, "rb")
PermissionError: [Errno 13] Permission denied: '<Path to File>'
Generally Excel have a lot of restrictions when opening files (can't open the same file twice, can't open 2 different files with the same name ..etc).
I don't have excel on machine to test, but checking the docs for read_excel I've noticed that it allows you to set the engine.
from the stack trace you posted it seems like the error is thrown by xlrd which is the default engine used by pandas.
try using any of the other ones
Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”, default “xlrd”.
so try with the rest, like
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True, engine="openpyxl")
I know this is not a real answer, but you might want to submit a bug report to pandas or xlrd teams.
As a workaround I suggest making python create a copy of the original file then read from the copy. After that the code should delete the copied file. It's a bit of extra work but should work.
Example
import shutil
shutil.copy("C://Test//Test.xlsx", "C://Test//koko.xlsx")
I would suggest using the xlwings module instead which allows for greater functionality.
Firstly, you will need to load your workbook using the following line:
If the spreadsheet is in the same folder as your python script:
import xlwings as xw
workbook = xw.Book('myfile.xls')
Alternatively:
workbook = xw.Book('"C:\Users\...\myfile.xls')
Then, you can create your Pandas DataFrame, by specifying the sheet within your spreadsheet and the cell where your dataset begins:
df = workbook.sheets[0].range('A1').options(pd.DataFrame,
header=1,
index=False,
expand='table').value
When specifying a sheet you can either specify a sheet by its name or by its location (i.e. first, second etc.) in the following way:
workbook.sheets[0] or workbook.sheets['sheet_name']
Lastly, you can simply install the xlwings module by using Pip install xlwings
Mostly there is no issues in your code. [ If you publish the code it will be easier.]
You need to change the permissions of the directory you are using so that all users have read and write permissions.
I got this to work by first setting the working directory, then opening the file. Maybe something to do with shared drive permissions and read_excel function.
import os
import pandas as pd
os.chdir("c:\\Users\\...\\")
filepath = "...\\filename.xlsx"
sheetname = 'sheet1'
df_xls = pd.read_excel(filepath, sheet_name=sheetname, engine='openpyxl')
I fix this error simply closing the .xlsx file that was open.
You can set engine = 'xlrd', then you can run the code while Excel has the file open.
df = pd.read_excel(filename, sheetname, engine = 'xlrd')
You may need to pip install xlrd if you don't have it
You may also want to check if the file has a password? Alternatively you can open the file with the password required using the code below:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = <-- enter your own filename and password
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets([insert number here]) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1
You can set engine='python' then you can run it even if the file is open
df = pd.read_excel(filename, engine = 'python')