pd.read_excel throws PermissionError if file is open in Excel - python

Whenever I have the file open in Excel and run the code, I get the following error which is surprising because I thought read_excel should be a read only operation and would not require the file to be unlocked?
Traceback (most recent call last):
File "C:\Users\Public\a.py", line 53, in <module>
main()
File "C:\Users\Public\workspace\a.py", line 47, in main
blend = plStream(rootDir);
File "C:\Users\Public\workspace\a.py", line 20, in plStream
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\pandas\io\excel.py", line 206, in __init__
self.book = xlrd.open_workbook(io)
File "C:\Users\Public\Continuum\Anaconda35\lib\site-packages\xlrd\__init__.py", line 394, in open_workbook
f = open(filename, "rb")
PermissionError: [Errno 13] Permission denied: '<Path to File>'

Generally Excel have a lot of restrictions when opening files (can't open the same file twice, can't open 2 different files with the same name ..etc).
I don't have excel on machine to test, but checking the docs for read_excel I've noticed that it allows you to set the engine.
from the stack trace you posted it seems like the error is thrown by xlrd which is the default engine used by pandas.
try using any of the other ones
Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”, default “xlrd”.
so try with the rest, like
df = pd.read_excel(fPath, sheetname="linear strategy", index_col="date", parse_dates=True, engine="openpyxl")
I know this is not a real answer, but you might want to submit a bug report to pandas or xlrd teams.

As a workaround I suggest making python create a copy of the original file then read from the copy. After that the code should delete the copied file. It's a bit of extra work but should work.
Example
import shutil
shutil.copy("C://Test//Test.xlsx", "C://Test//koko.xlsx")

I would suggest using the xlwings module instead which allows for greater functionality.
Firstly, you will need to load your workbook using the following line:
If the spreadsheet is in the same folder as your python script:
import xlwings as xw
workbook = xw.Book('myfile.xls')
Alternatively:
workbook = xw.Book('"C:\Users\...\myfile.xls')
Then, you can create your Pandas DataFrame, by specifying the sheet within your spreadsheet and the cell where your dataset begins:
df = workbook.sheets[0].range('A1').options(pd.DataFrame,
header=1,
index=False,
expand='table').value
When specifying a sheet you can either specify a sheet by its name or by its location (i.e. first, second etc.) in the following way:
workbook.sheets[0] or workbook.sheets['sheet_name']
Lastly, you can simply install the xlwings module by using Pip install xlwings

Mostly there is no issues in your code. [ If you publish the code it will be easier.]
You need to change the permissions of the directory you are using so that all users have read and write permissions.

I got this to work by first setting the working directory, then opening the file. Maybe something to do with shared drive permissions and read_excel function.
import os
import pandas as pd
os.chdir("c:\\Users\\...\\")
filepath = "...\\filename.xlsx"
sheetname = 'sheet1'
df_xls = pd.read_excel(filepath, sheet_name=sheetname, engine='openpyxl')

I fix this error simply closing the .xlsx file that was open.

You can set engine = 'xlrd', then you can run the code while Excel has the file open.
df = pd.read_excel(filename, sheetname, engine = 'xlrd')
You may need to pip install xlrd if you don't have it

You may also want to check if the file has a password? Alternatively you can open the file with the password required using the code below:
import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = <-- enter your own filename and password
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets([insert number here]) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1

You can set engine='python' then you can run it even if the file is open
df = pd.read_excel(filename, engine = 'python')

Related

Error open file after saving it with storeFile of pysmb

I am reading an Excel file (.xlsx) with pysmb.
import tempfile
from smb.SMBConnection import SMBConnection
conn = SMBConnection(userID, password, client_machine_name, server_name, use_ntlm_v2 = True)
conn.connect(server_ip, 139)
file_obj = tempfile.TemporaryFile()
file_attributes, filesize = conn.retrieveFile(service_name, test.xlsx, file_obj)
This step works, I am able to transform the file in pandas.DataFrame
import pandas as pd
pd.read_excel(file_obj)
Next, I want to save the file, the file is saved but if I want to open it with Excel, I have an error message "Excel has run into an error"
Here the code to save the file
conn.storeFile(service_name, 'test_save.xlsx', file_obj)
file_obj.close()
How can I save correctly the file and open it with excel ?
Thank you
I tried with a .txt file file and it is working. An error occurs with .xlsx, .xls and .pdf files. I have also tried without extension, same issue, imossible to open the file.
I would like to save the file with .pdf and .xlsx extension, and open it.
Thank you.
I found a solution an I will post it here in case someone face a similar issue.
Excel can be save as a binary stream.
from io import BytesIO
df = pd.read_excel(file_obj)
output = BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')
df.to_excel(writer, sheet_name='data', index = False)
writer.save()
output.seek(0)
conn.storeFile(service_name, 'test_save.xlsx', output)

python xlwings I want to open a file as read only using the caller when opening the file normally has a dialog option for read only or password

When I call a file to be used in code. THat file has been read only protected and requires a password in order to write. I want to just read it so xlwings to open the file and pass by the read only dialog by hitting read only
import xlwings as xw
import pandas as pd
def main():
wb = xw.Book.caller()
ws = wb.sheets["Engine"]
wbDiners = xw.Book(ws["Diners"].value)
dfRDiners = wbDiners.sheets["Invoice-Retail-Email"]["C1:H1000"].options(pd.DataFrame, index=False, header=False).value
xw.Book has a read_only parameter. Set it to True to get rid of the dialog:
wbDiners = xw.Book(ws["Diners"].value, read_only=True)
See also: https://docs.xlwings.org/en/stable/api.html#xlwings.Book

Save an excel file from a dataframe pandas to sharepoint (office365 API)

I have this dataframe, and I want to save it as a excel file in a sharepoint folder.
This is my code:
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
# auth
client_credentials = ClientCredential(var_client_id, var_client_secret)
ctx = ClientContext(var_sp_site).with_credentials(client_credentials)
df = pd.DataFrame(sql_table)
var_relative_url = "sharepoint_path/sharepoint_path"
target_folder = ctx.web.get_folder_by_server_relative_url(var_relative_url)
target_folder.upload_file(content=df.to_excel(excel_writer='teste.xlsx'), file_name='teste.xlsx').execute_query() # Here is my problem
When I execute this code, the excel file is created at the folder, but when I try to open the file on sharepoint interface it raises a error ("cannot be opened").
This code will run on a cloud function, so I can't use local files to upload.
I'm investigating this issue right now. Not solved yet buy I can give you a work around: use .save()
wb = pd.ExcelWriter( outputFile, mode='w', engine="openpyxl" )
myDataFrame.to_excel( wb, sheet_name='sheet1', index=False )
wb.save()
From error to warning ;)

Pandas, Python - Problem with converting xlsx to csv

I found to have problem with conversion of .xlsx file to .csv using pandas library.
Here is the code:
import pandas as pd
# If pandas is not installed: pip install pandas
class Program:
def __init__(self):
# file = input("Insert file name (without extension): ")
file = "Daty"
self.namexlsx = "D:\\" + file + ".xlsx"
self.namecsv = "D:\\" + file + ".csv"
Program.export(self.namexlsx, self.namecsv)
def export(namexlsx, namecsv):
try:
read_file = pd.read_excel(namexlsx, sheet_name='Sheet1', index_col=0)
read_file.to_csv(namecsv, index=False, sep=',')
print("Conversion to .csv file has been successful.")
except FileNotFoundError:
print("File not found, check file name again.")
print("Conversion to .csv file has failed.")
Program()
After running the code the console shows the ValueError: File is not a recognized excel file error
File i have in that directory is "Daty.xlsx". Tried couple of thigns like looking up to documentation and other examples around internet but most had similar code.
Edit&Update
What i intend afterwards is use the created csv file for conversion to .db file. So in the end the line of import will go .xlsx -> .csv -> .db. The idea of such program came as a training, but i cant get past point described above.
You can use like this-
import pandas as pd
data_xls = pd.read_excel('excelfile.xlsx', 'Sheet1', index_col=None)
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
I checked the xlsx itself, and apparently for some reason it was corrupted with columns in initial file being merged into one column. After opening and correcting the cells in the file everything runs smoothly.
Thank you for your time and apologise for inconvenience.

Unprotect an Excel file programmatically

We're getting an Excel file from a client that has open protection and Write Reserve protection turned on. I want to remove the protection so I can open the Excel file with the python xlrd module. I've installed the pywin32 package to access the Excel file through COM, and I can open it with my program supplying the two passwords, save, and close the file with no errors. I'm using Unprotect commands as described in MSDN network, and they're not failing, but they're also not removing the protection. The saved file still requires two passwords to open it after my program is done. Here's what I have so far:
import os, sys
impdir = "\\\\xxx.x.xx.x\\allshare\\IT\\NewBusiness\\Python_Dev\\import\\"
sys.path.append(impdir)
from UsefulFunctions import *
import win32com.client
wkgdir = pjoin(nbShare, 'NorthLake\\_testing')
filename = getFilename(wkgdir, '*Collections*.xls*')
xcl = win32com.client.Dispatch('Excel.Application')
xcl.visible = True
pw_str = raw_input("Enter password: ")
try:
wb = xcl.workbooks.open(filename, 0, False, None, pw_str, pw_str)
except Exception as e:
print "Error:", str(e)
sys.exit()
wb.Unprotect(pw_str)
wb.UnprotectSharing(pw_str)
wb.Save()
xcl.Quit()
Can anyone provide me the correct syntax for unprotect commands that will work?
This function works for me
def Remove_password_xlsx(filename, pw_str):
xcl = win32com.client.Dispatch("Excel.Application")
wb = xcl.Workbooks.Open(filename, False, False, None, pw_str)
xcl.DisplayAlerts = False
wb.SaveAs(filename, None, '', '')
xcl.Quit()
This post helped me a lot. I thought I would post what I used for my solution in case it may help someone else. Just Unprotect, DisaplyAlerts=False, and Save. Made it easy for me and the file is overwritten with a usable unprotected file.
import os, sys
import win32com.client
def unprotect_xlsx(filename):
xcl = win32com.client.Dispatch('Excel.Application')
pw_str = '12345'
wb = xcl.workbooks.open(filename)
wb.Unprotect(pw_str)
wb.UnprotectSharing(pw_str)
xcl.DisplayAlerts = False
wb.Save()
xcl.Quit()
if __name__ == '__main__':
filename = 'test.xlsx'
unprotect_xlsx(filename)
you can unprotect excel file sheets with python openpyxl module without knowing the password:
from openpyxl import load_workbook
sample = load_workbook(filename="sample.xlsx")
for sheet in sample: sheet.protection.disable()
sample.save(filename="sample.xlsx")
sample.close()
where parameter "filename" is the path of your excel file which in here i have used local dir path.
if you are on MacOS (or maybe Linux? not tested)
You have to install Microsoft Excel and xlwings
pip install xlwings
Then run this:
import pandas as pd
import xlwings as xw
def _process(filename):
wb = xw.Book(filename)
sheet = wb.sheets[0]
df = sheet.used_range.options(pd.DataFrame, index=False, header=True).value
wb.close()
return df
Resources:
Adapted from this script:
https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/
xlwings documentation: https://docs.xlwings.org/en/stable/api.html
The suggestion from #Tim Williams worked. (Use SaveAs and pass empty strings for the Password and WriteResPassword parameters.) I used 'None' for the 'format' parameter after filename, and I used a new filename to keep Excel from prompting me asking if OK to overwrite the existing file. I also found that I did not need the wb.Unprotect and wb.UnprotectSharing calls using this approach.
Hey I tried the solution provided by #Enoch Sit
def Remove_password_xlsx(filename, pw_str):
xcl = win32com.client.Dispatch("Excel.Application")
wb = xcl.Workbooks.Open(filename, False, False, None, pw_str)
xcl.DisplayAlerts = False
wb.SaveAs(filename, None, '', '')
xcl.Quit()
but got the error NameError: name 'pw_str' is not defined
:'(

Categories

Resources