Python; Error with Pandas to_excel() function, Permission Error [win32] - python

I'm reading from an excel (.xlsx) file using pandas read_excel() and trying to write back to the same file using pandas to_excel() function. For some reason with small files (20-30 rows) it works fine but when I put in a larger file (200,000 rows) it gives me a permission error.
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\...\\AppData\\Local\\Temp\\1\\openpyxl._fbk93l5'
I'm assuming the reader somehow still has its hands on the file when it attempts to overwrite back to it but I'm not sure how to resolve this. I make sure to close the file from excel before running the program.
edit:
these are my read and write functions
def readData(excelFilePath):
print("Reading data...\n")
data = pd.read_excel(excelFilePath)
return data
def writeData(data, excelFilePath):
data.to_excel(excelFilePath, index=False)
print("\nData Updated...\nProgram exiting...")
sleep(2)
I read the data, manipulate it then write back to the same file
Any help is appreciated,
Thanks

Related

Pandas exported excel file reads as Zip file format

I have a python script that utilizes pandas to perform some aggregation on a huge dataframe and after doing so It tries to export it as an Excel "xlsx" File format.
this is the last step in the process.
print("Exporting to Excel...")
sum_df = sum_df.set_index('products_code')
with pd.ExcelWriter(OUTPUT_FILE, engine='openpyxl') as writer:
sum_df.to_excel(writer, sheet_name="stocks")
print("Done!")
The file exports normally but whenever I try to upload it to the server, the server rejects it and reads it as a zip file instead of an xlsx file, I found a quick fix for this which is to open the file in Microsoft Excel and just hit save and exit, this seems to fix the issue. But I don't know the reason for this behavior and was looking for help to automatically save it as a valid excel file from the script directly.
Any Ideas?
As discussed in the comments, it seems using xlsxwriter as the engine has solved this issue. E.g.:
print("Exporting to Excel...")
sum_df = sum_df.set_index('products_code')
with pd.ExcelWriter(OUTPUT_FILE, engine='xlsxwriter') as writer:
sum_df.to_excel(writer, sheet_name="stocks")
print("Done!")
It would be good to know what software was used on the server, if possible, in case other people encounter this issue.

Python - trying to import/open incorrectly formatted .xls file

I'm trying to write some Python code which needs to take data from an .xls file created by another application (outside of my control). I've tried using pandas and xlrd and neither are able to open the file, I get the error messages:
"Excel file format cannot be determined, you must specify an engine manually." using Pandas.
"Unsupported format, or corrupt file: Expected BOF record; found b'\r\n\t'" using xlrd
I think it has to do with the way the file is exported from the program that creates it. When opened directly through Excel, I get the error message "The file format and extension don't match". However, you can ignore this message and the file opens in a usable format and can be edited and all of the expected values are in the right cells etc. Interestingly, when I go to save the file in Excel, the default option that comes up is a webpage.
Currently I have a workaround in that I can just open the file in Excel, save it as a .csv then read it into Python as a csv. This does have to be done through Excel through, if I just change the file extension to .csv, the resulting file is garbage.
However, ideally I would like to avoid the user having to do anything manaully. Would be greatly appreciated if anyone has any suggestions of ways that this might be possible (i.e. can I 'open' the file in Excel and save it through Excel using Python commands?) or if there are any packages or comands I can use to open/fix badly formatted .xls files.
Cheers!
P.S. I'm pretty new to Python and only have experience in R otherwise so my current knowledge is quite limited, apologies in advance!
try this :
from pathlib import Path
import pandas as pd
file_path = Path(filename)
df = pd.read_excel(file.read(), engine='openpyxl')

Pandas ExcelWriter creates corrupt file even in try except statement

I have an excel file with multiple sheets, one of which contains multiple data tables. I wrote a function to update the data tables in this file while keeping the rest of the sheets in tact. When saving you can overwrite the existing file or save it as a new file.
Here is the relevant code:
try:
wb=op.load_workbook(master_file)
writer=pd.ExcelWriter(save_name, engine='openpyxl')
writer.book=wb
writer.sheets=dict((ws.title, ws) for ws in wb.worksheets)
table1.to_excel(writer, sheet_name="New Sheet")
#Format and print more data tables....
wb.remove(wb["Data Sheet"])
wb["New Sheet"].title="Data Sheet"
except Exception:
return
writer.close()
My code works fine when there are no errors however, if something goes wrong and it does throw an error then it spits out a corrupted excel file that can't be open and is 0kb.
I want to prevent this because if you try to overwrite an existing file and it happens to throw an error all of the information in the file would be lost. This would be a huge issue.
However even when in a try block that catches all exceptions, ExcelWriter still creates a corrupted excel file with the save name.

pandas read_csv producing "IOError: Initializing from file failed" error. How do I fix this?

I have not been able to get pandas read_csv function to work on Mac. It does not matter how the file path is configured, or what csv file I am trying to use (I have used multiple files). It always results in IOError: Initializing from file failed. I have no clue what to try. This is the format I am using.
df = pd.read_csv(r'/Users/me/Documents/file.csv')
I have tried changing the working directory and using df = pd.read_csv('file.csv'), but this produces the same error.

Open a read-only Excel file using Python

I have a program (zTree) that is writing an Excel file and updating it constantly. What I need this Python program to do is read in the data from the Excel file as its updating. The problem that I'm having though is that when I try to read in the data using xlrd, I get the error:
peek = f.read(peeksz)
IO Error: [Errno 13] Permission denied
which comes up because Excel is in read-only mode. Is there any way to read in the data of an Excel file in read-only mode using Python?
just tested it on win 7 (64bit), but in this case it works:
import xlrd
workbook = xlrd.open_workbook('C:/User/myaccount/Book1.xls')
worksheet = workbook.sheet_by_name('Sheet1')
print worksheet
could it be, that you are trying to copy it first, or that your python is trying to put a temporary copy of the file in the py-directoy? - because that would give the IO-Error

Categories

Resources