Python does not read the Excel Sheet

Python does not read the Excel Sheet - python

I am trying to read an excel sheet and create a pandas data frame out of it, but it keeps saying that the sheet does not exist, even tho it actually exists. Has anyone faced something similar?
This is the code I used:
excel=pd.ExcelFile("Berlin_Club_List.xlsx")
clubs=pd.read_excel(excel, 'Berlin_Club_List')

Is the file in your working directory?
If it is not you need to provide full path
(Would be a comment but not enough rep)

To check your actual working directory you can do the following:
import os
os.getcwd()
Do show the actual error message that you got. This will help others to understand and debug the issue you are facing.
Definitely check the documentation for pd.read_excel by typing
pd.read_excel?
you will see that you can specify the sheet you want to by
pd.read_excel(excel, sheet_name='sheetname')
and so on.

Related

Error when using Writer.Close() function within my Pandas and Openpyxl code

I have written a code which combines some CSV files into a single Excel file, and ended the 'writer' with the code:
writer.save()
writer.close()
However, I get the following error when trying to then open that file after the code has finalised:
We found a problem with some content in 'the file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.'
This seems to purely be related to the 'Writer.Close()' aspect, as without it I don't get the error. However, instead I cannot open the file as it states that someone else is using it (ie - openpyxl)
I'm not sure if relevant, but my file system runs on a OneDrive cloud based system.
My current plan beyond the 'writer.close()' is to pause the script to allow me to print the excel to PDF (I found this to be unreliable via Python), and then 'hit continue' to continue with exporting the PDF via Email.
Any ideas on how to resolve this error?

With out seeing more of your code and maybe an example of the data you are writing it's tough to make any assumptions. Based on the error you are experiencing it is likely due to the inputs/data going into the actual xlsx file that is causing the issue and not with the actual 'writer'. This is Excel saying that data in your file is 'corrupted' from their standards perspective and needs to be fixed.
You should be able to do a 'recovery' of the file through excel and it will identify the problem spots in your file which you can then back track into your python program and properly address to eliminate the probelm.

Reading excel file with pylightxl

My organization needs me to use pylightxl library to read some bulky excel xlsx files. I have never used this library before and I'm getting a strange error in pycharm. I simply do not understand what it is.
I've tried googling but there isn't much support for pylightxl on the web. Does anyone know how to help?

for completeness of this post it looks like there was a bug in early days of pylightxl for files that were converted from xls to xlsx, however this issue has been resolved with #31 with version 1.52+

For those who might meet the same issue in the future, here's how i solved my issue, or rather a work around of it.
The excel files i had been given were initially in xls format, which pylightxl does not support.
I converted them to xlsx by just clicking "save as" in excel and then tried to read them in pylightxl, of which i was getting the strange error above. Must be something to do with the format
So I ended up saving it in csv instead, of which reading them was successful.
So if anyone meets this error, try different formats for the document you're trying to read

Python - Openpyxl - "UserWarning: Unknown extension" issue

I am trying to learn Python (day 2) and am hoping to practice with Excel books first as this is where I am comfortable/fluent.
Right off the bat I am having an error that I don't quit understand when running the below code:
import openpyxl
wb = openpyxl.load_workbook("/Users/Scott/Desktop/Workbook1.xlsx")
print(wb.sheetnames)
This does print my sheet names as requested, but it is followed by:
/Users/Scott/PycharmProjects/Excel/venv/lib/python3.7/site-packages/openpyxl/worksheet/_reader.py:293: UserWarning: Unknown extension is not supported and will be removed
warn(msg)
I have found other questions that point to slicers/conditional formatting etc, but that does not apply here. This is a book I just made and only added 3 sheets before saving. It has no data, no formatting, and the extension is valid. I have no add-ons installed on my excel either.
Any idea why why I am getting this error? How do I resolve?
Python: 3.7
openpyxl: 2.6

I had a similar issue. I developed an application which read and write Excel files. It woked well on my Windows computer, but then I tried to run it on a friends mac. It showed the same error. I could "fix" it by changing the configuration of the workbook, like this:
import openpyxl as op
wb = op.load_workbook(file, read_only=True, data_only=True)
But, as you can see, you can only read Excel files with this configuration. At the end, I realized that my friend didn't have Microsoft Office installed on his computer. Install it truly solved my problem.

This question was from a couple years ago but I'm encountering it now with openpyxl and require a fix, as the warning is confounding and misleading to my end users.
The warning from openpyxl comes via the stdlib warnings library, which can be suppressed.
import warnings
warnings.simplefilter("ignore")
That's the "hit it with a hammer" approach. More granular levels of warnings suppression can be found here: https://docs.python.org/3/library/warnings.html

This is exactly the problem I encountered just now..
And to my situation (not to everyone) I discovered that you just need to close your excel and rerun the code, very simple.
If this doesn't work, you can refer to other answers.
Thanks

Python - Openpyxl - "UserWarning: Unknown extension" issue
To understand the error, you need to know what's inside an XLSX file. The best way to take a look is to change the extension to zip and open that. Inside you will see a file called [Content_Types].xml and directories for the other content. If you check out the XML in Content_Types you will see a <Types ...> tag containing other tags like this:
<Default Extension="png" ContentType="image/png"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
Note the "Extension" property. This is what the warning refers to. In the example above, my file included Extension="png" - the unknown extension.
For me, it was enough to specify read_only=True and the error went away eg:
wb = openpyxl.load_workbook(file, read_only=True)
I could also fix the issue by copying everything except the images to a new workbook and saving that. After checking, the xml in the new workbook no longer contained the png property.
Note, reading into pandas with pd.read_excel uses openpyxl and generates the same "Unknown extension" error but there is no way to pass through the read_only parameter. You can suppress the specific warning with:
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='openpyxl')

How to fix [Errno13] permission denied when trying to read excel file?

I tried the following code to be able to read an excel file from my personal computer.
import xlrd
book = xlrd.open_workbook('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx')
But I am getting the error 'Permission denied'. I am using windows and if I look at the properties of the directory and look at the 'Security' tab I have three groups/users and all three have permissions for all the authorities, except for the last option which is called 'special authorities' (as far as I know I do not need this authority to read the excel file in Python).
I have no idea how to fix this error. Furthermore, I do not have the Excel file open on my computer when running the simulation.
I really hope someone can help me to fix this error.

Sometimes, it is because you try to read the Excel file while it is opened. Close the file in Excel and you are good to go.

book = xlrd.open_workbook('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx')
You cannot give path like this to xlrd. path need to be single string.
If you insist you can use os module
import os
book = xlrd.open_workbook(os.path.join('C:\\Users\eline\Documents\***\***\Python', 'Example 1.xlsx'))
[Errno13] permission denied in your case is happening because you want to read folder like a file which is not allowed.

I ran into this situation also while reading an Excel file into a data frame. To me it appears that it is a Python and/or Excel bug which we should probably not hide by using os.path.join even if that solves the problem. My situation involved an excel spreadsheet that links cells to another CSV file. If this excel file is freshly opened and open when I try to read it in python, it fails.
Python reads it correctly if I do an unnecessary save of the open Excel file.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.