Reading a .xls file using pandas read_excel [duplicate]

Reading a .xls file using pandas read_excel [duplicate] - python

My code:
import xlrd
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls")
sh = wb.sheet_by_index(0)
print sh.cell(0,0).value
The error:
Traceback (most recent call last):
File "Z:\Wilson\tradedStockStatus.py", line 18, in <module>
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls")
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record;
found '<table r'"
The file doesn't seem to be corrupted or of a different format.
Anything to help find the source of the issue would be great.

Try to open it as an HTML with pandas:
import pandas as pd
data = pd.read_html('filename.xls')
Or try any other html python parser.
That's not a proper excel file, but an html readable with excel.

You say:
The file doesn't seem to be corrupted or of a different format.
However as the error message says, the first 8 bytes of the file are '<table r' ... that is definitely not Excel .xls format. Open it with a text editor (e.g. Notepad) that won't take any notice of the (incorrect) .xls extension and see for yourself.

This will happen to some files while also open in Excel.

I had a similar problem and it was related to the version. In a python terminal check:
>> import xlrd
>> xlrd.__VERSION__
If you have '0.9.0' you can open almost all files. If you have '0.6.0' which was what I found on Ubuntu, you may have problems with newest Excel files. You can download the latest version of xlrd using the Distutils standard.

I found the similar problem when downloading .xls file and opened it using xlrd library. Then I tried out the solution of converting .xls into .xlsx as detailed here: how to convert xls to xlsx
It works like a charm and rather than opening .xls, I am working with .xlsx file now using openpyxl library.
Hope it helps to solve your issue.

I had faced the same xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; error and solved it by writing an XML to XLSX converter. The reason is that actually, xlrd does not support XML Spreadsheet (*.xml) i.e. NOT in XLS or XLSX format.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False, header=False)
writer.save()

In my case, after opening the file with a text editor as #john-machin suggested, I realized the file is not encrypted as an Excel file is supposed to but it's in the CSV format and was saved as an Excel file. What I did was renamed the file and its extension and used read_csv function instead:
os.rename('sample_file.xls', 'sample_file.csv')
csv = pd.read_csv("sample_file.csv", error_bad_lines=False)

It may be an old excel file format. It can be read as html in pandas via
import pandas as pd
df = pd.read_html('file.xls')
Eventually, this gives a list of dataframes (if you check the type is a list). https://pandas.pydata.org/pandas-docs/version/0.17.1/io.html#io-read-html
You need to extract them, for instance with df[0]

I met this problem too.I opened this file by excel and saved it as other formats such as excel 97-2003 and finally I solved this problem

I had the same issue. Those old files are formatted like a tab-delimited file. I've been able to open my problem files with read_table; ie df = pd.read_table('trouble_maker.xls').

I got this error when I tried to read some XLSX files from a folder and that one of the files was opened. I closed the XLSX file and this error did not show up.

Try this It worked for me.
import pandas as pd
data = pd.read_csv('filename.xls')

I just downloaded xlrd, created an excel document (excel 2007) for testing and got the same error (message says 'found PK\x03\x04\x14\x00\x06\x00'). Extension is a xlsx. Tried saving it to an older .xls format and error disappears .....

I meet the same problem.
it lies in the .xls file itself - it looks like an Excel file however it isn't. (see if there's a pop up when you plainly open the .xls from Excel)
sjmachin commented on Jan 19, 2013 from https://github.com/python-excel/xlrd/issues/26 helps.

Worked on the same issue , finally done this is top for the question so just putting what i did.
Observation -
1 -The file was not actually XLS i renamed to txt and noticed HTML text in file.
2 - Renamed the file to html and tried reading pd.read_html, Failed.
3- Added as it was not there in txt file, removed style to ensure that table is displaying in browser from local, and WORKED.
Below is the code may help someone..
import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time
shutil.copy('your.xls','file.html')
shutil.copy('file.html','file.txt')
time.sleep(2)
txt = open('file.txt','r').read()
# Modify the text to ensure the data display in html page, delete style
txt = str(txt).replace('<style> .text { mso-number-format:\#; } </script>','')
# Add head and body if it is not there in HTML text
txt_with_head = '<html><head></head><body>'+txt+'</body></html>'
# Save the file as HTML
html_file = open('output.html','w')
html_file.write(txt_with_head)
# Use beautiful soup to read
url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})
frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)

Open in google sheets and then download from sheets as CSV and then reupload to drive. Then you can Open CSV file from python.

2 ways I know of is to just download the xls file once again and if you are doing in google colab, just load the file once again from your computer and run the pd.read_excel("filename,xlsx") once again . It should work.

As they already wrote it is actually html, to see the first table you can use
df= pd.read_html(file)
df[0]
To see how many tables there are you can use
print('Tables found:', len(df))

This work for me, using encoding="utf-8" from this post
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 100: character maps to <undefined>
def convert_to_xlsx():
with open('sample.xls', encoding="utf-8") as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False,
header=False)
writer.save()

melike's answer works for me, while the last output sentence did't work, so if anyone has the same issue with me and wants to output the xlsx file into local location, can just easily modify the last three lines.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
output_df = pd.DateFrame(sheet_as_list)
output_df.to_excel(writer, sheet_name='sheet1',index=False, header=False)
writer.close()

import os
import pandas as pd
# Rename the file if it's not already a .csv file
if not os.path.exists('3.8 locates.csv'):
os.rename('3.8 locates.xls', '3.8 locates.csv')
# Load the data into a pandas dataframe
df = pd.read_csv("3.8 locates.csv", sep='\t|\n', engine='python')
# Show the first 5 rows of the dataframe
print(df.head())
The code imports the os and pandas modules and then uses them to perform the following operations:
Check if the file '3.8 locates.csv' exists.
If it does not exist, it renames the file '3.8 locates.xls' to '3.8 locates.csv'.
Load the contents of the file '3.8 locates.csv' into a Pandas dataframe using the pd.read_csv method. The sep argument is set to '\t|\n' and the engine argument is set to 'python' to handle the file's separators correctly.
Print the first 5 rows of the dataframe using the df.head() method.
Note: The code may not work as expected if the file is not a valid tab-separated or newline-separated file.

there's nothing wrong with your file. xlrd does not yet support xlsx (excel 2007+) files although it's purported to have supported this for some time.
Simplistix github
2-days ago they committed a pre-alpha version to their git which integrates xlsx support. Other forums suggest that you use a DOM parser for xlsx files since the xlsx file type is just a zip archive containing XML. I have not tried this. there is another package with similar functionality as xlrd and this is called openpyxl which you can get from easy_install or pip. I have not tried this either, however, its API is supposed to be similar to xlrd.

I know there should be a proper way to solve it
but just to save time
I uploaded my xlsx sheet to Google Sheets and then again downloaded it from Google Sheets
it working now
If you don't have time to solve the problem, you can try this

Sometimes help to add ?raw=true at the end of a file path. For example:
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls?raw=true")

Related

Python Pandas can't read .xls file though engine is xlrd

have a 1 GB excel sheet with xls format (old excel), and I can't read it with pandas
df = pd.read_excel("filelocation/filename.xls",engine = "xlrd")
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<html>\r\n'
and if removed the engine it sends this error
ValueError: Excel file format cannot be determined, you must specify an engine manually
any advice will be appreciated thanks

One of these options should work:
data = pandas.read_table(r"filelocation/filename.xls")
or
data = pandas.read_html("filelocation/filename.xls")
Otherwise, try another HTML parse, I agree with #AKX, this doesn't look like an excel file.

CSV file with Arabic characters is displayed as symbols in Excel

I am using python to extract Arabic tweets from twitter and save it as a CSV file, but when I open the saved file in excel the Arabic language displays as symbols. However, inside python, notepad, or word, it looks good.
May I know where is the problem?

This is a problem I face frequently with Microsoft Excel when opening CSV files that contain Arabic characters. Try the following workaround that I tested on latest versions of Microsoft Excel on both Windows and MacOS:
Open Excel on a blank workbook
Within the Data tab, click on From Text button (if not
activated, make sure an empty cell is selected)
Browse and select the CSV file
In the Text Import Wizard, change the File_origin to "Unicode (UTF-8)"
Go next and from the Delimiters, select the delimiter used in your file e.g. comma
Finish and select where to import the data
The Arabic characters should show correctly.

Just use encoding='utf-8-sig' instead of encoding='utf-8' as follows:
import csv
data = u"اردو"
with(open('example.csv', 'w', encoding='utf-8-sig')) as fh:
writer = csv.writer(fh)
writer.writerow([data])
It worked on my machine.

The only solution that i've found to save arabic into an excel file from python is to use pandas and to save into the xlsx extension instead of csv, xlsx seems a million times better here's the code i've put together which worked for me
import pandas as pd
def turn_into_csv(data, csver):
ids = []
texts = []
for each in data:
texts.append(each["full_text"])
ids.append(str(each["id"]))
df = pd.DataFrame({'ID': ids, 'FULL_TEXT': texts})
writer = pd.ExcelWriter(csver + '.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', encoding="utf-8-sig")
# Close the Pandas Excel writer and output the Excel file.
writer.save()

Fastest way is after saving the file into .csv from python:
open the .csv file using Notepad++
from Encoding drop-down menu choose UTF-8-BOM
click save as and save at with same name with .csv extension (e.g. data.csv) and keep the file type as it is .txt
re-open the file again with Microsoft Excel.

Excel is known to have an awful csv import sytem. Long story short if on same system you import a csv file that you have just exported, it will work smoothly. Else, the csv file is expected to use the Windows system encoding and delimiter.
A rather awkward but robust system is to use LibreOffice or Oracle OpenOffice. Both are far beyond Excel on any feature but the csv module: they will allow you to specify the delimiters and optional quoting characters along with the encoding of the csv file and you will be able to save the resulting file in xslx.

Although my CSV file encoding was UTF-8; but explicitly redoing it again using the Notepad resolved it.
Steps:
Open your CSV file in Notepad.
Click File --> Save as...
In the "Encoding" drop-down, select UTF-8.
Rename your file using the .csv extension.
Click Save.
Reopen the file with Excel.

Pandas:XLDR error while reading an xls file from a url

i am getting an error while reading an xls file the error is as stated below
**XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x08jstanle'**
i tried out various solution but ended up with no luck other tools like xlrd,pyexcel but still facing this error.hope someone out there has a solution to this issue.Also i tried to read it as raw file using pythons io library but the issue is there are multiple sheets in file the sequence need to be maintained
Thanks in Advance
Your Good Health

There are 2 possible reasons for this:
1).The file that you've getting from the source url is not as the
same file format as the file extension says
2).XLS files are encrypted if you explicitly apply a workbook
password but also if you password protect some of the worksheet
elements. As such it is possible to have an encrypted XLS file even
if you don't need a password to open it.
if you have the problem number one the you have a solution open the workbook and save it as the supported format.
file1 = io.open(filename, "r", encoding="utf-8")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('myexcel.xls')
the file you've downloaded it would be html also.use below code snippet to verify for a one file.
import pandas as pd
df_list = pd.read_html('filename.xlsx')
df = pd.DataFrame(df_list[0])

CParserError: Error tokenizing data

I'm having some trouble reading a csv file
import pandas as pd
df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)
I get
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5
and when I add sep=None to df I get another error
Error: line contains NULL byte
I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file
the csv file is totally fine, I checked it and i see nothing wrong with it
Here are the errors I get:

In your actual code, the line is:
>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)
You are trying to read an Excel file, and not a plain text CSV which is why things are not working.
Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).
You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.
You can use read_excel or you can use a library like xlrd which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.

Use read_excel instead read_csv if Excel file:
import pandas as pd
df = pd.read_excel("Data_Matches_tekha.xlsx")

I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.
You can download it by typing in your terminal
pip install pickle
Then you can use for writing your data (first) the code below
import pickle
with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)
And finally import your data in another script using
import pickle
with open(path, 'rb') as input:
data = pickle.load(input)
Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=x with x corresponding to the version (2 or 3) aiming to use for reading.
I hope this can be of any use.

python xlrd unsupported format, or corrupt file.

My code:
import xlrd
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls")
sh = wb.sheet_by_index(0)
print sh.cell(0,0).value
The error:
Traceback (most recent call last):
File "Z:\Wilson\tradedStockStatus.py", line 18, in <module>
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls")
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record;
found '<table r'"
The file doesn't seem to be corrupted or of a different format.
Anything to help find the source of the issue would be great.

Try to open it as an HTML with pandas:
import pandas as pd
data = pd.read_html('filename.xls')
Or try any other html python parser.
That's not a proper excel file, but an html readable with excel.

You say:
The file doesn't seem to be corrupted or of a different format.
However as the error message says, the first 8 bytes of the file are '<table r' ... that is definitely not Excel .xls format. Open it with a text editor (e.g. Notepad) that won't take any notice of the (incorrect) .xls extension and see for yourself.

This will happen to some files while also open in Excel.

I had a similar problem and it was related to the version. In a python terminal check:
>> import xlrd
>> xlrd.__VERSION__
If you have '0.9.0' you can open almost all files. If you have '0.6.0' which was what I found on Ubuntu, you may have problems with newest Excel files. You can download the latest version of xlrd using the Distutils standard.

I found the similar problem when downloading .xls file and opened it using xlrd library. Then I tried out the solution of converting .xls into .xlsx as detailed here: how to convert xls to xlsx
It works like a charm and rather than opening .xls, I am working with .xlsx file now using openpyxl library.
Hope it helps to solve your issue.

I had faced the same xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; error and solved it by writing an XML to XLSX converter. The reason is that actually, xlrd does not support XML Spreadsheet (*.xml) i.e. NOT in XLS or XLSX format.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False, header=False)
writer.save()

In my case, after opening the file with a text editor as #john-machin suggested, I realized the file is not encrypted as an Excel file is supposed to but it's in the CSV format and was saved as an Excel file. What I did was renamed the file and its extension and used read_csv function instead:
os.rename('sample_file.xls', 'sample_file.csv')
csv = pd.read_csv("sample_file.csv", error_bad_lines=False)

It may be an old excel file format. It can be read as html in pandas via
import pandas as pd
df = pd.read_html('file.xls')
Eventually, this gives a list of dataframes (if you check the type is a list). https://pandas.pydata.org/pandas-docs/version/0.17.1/io.html#io-read-html
You need to extract them, for instance with df[0]

I met this problem too.I opened this file by excel and saved it as other formats such as excel 97-2003 and finally I solved this problem

I had the same issue. Those old files are formatted like a tab-delimited file. I've been able to open my problem files with read_table; ie df = pd.read_table('trouble_maker.xls').

I got this error when I tried to read some XLSX files from a folder and that one of the files was opened. I closed the XLSX file and this error did not show up.

Try this It worked for me.
import pandas as pd
data = pd.read_csv('filename.xls')

I just downloaded xlrd, created an excel document (excel 2007) for testing and got the same error (message says 'found PK\x03\x04\x14\x00\x06\x00'). Extension is a xlsx. Tried saving it to an older .xls format and error disappears .....

I meet the same problem.
it lies in the .xls file itself - it looks like an Excel file however it isn't. (see if there's a pop up when you plainly open the .xls from Excel)
sjmachin commented on Jan 19, 2013 from https://github.com/python-excel/xlrd/issues/26 helps.

Worked on the same issue , finally done this is top for the question so just putting what i did.
Observation -
1 -The file was not actually XLS i renamed to txt and noticed HTML text in file.
2 - Renamed the file to html and tried reading pd.read_html, Failed.
3- Added as it was not there in txt file, removed style to ensure that table is displaying in browser from local, and WORKED.
Below is the code may help someone..
import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time
shutil.copy('your.xls','file.html')
shutil.copy('file.html','file.txt')
time.sleep(2)
txt = open('file.txt','r').read()
# Modify the text to ensure the data display in html page, delete style
txt = str(txt).replace('<style> .text { mso-number-format:\#; } </script>','')
# Add head and body if it is not there in HTML text
txt_with_head = '<html><head></head><body>'+txt+'</body></html>'
# Save the file as HTML
html_file = open('output.html','w')
html_file.write(txt_with_head)
# Use beautiful soup to read
url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})
frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)

Open in google sheets and then download from sheets as CSV and then reupload to drive. Then you can Open CSV file from python.

2 ways I know of is to just download the xls file once again and if you are doing in google colab, just load the file once again from your computer and run the pd.read_excel("filename,xlsx") once again . It should work.

As they already wrote it is actually html, to see the first table you can use
df= pd.read_html(file)
df[0]
To see how many tables there are you can use
print('Tables found:', len(df))

This work for me, using encoding="utf-8" from this post
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 100: character maps to <undefined>
def convert_to_xlsx():
with open('sample.xls', encoding="utf-8") as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
pd.DataFrame(sheet_as_list).to_excel(writer, sheet_name=sheet.attrs['ss:Name'], index=False,
header=False)
writer.save()

melike's answer works for me, while the last output sentence did't work, so if anyone has the same issue with me and wants to output the xlsx file into local location, can just easily modify the last three lines.
import pandas as pd
from bs4 import BeautifulSoup
def convert_to_xlsx():
with open('sample.xls') as xml_file:
soup = BeautifulSoup(xml_file.read(), 'xml')
writer = pd.ExcelWriter('sample.xlsx')
for sheet in soup.findAll('Worksheet'):
sheet_as_list = []
for row in sheet.findAll('Row'):
sheet_as_list.append([cell.Data.text if cell.Data else '' for cell in row.findAll('Cell')])
output_df = pd.DateFrame(sheet_as_list)
output_df.to_excel(writer, sheet_name='sheet1',index=False, header=False)
writer.close()

import os
import pandas as pd
# Rename the file if it's not already a .csv file
if not os.path.exists('3.8 locates.csv'):
os.rename('3.8 locates.xls', '3.8 locates.csv')
# Load the data into a pandas dataframe
df = pd.read_csv("3.8 locates.csv", sep='\t|\n', engine='python')
# Show the first 5 rows of the dataframe
print(df.head())
The code imports the os and pandas modules and then uses them to perform the following operations:
Check if the file '3.8 locates.csv' exists.
If it does not exist, it renames the file '3.8 locates.xls' to '3.8 locates.csv'.
Load the contents of the file '3.8 locates.csv' into a Pandas dataframe using the pd.read_csv method. The sep argument is set to '\t|\n' and the engine argument is set to 'python' to handle the file's separators correctly.
Print the first 5 rows of the dataframe using the df.head() method.
Note: The code may not work as expected if the file is not a valid tab-separated or newline-separated file.

there's nothing wrong with your file. xlrd does not yet support xlsx (excel 2007+) files although it's purported to have supported this for some time.
Simplistix github
2-days ago they committed a pre-alpha version to their git which integrates xlsx support. Other forums suggest that you use a DOM parser for xlsx files since the xlsx file type is just a zip archive containing XML. I have not tried this. there is another package with similar functionality as xlrd and this is called openpyxl which you can get from easy_install or pip. I have not tried this either, however, its API is supposed to be similar to xlrd.

I know there should be a proper way to solve it
but just to save time
I uploaded my xlsx sheet to Google Sheets and then again downloaded it from Google Sheets
it working now
If you don't have time to solve the problem, you can try this

Sometimes help to add ?raw=true at the end of a file path. For example:
wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls?raw=true")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading a .xls file using pandas read_excel [duplicate] - python

Try to open it as an HTML with pandas: import pandas as pd data = pd.read_html('filename.xls') Or try any other html python parser. That's not a proper excel file, but an html readable with excel.

This will happen to some files while also open in Excel.

I met this problem too.I opened this file by excel and saved it as other formats such as excel 97-2003 and finally I solved this problem

I had the same issue. Those old files are formatted like a tab-delimited file. I've been able to open my problem files with read_table; ie df = pd.read_table('trouble_maker.xls').

I got this error when I tried to read some XLSX files from a folder and that one of the files was opened. I closed the XLSX file and this error did not show up.

Try this It worked for me. import pandas as pd data = pd.read_csv('filename.xls')

I just downloaded xlrd, created an excel document (excel 2007) for testing and got the same error (message says 'found PK\x03\x04\x14\x00\x06\x00'). Extension is a xlsx. Tried saving it to an older .xls format and error disappears .....

I meet the same problem. it lies in the .xls file itself - it looks like an Excel file however it isn't. (see if there's a pop up when you plainly open the .xls from Excel) sjmachin commented on Jan 19, 2013 from https://github.com/python-excel/xlrd/issues/26 helps.

Open in google sheets and then download from sheets as CSV and then reupload to drive. Then you can Open CSV file from python.

2 ways I know of is to just download the xls file once again and if you are doing in google colab, just load the file once again from your computer and run the pd.read_excel("filename,xlsx") once again . It should work.

As they already wrote it is actually html, to see the first table you can use df= pd.read_html(file) df[0] To see how many tables there are you can use print('Tables found:', len(df))

I know there should be a proper way to solve it but just to save time I uploaded my xlsx sheet to Google Sheets and then again downloaded it from Google Sheets it working now If you don't have time to solve the problem, you can try this

Sometimes help to add ?raw=true at the end of a file path. For example: wb = xlrd.open_workbook("Z:\\Data\\Locates\\3.8 locates.xls?raw=true")

Related

Python Pandas can't read .xls file though engine is xlrd

CSV file with Arabic characters is displayed as symbols in Excel

Pandas:XLDR error while reading an xls file from a url

CParserError: Error tokenizing data

python xlrd unsupported format, or corrupt file.

Categories

Resources