I'm trying to read a Excel file (.xlsx) and I'm getting the error "IndexError: list index out of range". My code is simple:
import pandas as pd
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')
The error:
File "<ipython-input-16-fd0112985376>", line 2, in <module>
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xlsx')
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 364, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1233, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 522, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 420, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 533, in load_workbook
return load_workbook(
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 94, in __init__
self.named_styles = self._merge_named_styles()
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 114, in _merge_named_styles
self._expand_named_style(style)
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\stylesheet.py", line 124, in _expand_named_style
xf = self.cellStyleXfs[named_style.xfId]
File "C:\Users\Felipe.dias\Anaconda3\lib\site-packages\openpyxl\styles\cell_style.py", line 185, in __getitem__
return self.xf[idx]
IndexError: list index out of range
Maybe the Excel file is not proberly formatted, as stated here and here. But when I manually save the Excel file, the file is a "Excel Workbook (*.xlsx)" not a "Strict XML Open Spreadsheet", like in those other questions:
Print Screen: saving the Excel file manually
I downloaded this file from the web, so maybe the file is broken, but I don't know how to check it.
Thanks for your attention!
Edit 1:
Here is a print screen from the website's HTML
I don't know HTML, but I found strange that the file id is "xls-link" and its href is "./Html/DIARIO_16-11-2021.xlsx".
Like #Wayne said, when he downloaded the file, it came as .xls. After reading this answer, I tried running
pd.read_excel(r'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls')
And got the error "[Errno 2] No such file or directory: 'M:\PUBLIC\Felipe Dias\ONS\DIARIO_16-11-2021.xls' "
Then I tried to open the file manually and save it as .xls. After running the code above, it actually worked!
But now my problem is: I will have to manually open and save as .xls all +5000 daily files that I need, which is a tedious quest. Does anyone know how I could do this automatically (without actually open it, because I still can't figure it out)?
Related
I am writing a Python function to open two .csv files and make changes to the data inside. I am using pandas and pd.read_csv('text') to open the files. Everything works well and the function works for one .csv file. However, when I try it on a different smaller .csv file the file cannot even open.
This is part of the error I am getting when I try to open the .csv file.
Traceback (most recent call last):
File "C:\Users\...\Downloads\test\test.py", line 3, in <module>
df = pd.read_csv('data2.csv')
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 611, in _read
return parser.read(nrows)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 8836, saw 5
This is the code I am using to access the .csv files.
import pandas as pd
df = pd.read_csv('test.csv')
All the files are in the correct folders and the file paths are all correct. Any help is appreciated, thanks
I want to read a xlsm file by Pandas:
pd.read_excel("data.xlsm", engine='openpyxl', sheet_name="sheet1")
But, I get the error:
C:\Users\anaconda3\lib\site-packages\openpyxl\worksheet\_read_only.py:79: UserWarning: Unknown extension is not supported and will be removed
for idx, row in parser.parse():
C:\Users\anaconda3\lib\site-packages\openpyxl\worksheet\_read_only.py:79: UserWarning: Conditional Formatting extension is not supported and will be removed
for idx, row in parser.parse():
Another try: I saved the data file by xlsx format and tried to read that by:
pd.read_excel("data.xlsx", engine='openpyxl', sheet_name="sheet1")
And this time, I get the following error:
File "C:\Users\AppData\Local\Temp\ipykernel_28028\1689108907.py", line 1, in <module>
data = pd.read_excel(data_original_filepath, engine='openpyxl', sheet_name=sheet_name)
File "C:\Users\anaconda3\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 457, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 1419, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 525, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 518, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 536, in load_workbook
return load_workbook(
File "C:\Users\anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Users\anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 278, in read
self.read_workbook()
File "C:\Users\anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 150, in read_workbook
self.parser.parse()
File "C:\Users\anaconda3\lib\site-packages\openpyxl\reader\workbook.py", line 49, in parse
package = WorkbookPackage.from_tree(node)
File "C:\Users\anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 83, in from_tree
obj = desc.from_tree(el)
File "C:\Users\anaconda3\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in from_tree
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\anaconda3\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in <listcomp>
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\anaconda3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
TypeError: __init__() missing 1 required positional argument: 'id'
Any idea how to solve this issue?
In fact, I have to read the xlsm file. Changing the format to xlsx was only for trial purpose.
Please try this block of code.
import openpyxl
file='data.xlsm'
wb=openpyxl.load_workbook(file, data_only=True, read_only=False, keep_vba=True)
install the latest openpyxl from openpyxl web page
If you specify a sheet_name it's working
pd.read_excel("data.xlsm", sheet_name="sheet1")
I'm trying to execute the following code and I'm constantly experiencing this issue.
import pandas as pd
df = pd.read_excel('First_Run.xlsx', engine='openpyxl')
print(df.head())
I've make sure that excel file is there at the respective path. Have tried multiple ways to resolve the issue but failed to find the desired solution.
Here's the output of the code block.
Traceback (most recent call last):
File "c:\Users\fharookshaik\Desktop\Gmail Bot\temp.py", line 7, in <module>
df = pd.read_excel('First_Run.xlsx',engine='openpyxl')
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\util\_decorators.py", line 299, in wrapper
return func(*args, **kwargs)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 336, in read_excel
io = ExcelFile(io, storage_options=storage_options, engine=engine)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 1131, in __init__
self._reader = self._engines[engine](self._io, storage_options=storage_options)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_openpyxl.py", line 475, in __init__
super().__init__(filepath_or_buffer, storage_options=storage_options)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_base.py", line 391, in __init__
self.book = self.load_workbook(self.handles.handle)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\excel\_openpyxl.py", line 486, in load_workbook
return load_workbook(
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\zipfile.py", line 1269, in __init__
self._RealGetContents()
File "C:\Users\fharookshaik\AppData\Local\Programs\Python\Python38\lib\zipfile.py", line 1336, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Hope, this will be answered soon by the brilliant minds on this dynamic community.
Thanks in well Advance. š
I am trying to use the TensorFlow image classifier but halfway through the download my internet connection got lost and I could not download the file.
I understand that I have to delete the partial download and run again to make this work but I am not sure where the file is or how to find it.
I tried searching for the file name "Inception-2015-12-05.tgz" and nothing showed up. My guess is that there is a temporary file name when I downloaded it.
File "classify_image.py", line 227, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "classify_image.py", line 190, in main
maybe_download_and_extract()
File "classify_image.py", line 186, in maybe_download_and_extract
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2007, in extractall
numeric_owner=numeric_owner)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2049, in extract
numeric_owner=numeric_owner)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2119, in _extract_member
self.makefile(tarinfo, targetpath)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 2168, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tarfile.py", line 248, in copyfileobj
buf = src.read(bufsize)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 276, in read
return self._buffer.read(size)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
Delete the /tmp/imagenet folder, then restart the download.
I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.
input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)
The CSV file is AnzoƔtegui.csv. When I'm getting errors,
input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\AnzoƔtegui.csv
Error code:
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:
Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\AnzoƔtegui.csv]
Index: []
Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!
Edit: Full error
Traceback (most recent call last):
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
result = eval(compiled, updated_globals, frame.f_locals)
File "<string>", line 1, in <module>
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
self._make_engine(self.engine)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.
Comments at github.
Thanks for the input!