Pandas can't read excel encoding

Pandas can't read excel encoding - python

I'm trying to import an excel file into Pandas. I'm using df=pd.read_excel(file_path) but it keeps getting me this error:
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5468, in <module>
adjust_sheet(y1,y2,y3)
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5130, in adjust_sheet
y1=pd.read_excel(y1)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 294, in __init__
self.book = xlrd.open_workbook(self._io)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 719, in get_sheets
self.get_sheet(sheetno)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 710, in get_sheet
sh.read(self)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/sheet.py", line 815, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/biffh.py", line 249, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/timemachine.py", line 30, in <lambda>
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
The file I'm trying to import is this one.
Is that an encoding problem or some character in the file is causing this? What would be the way to solve it?

pd.read_excel('data.csv' encoding='utf-8')

#astrobiologist gave a good hint
Since I didn't want the hassle of going into patches, the way I found to solve was to open the file in Open Office and save it as an Excel 97 file. Finally worked

Related

cocos creator can not build for android platform

The information I got as follows.
Traceback (most recent call last):
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\bin\cocos.py", line 983, in
run_plugin(command, argv, plugins)
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\bin\cocos.py", line 875, in run_plugin
plugin.run(argv, dependencies_objects)
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\plugins\plugin_new\project_new.py", line 258, in run
self.parse_args(argv)
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\plugins\plugin_new\project_new.py", line 104, in parse_args
description=self.__class__.brief_description())
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\plugins\plugin_new\project_new.py", line 43, in brief_description
return MultiLanguage.get_string('NEW_BRIEF')
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\bin\MultiLanguage.py", line 52, in get_string
fmt = cls.get_instance().get_current_string(key)
File "C:\CocosCreator\resources\cocos2d-x\tools\cocos2d-console\bin\MultiLanguage.py", line 158, in get_current_string
ret = ret.encode(self.encoding)
File "C:\CocosCreator\resources\utils\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to <undefined>
any one knows why this happened?

I do not know python, but the problem seems solved afterI made the modification to the line 158 of MultiLanguage.py(in the get_current_string method):
ret = ret.encode(self.encoding)
to the following line:
ret = ret.encode("utf8") #self.encoding
Hope this helps people who encounter the same problem.

Take a look at the versions of NDK and SDK, NDK. The recommended version is R17 - R19

Encoding error when opening an Excel file with python xlrd module

I have some excel files with extensions that are xls，when I use xlrd to open these files, it failed，I do not know how to solve it.
oldbook=xlrd.open_workbook('file.xls')
oldsheet=oldbook.sheets()[0]
PS C:\Users\我是猫\Desktop\python> python -u "c:\Users\我是猫\Desktop\python\a.py"
Traceback (most recent call last):
File "c:\Users\我是猫\Desktop\python\a.py", line 64, in <module>
oldbook=xlrd.open_workbook(result)
File "E:\python\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "E:\python\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
bk.parse_globals()
File "E:\python\lib\site-packages\xlrd\book.py", line 1209, in parse_globals
self.handle_format(data)
File "E:\python\lib\site-packages\xlrd\formatting.py", line 538, in handle_format
unistrg = unpack_unicode(data, 2)
File "E:\python\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
strg = unicode(rawstrg, 'utf_16_le')
File "E:\python\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
unicode = lambda b, enc: b.decode(enc)
File "E:\python\lib\encodings\utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 10-11: illegal encoding
PS C:\Users\我是猫\Desktop\python>

Try overriding the encoding used:
oldbook = xlrd.open_workbook('file.xls', encoding_override="cp1252")
You can also try encoding_override="utf-8", play around with the encoding till you get the right one.

Encoding problem with /usr/lib64/python3.4/http/client.py

I do not understand the error below. If I run :
python3.4 ./bug.py "salé.txt"
It is fine.
If I run : python3.4 ./bug.py "Capture d’écran du 2019-03-21 15-17-10.png"
I got this error :
Traceback (most recent call last):
File "./bug.py", line 45, in <module>
status=testB_CreateSimpleDocumentWithFile(session)
File "./bug.py", line 32, in testB_CreateSimpleDocumentWithFile
status, result = session.create_document_with_properties(path,mydoc,simple_document,properties=props,files=kk)
File "/home/karim/testatrium/nuxeolib/session.py", line 345, in create_document_with_properties
_document_properties, _ = self.encode_properties(properties, files)
File "/home/karim/testatrium/nuxeolib/session.py", line 251, in encode_properties
_names, _sizes = self.upload_files(files, batch_id=_batch_id)
File "/home/karim/testatrium/nuxeolib/session.py", line 136, in upload_files
_status, _result = self.execute_api(param=_param, headers=_headers, file_name=_name)
File "/home/karim/testatrium/nuxeolib/session.py", line 1325, in execute_api
_connection.request(method, url, headers=h2, body=data)
File "/usr/lib64/python3.4/http/client.py", line 1139, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python3.4/http/client.py", line 1179, in _send_request
self.putheader(hdr, value)
File "/usr/lib64/python3.4/http/client.py", line 1110, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 9: ordinal not in range(256)
The problem comes from the Right Single Quotation Mark. I do not manage to fix it.
Thanks for any advice.
Karim

Since File name is not in your control, I would sanitize the file name..
Any of the methods in this question, would solve the problem.

Python pandas to excel UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11

After the web scraping of an e-commerce web site I have saved all the data into a pandas dataframe. Well, when I'm trying to save my pandas dataframe to an excel file but I get the following error:
Traceback (most recent call last):
File "<ipython-input-7-3dafdf6b87bd>", line 2, in <module>
sheet_name='Dolci', encoding='iso-8859-1')
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line
1466, in to_excel
excel_writer.save()
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\io\excel.py", line
1502, in save
return self.book.close()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\workbook.py",
line 299, in close
self._store_workbook()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\workbook.py",
line 607, in _store_workbook
xml_files = packager._create_package()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\packager.py",
line 139, in _create_package
self._write_shared_strings_file()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\packager.py",
line 286, in _write_shared_strings_file
sst._assemble_xml_file()
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 53, in _assemble_xml_file
self._write_sst_strings()
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 83, in _write_sst_strings
self._write_si(string)
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 110, in _write_si
self._xml_si_element(string, attributes)
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\xmlwriter.py",
line 122, in _xml_si_element
self.fh.write("""<si><t%s>%s</t></si>""" % (attr, string))
File "C:\ProgramData\Anaconda2\lib\codecs.py", line 706, in write
return self.writer.write(data)
File "C:\ProgramData\Anaconda2\lib\codecs.py", line 369, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11:
ordinal not in range(128)
The code I use is this:
df.to_excel('my_file.xlsx',sheet_name='Dolci', encoding='iso-8859-1')
but it doesn't work, I even have tried:
df.to_excel('my_file.xlsx',sheet_name='Dolci', encoding='utf-8')
but it still give me error.
Can somebody help me on this issue?

It seems like you use xlsxwriter engine in ExcelWriter.
Try to use openpyxl instead.
writer = pd.ExcelWriter('file_name.xlsx', engine='openpyxl')
df.to_excel(writer)
writer.save()

There is an essential param of to_excel method, try
df.to_excel('filename.xlsx', engine='openpyxl')
and it works for me.

Adding to #Vadym 's response, you may have to close your writer to get the file to be created.
writer = pd.ExcelWriter(xlPath, engine='openpyxl')
df.to_excel(writer)
writer.close()
"depends on the behaviour of the used engine"
See:
https://github.com/pandas-dev/pandas/issues/9145
This should be a comment but I don't have the rep...

'utf-8' decode error in tensorflow tutorial

I'm running into this bizarre problem where when I run
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/home/fqiao/development/MNIST_data/', one_hot=True)
I get:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 199, in read_data_sets
train_images = extract_images(local_file)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 58, in extract_images
magic = _read32(bytestream)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 51, in _read32
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
File "/usr/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
magic = self._fp.read(2)
File "/usr/lib/python3.5/gzip.py", line 91, in read
self.file.read(size-self._length+read)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 45, in sync
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 199, in read
return self._fp.read(n)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
However, if I just run the code in input_data.py directly, everything appears to be fine:
>>> dt = numpy.dtype(numpy.uint32).newbyteorder('>')
>>> f = tf.gfile.Open('/home/fqiao/development/MNIST_data/train-images-idx3-ubyte.gz', 'rb')
>>> bytestream = gzip.GzipFile(fileobj=f)
>>> testbytes = numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
>>> testbytes
2051
Anyone has any idea what's going on?
My system: Ubuntu 15.10 x64 python 3.5.0.

The bug has been addressed by a recent change 555e73d. MNIST files need to be opened with binary 'rb' mode instead of just text 'r'.

In my case, the problem was in the encoding of the data file.
Open the file using vim and execute:
:set fileencoding=utf-8
That solved the issue in my case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas can't read excel encoding - python

pd.read_excel('data.csv' encoding='utf-8')

#astrobiologist gave a good hint Since I didn't want the hassle of going into patches, the way I found to solve was to open the file in Open Office and save it as an Excel 97 file. Finally worked

Related

cocos creator can not build for android platform

Encoding error when opening an Excel file with python xlrd module

Encoding problem with /usr/lib64/python3.4/http/client.py

Python pandas to excel UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11

'utf-8' decode error in tensorflow tutorial

Categories

Resources