Encoding problem with /usr/lib64/python3.4/http/client.py - python

I do not understand the error below. If I run :
python3.4 ./bug.py "salé.txt"
It is fine.
If I run : python3.4 ./bug.py "Capture d’écran du 2019-03-21 15-17-10.png"
I got this error :
Traceback (most recent call last):
File "./bug.py", line 45, in <module>
status=testB_CreateSimpleDocumentWithFile(session)
File "./bug.py", line 32, in testB_CreateSimpleDocumentWithFile
status, result = session.create_document_with_properties(path,mydoc,simple_document,properties=props,files=kk)
File "/home/karim/testatrium/nuxeolib/session.py", line 345, in create_document_with_properties
_document_properties, _ = self.encode_properties(properties, files)
File "/home/karim/testatrium/nuxeolib/session.py", line 251, in encode_properties
_names, _sizes = self.upload_files(files, batch_id=_batch_id)
File "/home/karim/testatrium/nuxeolib/session.py", line 136, in upload_files
_status, _result = self.execute_api(param=_param, headers=_headers, file_name=_name)
File "/home/karim/testatrium/nuxeolib/session.py", line 1325, in execute_api
_connection.request(method, url, headers=h2, body=data)
File "/usr/lib64/python3.4/http/client.py", line 1139, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python3.4/http/client.py", line 1179, in _send_request
self.putheader(hdr, value)
File "/usr/lib64/python3.4/http/client.py", line 1110, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 9: ordinal not in range(256)
The problem comes from the Right Single Quotation Mark. I do not manage to fix it.
Thanks for any advice.
Karim

Since File name is not in your control, I would sanitize the file name..
Any of the methods in this question, would solve the problem.

Related

Encoding error when opening an Excel file with python xlrd module

I have some excel files with extensions that are xls,when I use xlrd to open these files, it failed,I do not know how to solve it.
oldbook=xlrd.open_workbook('file.xls')
oldsheet=oldbook.sheets()[0]
PS C:\Users\我是猫\Desktop\python> python -u "c:\Users\我是猫\Desktop\python\a.py"
Traceback (most recent call last):
File "c:\Users\我是猫\Desktop\python\a.py", line 64, in <module>
oldbook=xlrd.open_workbook(result)
File "E:\python\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "E:\python\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
bk.parse_globals()
File "E:\python\lib\site-packages\xlrd\book.py", line 1209, in parse_globals
self.handle_format(data)
File "E:\python\lib\site-packages\xlrd\formatting.py", line 538, in handle_format
unistrg = unpack_unicode(data, 2)
File "E:\python\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
strg = unicode(rawstrg, 'utf_16_le')
File "E:\python\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
unicode = lambda b, enc: b.decode(enc)
File "E:\python\lib\encodings\utf_16_le.py", line 16, in decode
return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 10-11: illegal encoding
PS C:\Users\我是猫\Desktop\python>
Try overriding the encoding used:
oldbook = xlrd.open_workbook('file.xls', encoding_override="cp1252")
You can also try encoding_override="utf-8", play around with the encoding till you get the right one.

Pandas can't read excel encoding

I'm trying to import an excel file into Pandas. I'm using df=pd.read_excel(file_path) but it keeps getting me this error:
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5468, in <module>
adjust_sheet(y1,y2,y3)
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5130, in adjust_sheet
y1=pd.read_excel(y1)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 294, in __init__
self.book = xlrd.open_workbook(self._io)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 719, in get_sheets
self.get_sheet(sheetno)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 710, in get_sheet
sh.read(self)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/sheet.py", line 815, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/biffh.py", line 249, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/timemachine.py", line 30, in <lambda>
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
The file I'm trying to import is this one.
Is that an encoding problem or some character in the file is causing this? What would be the way to solve it?
pd.read_excel('data.csv' encoding='utf-8')
#astrobiologist gave a good hint
Since I didn't want the hassle of going into patches, the way I found to solve was to open the file in Open Office and save it as an Excel 97 file. Finally worked

'utf-8' decode error in tensorflow tutorial

I'm running into this bizarre problem where when I run
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/home/fqiao/development/MNIST_data/', one_hot=True)
I get:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 199, in read_data_sets
train_images = extract_images(local_file)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 58, in extract_images
magic = _read32(bytestream)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 51, in _read32
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
File "/usr/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
magic = self._fp.read(2)
File "/usr/lib/python3.5/gzip.py", line 91, in read
self.file.read(size-self._length+read)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 45, in sync
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 199, in read
return self._fp.read(n)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
However, if I just run the code in input_data.py directly, everything appears to be fine:
>>> dt = numpy.dtype(numpy.uint32).newbyteorder('>')
>>> f = tf.gfile.Open('/home/fqiao/development/MNIST_data/train-images-idx3-ubyte.gz', 'rb')
>>> bytestream = gzip.GzipFile(fileobj=f)
>>> testbytes = numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
>>> testbytes
2051
Anyone has any idea what's going on?
My system: Ubuntu 15.10 x64 python 3.5.0.
The bug has been addressed by a recent change 555e73d. MNIST files need to be opened with binary 'rb' mode instead of just text 'r'.
In my case, the problem was in the encoding of the data file.
Open the file using vim and execute:
:set fileencoding=utf-8
That solved the issue in my case.

Python decoding errors with BeautifulSoup, requests, and lxml

I'm attempting to pull some data off a popular browser based game, but am having trouble with some decoding errors:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.neopets.com/")
p = BeautifulSoup(r.text)
This produces the following stack trace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.linux-x86_64/egg/bs4/__init__.py", line 172, in __init__
File "build/bdist.linux-x86_64/egg/bs4/__init__.py", line 185, in _feed
File "build/bdist.linux-x86_64/egg/bs4/builder/_lxml.py", line 195, in feed
File "parser.pxi", line 1187, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:87912)
File "parsertarget.pxi", line 130, in lxml.etree._TargetParserContext._handleParseResult (src/lxml/lxml.etree.c:97055)
File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:8862)
File "saxparser.pxi", line 274, in lxml.etree._handleSaxCData (src/lxml/lxml.etree.c:93385)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb1 in position 476: invalid start byte
Doing the following:
print repr(r.text[476 - 10: 476 + 10])
Produces:
u'ttp-equiv="X-UA-Comp'
I'm really not sure what the issue here is. Any help is greatly appreciated. Thank you.
.text on a response returns a decoded unicode value, but perhaps you should let BeautifulSoup do the decoding for you:
p = BeautifulSoup(r.content, from_encoding=r.encoding)
r.content returns the un-decoded raw bytestring, and r.encoding is the encoding detected from the headers.

Python+mako Unicode problem

I am trying to read a DB table contents and display it as a web page using mako and bottle. The table has some Unicode (utf-8) fields in it.
UnicodeDecodeError('ascii', 'MOTOROLA MILESTONE\xe2\x84\xa2 PLUS',
18, 19, 'ordinal not in range(128)')
With the following stack trace:
Traceback (most recent call last):
File "/workspace/web/controller/bottle.py", line 499, in handle
return handler(**args)
File "webserver/webserver.py", line 101, in download
return html_tmpl(tmpl, **kwds)
File "webserver/webserver.py", line 116, in html_tmpl
return tmpl.render(**kwds)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/template.py", line 189, in render
return runtime._render(self, self.callable_, args, data)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 403, in _render
_render_context(template, callable_, context, *args, **_kwargs_for_callable(callable_, data))
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 434, in _render_context
_exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 457, in _exec_template
callable_(context, *args, **kwargs)
File "download_android_index_html", line 41, in render_body
File "download_android_index_html", line 23, in fill_devices
File "download_android_index_html", line 68, in render_fill_devices
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 18: ordinal not in range(128)
The calling function is:
def html_tmpl(tmpl, **kwds):
kwds['nav'] = templates_lookup.get_template('nav.html').render()
kwds['nav_bottom'] = templates_lookup.get_template('nav_bottom.html').render()
base_path = request.path.replace("de/","").replace("fr/","")
kwds['languages'] = templates_lookup.get_template('languages.html').render(en_url=base_path,fr_url="/fr"+base_path)
kwds['analytics'] = ''
return tmpl.render(**kwds)
How do I go aboutthis? I've tried:
return tmpl.render_unicode(**kwds)`
and
return tmpl.render_unicode(**kwds).encode('utf-8', 'replace')
with no luck, and this answer did not help much.
Any ideas?
The problem is not that render_unicode cannot convert a python unicode object into utf8, its that a string object exists, which it assumes is ascii, and that holds non ascii data.
Start at the beginning - decode all incoming strings into unicode internally. You have a string input that needs fixing.
I suggest you try naming all variables at the boundary with a sort of hungarian notation - perhaps rawstr_myvar and u_myvar.

Categories

Resources