Python+mako Unicode problem - python

I am trying to read a DB table contents and display it as a web page using mako and bottle. The table has some Unicode (utf-8) fields in it.
UnicodeDecodeError('ascii', 'MOTOROLA MILESTONE\xe2\x84\xa2 PLUS',
18, 19, 'ordinal not in range(128)')
With the following stack trace:
Traceback (most recent call last):
File "/workspace/web/controller/bottle.py", line 499, in handle
return handler(**args)
File "webserver/webserver.py", line 101, in download
return html_tmpl(tmpl, **kwds)
File "webserver/webserver.py", line 116, in html_tmpl
return tmpl.render(**kwds)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/template.py", line 189, in render
return runtime._render(self, self.callable_, args, data)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 403, in _render
_render_context(template, callable_, context, *args, **_kwargs_for_callable(callable_, data))
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 434, in _render_context
_exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File "/usr/lib/python2.5/site-packages/Mako-0.3.4-py2.5.egg/mako/runtime.py", line 457, in _exec_template
callable_(context, *args, **kwargs)
File "download_android_index_html", line 41, in render_body
File "download_android_index_html", line 23, in fill_devices
File "download_android_index_html", line 68, in render_fill_devices
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 18: ordinal not in range(128)
The calling function is:
def html_tmpl(tmpl, **kwds):
kwds['nav'] = templates_lookup.get_template('nav.html').render()
kwds['nav_bottom'] = templates_lookup.get_template('nav_bottom.html').render()
base_path = request.path.replace("de/","").replace("fr/","")
kwds['languages'] = templates_lookup.get_template('languages.html').render(en_url=base_path,fr_url="/fr"+base_path)
kwds['analytics'] = ''
return tmpl.render(**kwds)
How do I go aboutthis? I've tried:
return tmpl.render_unicode(**kwds)`
and
return tmpl.render_unicode(**kwds).encode('utf-8', 'replace')
with no luck, and this answer did not help much.
Any ideas?

The problem is not that render_unicode cannot convert a python unicode object into utf8, its that a string object exists, which it assumes is ascii, and that holds non ascii data.
Start at the beginning - decode all incoming strings into unicode internally. You have a string input that needs fixing.
I suggest you try naming all variables at the boundary with a sort of hungarian notation - perhaps rawstr_myvar and u_myvar.

Related

Unable to push data to xcom in airflow

from airflow.operators.python import get_current_context
context = get_current_context()
ti = context['ti']
ti.xcom_push(key="file", value = doc )
I have the above code in a task and doc is the data that I want to pass to xcom. Its throwing the following error stack trace :
Traceback (most recent call last):
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/decorators/base.py", line 217, in execute
return_value = super().execute(context)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/operators/python.py", line 175, in execute
return_value = self.execute_callable()
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/operators/python.py", line 192, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/opt/bitnami/airflow/dags/rover_ocr_pipeline.py", line 65, in retrieve
ti.xcom_push(key="file", value = doc )
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 2294, in xcom_push
XCom.set(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/session.py", line 72, in wrapper
return func(*args, **kwargs)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/xcom.py", line 234, in set
value = cls.serialize_value(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/models/xcom.py", line 627, in serialize_value
return json.dumps(value, cls=XComEncoder).encode("UTF-8")
File "/opt/bitnami/python/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/json.py", line 176, in encode
return super().encode(o)
File "/opt/bitnami/python/lib/python3.9/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/bitnami/python/lib/python3.9/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/bitnami/airflow/venv/lib/python3.9/site-packages/airflow/utils/json.py", line 153, in default
CLASSNAME: o.__module__ + "." + o.__class__.__qualname__,
AttributeError: 'bytes' object has no attribute '__module__'
This was working till now, I am guessing its an issue with airflow version. Previously I was using 2.3.4 , now using 2.5.0.
Airflow is running on kubernetes cluster and using airflow:2.5.0-debian-11-r11 image.
Moving from comments to an actual answer, see above comments for full conversation
XCOM tries to convert everything to a string before storing in the XCOM tables. In this case since bytes is a class, it was trying to serialize it which isn't possible. Converting the bytes to a normal string by base64 encoding the bytes allowed for it to be stored in xcom.
While probably not worth the effort for just this case, this could be handled automatically by creating a custom xcom backend that accurately detects when dealing with byte strings and performs the conversion behind the scenes.

Encoding problem with /usr/lib64/python3.4/http/client.py

I do not understand the error below. If I run :
python3.4 ./bug.py "salé.txt"
It is fine.
If I run : python3.4 ./bug.py "Capture d’écran du 2019-03-21 15-17-10.png"
I got this error :
Traceback (most recent call last):
File "./bug.py", line 45, in <module>
status=testB_CreateSimpleDocumentWithFile(session)
File "./bug.py", line 32, in testB_CreateSimpleDocumentWithFile
status, result = session.create_document_with_properties(path,mydoc,simple_document,properties=props,files=kk)
File "/home/karim/testatrium/nuxeolib/session.py", line 345, in create_document_with_properties
_document_properties, _ = self.encode_properties(properties, files)
File "/home/karim/testatrium/nuxeolib/session.py", line 251, in encode_properties
_names, _sizes = self.upload_files(files, batch_id=_batch_id)
File "/home/karim/testatrium/nuxeolib/session.py", line 136, in upload_files
_status, _result = self.execute_api(param=_param, headers=_headers, file_name=_name)
File "/home/karim/testatrium/nuxeolib/session.py", line 1325, in execute_api
_connection.request(method, url, headers=h2, body=data)
File "/usr/lib64/python3.4/http/client.py", line 1139, in request
self._send_request(method, url, body, headers)
File "/usr/lib64/python3.4/http/client.py", line 1179, in _send_request
self.putheader(hdr, value)
File "/usr/lib64/python3.4/http/client.py", line 1110, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 9: ordinal not in range(256)
The problem comes from the Right Single Quotation Mark. I do not manage to fix it.
Thanks for any advice.
Karim
Since File name is not in your control, I would sanitize the file name..
Any of the methods in this question, would solve the problem.

Pandas can't read excel encoding

I'm trying to import an excel file into Pandas. I'm using df=pd.read_excel(file_path) but it keeps getting me this error:
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5468, in <module>
adjust_sheet(y1,y2,y3)
File "/Users/santanna_santanna/PycharmProjects/KlooksExplore/FindCos/FindCos_Functions.py", line 5130, in adjust_sheet
y1=pd.read_excel(y1)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 230, in read_excel
io = ExcelFile(io, engine=engine)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 294, in __init__
self.book = xlrd.open_workbook(self._io)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 719, in get_sheets
self.get_sheet(sheetno)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 710, in get_sheet
sh.read(self)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/sheet.py", line 815, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/biffh.py", line 249, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
File "/Users/santanna_santanna/anaconda3/lib/python3.6/site-packages/xlrd/timemachine.py", line 30, in <lambda>
unicode = lambda b, enc: b.decode(enc)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
The file I'm trying to import is this one.
Is that an encoding problem or some character in the file is causing this? What would be the way to solve it?
pd.read_excel('data.csv' encoding='utf-8')
#astrobiologist gave a good hint
Since I didn't want the hassle of going into patches, the way I found to solve was to open the file in Open Office and save it as an Excel 97 file. Finally worked

How do I trace UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 1950: invalid start byte

Am trying to run an app in development but I keep getting
UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 1950: invalid start byte
Please how do trace the exact part of the code where this is coming from as I can't make sense of it's error source
Below is the full error screen
{u'selected': {}, u'categories': {u'ratings': ((1, u'*'), (2, u'**'), (3, u'***'), (4, u'****'), (5, u'*****')), u'genre s': <QuerySet []>, u'actors': <QuerySet []>, u'directors': <QuerySet []>}}
Internal Server Error: /movie/
Traceback (most recent call last):
File "c:\python27\lib\site-packages\django\core\handlers\exception.py", line 41, in inner
response = get_response(request)
File "c:\python27\lib\site-packages\django\core\handlers\base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "c:\python27\lib\site-packages\django\core\handlers\base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "C:\Users\Roland\Documents\Web2\myproject_env\myproject2\movies\views.py", line 70, in movie_list
return render(request, "movies/movie_list.html",context)
File "c:\python27\lib\site-packages\django\shortcuts.py", line 30, in render
content = loader.render_to_string(template_name, context, request, using=using)
File "c:\python27\lib\site-packages\django\template\loader.py", line 67, in render_to_string
template = get_template(template_name, using=using)
File "c:\python27\lib\site-packages\django\template\loader.py", line 21, in get_template
return engine.get_template(template_name)
File "c:\python27\lib\site-packages\django\template\backends\django.py", line 39, in get_template
return Template(self.engine.get_template(template_name), self)
File "c:\python27\lib\site-packages\django\template\engine.py", line 162, in get_template
template, origin = self.find_template(template_name)
File "c:\python27\lib\site-packages\django\template\engine.py", line 136, in find_template
name, template_dirs=dirs, skip=skip,
File "c:\python27\lib\site-packages\django\template\loaders\base.py", line 38, in get_template
contents = self.get_contents(origin)
File "c:\python27\lib\site-packages\django\template\loaders\filesystem.py", line 29, in get_contents
return fp.read()
File "C:\Users\Roland\Documents\Web2\myproject_env\lib\codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 1950: invalid start byte
Reading a long traceback like this can be tricky, for sure.
First you need to know how to read the lines of the traceback. They come in pairs: a description of where to find the code, and a snippet of the code. Like this, the last pair of lines:
File "C:\Users\Roland\Documents\Web2\myproject_env\lib\codecs.py", line 314, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
This tells you which file, line number, and function to look in to find the line of code that was involved. This last line is the one that raised the actual error. However, it's not a line of your code - it's from a library. It's pretty rare that you need to read the code of libraries to debug exceptions in your code. So we look at the next line up:
File "c:\python27\lib\site-packages\django\template\loaders\filesystem.py", line 29, in get_contents
return fp.read()
This is the line that called the line we just looked at. It's also not your code - it's from django. So you keep looking up, line by line, until you find a line from your code. Here it is:
File "C:\Users\Roland\Documents\Web2\myproject_env\myproject2\movies\views.py", line 70, in movie_list
return render(request, "movies/movie_list.html",context)
This isn't the line that caused the exception, but it's the last part of the code that you're responsible for that ultimately led to the exception. So this is most likely where you'll need to make a change. In fact, looking further up the traceback, this is the only line you have control over - the rest are all in libraries. So this is definitely where you'll need to make the change. Now you can narrow your search to find more information: you need to call render with different arguments, somehow. I'd search for django render UnicodeDecodeError. And in fact, the first hit is another Stack Overflow question with exactly the same problem! Unfortunately the answer isn't very helpful. But if you keep searching, you'll find one that is.

Python SUDS unicode decode error returned from Webservice

I am attempting to use a Webservice created by one of our developers that allows us to upload files into the system, within certain restrictions.
Using SUDS, I get the following information:
Suds ( https://fedorahosted.org/suds/ ) version: 0.4 GA build: R699-20100913
Service ( ConnectToEFS ) tns="http://tempuri.org/"
Prefixes (3)
ns0 = "http://schemas.microsoft.com/2003/10/Serialization/"
ns1 = "http://schemas.microsoft.com/Message"
ns2 = "http://tempuri.org/"
Ports (1):
(BasicHttpBinding_IConnectToEFS)
Methods (2):
CreateContentFolder(xs:string FileCode, xs:string FolderName, xs:string ContentType, xs:string MetaDataXML, )
UploadFile(ns1:StreamBody FileByteStream, )
Types (4):
ns1:StreamBody
ns0:char
ns0:duration
ns0:guid
My method to using UploadFile is as follows:
def webserviceUploadFile(self, targetLocation, fileName, fileSource):
fileSource = './test_files/' + fileSource
ntlm = WindowsHttpAuthenticated(username=uname, password=upass)
client = Client(webservice_url, transport=ntlm)
client.set_options(soapheaders={'TargetLocation':targetLocation, 'FileName': fileName})
body = client.factory.create('AIRDocument')
body_file = open(fileSource, 'rb')
body_data = body_file.read()
body.FileByteStream = body_data
return client.service.UploadFile(body)
Running this gets me the following result:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe',
'test_data.pdf')
File "test_cases.py", line 81, in webserviceUploadFile
return client.service.UploadFile(body)
File "build\bdist.win32\egg\suds\client.py", line 542, in __call__
return client.invoke(args, kwargs)
File "build\bdist.win32\egg\suds\client.py", line 595, in invoke
soapenv = binding.get_message(self.method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 120, in get_message
content = self.bodycontent(method, args, kwargs)
File "build\bdist.win32\egg\suds\bindings\document.py", line 63, in bodycontent
p = self.mkparam(method, pd, value)
File "build\bdist.win32\egg\suds\bindings\document.py", line 105, in mkparam
return Binding.mkparam(self, method, pdef, object)
File "build\bdist.win32\egg\suds\bindings\binding.py", line 287, in mkparam
return marshaller.process(content)
File "build\bdist.win32\egg\suds\mx\core.py", line 62, in process
self.append(document, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 243, in append
Appender.append(self, child, cont)
File "build\bdist.win32\egg\suds\mx\appender.py", line 182, in append
self.marshaller.append(parent, content)
File "build\bdist.win32\egg\suds\mx\core.py", line 75, in append
self.appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 102, in append
appender.append(parent, content)
File "build\bdist.win32\egg\suds\mx\appender.py", line 198, in append
child.setText(tostr(content.value))
File "build\bdist.win32\egg\suds\sax\element.py", line 251, in setText
self.text = Text(value)
File "build\bdist.win32\egg\suds\sax\text.py", line 43, in __new__
result = super(Text, cls).__new__(cls, *args, **kwargs)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal
not in range(128)
After much research and talking with the developer of the webservice, I modified the body_data = body_file.read() into body_data = body_file.read().decode("UTF-8") which gets me this error:
Traceback (most recent call last):
File "test_cases.py", line 639, in test_upload_file_invalid_extension
result_string = self.HM.webserviceUploadFile('9999', 'AD-1234-5424__44.exe', 'test_data.pdf')
File "test_cases.py", line 79, in webserviceUploadFile
body_data = body_file.read().decode("utf-8")
File "C:\python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe2 in position 10: invalid
continuation byte
Which is less than helpful.
After more research into the problem, I tried adding 'errors='ignore'' to the UTF-8 encode, and this was the result:
<TransactionDescription>Error in INTL-CONF_France_PROJ_MA_126807.docx: An exception has been thrown when reading the stream.. Inner Exception: System.Xml.XmlException: The byte 0x03 is not valid at this location. Line 1, position 318.
at System.Xml.XmlExceptionHelper.ThrowXmlException(XmlDictionaryReader reader, String res, String arg1, String arg2, String arg3)
at System.Xml.XmlUTF8TextReader.Read()
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Exhaust(XmlDictionaryReader reader)
at System.ServiceModel.Dispatcher.StreamFormatter.MessageBodyStream.Read(Byte[] buffer, Int32 offset, Int32 count). Source: System.ServiceModel</TransactionDescription>
Which pretty much stumps me on what to do. Based on the result stack trace by the webservice, it looks like it wants UTF-8 but I can't seem to get it to the webservice without Python or SUDS throwing a fit, or by ignoring problems in the encoding. The system I'm working on only takes in MicroSoft office type files (doc, xls, and the like), PDFs, and TXT files, so using something that I have more control on the encoding is not an option. I also tried detecting the encoding used by the sample PDF and the sample DOCX, but using what it suggested (Latin-1, ISO8859-x, and several windows XXXX) all were accepted by Python and SUDS, but not by the webservice.
Also note in the example shown, its most frequently referencing a test to an invalid extension. This error applies even in what should be a test of the successful upload, which is the only time really that the final stacktrace ever shows up.
You can use this base64.b64encode(body_file.read()) and this will return the base64 string value. So your request variable must be a string.

Categories

Resources