python appengine unicodeencodeerror on search api snippeted results - python

I'm crawling pages and indexing them with appengine search api (Spanish and Catalan pages, with accented characters). I'm able to perform searches and make a page of results.
Problem arises when I try to use a query object with snipetted_fields, as it always generates a UnicodeEncodeError:
File "/home/otger/python/jobs-gae/src/apps/search/handlers/results.py", line 82, in find_documents
return index.search(query_obj)
File "/opt/google_appengine_1.7.6/google/appengine/api/search/search.py", line 2707, in search
apiproxy_stub_map.MakeSyncCall('search', 'Search', request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 320, in MakeSyncCall
rpc.CheckSuccess()
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_rpc.py", line 156, in _WaitImpl
self.request, self.response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall
self._MakeRealSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 52: ordinal not in range(128)
I've found a similar question on stackoverflow: GAE Full Text Search development console UnicodeEncodeError but it says that it was a bug fixed on 1.7.0. I get same error either using version 1.7.5 and 1.7.6.
When Indexing pages I add two fields: description and description_ascii. If I try to generate snippets for description_ascii it works perfectly.
Is this possible to generate snippets of not ascii contents on dev_appserver?

I think this is a bug, reported new defect issue https://code.google.com/p/googleappengine/issues/detail?id=9335.
Temporary solution for dev server - locate google.appengine.api.search module (search.py), and patch function _DecodeUTF8 by adding inline if like this:
def _DecodeUTF8(pb_value):
"""Decodes a UTF-8 encoded string into unicode."""
if pb_value is not None:
return pb_value.decode('utf-8') if not isinstance(pb_value, unicode) else pb_value
return None
Workaround - until the issue is solved implement snippet functionality yourself - assuming field which is base for snippet is called snippet_base:
query = search.Query(query_string=query_string,
options=
search.QueryOptions(
...
returned_fields= [... 'snippet_base' ...]
))
results = search.Index(name="<index-name>").search(query)
if results:
for res in results.results:
res.snippet = some_snippeting_function(res.field("snippet_base"))

Related

I can't use unicode characters in EMAIL_PASSWORD with django

I am using django 3.0.8 and in my settings.py, I've specified the password for my e-mail account using EMAIL_PASSWORD = '...'. My password contains umlauts and upon manually sending a mail from the shell I get this error:
>>> from django.core.mail import send_mail
>>> send_mail('Django mail', 'This e-mail was sent with django', ..., fail_silently=False)
Traceback (most recent call last):
File "/usr/lib/python3.6/code.py", line 91, in runcode
exec(code, self.locals)
File "<console>", line 1, in <module>
File "/home/admin/.local/lib/python3.6/site-packages/django/core/mail/__init__.py", line 60, in send_mail
return mail.send()
File "/home/admin/.local/lib/python3.6/site-packages/django/core/mail/message.py", line 276, in send
return self.get_connection(fail_silently).send_messages([self])
File "/home/admin/.local/lib/python3.6/site-packages/django/core/mail/backends/smtp.py", line 102, in send_messages
new_conn_created = self.open()
File "/home/admin/.local/lib/python3.6/site-packages/django/core/mail/backends/smtp.py", line 69, in open
self.connection.login(self.username, self.password)
File "/usr/lib/python3.6/smtplib.py", line 721, in login
initial_response_ok=initial_response_ok)
File "/usr/lib/python3.6/smtplib.py", line 630, in auth
response = encode_base64(initial_response.encode('ascii'), eol='')
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 19: ordinal not in range(128)
If I remove the umlaut everything works as it should. Apparently smtplib manually encodes with ascii and I don't know how to tell it not to. Any ideas?
This is a known issue in smtplib.
As for now, July 2020 its is still open, with the fix awaiting review.
Bottom line - there is a problem, it is known, there is no official solution.
And yet, what you can do for now:
Manually patch the file /usr/lib/python3.6/smtplib.py the way it is done here - it is very simple, just replace ascii with utf-8 in 3 places. I didn't try it but it works according to other users. Just backup the file just in case. Or use a virtual env with a patched smtplib.py. To be honest I personally didn't test it but people report it working.
Use some alternative to smtplib. I actually don't know anything like this in native Python, however you can always use command line utilities like mail via python subprocess module, or some web services that provide mailing service via REST API (there are plenty, but this will not work if you are using mail on local company network or something).
Change your password and forget about the problem.

DjangoUnicodeDecodeError during calling form.errors with EmailField

I am porting my site to django 1.9 and don't know how to resolve this issue correctly.
In my form I have usual EmailField from django forms. If validation fails, it shoud be message about it (I pass 'form_errors': form.errors} to context for manipulating).
But in that case django returns
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in
position 0: ordinal not in range(128). You passed in
()
in django.core.validators there is a validator for it
#deconstructible
class EmailValidator(object):
message = _('Enter a valid email address.')
...
If I change message to message = 'error' all works fine.
So, question: how can I fix this issue without editing django files?
the problem is in calling form.errors, this error raise even if I want only to print it. (print form.errors). Another fields errors (IntegerField, URLField, for example) works fine, this problem is only for EmailField.
in view process looks like that now:
from django.http import JsonResponse
...
if form.is_valid():
...
else:
return JsonResponse({'form_errors': form.errors})
last traceback is:
File "/path/views.py", line 331, in custom_form_post
response = JsonResponse({'form_errors': form.errors})
File "/path/.env/local/lib/python2.7/site-packages/django/http/response.py", line 505, in __init__
data = json.dumps(data, cls=encoder, **json_dumps_params)
File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/path/.env/lib/python2.7/_abcoll.py", line 581, in __iter__
v = self[i]
File "/path/.env/local/lib/python2.7/site-packages/django/forms/utils.py", line 146, in __getitem__
return list(error)[0]
File "/path/.env/local/lib/python2.7/site-packages/django/core/exceptions.py", line 165, in __iter__
yield force_text(message)
File "/path/.env/local/lib/python2.7/site-packages/django/utils/encoding.py", line 88, in force_text
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128). You passed in <django.utils.functional.__proxy__ object at 0x40a6a90c> (<class 'django.utils.functional.__proxy__'>)
Well. The problem is in translations. In this case - Russian localization.
No idea why translation from "native" django localization files failed.
But for all who have similar problem:
Create (if still not) locale file (https://docs.djangoproject.com/en/1.9/topics/i18n/translation/#localization-how-to-create-language-files)
Add theese rows to django.po:
msgid "Enter a valid email address."
msgstr "Введите правильный адрес электронной почты." (or another translation you need)
Compile (django-admin compilemessages)

Unicode String in urllib.request [duplicate]

This question already has answers here:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' - -when using urlib.request python3
(2 answers)
Closed 3 years ago.
The short version: I have a variable s = 'bär'. I need to convert s to ASCII so that s = 'b%C3%A4r'.
Long version:
I'm using urllib.request.urlopen() to read an mp3 pronunciation file from URL. This has worked very well, except I ran into a problem because the URLs often contain unicode characters. For example, the German "Bär". The full URL is https://d7mj4aqfscim2.cloudfront.net/tts/de/token/bär. Indeed, typing this into Chrome as a URL works, and navigates me to the mp3 file without problems. However, feeding this same URL to urllib creates a problem.
I determined this was a unicode problem because the stack-trace reads:
Traceback (most recent call last):
File "importer.py", line 145, in <module>
download_file(tuple[1], tuple[0], ".mp3")
File "importer.py", line 81, in download_file
with urllib.request.urlopen(url) as in_stream, open(to_fname+ext, 'wb') as out_file: #`with object as name:` safely __enter__() and __exit__() the runtime of object. `as` assigns `name` as referring to the object `object`.
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1283, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1118, in _send_request
self.putrequest(method, url, **skips)
File "C:\Users\quesm\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 960, in putrequest
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 19: ordinal not in range(128)
... and other than the obvious UnicodeEncodeError, I can see it's trying to encode() to ASCII.
Interestingly, when I copied the URL from Chrome (instead of simply typing it into the Python interpreter), it translated the bär to b%C3%A4r. When I feed this to urllib.request.urlopen(), it processes fine, because all of these characters are ASCII. So my goal is to make this conversion within my program. I tried to get my original string to the unicode equivalent, but unicodedata.normalize() in all of its variants didn't work; further, I'm not sure how to store the Unicode as ASCII, given that Python 3 stores all strings as Unicode and thus makes no attempt to convert the text.
Use urllib.parse.quote:
>>> urllib.parse.quote('bär')
'b%C3%A4r'
>>> urllib.parse.urljoin('https://d7mj4aqfscim2.cloudfront.net/tts/de/token/',
... urllib.parse.quote('bär'))
'https://d7mj4aqfscim2.cloudfront.net/tts/de/token/b%C3%A4r'

Nested text encodings in suds requests

Environment: Python 2.7.4 (partly on Windows, partly on Linux, see below), suds (SVN HEAD with minor modifications)
I need to call into a web service that takes a single argument, which is an XML string (yes, I know…), i.e. the request is declared in the WSDL with the following type:
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="actionString" type="s:string"/>
</s:sequence>
</s:complexType>
I'm using cElementTree to construct this inner XML document, then I pass it as the only parameter to the client.service.ProcessAction(request) method that suds generates.
For a while, this worked okay:
root = ET.Element(u'ActionCommand')
value = ET.SubElement(root, u'value')
value.text = saxutils.escape(complex_value)
request = u'<?xml version="1.0" encoding="utf-8"?>\n' + ET.tostring(root, encoding='utf-8')
client.service.ProcessAction(request)
The saxutils.escape, I had added at some point to fix the first encoding problems, pretty much without being able to understand why exactly I need it and what difference it makes.
Now (possibly due to the first occurence of the pound sign), I suddenly got the following exception:
Traceback (most recent call last):
File "/app/module.py", line 135, in _process_web_service_call
request = u'<?xml version="1.0" encoding="utf-8"?>\n' + ET.tostring(root, encoding='utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 137: ordinal not in range(128)
The position 137 here corresponds to the location of the special characters inside the inner XML request. Apparently, cElementTree.tostring() returns a 'str' type, not a 'unicode' even when an encoding is given. So Python tries to decode this string str into unicode (why with 'ascii'?), so that it can concatenate it with the unicode literal. This fails (of course, because the str is actually encoded in UTF-8, not ASCII).
So I figured, fine, I'll decode it to unicode myself then:
root = ET.Element(u'ActionCommand')
value = ET.SubElement(root, u'value')
value.text = saxutils.escape(complex_value)
request_encoded_str = ET.tostring(root, encoding='utf-8')
request_unicode = request_encoded_str.decode('utf-8')
request = u'<?xml version="1.0" encoding="utf-8"?>\n' + request_unicode
client.service.ProcessClientAction(request)
Except that now, it blows up inside suds, which tries to decode the outer XML request for some reason:
Traceback (most recent call last):
File "/app/module.py", line 141, in _process_web_service_call
raw_response = client.service.ProcessAction(request)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/client.py", line 542, in __call__
return client.invoke(args, kwargs)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/client.py", line 602, in invoke
result = self.send(soapenv)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/client.py", line 643, in send
reply = transport.send(request)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/transport/https.py", line 64, in send
return HttpTransport.send(self, request)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/transport/http.py", line 118, in send
return self.invoke(request)
File "/app/.heroku/python/lib/python2.7/site-packages/suds/transport/http.py", line 153, in invoke
u2response = urlopener.open(u2request, timeout=tm)
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1181, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/app/.heroku/python/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/app/.heroku/python/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/app/.heroku/python/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/app/.heroku/python/lib/python2.7/httplib.py", line 827, in _send_output
msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 565: ordinal not in range(128)
The position 565 here again corresponds with the same character as above, except this time it's the location of my inner XML request embedded into the outer XML request (SOAP) created by suds.
I'm confused. Can anyone help me out of this mess? :)
To make matters worse, all of this only happens on the server under Linux. None of these raises an exception in my development environment on Windows. (Bonus points for an explanation as to why that is, just because I'm curious. I suspect it has to do with a different default encoding.) However, they all are not accepted by the server. What does work on Windows is if I drop the saxutils.escape and then hand a proper unicode object to suds. This however still results in the same UnicodeDecodeError on Linux.
Update: I started debugging this on Windows (where it works fine), and in the line 827 of httplib.py, it indeed tries to concatenate the unicode object msg (containing the HTTP headers) and the str object message_body, leading to the implicit unicode decoding with the incorrect encoding. I guess it just happens to not fail on Windows for some reason. I don't understand why suds tries to send a str object when I put a unicode object in at the top.
This turned out to be more than absurd. I'm still understanding only small parts of the whole problem and situation, but I managed to solve my problem.
So let's trace it back: my last attempt was the most sane one, I believe. So let's start there:
msg += message_body
That line in Python's httplib.py tries to concatenate a unicode and a str object, which leads to an implicit .decode('ascii') of the str, even though the str is UTF8-encoded. Why is that? Because msg is a unicode object.
msg = "\r\n".join(self._buffer)
self._buffer is a list of HTTP headers. Inspecting that, only one header in there was unicode, 'infecting' the resulting string: the action and endpoint.
And there's the problem: I'm using unicode_literals from __future__ (makes it more future-proof, right? right???) and I'm passing my own endpoint into suds.
By just doing an .encode('utf-8') on the URL, all my problems went away. Even the whole saxutils.escape was no longer needed (even though it weirdly also didn't hurt).
tl;dr: make sure you're not passing any unicode objects anywhere into httplib or suds, I guess.
root = ET.Element(u'ActionCommand')
value = ET.SubElement(root, u'value')
value.text = complex_value)
request = ET.tostring(root, encoding='utf-8').decode('utf-8')
client.service.ProcessAction(request)

UnicodeEncodeError:'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

i am a newer in python.Today when I write some search function I met an error.well, I use sqlalchemy orm to do that, in my function,I input a chinese word as the key word.The html page give me an UnicodeEncodeError at /user/search:'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256).
and my code is like this:
def user_search(request):
name = request.GET.get('name').strip()
user_list = list()
if name:
user_list = User.get_by_name(name)
class User(object):
#classmethod
def get_by_name(cls, name):
return DBSession.query(cls).filter(cls.name==name)
and the Traceback is here:
Traceback:
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/home/jiankong/git/admin-server/lib/decorators.py" in wrapper
75. return func(request, *args, **kwargs)
File "/home/jiankong/git/admin-server/lib/decorators.py" in wrapper
39. output = function(request, *args, **kwargs)
File "/home/jiankong/git/admin-server/apps/user/user_views.py" in user_search
47. users = jump_page(paginator, page)
File "/home/jiankong/git/admin-server/apps/user/utils.py" in jump_page
92. return paginator.page(1)
File "/usr/local/lib/python2.6/dist-packages/django/core/paginator.py" in page
37. number = self.validate_number(number)
File "/usr/local/lib/python2.6/dist-packages/django/core/paginator.py" in validate_number
28. if number > self.num_pages:
File "/usr/local/lib/python2.6/dist-packages/django/core/paginator.py" in _get_num_pages
60. if self.count == 0 and not self.allow_empty_first_page:
File "/usr/local/lib/python2.6/dist-packages/django/core/paginator.py" in _get_count
48. self._count = self.object_list.count()
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/orm/query.py" in count
2414. return self.from_self(col).scalar()
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/orm/query.py" in scalar
2240. ret = self.one()
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/orm/query.py" in one
2209. ret = list(self)
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/orm/query.py" in __iter__
2252. return self._execute_and_instances(context)
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/orm/query.py" in _execute_and_instances
2267. result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/engine/base.py" in execute
664. params)
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/engine/base.py" in _execute_clauseelement
764. compiled_sql, distilled_params
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/engine/base.py" in _execute_context
871. context)
File "/usr/local/lib/python2.6/dist-packages/SQLAlchemy-0.8.1dev-py2.6-linux-i686.egg/sqlalchemy/engine/default.py" in do_execute
324. cursor.execute(statement, parameters)
File "/usr/local/lib/python2.6/dist-packages/MySQL_python-1.2.4-py2.6-linux-i686.egg/MySQLdb/cursors.py" in execute
183. query = query % db.literal(args)
File "/usr/local/lib/python2.6/dist-packages/MySQL_python-1.2.4-py2.6-linux-i686.egg/MySQLdb/connections.py" in literal
264. return self.escape(o, self.encoders)
File "/usr/local/lib/python2.6/dist-packages/MySQL_python-1.2.4-py2.6-linux-i686.egg/MySQLdb/connections.py" in unicode_literal
202. return db.literal(u.encode(unicode_literal.charset))
Exception Type: UnicodeEncodeError at /user/search
Exception Value: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)`
when I met the error, I did a test in python shell, it worked well,the code is here:
from apps.user.models import User
user = User.get_by_name('某人').first()
print user
print user.name
某人
so what can I do to let it worked in my html page?much appreciate!!
I'm assuming that you're using MySQL with the MySQLdb driver here.
The default encoding used by the MySQLdb driver is latin-1, which does not support your character set. You'll need to use UTF-8 (or others, but UTF-8 is the most common) to be able to communicate with your database through MySQLdb (see http://docs.sqlalchemy.org/en/rel_0_8/dialects/mysql.html#unicode).
To do such a thing, create your engine with the following line:
create_engine('mysql+mysqldb://USER:#SERVER:PORT/DB?charset=utf8', encoding='utf-8')
You can also construct your engine url using the sqlalchemy.engine.url.URL class, and send it to the create engine function. I find it useful when you have your settings in a config file.
import sqlalchemy.engine.url as url
engine_url = url.URL(
drivername='mysql+' + cfg['MYSQL_PYTHON_DRIVER'],
host=cfg['MYSQL_HOST'],
port=cfg['MYSQL_PORT'],
username=cfg['MYSQL_USER'],
password=cfg['MYSQL_PWD'],
database=cfg['MYSQL_DB'],
query={'charset': 'utf8'}
)
db = create_engine(engine_url, encoding='utf-8')
Hope that helps.
based on your stacktrace, you're using MySQL Python with unicode encoding turned on, since it's doing an encode. So you likely need to specify a comaptible encoding (note this is all settings used by the MySQLdb DBAPI, SQLalhcemy just passes them through):
create_engine('mysql+mysqldb:///mydb?charset=utf8&use_unicode=1')
http://docs.sqlalchemy.org/en/rel_0_8/dialects/mysql.html#unicode

Categories

Resources