weird python encoding error with elasticsearch - python

i have run into an encoding related issue lately in python and elasticsearch which i find is weird. What happens is i have a script which downloads some data from a an api it works fine on localhost. but when i try to run the same script on my server it throws an exception
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 143: ordinal not in range(128)
i understand this is beecause of some non-ascii characters present in the the data downloaded. but why then same script runs on my local without throwing any error?
i cannot post complete code because its a part of bigger application and it won't make sense on its own other than these:
def save_data(results,index,dimension):
def actionGenerator(results,index_name,doc_type_name):
for i in results:
yield {
"_index": index_name,
"_type": doc_type_name,
"_source": i
}
res = bulk(ES,actionGenerator(results,index,dimension),chunk_size=15000)
Is this error because of ES client? but in any case it works just fine on localhost
I guess exception is being thrown here looking at traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/srv/fbweaver/src/main/tasks/task.py", line 189, in data_dump
parse_csv_file(FILE_NAME,REPORT_DOWNLOAD_PATH,breakdown.capitalize())
File "/srv/fbweaver/src/main/tasks/task.py", line 128, in parse_csv_file
save_data(result,'fb',breakdown)
File "/srv/fbweaver/src/main/tasks/task.py", line 22, in save_data
res = bulk(ES,actionGenerator(results,index,dimension),chunk_size=15000)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 190, in bulk
for ok, item in streaming_bulk(client, actions, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 162, in streaming_bulk
for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 87, in _process_bulk_chunk
resp = client.bulk('\n'.join(bulk_actions) + '\n', **kwargs)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 143: ordinal not in range(128)

Related

Django server won't run

I just tries to start django project on win7(x64), but i faced with following issue:
$ python manage.py runserver
Performing system checks...
System check identified no issues (0 silenced).
March 24, 2018 - 14:24:08
Django version 1.11.3, using settings 'superlists.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
Unhandled exception in thread started by <function check_errors.<locals>.wrapper
at 0x035BD978>
Traceback (most recent call last):
File "C:\Users\alesya\.virtualenvs\superlists\lib\site-packages\django\utils\a
utoreload.py", line 227, in wrapper
fn(*args, **kwargs)
File "C:\Users\alesya\.virtualenvs\superlists\lib\site-packages\django\core\ma
nagement\commands\runserver.py", line 149, in inner_run
ipv6=self.use_ipv6, threading=threading, server_cls=self.server_cls)
File "C:\Users\alesya\.virtualenvs\superlists\lib\site-packages\django\core\se
rvers\basehttp.py", line 164, in run
httpd = httpd_cls(server_address, WSGIRequestHandler, ipv6=ipv6)
File "C:\Users\alesya\.virtualenvs\superlists\lib\site-packages\django\core\se
rvers\basehttp.py", line 74, in __init__
super(WSGIServer, self).__init__(*args, **kwargs)
File "c:\users\alesya\appdata\local\programs\python\python36-32\Lib\socketserv
er.py", line 453, in __init__
self.server_bind()
File "c:\users\alesya\appdata\local\programs\python\python36-32\Lib\wsgiref\si
mple_server.py", line 50, in server_bind
HTTPServer.server_bind(self)
File "c:\users\alesya\appdata\local\programs\python\python36-32\Lib\http\serve
r.py", line 138, in server_bind
self.server_name = socket.getfqdn(host)
File "c:\users\alesya\appdata\local\programs\python\python36-32\Lib\socket.py"
, line 673, in getfqdn
hostname, aliases, ipaddrs = gethostbyaddr(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 14: invalid
start byte
My computer has an ASCII name, so I even not realized, what happens.
Did all these things on another win7 and everything was ok.
Maybe someone can help with?
UPD. My problem was due to the changed 'hosts' file - there are a lot of disabled addresses.
Thanks all for the answers.
use python3, if you use python2.x many letters like accents or others, they cause abnormal crashes
try this:
a.encode('utf-8').strip()
if "a" is the string with non-ascii character

DjangoUnicodeDecodeError during calling form.errors with EmailField

I am porting my site to django 1.9 and don't know how to resolve this issue correctly.
In my form I have usual EmailField from django forms. If validation fails, it shoud be message about it (I pass 'form_errors': form.errors} to context for manipulating).
But in that case django returns
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in
position 0: ordinal not in range(128). You passed in
()
in django.core.validators there is a validator for it
#deconstructible
class EmailValidator(object):
message = _('Enter a valid email address.')
...
If I change message to message = 'error' all works fine.
So, question: how can I fix this issue without editing django files?
the problem is in calling form.errors, this error raise even if I want only to print it. (print form.errors). Another fields errors (IntegerField, URLField, for example) works fine, this problem is only for EmailField.
in view process looks like that now:
from django.http import JsonResponse
...
if form.is_valid():
...
else:
return JsonResponse({'form_errors': form.errors})
last traceback is:
File "/path/views.py", line 331, in custom_form_post
response = JsonResponse({'form_errors': form.errors})
File "/path/.env/local/lib/python2.7/site-packages/django/http/response.py", line 505, in __init__
data = json.dumps(data, cls=encoder, **json_dumps_params)
File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/path/.env/lib/python2.7/_abcoll.py", line 581, in __iter__
v = self[i]
File "/path/.env/local/lib/python2.7/site-packages/django/forms/utils.py", line 146, in __getitem__
return list(error)[0]
File "/path/.env/local/lib/python2.7/site-packages/django/core/exceptions.py", line 165, in __iter__
yield force_text(message)
File "/path/.env/local/lib/python2.7/site-packages/django/utils/encoding.py", line 88, in force_text
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128). You passed in <django.utils.functional.__proxy__ object at 0x40a6a90c> (<class 'django.utils.functional.__proxy__'>)
Well. The problem is in translations. In this case - Russian localization.
No idea why translation from "native" django localization files failed.
But for all who have similar problem:
Create (if still not) locale file (https://docs.djangoproject.com/en/1.9/topics/i18n/translation/#localization-how-to-create-language-files)
Add theese rows to django.po:
msgid "Enter a valid email address."
msgstr "Введите правильный адрес электронной почты." (or another translation you need)
Compile (django-admin compilemessages)

SSHLibrary UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 660: invalid start byte

I am using Robot Framework SSHLibrary to open connection with a RHEL server. But connection was unsuccessful. Robot Framework throws the following error
FAIL : UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 660: invalid start byte
20151212 12:47:36.022 : DEBUG :
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\SSHLibrary\library.py", line 792, in login
return self._login(self.current.login, username, password, delay)
File "C:\Python27\lib\site-packages\SSHLibrary\library.py", line 832, in _login
login_output = login_method(username, *args)
File "C:\Python27\lib\site-packages\SSHLibrary\abstractclient.py", line 150, in login
return self._read_login_output(delay)
File "C:\Python27\lib\site-packages\SSHLibrary\abstractclient.py", line 165, in _read_login_output
return self.read(delay)
File "C:\Python27\lib\site-packages\SSHLibrary\abstractclient.py", line 299, in read
return self._decode(output)
File "C:\Python27\lib\site-packages\SSHLibrary\abstractclient.py", line 302, in _decode
return output.decode(self.config.encoding)
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
In Open Connection Robot Framework keyword gave encoding=latin-1 as Latin character representation was present in my login response from remote server.
Login successful.

python appengine unicodeencodeerror on search api snippeted results

I'm crawling pages and indexing them with appengine search api (Spanish and Catalan pages, with accented characters). I'm able to perform searches and make a page of results.
Problem arises when I try to use a query object with snipetted_fields, as it always generates a UnicodeEncodeError:
File "/home/otger/python/jobs-gae/src/apps/search/handlers/results.py", line 82, in find_documents
return index.search(query_obj)
File "/opt/google_appengine_1.7.6/google/appengine/api/search/search.py", line 2707, in search
apiproxy_stub_map.MakeSyncCall('search', 'Search', request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_stub_map.py", line 320, in MakeSyncCall
rpc.CheckSuccess()
File "/opt/google_appengine_1.7.6/google/appengine/api/apiproxy_rpc.py", line 156, in _WaitImpl
self.request, self.response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall
self._MakeRealSyncCall(service, call, request, response)
File "/opt/google_appengine_1.7.6/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 52: ordinal not in range(128)
I've found a similar question on stackoverflow: GAE Full Text Search development console UnicodeEncodeError but it says that it was a bug fixed on 1.7.0. I get same error either using version 1.7.5 and 1.7.6.
When Indexing pages I add two fields: description and description_ascii. If I try to generate snippets for description_ascii it works perfectly.
Is this possible to generate snippets of not ascii contents on dev_appserver?
I think this is a bug, reported new defect issue https://code.google.com/p/googleappengine/issues/detail?id=9335.
Temporary solution for dev server - locate google.appengine.api.search module (search.py), and patch function _DecodeUTF8 by adding inline if like this:
def _DecodeUTF8(pb_value):
"""Decodes a UTF-8 encoded string into unicode."""
if pb_value is not None:
return pb_value.decode('utf-8') if not isinstance(pb_value, unicode) else pb_value
return None
Workaround - until the issue is solved implement snippet functionality yourself - assuming field which is base for snippet is called snippet_base:
query = search.Query(query_string=query_string,
options=
search.QueryOptions(
...
returned_fields= [... 'snippet_base' ...]
))
results = search.Index(name="<index-name>").search(query)
if results:
for res in results.results:
res.snippet = some_snippeting_function(res.field("snippet_base"))

Python: Overriding os.path.supports_unicode_filenames on Ubuntu

I am running a python web app on an Ubuntu server, while I development locally on OS X.
I use a lot of unicode strings for the Hebrew language, including manipulating filenames of images, so they will be saved on the filesystem with Hebrew characters.
My Ubuntu server is fully configured for UTF-8 - I have other images on the file system (outside of this app) with Hebrew names, in Hebrew named directories, etc.
However, my app returns errors when trying to save an image with a Hebrew filename on Ubuntu (but not on OS X).
The error being:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
After alot of investigating, I got to the last possible cause as far as I can see:
# Inside my virtualenv, Mac OS X
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> True
# Inside my virtualenv, Ubuntu 12.04
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> False
And just for the curious, here are my Ubuntu locale settings:
locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Update: adding the code, and an example string:
# a string, of the type I would get for instance.product.name, as used below.
u'\\u05e7\\u05e8\\u05d5\\u05d1-\\u05e8\\u05d7\\u05d5\\u05e7'
#utils.py
# I get an image object from django, and I run this function so django
# can use the generated filepath for the image.
def get_upload_path(instance, filename):
tmp = filename.split('.')
extension = '.' + tmp[-1]
if instance.__class__.__name__ == 'MyClass':
seo_filename = unislugify(instance.product.name)
# unislugify takes a string and strips spaces, etc.
value = IMAGES_PRODUCT_DIR + seo_filename + extension
else:
value = IMAGES_GENERAL_DIR + unislugify(filename)
return value
Example stacktrace:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-66: ordinal not in range(128)
Stacktrace (most recent call last):
File "django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "django/contrib/admin/options.py", line 366, in wrapper
return self.admin_site.admin_view(view)(*args, **kwargs)
File "django/utils/decorators.py", line 91, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "django/views/decorators/cache.py", line 89, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "django/contrib/admin/sites.py", line 196, in inner
return view(request, *args, **kwargs)
File "django/utils/decorators.py", line 25, in _wrapper
return bound_func(*args, **kwargs)
File "django/utils/decorators.py", line 91, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "django/utils/decorators.py", line 21, in bound_func
return func(self, *args2, **kwargs2)
File "django/db/transaction.py", line 209, in inner
return func(*args, **kwargs)
File "django/contrib/admin/options.py", line 1055, in change_view
self.save_related(request, form, formsets, True)
File "django/contrib/admin/options.py", line 733, in save_related
self.save_formset(request, form, formset, change=change)
File "django/contrib/admin/options.py", line 721, in save_formset
formset.save()
File "django/forms/models.py", line 497, in save
return self.save_existing_objects(commit) + self.save_new_objects(commit)
File "django/forms/models.py", line 628, in save_new_objects
self.new_objects.append(self.save_new(form, commit=commit))
File "django/forms/models.py", line 731, in save_new
obj.save()
File "django/db/models/base.py", line 463, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "django/db/models/base.py", line 551, in save_base
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
File "django/db/models/manager.py", line 203, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File "django/db/models/query.py", line 1593, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "django/db/models/sql/compiler.py", line 909, in execute_sql
for sql, params in self.as_sql():
File "django/db/models/sql/compiler.py", line 872, in as_sql
for obj in self.query.objs
File "django/db/models/fields/files.py", line 249, in pre_save
file.save(file.name, file, save=False)
File "django/db/models/fields/files.py", line 86, in save
self.name = self.storage.save(name, content)
File "django/core/files/storage.py", line 44, in save
name = self.get_available_name(name)
File "django/core/files/storage.py", line 70, in get_available_name
while self.exists(name):
File "django/core/files/storage.py", line 230, in exists
return os.path.exists(self.path(name))
File "python2.7/genericpath.py", line 18, in exists
os.stat(path)
os.path.supports_unicode_filenames is always False on posix systems except darwin, that's because they don't really care about the encoding of the filename, it's simply a byte sequence. The locale settings specify how to interpret this bytes, that's why you can end up with broken characters in a terminal whenn the locale setting isn't right.
How are you running your web app? If your running it through a web server (apache?) using cgi or wsgi, the locale may not be what you see in the shell, so this could be the reason why python tries to use the ascii codec to encode the pathname.
To make it work, you could manually encode the pathname as utf-8 when opening the file.
Edit:
So the fails is a call to os.stat, which, wenn called with an unicode string, tries to convert it to a byte string according to the default encoding (sys.getdefaultencoding()), which within a uWSGI environment always seems to be ascii when using python2. To fix this you can make sure to encode any unicode string to utf-8 before it can be passed on to os.stat.
Thanks to the help of everyone. I still did not solve this issue with uWSGI.
But, this was the last straw in "configuring" uWSGI for me, I went back to gunicorn as the app server and everything works fine. I sure would like to use uWSGI as it is an ambitious project, but at the end of the day I am a developer and not a sys admin, and gunicorn is much easier to just get working in the common use cases.

Categories

Resources