Django: uploaded file encoding issues - python

On the development platform upload works just fine, but when I deploy and test on the server upload fails with the following error
UnicodeEncodeError at /upload
'ascii' codec can't encode characters in position 25-30: ordinal not in range(128)
I use
django-ajax-uploader,
Django version: 1.3.1,
Python version: 2.6
I believe it happens about files with Roman, Russian, Chinese filenames.
Also found similar discussion at
Why do I get a ASCII encoding error with Unicode data in Python 2.4 but not in 2.7?
but about differences about Python versions.
I tried to set environment $LANG variable to en_US.utf8 etc. but it didn't work.
Can anyone give me an advice or point a right way?
Thanks,
Sultan

See If you get a UnicodeEncodeError in the django docs.
Personally, I prefer to rename uploaded files to ASCII charset to avoid other problems as well. Here is a link to an article with code that describes subclassing FileSystemStorage .

Related

IEPY Package - Installation Error - Syntax Error - Python

I have some troubles in interpreting the documentation (http://iepy.readthedocs.io/en/latest/tutorial.html) commands of the Python IEPY package for Natural Language Processing. The first step to work with IEPY is to create an "instance" (as a non-programmer I have little idea what is it all about). They also provide the command to do it:
iepy --create <project_name>
My problem is that when I type the exact sentence into my command line, I get a "syntax error". I suppose that I do not follow some coding conventions and misinterpret what I should really type. Would be glad to hear what I do wrong.
Another source of the problem could be an improper installation of some of the additional libraries required for IEPY. After trying to install IEPY using
pip install iepy, I got a big fat error message after "collecting django-angular-0.7.8". The command line reports an "Exception", gives a reference to the line, where it happens and describes the source of an error (this is how I interpret the output at least). Which is:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position
4: invalid start byte
Here is the screenshot (cannot use copy-paste in my command line)
Installation of IEPY: The Error Message While Installing IEPY
P.S. It looks that the documentation in general is Linux-users oriented while I am using Windows 8.1. Is it the source of my troubles?

How to solve the UnicodeDecodeError when using stanford parser API in NLTK for python?

I want to use stanford parser using Python, I use Windows 7, I've installed Python 2.7 and nltk 3.0 and I downloaded the stanford parser from the official site.
I got the javahome environment problem which I solved, then I got this error message:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
0: ordinal not in range(128)
and I can't find a solution for this problem.
I used the next code :
# -*- coding: utf-8 -*-
from nltk.parse import stanford
parser = stanford.StanfordParser(model_path='C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz')
sent = 'my name is zim'
parser.parse(sent)
I've looked in stack overflow for a solution but I didn't find one.
If the os.environ or export paths are set properly as described in this: Stanford Parser and NLTK, then it should be an issue of
specifying the encoding in the NLTK API AND
the encoding of your input string
So the solution would be:
update NLTK to the latest stable version i.e. sudo pip install -U nltk
use python3!!!! or specify the encoding for your string
If you're somehow unable to update your python or NLTK, then:
specify the encoding when using Stanford API in NLTK (because of https://github.com/nltk/nltk/issues/877)
specify the encoding for your string (see How to output NLTK chunks to file?)
It is STRONGLY recommended that you use python3 especially when handling text inputs.
If all else fails, and you only have the old version of NLTK and you must somehow use py2.7, then:
import six
from nltk.parse import stanford
path_to_model = "C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz"
parser = stanford.StanfordParser(model_path=path_to_model, encoding='utf8')
sent = six.text_type('my name is zim')
parser.parse(sent)
See six docs # http://pythonhosted.org//six/#six.text_type
0xe9 isn't a valid ASCII byte, so your englishPCFG.ser.gz must not be ASCII encoded. You'll need to figure out what encoding it's using (probably UTF-8) and tell StanfordParser() about it with the encoding keyword argument.
I've found what was the problem that caused the error that I've encountered
raise OSError('Java command failed : ' + str(cmd)) OSError: Java command failed :...
This error is due to the bad interpretation of the address in the following instruction :
parser = stanford.StanfordParser(model_path='C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz').
Python or Java interpreted the ...\nlp\.. as \n lp\..., so as a result, it couldn't find the path.
I've tried a simple solution, I've renamed the folder nlp. And it worked!

In trying to install flask and virtualenv in PYTHON, there are two errors. 1. UnicodeDecodeError:'ascii' codec. (with REDIRECTION ISSUE) 2. OSError

As I described on the headline, I'm trying to install flask in python. After finishing installing virtualenv, in creating a virtual environment, when I enter 'virturalenv venv', I got two errors !
1. UnicodeDecodeError
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc7 in position 5: ordinal not in range(128)
I searched during a lot of time to find the solution, and I got a bunch of various solutions from other questions. And most of them is about encoding types such as 'cpc949', 'mbcs' and so on.
So I tried to replace 'ascii' to them(cpc949, mbcs or...), and I got a terrible error. The encoding type was redirected! Although I modified the encoding type in 'site.py' file, when I committed 'virtualenv venv' again at cmd in windows, the error shows again. After that, I checked site.py file again, the encoding type was redirected. It's odd.....
Any suggestions, comments or idea :) ?
2. OSError
OSError: Command C:\myproject\venv\Scripts\python.exe -c "import sys, pip; sys...d\"] + sys.argv[1:]))" setuptools pip failed with error code 2
with this, I have no idea at all. Who anybody has idea ??
.
And here is the photos of the errors :
https://drive.google.com/open?id=0BzJpcZZu99ZnMl9nQTFnS3VaSWM&authuser=0
https://drive.google.com/open?id=0BzJpcZZu99ZnOVh5RUtnSklTbDA&authuser=0
(because of my less reputation score)
Please help sir. Very thanks. Sincerely

Why LC_CTYPE needs to be set manually for Python in some cases

I am trying to use Django's Admin documentation. I followed this tutorial and installed docutils. After installing I run Django development server through python manage.py runserver and get error unknown locale: UTF-8.
I solved issue as explained in this question:
export LC_CTYPE=en_US.UTF-8
export LC_ALL=en_US.UTF-8
But my question is: What is origin of this problem? It seems docutils has some compatibility problem with Python or something else?
This is old issue, but still happening on OSX El Capitan. The origin of this problem is that Python assumes locale environment variable to be in format of language_region.encoding. This assumption is strict on Python's part as OSX defaults to UTF-8 when valid language and region pair is not available.
Lengthy discussion about this issue at bugs.python.org

UnicodeEncodeError with django: inconsistent behavior

I deployed a project on webfaction with djanog. All went fine until recently, when all of a sudden I started to get this error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 64-68: ordinal not in range(128)
The url is with Russian characters. But the matter is, when I restart Apache, there is no error any more. So it is kind of difficult for me to pin the error.
Read:
https://docs.djangoproject.com/en/dev/howto/deployment/wsgi/modwsgi/#if-you-get-a-unicodeencodeerror
Most likely you need to ensure that UTF-8 is set as the lang locale for the environment Apache runs under.
Otherwise you need to ensure you handle Unicode issues in your code yourself where appropriate.
This error comes because of filename or file contents cotains garbage collection or in another language (except english)..
So you can add unicode() for this. or check NLTK library of handling this situation.
I guess it has to do with webfaction or my incorrect dealing with Apache: actually, I had a restart Apache command in my crontab.
Found a similar question (dealing with Apache deliberately restarting), the webfaction guy suggested:
touch /path to /wsgi.py
instead of:
apache2/restart
after I replaced .../restart with the above line I have no more error messages.

Categories

Resources