Cannot decode/encode in UTF-8

Cannot decode/encode in UTF-8 - python

I have a text-box which allows users to enter a word.
The user enters: über
In the backend, I get the word like this:
def form_process(request):
word = request.GET.get('the_word')
word = word.encode('utf-8')
#word = word.decode('utf-8')
print word
For some reason, I cannot decode or encode this!!
It gives me the error:
UnicodeEncodeError
('ascii', u'\ufffd', 0, 1, 'ordinal not in range(128)')
Edit: When I do "repr(word)", this is what I get:
u'\ufffd'

Did you remember to put:
accept-charset="utf-8"
in the form tag?
EDIT: Is the DEFAULT_CHARSET in settings.py set to 'utf-8' ?

Solved!
I had escape(word) ...in the javascript ...before I passed it to the server.

Is there any reason to use print word? If not, its should work without those lines.
def form_process(request):
word = request.GET.get('the_word')

Related

jinja 2 passing arabic to rander template

hi i am using jinja2 in google app engine for rendering template but when im passinf arabic or persian string az template variable i get this error
فروشگاه {{ name }}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)
below i've tried to encode it so that it would be jinja acceptable but the string doesn't appear at all
def deccode(n):
n = n.decode("utf-8")
n = n.encode("ascii","ignore")
return n
name = 'رشد'
name = deccode(name)
logo = 'roshd'
logo = deccode(logo)
ss = {'name': name, 'logo': logo}
s = template.render(ss)
<div class=" title">
<i class="dropdown icon"></i>
فروشگاه
so whats the best way to pass arabic to jinja 2 ?

Make sure to pass a unicode string to the template. Assuming you are on Python 2 this means prefixing the string literal with u:
name = u'رشد'
Also, get rid of your custom decode function. It's not needed. Make sure to save the file as UTF-8 though and add a comment in the first line of the file indicating the encoding of the file as mentioned in #manikandan's answer and PEP 263

Follow https://www.python.org/dev/peps/pep-0263/
Include this line at the beginning of python file
# -*- coding: utf-8 -*-
Maybe you can also remove your custom deccode() function

Invalid continuation byte saving cipher

I have the following function to create cipher text and then save it:
def create_credential(self):
des = DES.new(CIPHER_N, DES.MODE_ECB)
text = str(uuid.uuid4()).replace('-','')[:16]
cipher_text = des.encrypt(text)
return cipher_text
def decrypt_credential(self, text):
des = DES.new(CIPHER_N, DES.MODE_ECB)
return des.decrypt(text)
def update_access_credentials(self):
self.access_key = self.create_credential()
print repr(self.access_key) # "\xf9\xad\xfbO\xc1lJ'\xb3\xda\x7f\x84\x10\xbbv&"
self.access_password = self.create_credential()
self.save()
And I will call:
>>> from main.models import *
>>> u=User.objects.all()[0]
>>> u.update_access_credentials()
And this is the stacktrace I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 738: invalid start byte
Why is this occurring and how would I get around it?

You are storing a bytestring into a Unicode database field, so it'll try and decode to Unicode.
Either use a database field that can store opaque binary data, decode explicitly to Unicode (latin-1 maps bytes one-on-one to Unicode codepoints) or wrap your data into a representation that can be stored as text.
For Django 1.6 and up, use a BinaryField, for example. For earlier versions, using a binary-to-text conversion (such as Base64) would be preferable over decoding to Latin-1; the result of the latter would not give you meaningful textual data but Django may try to display it as such (in the admin interface for example).

It's occurring because you're attempting to save non-text data in a text field. Either use a non-text field instead, or encode the data as text via e.g. Base-64 encoding.

Using base64 encoding and decoding here fixed this:
import base64
def create_credential(self):
des = DES.new(CIPHER_N, DES.MODE_ECB)
text = str(uuid.uuid4()).replace('-','')[:16]
cipher_text = des.encrypt(text)
base64_encrypted_message = base64.b64encode(cipher_text)
return base64_encrypted_message
def decrypt_credential(self, text):
text = base64.b64decode(text)
des = DES.new(CIPHER_N, DES.MODE_ECB)
message = des.decrypt(text)
return message

Python convert file content to unicode form

For example, I have a file a.js whose content is:
Hello, 你好, bye.
Which contains two Chinese characters whose unicode form is \u4f60\u597d
I want to write a python program which convert the Chinese characters in a.js to its unicode form to output b.js, whose content should be: Hello, \u4f60\u597d, bye.
My code:
fp = open("a.js")
content = fp.read()
fp.close()
fp2 = open("b.js", "w")
result = content.decode("utf-8")
fp2.write(result)
fp2.close()
but it seems that the Chinese characters are still one character , not an ASCII string like I want.

>>> print u'Hello, 你好, bye.'.encode('unicode-escape')
Hello, \u4f60\u597d, bye.
But you should consider using JSON, via json.

You can try codecs module
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
a = codecs.open("a.js", "r", "cp936").read() # a is a unicode object
codecs.open("b.js", "w", "utf16").write(a)

There two ways you can use.
first one, use 'encode' method
str1 = "Hello, 你好, bye. "
print(str1.encode("raw_unicode_escape"))
print(str1.encode("unicode_escape"))
Also you can use 'codecs' module：
import codecs
print(codecs.raw_unicode_escape_encode(str1))

I found that repr(content.decode("utf-8")) will return "u'Hello, \u4f60\u597d, bye'"
so repr(content.decode("utf-8"))[2:-1] will do the job

you can use repr:
a = u"Hello, 你好, bye. "
print repr(a)[2:-1]
or you can use encode method:
print a.encode("raw_unicode_escape")
print a.encode("unicode_escape")

How can I understand this python error message?

Hi can you help me decode this message and what to do:
main.py", line 1278, in post
message.body = "%s %s/%s/%s" % (msg, host, ad.key().id(), slugify(ad.title.encode('utf-8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Thanks
UPDATE having tried removing the encode call it appears to work:
class Recommend(webapp.RequestHandler):
def post(self, key):
ad= db.get(db.Key(key))
email = self.request.POST['tip_email']
host = os.environ.get("HTTP_HOST", os.environ["SERVER_NAME"])
senderemail = users.get_current_user().email() if users.get_current_user() else 'info#monton.cl' if host.endswith('.cl') else 'info#monton.com.mx' if host.endswith('.mx') else 'info#montao.com.br' if host.endswith('.br') else 'admin#koolbusiness.com'
message = mail.EmailMessage(sender=senderemail, subject="%s recommends %s" % (self.request.POST['tip_name'], ad.title) )
message.to = email
message.body = "%s %s/%s/%s" % (self.request.POST['tip_msg'],host,ad.key().id(),slugify(ad.title))
message.send()
matched_images=ad.matched_images
count = matched_images.count()
if ad.text:
p = re.compile(r'(www[^ ]*|http://[^ ]*)')
text = p.sub(r'\1',ad.text.replace('http://',''))
else:
text = None
self.response.out.write("Message sent<br>")
path = os.path.join(os.path.dirname(__file__), 'market', 'market_ad_detail.html')
self.response.out.write(template.render(path, {'user_url':users.create_logout_url(self.request.uri) if users.get_current_user() else users.create_login_url(self.request.uri),
'user':users.get_current_user(), 'ad.user':ad.user,'count':count, 'ad':ad, 'matched_images': matched_images,}))

The problem here is your underlying model (message.body) only wants ASCII text but you're trying to give it a string encoded in unicode.
But since you've got a normal ascii string here, you can just make python print out the '?' character when you've got a non-ascii-printing string.
"UNICODE STRING".encode('ascii','replace').decode('ascii')
So like from your example above:
message.body = "%s %s/%s/%s" % \
(msgencode('ascii','replace').decode('ascii'),
hostencode('ascii','replace').decode('ascii'),
ad.key().id()encode('ascii','replace').decode('ascii'),
slugify(ad.title)encode('ascii','replace').decode('ascii'))
Or just encode/decode on the variable that has the unicode character.
But this isn't an optimal solution. The best idea is to make message.body a unicode string. Being that doesn't seem feasible (I'm not familiar with GAE), you can use this to at least not have errors.

You've got a Unicode character in a place that you're not supposed to. Most often I find this error is having MS Word-style slanted quotes.

One of these fields has some characters that cannot be encoded. If you switch to python 3 (it has better unicode support), or you change the encoding of the entire script the problem should stop, about the best way to change the encoding in 2.x is using the encoding comment line. If you see http://evanjones.ca/python-utf8.html you will see more of an explanation of using python with utf-8 support the best suggestion is add # -*- coding: utf-8 -*- to the top of your script. And handle scripts like this
s = "hello normal string"
u = unicode( s, "utf-8" )
backToBytes = u.encode( "utf-8" )

I had a similar problem when using Django norel and Google App Engine.
The problem was at the folder containing the application. Probably isn't this the problem described in this question, but, maybe helps someone don't waste time like me.
Try first change you application folder maybe to /home/ and try to run again, if doesn't works, try something more.

Unicode with active_directory

I'm using the active_directory module, and trying to print a list of the users. My code is:
import active_directory as ad
users = ad.AD_Object("LDAP://OU=Home, DC=dome, DC=net")
for user in users.search(objectCategory="Person"):
print str(user)
It prints some of the users until it meets an unicode username. Then it throws the following error:
UnicodeEncodeError: ascii codec
can't encode characthers in position
10-14: ordinaal not in range(128).
What can I do?
Thank you very much.

Try:
print user.decode('utf-8')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot decode/encode in UTF-8 - python

Did you remember to put: accept-charset="utf-8" in the form tag? EDIT: Is the DEFAULT_CHARSET in settings.py set to 'utf-8' ?

Solved! I had escape(word) ...in the javascript ...before I passed it to the server.

Is there any reason to use print word? If not, its should work without those lines. def form_process(request): word = request.GET.get('the_word')

Related

jinja 2 passing arabic to rander template

Invalid continuation byte saving cipher

Python convert file content to unicode form

How can I understand this python error message?

Unicode with active_directory

Categories

Resources