Using urlencode for Devanagari text

Using urlencode for Devanagari text - python

The following code:
import simplejson,urllib,urllib2
query=[u'नेपाल']
urlbase="http://search.twitter.com/search.json"
values={'q':query[0]}
data=urllib.urlencode(values)
req=urllib2.Request(urlbase,data)
response=urllib2.urlopen(req)
json=simplejson.load(response)
print json
throws exception:
SyntaxError: Non-ASCII character '\xe0' in file ques.py on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
The code works if query contains standard ASCII characters. I tried looking at the suggested link but couldn't figure out how to specify encoding for Devanagari characters.

You need to add the UTF-8 header to your file to tell the Python interpreter that there are unicode literals. You also have to encode the parameters as UTF-8. Here's a working version:
# -*- coding: utf-8 -*-
import simplejson,urllib,urllib2
query=[u'नेपाल']
urlbase="http://search.twitter.com/search.json"
values={'q':query[0].encode('utf-8')}
data=urllib.urlencode(values)
req=urllib2.Request(urlbase,data)
response=urllib2.urlopen(req)
json=simplejson.load(response)
print json

Related

How do you represent this character in python? 

The empty box character shows up in a text, in place of a non-ascii character, so I need to replace that character in python, but I get an error:
SyntaxError: Non-ASCII character '\xef' in file pyautoGuiTiReg/main.py on line 197, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.
I am not sure what encoding to use as this is a very weird symbol:

As #AnshumanTiwari mentioned in the comments, utf-8 is the way to go when in doubt. By adding # -*- coding: utf-8 -*- to the top of the code, it ends up working perfectly.

For replace it try this
ToBytes = str.encode('yourString', encoding='utf-8')
ReplaceBytes = ToBytes.replace(b'\xef',b'').decode('utf-8')

python utf-8 encoding declaration

é character belongs to utf-8 as shown in:
https://www.utf8-chartable.de/unicode-utf8-table.pl
As official documentation (https://www.python.org/dev/peps/pep-0263/)
says:
'In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape"....'
I use Python 2.7.13
so in my code (as told in https://www.python.org/dev/peps/pep-0263/), I have tried successively (after #!/usr/bin/python)
# coding=utf-8
# -*- coding: utf-8 -*-
the last one also appears in the post solution Correct way to define Python source code encoding
but it still does not work:
SyntaxError: Non-ASCII character '\xc3' in file ./<file_name>.py on line 160, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Any ideas folks ?? thanx.

Encode Strings in Python

I want to run my code on terminal but it shows me this error :
SyntaxError: Non-ASCII character '\xd8' in file streaming.py on line
72, but no encoding declared; see http://python.org/dev/peps/pep-0263/
for detail
I tried to encode the Arabic string using this :
# -*- coding: utf-8 -*-
st = 'المملكة العربية السعودية'.encode('utf-8')
It's very important for me to run it on the terminal so I can't use IDLE.

The problem is since you are directly pasting your characters in to a python file, the interpreter (Python 2) attempts to read them as ASCII (even before you encode, it needs to define the literal), which is illegal. What you want is a unicode literal if pasting non-ASCII bytes:
x=u'المملكة العربية السعودية' #Or whatever the corresponding bytes are
print x.encode('utf-8')
You can also try to set the entire source file to be read as utf-8:
#/usr/bin/python
# -*- coding: utf-8 -*-
and don't forget to make it run-able, and lastly, you can import the future from Python 3:
from __future__ import unicode_literal
at the top of the file, so string literals by default are utf-8. Note that \xd8 appears as phi in my terminal, so make sure the encoding is correct.

Django - writing an hebrew string

I'm trying to send an hebrew string through parse rest API in a Django code
the code is fine - sending a string in english works perfectly
when the letters are in hebrew I get the following error:
Non-ASCII character '\xd7' but no encoding declared;
how can I set encoding programmatically for a specific line?

It's explained in the docs:
Python supports writing Unicode literals in any encoding, but you have
to declare the encoding being used. This is done by including a
special comment as either the first or second line of the source file
In your case:
# -*- coding: utf-8 -*-

Python unicode string literals in module declared as utf-8

I have a dummie Python module with the utf-8 header that looks like this:
# -*- coding: utf-8 -*-
a = "á"
print type(a), a
Which prints:
<type 'str'> á
But I thought that all string literals inside a Python module declared as utf-8 whould automatically be of type unicode, intead of str. Am I missing something or is this the correct behaviour?
In order to get a as an unicode string I use:
a = u"á"
But this doesn't seem very "polite", nor practical. Is there a better option?

# -*- coding: utf-8 -*-
doesn't make the string literals Unicode. Take this example, I have a file with an Arabic comment and string, file is utf-8:
# هذا تعليق عربي
print type('نص عربي')
if I run it it will throw a SyntaxError exception:
SyntaxError: Non-ASCII character '\xd9' in file file.py
on line 2, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
so to allow this I have to add that line to tell the interpreter that the file is UTF-8 encoded:
# -*-coding: utf-8 -*-
# هذا تعليق عربي
print type('نص عربي')
now it runs fine but it still prints <type 'str'> unless I make the string Unicode:
# -*-coding: utf-8 -*-
# هذا تعليق عربي
print type(u'نص عربي')

No, the codec at the top only informs Python how to interpret the source code, and uses that codec to interpret Unicode literals. It does not turn literal bytestrings into unicode values. As PEP 263 states:
This PEP proposes to introduce a syntax to declare the encoding of
a Python source file. The encoding information is then used by the
Python parser to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.
Emphasis mine.
Without the codec declaration, Python has no idea how to interpret non-ASCII characters:
$ cat /tmp/test.py
example = '☃'
$ python2.7 /tmp/test.py
File "/tmp/test.py", line 1
SyntaxError: Non-ASCII character '\xe2' in file /tmp/test.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
If Python behaved the way you expect it to, you would not be able to literal bytestring values that contain non-ASCII byte values either.
If your terminal is configured to display UTF-8 values, then printing a UTF-8 encoded byte string will look 'correct', but only by virtue of luck that the encodings match.
The correct way to get unicode values, is by using unicode literals or by otherwise producing unicode (decoding from byte strings, converting integer codepoints to unicode characters, etc.):
unicode_snowman = '\xe2\x98\x83'.decode('utf8')
unicode_snowman = unichr(0x2603)
In Python 3, the codec also applies to how variable names are interpreted, as you can use letters and digits outside of the ASCII range in names. The default codec in Python 3 is UTF-8, as opposed to ASCII in Python 2.

No this is just source code encoding. Please see http://www.python.org/dev/peps/pep-0263/
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors)
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
This doesn't make all literals unicode just point how unicode literals should be decoded.
One should use unicode function or u prefix to set literal as unicode.
N.B. in python3 all strings are unicode.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.