Python "SyntaxError: Non-ASCII character '\xe2' in file" [duplicate] - python

This question already has answers here:
SyntaxError: Non-ASCII character '\xa3' in file when function returns '£'
(6 answers)
Closed 2 years ago.
I am writing some python code and I am receiving the error message as in the title, from searching this has to do with the character set.
Here is the line that causes the error
hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")
I cannot figure out what character is not in the ANSI ASCII set? Furthermore searching "\xe2" does not give anymore information as to what character that appears as. Which character in that line is causing the issue?
I have also seen a few fixes for this issue but I am not sure which to use. Could someone clarify what the issue is (python doesn't interpret unicode unless told to do so?), and how I would clear it up properly?
EDIT:
Here are all the lines near the one that errors
def createLoadBalancer():
conn = ELBConnection(creds.awsAccessKey, creds.awsSecretKey)
hc = HealthCheck("instance_health", interval=15, target808="HTTP:8080/index.html")
lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])
lb.configure_health_check(hc)
return lb

If you are just trying to use UTF-8 characters or don't care if they are in your code, add this line to the top of your .py file
# -*- coding: utf-8 -*-

You've got a stray byte floating around. You can find it by running
with open("x.py") as fp:
for i, line in enumerate(fp):
if "\xe2" in line:
print i, repr(line)
where you should replace "x.py" by the name of your program. You'll see the line number and the offending line(s). For example, after inserting that byte arbitrarily, I got:
4 "\xe2 lb = conn.create_load_balancer('my_lb', ['us-east-1a', 'us-east-1b'],[(80, 8080, 'http'), (443, 8443, 'tcp')])\n"

Or you could just simply use:
# coding: utf-8
at top of .py file

\xe2 is the '-' character, it appears in some copy and paste it uses a different equal looking '-' that causes encoding errors.
Replace the '-'(from copy paste) with the correct '-' (from you keyboard button).

Change the file character encoding,
put below line to top of your code always
# -*- coding: utf-8 -*-

I had the same error while copying and pasting a comment from the web
For me it was a single quote (') in the word
I just erased it and re-typed it.

Adding # coding=utf-8 line in first line of your .py file will fix the problem.
Please read more about the problem and its fix on below link, in this article problem and its solution is beautifully described : https://www.python.org/dev/peps/pep-0263/

I got this error for characters in my comments (from copying/pasting content from the web into my editor for note-taking purposes).
To resolve in Text Wrangler:
Highlight the text
Go the the Text menu
Select "Convert to ASCII"

Based on PEP 0263 -- Defining Python Source Code Encodings
Python will default to ASCII as standard encoding if no other
encoding hints are given.
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors)
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

I had the same issue and just added this to the top of my file (in Python 3 I didn't have the problem but do in Python 2
#!/usr/local/bin/python
# coding: latin-1

If it helps anybody, for me that happened because I was trying to run a Django implementation in python 3.4 with my python 2.7 command

I my case \xe2 was a ’ which should be replaced by '.
In general I recommend to convert UTF-8 to ASCII using e.g. https://onlineasciitools.com/convert-utf8-to-ascii
However if you want to keep UTF-8 you can use
#-*- mode: python -*-
# -*- coding: utf-8 -*-

After about a half hour of looking through stack overflow, It dawned on me that if the use of a single quote " ' " in a comment will through the error:
SyntaxError: Non-ASCII character '\xe2' in file
After looking at the traceback i was able to locate the single quote used in my comment.

I had this exact issue running the simple .py code below:
import sys
print 'version is:', sys.version
DSM's code above provided the following:
1 'print \xe2\x80\x98version is\xe2\x80\x99, sys.version'
So the issue was that my text editor used SMART QUOTES, as John Y suggested. After changing the text editor settings and re-opening/saving the file, it works just fine.

I am trying to parse that weird windows apostraphe and after trying several things here is the code snippet that works.
def convert_freaking_apostrophe(self,string):
try:
issuer_rename = string.decode('windows-1252')
except:
issuer_rename = string.decode('latin-1')
issuer_rename = issuer_rename.replace(u'’', u"'")
issuer_rename = issuer_rename.encode('ascii','ignore')
try:
os.rename(directory+"/"+issuer,directory+"/"+issuer_rename)
print "Successfully renamed "+issuer+" to "+issuer_rename
return issuer_rename
except:
pass
#HANDLING FOR FUNKY APOSTRAPHE
if re.search(r"([\x90-\xff])", issuer):
issuer = self.convert_freaking_apostrophe(issuer)

I fixed this using pycharm. At the bottom of pycharm you can see file encoding. I noticed that it is UT-8. I changed it to US-ASCII

I had the same issue but it was because I copied and pasted the string as it is.
Later when I manually typed the string as it is the error vanished.
I had the error due to the - sign. When I replaced it with manually inputting a - the error was solved.
Copied string 10 + 3 * 5/(16 − 4)
Manually typed string 10 + 3 * 5/(16 - 4)
you can clearly see there is a bit of difference between both the hyphens.
I think it's because of the different formatting used by different OS or maybe just different software.

For me the problem had caused due to "’" that symbol in the quotes. As i had copied the code from a pdf file it caused that error. I just replaced "’" by this "'".

If you want to spot what character caused this just assign the problematic variable to a string and print it in a iPython console.
In my case
In [1]: array = [[24.9, 50.5]​, [11.2, 51.0]] # Raises an error
In [2]: string = "[[24.9, 50.5]​, [11.2, 51.0]]" # Manually paste the above array here
In [3]: string
Out [3]: '[[24.9, 50.5]\xe2\x80\x8b, [11.2, 51.0]]' # Here they are!

for me, the problem was caused by typing my code into Mac Notes and then copied it from Mac Notes and pasted into my vim session to create my file. This made my single quotes the curved type. to fix it I opened my file in vim and replaced all my curved single quotes with the straight kind, just by removing and retyping the same character. It was Mac Notes that made the same key stroke produce the curved single quote.

I was unable to find what's the issue for long but later I realised that I had copied a line "UTC-12:00" from web and the hyphen/dash in this was causing the problem. I just wrote this "-" again and the problem got resolved.
So, sometimes the copy pasted lines also give errors. In such cases, just re-write the copy pasted code and it works. On re-writing, it would look like nothing got changed but the error will be gone.

Plenty of good solutions here.
One challenge not really addressed in any of them is how to visually identify certain hard-to-spot non-ASCII characters that resemble other plain ASCII ones. For example, en dashes can appear almost exactly like hyphens and curly quotes look a lot like straight quotes, depending on your text editor's font.
This one-liner, which should work on Mac or Linux, will strip characters not in the ASCII printable range and show you the differences side-by-side:
# assumes Bash shell; for Bourne shell (sh), rearrange as a pipe and
# give '-' as second argument to 'sdiff' instead
sdiff --suppress-common-lines script.py <(tr -cd '\11\12\15\40-\176' <script.py)
The characters \11, \12, and \15 are tab, newline, and carriage return, respectively, in octal; the remaining range is the visible ASCII characters. (hat tip)
Another tip gleaned from this SO thread uses an inverse character class consisting of anything not in the ASCII visible range, and highlights it:
grep --color '[^ -~]' script.py
This should also work fine with the macOS / BSD version of grep.

When I have a similar issue when reading text files i use...
f = open('file','rt', errors='ignore')

Related

Print Bangla in pythone

I declare some variables in Bangla without any syntax error.
But when I want to print it, its gives me the error.
SyntaxError: Non-UTF-8 code starting with '\xff' in file D:/Project/Python Tutorials Repo/condition/condition.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
This is my script it Github: https://github.com/banglaosc/condition/blob/master/condition.py
unicode UTF-8 change picture
Here in this image, you can see the red mark. In your code editor, you can see that your Unicode change with another format(UTF-16LE). try to convert with Unicode UTF-16LE to UTF-8 from the bottom right corner. It's working for me.
The encoding is at the bottom right in that PyCharm screenshot: UTF-16LE. PEP 263 says Python will assume a file is ASCII, but it looks like that's been switched to UTF-8 in Python 3.
Try switching the file encoding to UTF-8 by clicking the "UTF-16LE" at the bottom right. If that's not possible, declare the encoding at the top of the script like this:
# -*- coding: UTF-16LE -*-
EDIT: The following applies to python3.5+ this will not work in python2.7
When I run your code I do not get that error. Pycharm show it as an error, but the python interpreter has no issue with the characters. The exception actually raised when running this is TypeError because the variable জুকারবার্গ is an int, and you are trying to use the + on it with a string. The following will execute with no errors.
বিল_গেটস = 20
জুকারবার্গ = 30
ওয়ারেন_বাফেট = 35
ইলন_মাস্ক = 10
if বিল_গেটস > জুকারবার্গ:
print(str(বিল_গেটস) + "বেশি ধনি")
else:
print(str(জুকারবার্গ) + "বেশি ধনি")

Python SyntaxError: Non-UTF-8

I converted my Python script to a Mac.app (via py2app). I try to run it and get the following error:
SyntaxError: Non-UTF-8 code starting with '\xcf' in file
py2app/dist/myapp.app/Contents/MacOS/myapp on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
I visited the PEP website and added the following to the first two lines of my script:
#!/usr/bin/python
# -*- coding: utf-8 -*-
I have also put my code into various online tools (such as this one) to check whether there are any non-UTF-8 characters but I'm not getting any issues.
I did copy some text from an Excel file however there were no special symbols that I was aware of.
The script is approx 800 lines so is there a way of identifying the problem that doesn't involve manually scanning the script line-by-line?
EDIT
Not exactly a fix, but converting my script into an executable instead of a .app has fixed the issue and it now runs correctly.
Python 3 uses UTF-8 as default encoding. This simplify the codes you get from Internet (and other packages). \xcf in UTF-8 is valid only if the byte before has predefined values, which it is not the case: Non-UTF8 code starting mean this, it is not a valid start (first byte) of UTF8 codepoint encoding.
As you see in the comment, you may convert the file into UTF-8, many times you can ignore the initial encoding (often such errors are from comments, e.g. author name). you may convert it, e.g. on options in Saving As on your original editor.
As an alternate way, you can specify the encoding on the first few lines of your code, see PEP-263 on how to do it. Note: Python has hardcoded byte strings to check [because it has not idea of encoding], so try to copy exactly the string as in such document. I think such line # -*- coding: latin-1 -*- should be ok, but this could misinterpret some characters, so test your program. If you do no know the original encoding, the easier way it is to convert original source (because you should in any case check all strings in the source code, and check if you guessed the correct encoding).

SyntaxError: Non-ASCII character - Scrapy [duplicate]

Say I have a function:
def NewFunction():
return '£'
I want to print some stuff with a pound sign in front of it and it prints an error when I try to run this program, this error message is displayed:
SyntaxError: Non-ASCII character '\xa3' in file 'blah' but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details
Can anyone inform me how I can include a pound sign in my return function? I'm basically using it in a class and it's within the '__str__' part that the pound sign is included.
I'd recommend reading that PEP the error gives you. The problem is that your code is trying to use the ASCII encoding, but the pound symbol is not an ASCII character. Try using UTF-8 encoding. You can start by putting # -*- coding: utf-8 -*- at the top of your .py file. To get more advanced, you can also define encodings on a string by string basis in your code. However, if you are trying to put the pound sign literal in to your code, you'll need an encoding that supports it for the entire file.
Adding the following two lines at the top of my .py script worked for me (first line was necessary):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
First add the # -*- coding: utf-8 -*- line to the beginning of the file and then use u'foo' for all your non-ASCII unicode data:
def NewFunction():
return u'£'
or use the magic available since Python 2.6 to make it automatic:
from __future__ import unicode_literals
The error message tells you exactly what's wrong. The Python interpreter needs to know the encoding of the non-ASCII character.
If you want to return U+00A3 then you can say
return u'\u00a3'
which represents this character in pure ASCII by way of a Unicode escape sequence. If you want to return a byte string containing the literal byte 0xA3, that's
return b'\xa3'
(where in Python 2 the b is implicit; but explicit is better than implicit).
The linked PEP in the error message instructs you exactly how to tell Python "this file is not pure ASCII; here's the encoding I'm using". If the encoding is UTF-8, that would be
# coding=utf-8
or the Emacs-compatible
# -*- encoding: utf-8 -*-
If you don't know which encoding your editor uses to save this file, examine it with something like a hex editor and some googling. The Stack Overflow character-encoding tag has a tag info page with more information and some troubleshooting tips.
In so many words, outside of the 7-bit ASCII range (0x00-0x7F), Python can't and mustn't guess what string a sequence of bytes represents. https://tripleee.github.io/8bit#a3 shows 21 possible interpretations for the byte 0xA3 and that's only from the legacy 8-bit encodings; but it could also very well be the first byte of a multi-byte encoding. But in fact, I would guess you are actually using Latin-1, so you should have
# coding: latin-1
as the first or second line of your source file. Anyway, without knowledge of which character the byte is supposed to represent, a human would not be able to guess this, either.
A caveat: coding: latin-1 will definitely remove the error message (because there are no byte sequences which are not technically permitted in this encoding), but might produce completely the wrong result when the code is interpreted if the actual encoding is something else. You really have to know the encoding of the file with complete certainty when you declare the encoding.
Adding the following two lines in the script solved the issue for me.
# !/usr/bin/python
# coding=utf-8
Hope it helps !
You're probably trying to run Python 3 file with Python 2 interpreter. Currently (as of 2019), python command defaults to Python 2 when both versions are installed, on Windows and most Linux distributions.
But in case you're indeed working on a Python 2 script, a not yet mentioned on this page solution is to resave the file in UTF-8+BOM encoding, that will add three special bytes to the start of the file, they will explicitly inform the Python interpreter (and your text editor) about the file encoding.

How does the "magic lines(s)" in python work, when specifying encoding in python file?

At the start of a python file (first line) sometimes I read
# -*- coding: utf-8 -*-
and sometimes I read
# encoding: utf-8
Both lines seem to do the same thing: specifying utf8 as encoding for all the text put in the file.
I have to questions:
Why does this even work? I thought the interpreter ignores everything after a # because it invokes a comment.
What is the difference between the two lines above? Does the interpreter just ignore the -*-?
The two forms are equivalent. The -*- version is a special kind of comment that Emacs understands. See PEP 263 for more information.
If a comment like in either of these forms is one of the first two lines of a file, the interpreter will use the specified encoding to read the file.
It works because the implementation looks for it, there is nothing magical about it. There is no difference, all possible variants are defined by PEP 263 (the only difference is that the first one is Emacs-compatible).

Correct way to define Python source code encoding

PEP 263 defines how to declare Python source code encoding.
Normally, the first 2 lines of a Python file should start with:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
But I have seen a lot of files starting with:
#!/usr/bin/python
# -*- encoding: <encoding name> -*-
=> encoding instead of coding.
So what is the correct way of declaring the file encoding?
Is encoding permitted because the regex used is lazy? Or is it just another form of declaring the file encoding?
I'm asking this question because the PEP does not talk about encoding, it just talks about coding.
Check the docs here:
"If a comment in the first or second line of the Python script matches the regular expression coding[=:]\s*([-\w.]+), this comment is processed as an encoding declaration"
"The recommended forms of this expression are
# -*- coding: <encoding-name> -*-
which is recognized also by GNU Emacs, and
# vim:fileencoding=<encoding-name>
which is recognized by Bram Moolenaar’s VIM."
So, you can put pretty much anything before the "coding" part, but stick to "coding" (with no prefix) if you want to be 100% python-docs-recommendation-compatible.
More specifically, you need to use whatever is recognized by Python and the specific editing software you use (if it needs/accepts anything at all). E.g. the coding form is recognized (out of the box) by GNU Emacs but not Vim (yes, without a universal agreement, it's essentially a turf war).
Just copy paste below statement on the top of your program.It will solve character encoding problems
#!/usr/bin/env python
# -*- coding: utf-8 -*-
PEP 263:
the first or second line must match
the regular
expression "coding[:=]\s*([-\w.]+)"
So, "encoding: UTF-8" matches.
PEP provides some examples:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
# This Python file uses the following encoding: utf-8
import os, sys
As of today — June 2018
PEP 263 itself mentions the regex it follows:
To define a source code encoding, a magic comment must be placed into
the source files either as first or second line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
More precisely, the first or second line must match the following regular expression:
^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)
So, as already summed up by other answers, it'll match coding with any prefix, but if you'd like to be as PEP-compliant as it gets (even though, as far as I can tell, using encoding instead of coding does not violate PEP 263 in any way) — stick with 'plain' coding, with no prefixes.
If I'm not mistaken, the original proposal for source file encodings was to use a regular expression for the first couple of lines, which would allow both.
I think the regex was something along the lines of coding: followed by something.
I found this: http://www.python.org/dev/peps/pep-0263/
Which is the original proposal, but I can't seem to find the final spec stating exactly what they did.
I've certainly used encoding: to great effect, so obviously that works.
Try changing to something completely different, like duhcoding: ... to see if that works just as well.
I suspect it is similar to Ruby - either method is okay.
This is largely because different text editors use different methods (ie, these two) of marking encoding.
With Ruby, as long as the first, or second if there is a shebang line contains a string that matches:
coding: encoding-name
and ignoring any whitespace and other fluff on those lines. (It can often be a = instead of :, too).

Categories

Resources