I needed to start dealing with foreign characters, and in doing so, I think I royally screwed up a file's encoding.
The error I'm getting is:
Lexical error at line 1, column 8. Encountered: "" (0), after : ""
The first line of the file is:
import xml.etree.cElementTree as ET
Also of note: when I pasted the line above into the textarea to ask this question, and submitted, an unknown character appeared between every character (e
I have been unable to fix this issue by adding an explicit coding definition:
# -*- coding: utf-8 -*-
I have also been unable to revert the file (using Hg) to a previous version, nor copy/paste code into a new file, or replace the broken file with copied/pasted code.
Please help!
If it is indeed a zero character in there, you may find you've injected some UTF-16/UCS-2 text. That particular Unicode encoding would have a zero byte in between every ASCII character.
The best way to find out is to do a hex dump of you file with something like od -xcb myfile.py.
If that is the case, then you'll need to edit the file with something that's able to see those characters, and fix them up.
vi would be my first choice (since that's what I'm used to) but I don't want to start any holy wars with the Emacs illuminati. In vi, they'll most likely show up as ^# characters.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
So this is an odd question but I'm trying to process Bengali characters like খ ( I tried with Arabic و and Japanese 片 as well as well) on VS Code and all was going well until suddenly I got this error:
SyntaxError: Non-UTF-8 code starting with '\xe0' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Note: When using arabic character و and japanese character 片, I got similar errors but with different notation - "\xd9" and "\xe7" respectively.
My code is not the problem because it's simply text = [long foreign language text] and that itself gives me an error. However, I noticed, through some experimenting, that this was only producing an error if I exceeded 167 foreign language characters (for Japanese as well, but for arabic the threshold was higher).
To find that limit, I created a string (without spaces) of only খ and kept incrementing the number of characters till I got the error. At 167 characters (as per this character count website), everything worked fine. But as I added another character (total 168 characters), the above error was thrown.
The common answers to this question in other stackoverflow posts such as this and this don't seem to work for me. That is likely because this doesn't really sound like an encoding problem. If it was an encoding problem, it should have thrown an error regardless of the length of the string right?
I tried to replicate this in the Spyder IDE and it doesn't seem to have any such problems or limits. That leads me to believe this is a VS Code problem. Is anyone familiar with such issues or knows how to solve them in VS Code?
I like working in VS Code so I'd rather not have to change just for this.
My whole code if it matters:
# (167 Characters) Gives no error in VS Code
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
# (168 Characters) Gives error in VS Code but not in Spyder IDE
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
The traceback, incase it matters is:
File "filename.py", line 5
SyntaxError: Non-UTF-8 code starting with '\xe0' in file filename.py on line 16, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Edit
Tried with # coding: utf-8 in front but still caused an issue on my vscode.
Could you try to add this at the beginning of the file:
# coding:utf-8
Update:
Update:
It seems like the length of the character and even the variable name can cause the problem of Non-UTF-8 code starting with '\xe0' in xxx on line xxx, but no encoding declared;
It's confusing, I will get the error of Non-UTF-8 code starting with '\xe0' with these codes:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text3)
While this works, as I only change text to text5, without change anything others:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text5 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text5)
text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text3)
This does not work too:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
But if I only add some lines, it will work:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
And this does not work too:
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
All of the problems have mentioned above, can be solved with # coding:utf-8 or # -*- coding: utf-8 -*-
I was able to make your error go away by specifing the encoding at the top of the file. Specifically, add this line to the top of your file:
# -*- coding: cp1252 -*-
By default python will use ascii as the standard encoding, but this line changes the encoding to cp1252. The cp1252 encoding standard is used for many European languages including Arabic. It looks like the default encoding for Japanese characters is shift-jis, but I have not tried this.
I made a program which contains chinese and russian words, but when I ran it, I had a problem with the encoding
In the code that I shared, a complete sentence with some Russian and Chinese characters is shown. With that variable assignment the SyntaxError arises. But when i write sentence=input(), when the user enters the same sentence no error appears.
sentence='n紙sнo頭q愛z語u買gлd娜xтgлj鳥u買gлcхd娜u買 рj鳥pщi魚d娜gлh園d娜gлn紙r無z語 рr無pщl電pщv書kмz語u買gлkмu買o頭d娜r無n紙r無d娜o頭pщh園z語gлh園d娜gлpщcхo頭z語gлu買kмwзd娜cхgлsнgлz語r無kмd娜u買o頭pщh園z語gлpщgлz語aчi魚d娜o頭z語xтgлv書z語u買gлd娜cхgлv書j鳥pщcхgлn紙z語h園d娜l電z語xтgлv書r無d娜pщr無gлo頭z語h園z語gлo頭kмn紙z語gлh園d娜gлpщn紙cхkмv書pщv書kмz語u買d娜xтgлd娜u買o頭r無d娜gлxтj鳥xтgлh園kмwзd娜r無xтz語xтgлo頭kмn紙z語xтgлh園d娜gлd娜xтo頭r無j鳥v書o頭j鳥r無pщxтgлh園d娜gлh園pщo頭z語xтgлxтd娜gлd娜u買v書j鳥d娜u買o頭r無pщgлh園kмv書v書kмz語u買pщr無kмz語xтgлh園d娜gлh園pщo頭z語xтgлd娜u買gлd娜xтo頭d娜gлo頭j鳥o頭z語r無kмpщcхgлpщn紙r無d娜u買h園d娜r無d娜l電z語xтgлpщgлj鳥o頭kмcхkмñсpщr無gлd娜xтo頭pщgлd娜xтo頭r無j鳥v書o頭j鳥r無pщgлr無d娜wзkмxтpщu買h園z語gлxтj鳥xтgлl電d娜o頭z語h園z語xтgлl電pщxтgлj鳥o頭kмcхkмñсpщh園z語xт'
SyntaxError: Non-UTF-8 code starting with '\xe5' in file hjs.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
How can I solve it?
First of all, welcome to Stack Overflow!
Second, you could solve your problem by using Python 3 or, for Python 2, following what is said in this answer.
But why?
Well, according to the aforementioned PEP 263,
Python will default to ASCII as standard encoding if no other encoding hints are given.
And in the PEP you can see the same thing that the mentioned answer says, to add the line # -*- coding: <encoding name> -*-
And why isn't Python 3 affected by this issue?
As said in here,
Since Python 3.0, the language’s str type contains Unicode characters(...)
So there is no need for adding the coding magic comment.
For more on that the full unicode article linked above is a great reading, and as it is a classic in StackOverflow, please see this.
Working on a german words (sometimes containing Umlaut characters) in an Excel2007 spreadsheet (I use xlrd xlwt and openpyxl), I get the following value:
var = str(ws.cell(row=i+k,column=0).value).encode('latin-1')
I get with print(var):
'[a word')
until coming on a word containing Umlaut characters, when I get:
Traceback (most recent call last):
File "C:\Users\cristina\Documents\horia\Linguistics3\px t3.py", line 68, in <module>
var = str(ws4.cell(row=i+k,column=0).value).encode('latin-1')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 3:ordinal not in range(128)
And the program stops.
If I define var as:
var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1')
I get, when hen trying to print(var), I get:
var=str(ws.cell(row=i+k,column=0).value)
The program runs normally until the end
I can get the value of var in Python Shell, but not by "print(var)" in the program.
Can anybody give me a solution?
First of all, read this: http://www.joelonsoftware.com/articles/Unicode.html (seriously)
Then, understand that Python2 has two distinct data-types:
unicode, for "agnostic" handing all possible characters, but which can nt be used in
input/output, such as "print" or writing to files, without being encoded into the
other data type: strings.
Strings are encoding-dependent.
What I am almost sure is going on there, given your error message, is that the ws4.cell(row=i+k,column=0).value call is returning you a unicode value. (I can't test it on my non-windows environment here) - to be sure instead of guess work, you may want to run things there once with
print (type(ws4.cell(row=i+k,column=0).value) just to assert you are getting unicode values.
Thus, when you do str(ws4.(...).value) you are telling Python to just convert unicode to str without any encoding - that is the call that raises your error, not the subsequent "decode" call.
If that is what is going on, simply replace that str call for unicode:
var = u'str(ws4.cell(row=i+k,column=0).value)'.encode('latin-1')
That should fix your problem. I hope you've read the article I linked above - it is helpful.
Also, mark your Python source code with the corresponding encoding you are using - otherwise
you will get an error on any non-ASCII char in your source code.
For example, write this on the very first line of your code:
# coding: latin1
(Although for any serious project you should be using utf-8 instead.)
I'm trying to start learning Python, but I became confused from the first step.
I'm getting started with Hello, World, but when I try to run the script, I get:
Syntax Error: Non-UTF-8 code starting with '\xe9' in file C:\Documents and Settings\Home\workspace\Yassine frist stared\src\firstModule.py on line 5 but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.
add to the first line is
# -*- coding: utf-8 -*-
Put as the first line of your program this:
# coding: utf-8
See also Correct way to define Python source code encoding
First off, you should know what an encoding is. Read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Now, the problem you are having is that most people write code in ASCII. Roughly speaking, that means that they use Latin letters, numerals and basic punctuation only in the code files themselves. You appear to have used a non-ASCII character code inside your program, which is confusing Python.
There are two ways to fix this. The first is to tell Python with what encoding you would like it to read the text file. You can do that by adding a # coding declaration at the top of the tile. The second, and probably better, is to restrict yourself to ASCII code. Remember that you can always have whatever characters you like inside strings, by writing them in their encoded form as e.g. \x00 or whatever.
When you run Python through the interpreter, you must run it in this format: python filename.py (command line args) or you will also get this error. I made the comment because you mentioned you were a beginner.
I'm using python for S60.
I want to use string in hebrew, to represent them on the GUI and to send them in SMS message.
It seems that the PythonScriptShell don't accept such expressions, for example:
u"אבגדה"
what can I do?
thanks
development of situation:
I added the line:
# -*- coding: utf-8 -*-
as the first line in the source file and in notepad++ I selected: Encoding>>Convert to utf8.
now, the GUI appears in Hebrew but when I selected an option the selection value cannot be compared to a string in Hebrew in the code (probably) and there is no response.
On PythonScriptShell appears the warning:
Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal.
Help me, please.
I just tested this in both bluetooth and on-phone consoles with PyS60 2.0, and non-ASCII unicode was handled w/out exceptions.
If you have that string in the file rather than passing it in the console, error is caused by lack of encoding specification in the file.
Add # -*- coding: utf-8 -*- as first line there.
convert your words to unicode characters using
unichr
eg unichr(1507) for char ף
refer to the decimal values in this table: http://www.ssec.wisc.edu/~tomw/java/unicode.html#x0590
Add up
ru = lambda txt: str(txt).decode('utf-8','ignore')
And add the function before each text use
ru("אבגדה")