SyntaxError: Non-UTF-8 code starting with '\xe7'

SyntaxError: Non-UTF-8 code starting with '\xe7' - python

I made a program which contains chinese and russian words, but when I ran it, I had a problem with the encoding
In the code that I shared, a complete sentence with some Russian and Chinese characters is shown. With that variable assignment the SyntaxError arises. But when i write sentence=input(), when the user enters the same sentence no error appears.
sentence='n紙sнo頭q愛z語u買gлd娜xтgлj鳥u買gлcхd娜u買 рj鳥pщi魚d娜gлh園d娜gлn紙r無z語 рr無pщl電pщv書kмz語u買gлkмu買o頭d娜r無n紙r無d娜o頭pщh園z語gлh園d娜gлpщcхo頭z語gлu買kмwзd娜cхgлsнgлz語r無kмd娜u買o頭pщh園z語gлpщgлz語aчi魚d娜o頭z語xтgлv書z語u買gлd娜cхgлv書j鳥pщcхgлn紙z語h園d娜l電z語xтgлv書r無d娜pщr無gлo頭z語h園z語gлo頭kмn紙z語gлh園d娜gлpщn紙cхkмv書pщv書kмz語u買d娜xтgлd娜u買o頭r無d娜gлxтj鳥xтgлh園kмwзd娜r無xтz語xтgлo頭kмn紙z語xтgлh園d娜gлd娜xтo頭r無j鳥v書o頭j鳥r無pщxтgлh園d娜gлh園pщo頭z語xтgлxтd娜gлd娜u買v書j鳥d娜u買o頭r無pщgлh園kмv書v書kмz語u買pщr無kмz語xтgлh園d娜gлh園pщo頭z語xтgлd娜u買gлd娜xтo頭d娜gлo頭j鳥o頭z語r無kмpщcхgлpщn紙r無d娜u買h園d娜r無d娜l電z語xтgлpщgлj鳥o頭kмcхkмñсpщr無gлd娜xтo頭pщgлd娜xтo頭r無j鳥v書o頭j鳥r無pщgлr無d娜wзkмxтpщu買h園z語gлxтj鳥xтgлl電d娜o頭z語h園z語xтgлl電pщxтgлj鳥o頭kмcхkмñсpщh園z語xт'
SyntaxError: Non-UTF-8 code starting with '\xe5' in file hjs.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
How can I solve it?

First of all, welcome to Stack Overflow!
Second, you could solve your problem by using Python 3 or, for Python 2, following what is said in this answer.
But why?
Well, according to the aforementioned PEP 263,
Python will default to ASCII as standard encoding if no other encoding hints are given.
And in the PEP you can see the same thing that the mentioned answer says, to add the line # -*- coding: <encoding name> -*-
And why isn't Python 3 affected by this issue?
As said in here,
Since Python 3.0, the language’s str type contains Unicode characters(...)
So there is no need for adding the coding magic comment.
For more on that the full unicode article linked above is a great reading, and as it is a classic in StackOverflow, please see this.

Related

Why does only VS Code show "SyntaxError: Non-UTF-8 code starting with '\xe0'" when reading foreign characters, but only beyond certain length? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
So this is an odd question but I'm trying to process Bengali characters like খ ( I tried with Arabic و and Japanese 片 as well as well) on VS Code and all was going well until suddenly I got this error:
SyntaxError: Non-UTF-8 code starting with '\xe0' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Note: When using arabic character و and japanese character 片, I got similar errors but with different notation - "\xd9" and "\xe7" respectively.
My code is not the problem because it's simply text = [long foreign language text] and that itself gives me an error. However, I noticed, through some experimenting, that this was only producing an error if I exceeded 167 foreign language characters (for Japanese as well, but for arabic the threshold was higher).
To find that limit, I created a string (without spaces) of only খ and kept incrementing the number of characters till I got the error. At 167 characters (as per this character count website), everything worked fine. But as I added another character (total 168 characters), the above error was thrown.
The common answers to this question in other stackoverflow posts such as this and this don't seem to work for me. That is likely because this doesn't really sound like an encoding problem. If it was an encoding problem, it should have thrown an error regardless of the length of the string right?
I tried to replicate this in the Spyder IDE and it doesn't seem to have any such problems or limits. That leads me to believe this is a VS Code problem. Is anyone familiar with such issues or knows how to solve them in VS Code?
I like working in VS Code so I'd rather not have to change just for this.
My whole code if it matters:
# (167 Characters) Gives no error in VS Code
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
# (168 Characters) Gives error in VS Code but not in Spyder IDE
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
The traceback, incase it matters is:
File "filename.py", line 5
SyntaxError: Non-UTF-8 code starting with '\xe0' in file filename.py on line 16, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Edit
Tried with # coding: utf-8 in front but still caused an issue on my vscode.

Could you try to add this at the beginning of the file:
# coding:utf-8
Update:
Update:
It seems like the length of the character and even the variable name can cause the problem of Non-UTF-8 code starting with '\xe0' in xxx on line xxx, but no encoding declared;
It's confusing, I will get the error of Non-UTF-8 code starting with '\xe0' with these codes:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text3)
While this works, as I only change text to text5, without change anything others:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text5 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text5)
text3 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text3)
This does not work too:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
But if I only add some lines, it will work:
text2 = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text2)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
And this does not work too:
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
text = "খখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখখ"
print(text)
All of the problems have mentioned above, can be solved with # coding:utf-8 or # -*- coding: utf-8 -*-

I was able to make your error go away by specifing the encoding at the top of the file. Specifically, add this line to the top of your file:
# -*- coding: cp1252 -*-
By default python will use ascii as the standard encoding, but this line changes the encoding to cp1252. The cp1252 encoding standard is used for many European languages including Arabic. It looks like the default encoding for Japanese characters is shift-jis, but I have not tried this.

Python program is running in IDLE but not in command line

Before someone says this is a duplicate question, I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
I am trying to run a very short script in Python
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("http://dictionary.reference.com/browse/word?s=t").read().strip()
dhtml = str(html, "utf-8").strip()
soup = BeautifulSoup(dhtml.strip(), "html.parser")
print(soup.prettify())
But I keep getting an error when I run this program with python.exe. UnicodeEncodeError: 'charmap' codec can't encode character '\u025c. I have tried a lot of methods to get around this, but I managed to isolate it to the problem of converting bytes to strings. When I run this program in IDLE, I get the HTML as expected. What is it that IDLE is automatically doing? Can I use IDLE's interpretation program instead of python.exe? Thanks!
EDIT:
My problem is caused by print(soup.prettify()) but type(soup.prettify()) returns str?
RESOLVED:
I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

UnicodeEncodeError: 'charmap' codec can't encode character '\u025c'
The console character encoding can't represent '\u025c' i.e., "ɜ" Unicode character (U+025C LATIN SMALL LETTER REVERSED OPEN E).
What is it that IDLE is automatically doing?
IDLE displays Unicode directly (only BMP characters) if the corresponding font supports given Unicode characters.
Can I use IDLE's interpretation program instead of python.exe
Yes, run:
T:\> py -midlelib -r your_script.py
Note: you could write arbitrary Unicode characters to the Windows console if Unicode API is used:
T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?

I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
Not really. You have PrintFails like everyone else.
The Windows console can't print Unicode. (This isn't strictly true, but going into exactly why, when and how you can get Unicode out of the console is a painful exercise and not usually worth it.) Trying to print a character that isn't in the console's limited encoding can't work, so Python gives you an error.
print them out (which I need an easier solution to because I cannot do .encode("utf-8") for a lot of elements
You could run the command set PYTHONIOENCODING=utf-8 before running the script to tell Python to use and encoding which can include any character (so no errors), but any non-ASCII output will still come out garbled as its encoding won't match the console's actual code page.
(Or indeed just use IDLE.)

I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

How to find right encoding in python? [duplicate]

This question already has answers here:
How to determine the encoding of text
(16 answers)
Closed 5 years ago.
I'm trying to get rid of diacritics in my textfile. I converted a pdf to text with a tool, not made by myself. I wasn't able to understand which encoding they use. The text is written in Nahuatl, orthographically familiar with Spanish.
I transformed the text into a list of strings. No I'm trying to do the following:
# check whether there is a not-ascii character in the item
def is_ascii(word):
check = string.ascii_letters + "."
if word not in check:
return False
return True
# if there is a not ascii-character encode the string
def to_ascii(word):
if is_ascii(word) == False:
newWord = word.encode("utf8")
return newWord
return word
What I want to get is a unicode-version of my string. It doesn't work so far and I tried several encodings like latin1, cp1252, iso-8859-1. What I get is Can anybody tell me what I did wrong?
How can I find out the right encoding?
Thank you!
EDIT:
I wrote to the people that developed the converter (pdf-txt) and they said they were using unicode already. So John Machin was right with (1) in his answer.
As I wrote in some comment that wasn't clear to me, because in the Eclipse debugger the list itself showed some signs in unicodes, others not. And if I looked at the items seperately they were all decoded in some way, so that I actually saw unicode.
Thank you for your help!

Edit your question to show the version of Python you are using. Guessing the version from your code is not possible. Whether you are using Python 3.X or 2.X matters a lot. Following remarks assume Python 2.x.
You already seem to have determined that you have UTF-8 encoded text. Try the_text.decode('utf8'). Note decode, NOT encode.
If decoding with UTF-8 does not raise UnicodeDecodeError and your text is not trivially short, then it is very close to certain that UTF-8 is the correct encoding.
If the above does not work, show us the result of print repr(the_text).
Note that it is counter-productive trying to check whether the file is encoded in ASCII -- ASCII is a subset of UTF-8. Leaving some data as str objects and other as unicode is messy in Python 2.x and won't work in Python 3.X
In any case, your first function doesn't do what you think it does; it returns False for any input string whose length is 2 or more. Please consider unit-testing functions as you write them; it makes debugging much faster later on.
Note that latin1 and iso-8859-1 are the same encoding. As latin1 encodes the first 256 codepoints in Unicode in the same order, then it is impossible to get UnicodeDecodeError raised by text.decode('latin1'). "No error" is this case has exactly zero diagnostic value.
Update in response to this comment from OP:
I use Python 2.7. If I use text.decode("utf8") it raises the following
error: UnicodeEncodeError: 'latin-1' codec can't encode character
u'\u2014' in position 0: ordinal not in range(256).
That can happen two ways:
(1) In a single statement like foo = text.decode('utf8'), text is already a unicode object so Python 2.X tries to encode it using the default encoding (latin-1 ???).
(2) Possibly in two different statements, first foo = text.decode('utf8') where text is an str object encoded in UTF-8, and this statement doesn't raise an error, followed by something like print foo and your sys.stdout.encoding is latin-1 (???).
I can't imagine why you have "ticked" my answer as correct. Nobody knows what the question is yet!
Please edit your question to show your code (insert print repr(text) just before the text.decode("utf8") line), and the result of running it. Show the repr() result and the full traceback (so that we can determine which line is causing the error).
I ask again: can you make your file available for analysis?
By the way, u'\u2014' is an "EM DASH" and is a valid character in cp1252 (but not in latin-1, as you have seen from the error message). What version of what operating system are you using?
And to answer your last question, NO, you must NOT attempt to decode your text using every codec in the known universe. You are ALREADY getting plausible Unicode; something (your code?) is decoding something somehow -- the presence of u'\u2014' is enough evidence of that. Just show us your code and its results.

If you have read some bytes and want to interpret them as an unicode string, then you have to use .decode() rather than encode().
Like #delnan said in the comment, I hope you know the encoding. If not, the guesswork should go easy once you fix the function used.
BTW even if there are only ASCII characters in that word, why not .decode() it too? You'd have the same data type (unicode) everywhere, which will make your program simpler.

My first step in Python

I'm trying to start learning Python, but I became confused from the first step.
I'm getting started with Hello, World, but when I try to run the script, I get:
Syntax Error: Non-UTF-8 code starting with '\xe9' in file C:\Documents and Settings\Home\workspace\Yassine frist stared\src\firstModule.py on line 5 but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details.

add to the first line is
# -*- coding: utf-8 -*-

Put as the first line of your program this:
# coding: utf-8
See also Correct way to define Python source code encoding

First off, you should know what an encoding is. Read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Now, the problem you are having is that most people write code in ASCII. Roughly speaking, that means that they use Latin letters, numerals and basic punctuation only in the code files themselves. You appear to have used a non-ASCII character code inside your program, which is confusing Python.
There are two ways to fix this. The first is to tell Python with what encoding you would like it to read the text file. You can do that by adding a # coding declaration at the top of the tile. The second, and probably better, is to restrict yourself to ASCII code. Remember that you can always have whatever characters you like inside strings, by writing them in their encoded form as e.g. \x00 or whatever.

When you run Python through the interpreter, you must run it in this format: python filename.py (command line args) or you will also get this error. I made the comment because you mentioned you were a beginner.

Lexical error: Encountered: "" (0), after : ""

I needed to start dealing with foreign characters, and in doing so, I think I royally screwed up a file's encoding.
The error I'm getting is:
Lexical error at line 1, column 8. Encountered: "" (0), after : ""
The first line of the file is:
import xml.etree.cElementTree as ET
Also of note: when I pasted the line above into the textarea to ask this question, and submitted, an unknown character appeared between every character (e
I have been unable to fix this issue by adding an explicit coding definition:
# -*- coding: utf-8 -*-
I have also been unable to revert the file (using Hg) to a previous version, nor copy/paste code into a new file, or replace the broken file with copied/pasted code.
Please help!

If it is indeed a zero character in there, you may find you've injected some UTF-16/UCS-2 text. That particular Unicode encoding would have a zero byte in between every ASCII character.
The best way to find out is to do a hex dump of you file with something like od -xcb myfile.py.
If that is the case, then you'll need to edit the file with something that's able to see those characters, and fix them up.
vi would be my first choice (since that's what I'm used to) but I don't want to start any holy wars with the Emacs illuminati. In vi, they'll most likely show up as ^# characters.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.