Where does this come from: -*- coding: utf-8 -*-

Where does this come from: -*- coding: utf-8 -*- - python

Python recognizes the following as instruction which defines file's encoding:
# -*- coding: utf-8 -*-
I definitely saw this kind of instructions before (-*- var: value -*-). Where does it come from? What is the full specification, e.g. can the value include spaces, special symbols, newlines, even -*- itself?
My program will be writing plain text files and I'd like to include some metadata in them using this format.

This way of specifying the encoding of a Python file comes from PEP 0263 - Defining Python Source Code Encodings.
It is also recognized by GNU Emacs (see Python Language Reference, 2.1.4 Encoding declarations), though I don't know if it was the first program to use that syntax.

# -*- coding: utf-8 -*- is a Python 2 thing.
In Python 3.0+ the default encoding of source files is already UTF-8 so you can safely delete that line, because unless it says something other than some variation of "utf-8", it has no effect. See Should I use encoding declaration in Python 3?
pyupgrade is a tool you can run on your code to remove those comments and other useless leftovers from Python 2, like having all your classes inherit from object.

This is so called file local variables, that are understood by Emacs and set correspondingly. See corresponding section in Emacs manual - you can define them either in header or in footer of file

In PyCharm, I'd leave it out. It turns off the UTF-8 indicator at the bottom with a warning that the encoding is hard-coded. Don't think you need the PyCharm comment mentioned above.

Related

Setting default encoding Openerp/Python

Do you guys know how to change the default encoding of an openerp file?
I've tried adding # -*- coding: utf-8 -*- but it doesn't work (is there a setup that ignore this command? just a wild guess). When I try to execute sys.getdefaultencoding() still its in ASCII.
Regards

The comment # -*- coding: utf-8 -*- tells the python parser the encoding of the source file. It affects how the bytecode compiler converts unicode literals in the source code. It has no effect on the runtime environment.
You should explicitly define the encoding when converting strings to unicode. If you are getting UnicodeDecodeError, post your problem scenario and I'll try to help.

Python, Emacs and Encoding

I have trouble with Emacs+Python 2.7.1+Encoding. According to PEP 0263, Python uses the same declaration of source encoding as emacs does.
There is no problem when I start my Python source code script with the following encoding tag:
#!/usr/bin/python
# -*- mode=python; encoding:us-ascii -*-
But when I add a line ending mode to my encoding such as in:
#!/usr/bin/python
# -*- mode=python; encoding:us-ascii-unix -*-
Emacs still acepts my encoding information, but I get the following error from Python when executing my script:
File "./unicode.py", line 2
SyntaxError: encoding problem: with BOM
Is there a way to tell Emacs about the line ending I want to use and at the same time tell Python about the source file encoding?

You can write two blocks: one that is parsed only by the interpreter, and one that is only parsed by Emacs:
#!/usr/bin/python
# coding: us-ascii
print "Hello World"
# Local Variables:
# mode: python
# coding: us-ascii-unix
# End:
Note that (1) us-ascii is the default in Python 2.x; and (2) Emacs is usually able to determine the line ending convention automatically; so you might be able to get along without declaring anything.

How does the "magic lines(s)" in python work, when specifying encoding in python file?

At the start of a python file (first line) sometimes I read
# -*- coding: utf-8 -*-
and sometimes I read
# encoding: utf-8
Both lines seem to do the same thing: specifying utf8 as encoding for all the text put in the file.
I have to questions:
Why does this even work? I thought the interpreter ignores everything after a # because it invokes a comment.
What is the difference between the two lines above? Does the interpreter just ignore the -*-?

The two forms are equivalent. The -*- version is a special kind of comment that Emacs understands. See PEP 263 for more information.
If a comment like in either of these forms is one of the first two lines of a file, the interpreter will use the specified encoding to read the file.

It works because the implementation looks for it, there is nothing magical about it. There is no difference, all possible variants are defined by PEP 263 (the only difference is that the first one is Emacs-compatible).

Hebrew characters in Python code on eclipse

I'm writing python code on eclipse and whenver I use hebrew characters I get the following syntax error:
SyntaxError: Non-ASCII character '\xfa' in file ... on line 66, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
How do I declare unicode/utf-8 encoding?
I tried adding
-*- coding: Unicode -*-
or
-*- coding: utf-8 -*-
in the commented section in the beginnning of the py file. It didn't work.
I'm running eclipse with pydev, python 2.6 on windows 7.

I tried it also and here is my conclusion:
You should add
# -*- coding: utf-8 -*-
at the very first line in your file.
And yes, I work with windows...
If I got it right, you are missing the #

Ensure that the encoding the editor is using to enter data matches the declared encoding in the file metadata.
This isn't something unique to Eclipse or Python; it applies to all character data formats and text editors.
Python has a number of options for dealing with string literals in both the str and unicode types via escape sequences. I believe there were changes to string literals between Python 2 and 3.
Python 2.7 string literals
Python 3.2 string literals

I had the same thing and it was because I'd tried to do:
a='言語版の記事'
When I should have done:
a=u'言語版の記事'
I think it's python/pydev complaining when trying to parse the source, rather than eclipse as such.

"Unicode" is certainly wrong, and \xfa is not UTF-8. Figure out which encoding is actually being used and declare that instead.

Correct way to define Python source code encoding

PEP 263 defines how to declare Python source code encoding.
Normally, the first 2 lines of a Python file should start with:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
But I have seen a lot of files starting with:
#!/usr/bin/python
# -*- encoding: <encoding name> -*-
=> encoding instead of coding.
So what is the correct way of declaring the file encoding?
Is encoding permitted because the regex used is lazy? Or is it just another form of declaring the file encoding?
I'm asking this question because the PEP does not talk about encoding, it just talks about coding.

Check the docs here:
"If a comment in the first or second line of the Python script matches the regular expression coding[=:]\s*([-\w.]+), this comment is processed as an encoding declaration"
"The recommended forms of this expression are
# -*- coding: <encoding-name> -*-
which is recognized also by GNU Emacs, and
# vim:fileencoding=<encoding-name>
which is recognized by Bram Moolenaar’s VIM."
So, you can put pretty much anything before the "coding" part, but stick to "coding" (with no prefix) if you want to be 100% python-docs-recommendation-compatible.
More specifically, you need to use whatever is recognized by Python and the specific editing software you use (if it needs/accepts anything at all). E.g. the coding form is recognized (out of the box) by GNU Emacs but not Vim (yes, without a universal agreement, it's essentially a turf war).

Just copy paste below statement on the top of your program.It will solve character encoding problems
#!/usr/bin/env python
# -*- coding: utf-8 -*-

PEP 263:
the first or second line must match
the regular
expression "coding[:=]\s*([-\w.]+)"
So, "encoding: UTF-8" matches.
PEP provides some examples:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
# This Python file uses the following encoding: utf-8
import os, sys

As of today — June 2018
PEP 263 itself mentions the regex it follows:
To define a source code encoding, a magic comment must be placed into
the source files either as first or second line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
More precisely, the first or second line must match the following regular expression:
^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)
So, as already summed up by other answers, it'll match coding with any prefix, but if you'd like to be as PEP-compliant as it gets (even though, as far as I can tell, using encoding instead of coding does not violate PEP 263 in any way) — stick with 'plain' coding, with no prefixes.

If I'm not mistaken, the original proposal for source file encodings was to use a regular expression for the first couple of lines, which would allow both.
I think the regex was something along the lines of coding: followed by something.
I found this: http://www.python.org/dev/peps/pep-0263/
Which is the original proposal, but I can't seem to find the final spec stating exactly what they did.
I've certainly used encoding: to great effect, so obviously that works.
Try changing to something completely different, like duhcoding: ... to see if that works just as well.

I suspect it is similar to Ruby - either method is okay.
This is largely because different text editors use different methods (ie, these two) of marking encoding.
With Ruby, as long as the first, or second if there is a shebang line contains a string that matches:
coding: encoding-name
and ignoring any whitespace and other fluff on those lines. (It can often be a = instead of :, too).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Where does this come from: -- coding: utf-8 -- - python

This way of specifying the encoding of a Python file comes from PEP 0263 - Defining Python Source Code Encodings. It is also recognized by GNU Emacs (see Python Language Reference, 2.1.4 Encoding declarations), though I don't know if it was the first program to use that syntax.

This is so called file local variables, that are understood by Emacs and set correspondingly. See corresponding section in Emacs manual - you can define them either in header or in footer of file

In PyCharm, I'd leave it out. It turns off the UTF-8 indicator at the bottom with a warning that the encoding is hard-coded. Don't think you need the PyCharm comment mentioned above.

Related

Setting default encoding Openerp/Python

Python, Emacs and Encoding

How does the "magic lines(s)" in python work, when specifying encoding in python file?

Hebrew characters in Python code on eclipse

Correct way to define Python source code encoding

Categories

Resources