Im new in python
I'm trying to print some chineses word to command line windows 10 and file but got a problem:
Here is my code:
fh = open("hello.txt", "w")
str="欢迎大家加入自由职业者群体。谢谢大家"
print(str)
fh.write(str)
fh.close()
The default encoding of files is locale.getpreferredencoding(False), which seems to be cp1252 on your system. Specify the encoding when opening the file.
Also use with and the file will be closed for you when it exits the block:
#!python3.6
with open('hello.txt','w',encoding='utf8') as fh:
str="欢迎大家加入自由职业者群体。谢谢大家"
print(str)
fh.write(str)
To see the Chinese characters on the console you'll need to install a Chinese language pack, and change the console font to one that supports Chinese. Using an IDE that supports UTF-8 will also work. The "boxed question mark" characters are what is displayed when the font doesn't support the characters. If you cut-n-paste those characters to an application like Notepad that has support for Chinese fonts you should see the correct characters.
Here's my US Windows system with the Chinese Language Pack. The console is configured with the SimHei font.
Couple of issues:
There shouldn't be an identation after the fh variable declaration. You shouldn't name a string "str" because that's a builtin function. If you want to use characters outside the latin alphabet you need to declare that you're using UTF-8 like so: # -*- coding: utf-8 -*- (put that at the top of your file). Then it should work. Although terminal does sometimes have issues with foreign characters.
# -*- coding: utf-8 -*-
fh = open("hello.txt", "w")
str1="欢迎大家加入自由职业者群体。谢谢大家"
print(str1)
fh.write(str1)
fh.close()
Edit
Official solution is, use PyCharm!
For me, changing the font in command line worked.
System: Windows 10
Font that worked: NSimSun
Related
I was trying to run a simple code that should count the number of "ü" in a string, but just doing:
string1= "pingüino"
print(string1)
gives the error:
SyntaxError: Non-UTF-8 code starting with '\xfc'
I get that when opening a file you can do:
with open("path to file", "r", encoding="utf-8") as file1:
forcing the encoding utf-8 in file1, but what if I just want to save the string in a variable without making use of text files, I'm currently using PyCharm and in File->Settings->File encoding-> I have the global encoding is UTF-8 and the project encoding is windows-1252 (I have tried to change it to UTF-8 but then the ü symbol appears as an unidentified symbol in the code), also the version of Python I'm using is 3.7.7
the way to achieve what I wanted was to add a magical comment in the code, you need to add:
# encoding: utf-8
that will change the encoding of the code afterward the comment, allowing the use of string with special characters.
I am trying to add some symbols to a text file,i can not define these symbols in editor
but it works from command line.
symbols = '$¢£¥€¤' works in interpreter but not editor(sublime),however it doesn't print these symbols correctly in command.However if i decode("utf-8") then print works fine.
symbols = '$¢£¥€¤'
s=symbols.decode("utf-8")
I use python 2.7 and sublime text editor
this is the error i get when i run using editor
SyntaxError: Non-ASCII character '\xc2' in file /home/programmer/Desktop/seleniumIns.py on line 184, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
How can i fix these to add them to my original program in editor
When you run a python file containing unicode, you need to tell the interpreter what encoding is used.
In your case put at the very first line of your script this line:
# -*- coding: utf-8 -*-
And you'll be able to use utf-8!
I'm trying to execute a simple utility I wrote for Linux, which I thought it would be executed without problem on Windows. Wrong.
The script parses a simple file using the "re" module for regex. The problem is that the expression fails every time because Windows doesn't treat well the text file, which is UTF-8, because it contains things like áéíóú or ñ (it's in Spanish).
I've found a lot of stuff about printing text in Unicode format, but have found nothing about reading an Unicode line from a text file or using regex with Unicode on Windows. Thought you might shed some light on the subject.
open() uses locale.getpreferredencoding(False) encoding to decode a file. It is likely to be utf-8 on POSIX systems and it is something else on Windows e.g., cp1252.
If you know the text in the file is stored using utf-8 character encoding then pass it explicitly:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import io
import re
with io.open("filename.txt", encoding='utf-8') as file:
for line in file:
if re.search(u"(?u)unicode\s+pattern…", line):
# found..
I'm writing python code on eclipse and whenver I use hebrew characters I get the following syntax error:
SyntaxError: Non-ASCII character '\xfa' in file ... on line 66, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
How do I declare unicode/utf-8 encoding?
I tried adding
-*- coding: Unicode -*-
or
-*- coding: utf-8 -*-
in the commented section in the beginnning of the py file. It didn't work.
I'm running eclipse with pydev, python 2.6 on windows 7.
I tried it also and here is my conclusion:
You should add
# -*- coding: utf-8 -*-
at the very first line in your file.
And yes, I work with windows...
If I got it right, you are missing the #
Ensure that the encoding the editor is using to enter data matches the declared encoding in the file metadata.
This isn't something unique to Eclipse or Python; it applies to all character data formats and text editors.
Python has a number of options for dealing with string literals in both the str and unicode types via escape sequences. I believe there were changes to string literals between Python 2 and 3.
Python 2.7 string literals
Python 3.2 string literals
I had the same thing and it was because I'd tried to do:
a='言語版の記事'
When I should have done:
a=u'言語版の記事'
I think it's python/pydev complaining when trying to parse the source, rather than eclipse as such.
"Unicode" is certainly wrong, and \xfa is not UTF-8. Figure out which encoding is actually being used and declare that instead.
The title explains it well. I have set up Notepad++ to open the Python script in the command prompt when I press F8 but all Swedish characters looks messed up when opening in CMD but perfectly fine in e.g IDLE.
This simple example code:
#!/usr/bin/env python
#-*- coding: UTF-8 -*-
print "åäö"
Looks like this.
As you can see the output of the batch file I use to open Python in cmd below shows the characters correctly but not the Python script above it. How do I fix this? I just want to show the characthers correctly I dont necessarily have too use UTF-8.
I open the file in cmd using this method.
Update: Solved. Added a "chcp 1252" line at the top of the batch file and then a cls line below it to remove the message about what character encoding it uses. Then I used "# -- coding: cp1252 --" in the python script and changed the font in cmd to Lucida Console. This is done by clicking the cmd icon at the top right of the cmd window and go into properties.
You're printing out UTF-8 bytes, but your console is not set to UTF-8. Either write Unicode as UTF-16, or set your console codepage to UTF-8.
print u"åäö"
I had the same issue and I used cp1252
C:>chcp 1252
This made the console to use 1252 encoding and then I ran my program which displayed the swedish characters with mercy.
Python will normally convert Unicode strings to the Windows console's encoding. Note that to use Unicode properly, you need Unicode strings (e.g., u'string') and need to declare the encoding the file is saved in with a coding: line.
For example, this (saved in UTF-8 as x.py on my system):
# coding: utf8
print u"åäö"
Produces this:
C:\>chcp
Active code page: 437
C:\>x
åäö
You'll only be able to successfully print characters that are supported by the active code page.
Set coding to: # -*- coding: ISO-8859-1 -*-
This worked for me and I tried a lot of different solutions to get it to work with Visual Studio IDE for Python.
# -*- coding: ISO-8859-1 -*-
print ("åäö")