Python: Can't write to file - UnicodeEncodeError - python

This code should write some text to file.
When I'm trying to write my text to console, everything works. But when I try to write the text into the file, I get UnicodeEncodeError. I know, that this is a common problem which can be solved using proper decode or encode, but I tried it and still getting the same UnicodeEncodeError. What am I doing wrong?
I've attached an example.
print "(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".decode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2])
(StarBuy s.r.o.,Inzertujte s foto, auto-moto, oblečenie, reality, prácu, zvieratá, starožitnosti, dovolenky, nábytok, všetko pre deti, obuv, stroj....
with open("test.txt","wb") as f:
f.write("(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".decode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u010d' in position 50: ordinal not in range(128)
Where could be the problem?

To write Unicode text to a file, you could use io.open() function:
#!/usr/bin/env python
from io import open
with open('utf8.txt', 'w', encoding='utf-8') as file:
file.write(u'\u010d')
It is default on Python 3.
Note: you should not use the binary file mode ('b') if you want to write text.
# coding: utf8 that defines the source code encoding has nothing to do with it.
If you see sys.setdefaultencoding() outside of site.py or Python tests; assume the code is broken.

#ned-batchelder is right. You have to declare that the system default encoding is "utf-8". The coding comment # -*- coding: utf-8 -*- doesn't do this.
To declare the system default encoding, you have to import the module sys, and call sys.setdefaultencoding('utf-8'). However, sys was previously imported by the system and its setdefaultencoding method was removed. So you have to reload it before you call the method.
So, you will need to add the following codes at the beginning:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

You may need to explicitly declare that python use UTF-8 encoding.
The answer to this SO question explains how to do that: Declaring Encoding in Python

For Python 2:
Declare document encoding on top of the file (if not done yet):
# -*- coding: utf-8 -*-
Replace .decode with .encode:
with open("test.txt","wb") as f:
f.write("(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)".encode("utf-8")%(dict.get('name'),dict.get('description'),dict.get('ico'),dict.get('city'),dict.get('ulCislo'),dict.get('psc'),dict.get('weby'),dict.get('telefony'),dict.get('mobily'),dict.get('faxy'),dict.get('emaily'),dict.get('dic'),dict.get('ic_dph'),dict.get('kategorie')[0],dict.get('kategorie')[1],dict.get('kategorie')[2]))

Related

Encode Strings in Python

I want to run my code on terminal but it shows me this error :
SyntaxError: Non-ASCII character '\xd8' in file streaming.py on line
72, but no encoding declared; see http://python.org/dev/peps/pep-0263/
for detail
I tried to encode the Arabic string using this :
# -*- coding: utf-8 -*-
st = 'المملكة العربية السعودية'.encode('utf-8')
It's very important for me to run it on the terminal so I can't use IDLE.
The problem is since you are directly pasting your characters in to a python file, the interpreter (Python 2) attempts to read them as ASCII (even before you encode, it needs to define the literal), which is illegal. What you want is a unicode literal if pasting non-ASCII bytes:
x=u'المملكة العربية السعودية' #Or whatever the corresponding bytes are
print x.encode('utf-8')
You can also try to set the entire source file to be read as utf-8:
#/usr/bin/python
# -*- coding: utf-8 -*-
and don't forget to make it run-able, and lastly, you can import the future from Python 3:
from __future__ import unicode_literal
at the top of the file, so string literals by default are utf-8. Note that \xd8 appears as phi in my terminal, so make sure the encoding is correct.

Python, Windows and a Unicode file

I'm trying to execute a simple utility I wrote for Linux, which I thought it would be executed without problem on Windows. Wrong.
The script parses a simple file using the "re" module for regex. The problem is that the expression fails every time because Windows doesn't treat well the text file, which is UTF-8, because it contains things like áéíóú or ñ (it's in Spanish).
I've found a lot of stuff about printing text in Unicode format, but have found nothing about reading an Unicode line from a text file or using regex with Unicode on Windows. Thought you might shed some light on the subject.
open() uses locale.getpreferredencoding(False) encoding to decode a file. It is likely to be utf-8 on POSIX systems and it is something else on Windows e.g., cp1252.
If you know the text in the file is stored using utf-8 character encoding then pass it explicitly:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import io
import re
with io.open("filename.txt", encoding='utf-8') as file:
for line in file:
if re.search(u"(?u)unicode\s+pattern…", line):
# found..

How do I direct output to a file when there are UTF-8 characters?

I have a python script that grabs a bunch of recent tweets from the twitter API and dumps them to screen. It works well, but when I try to direct the output to a file something strange happens and a print statement causes an exception:
> ./tweets.py > tweets.txt
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2018' in position 61: ordinal not in range(128)
I understand that the problem is with a UTF-8 character in one of the tweets that doesn't translate well to ASCII, but what is a simple way to dump the output to a file? Do I fix this in the python script or is there a way to coerce it at the commandline?
BTW, the script was written in Python2.
Without modifying the script, you can just set the environment variable PYTHONIOENCODING=utf8 and Python will assume that encoding when redirecting to a file.
References:
https://docs.python.org/2.7/using/cmdline.html#envvar-PYTHONIOENCODING
https://docs.python.org/3.3/using/cmdline.html#envvar-PYTHONIOENCODING
You may need encode the unicode object with .encode('utf-8')
In your python file append this to first line
# -*- coding: utf-8 -*-
If your script file is working standalone, append it to second line
#!/usr/local/bin/python
# -*- coding: utf-8 -*-
Here is the document: PEP 0263

How does "coding: pyxl" work in Python?

pyxl or interpy are using a very interesting trick to enhance the python syntax in a way: coding: from PEP-263
# coding: pyxl
print <html><body>Hello World!</body></html>
or
# coding: interpy
package = "Interpy"
print "Enjoy #{package}!"
How could I write my own coding: if I wanted to? And could I use more than one?
I'm Syrus, the creator of interpy.
Thanks to codecs # coding: your_codec_name in Python we have a chance to preprocess the file before it is converted to bytecode.
This is how it works:
Reading the file contents
At first, Python reads the file and stores its content. As the content could be encoded in a strange format, Python tries to decode it. Here is where the magic happens.
If the coding is not found, Python will try to decode the content with the default string coding: Ascii or UTF-8 codecs depending on the Python version. This is why you have to write # coding: utf-8 when using unusual chars (á, ñ, Ð, ...) in Python 2, because Ascii is the default.
Decoding file contents
If we register a custom codec (both encoder and decoder), and a file tells Python it is using our codec (via # coding: codec_name), then Python will decode the file with our codec.
Registering our codec
To register the codec without needing an import, we create a path configuration file (.pth) which registers the codec before any non-main-module is executed.
Transforming file contents
Once the decoder of our codec is called, we can modify the output we want, but... how do we know Python syntax (tokens) inside this content?
Simply call the Python tokenizer with the file contents and modify the desired tokens.
In the case of interpy, it changes the behavior only when Python strings are found in the file content.
Sending back transformed (decoded) contents
Once we transform the content, we send it back to the Python compiler to be compiled to bytecode.
Hope you find this useful!

Setting default encoding Openerp/Python

Do you guys know how to change the default encoding of an openerp file?
I've tried adding # -*- coding: utf-8 -*- but it doesn't work (is there a setup that ignore this command? just a wild guess). When I try to execute sys.getdefaultencoding() still its in ASCII.
Regards
The comment # -*- coding: utf-8 -*- tells the python parser the encoding of the source file. It affects how the bytecode compiler converts unicode literals in the source code. It has no effect on the runtime environment.
You should explicitly define the encoding when converting strings to unicode. If you are getting UnicodeDecodeError, post your problem scenario and I'll try to help.

Categories

Resources