Unicode error when writing russian symbols to csv - python

I want to write cyrillic symbols to csv file but I get unicode encode error. English symbols works perfect. I'm using Python 3.6.2.
UnicodeEncodeError: 'ascii' codec can't encode characters in position
1-6: ordinal not in range(128)
import csv
with open("test.csv", 'w') as csvfile:
csvfile = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
hello = 'привет, мир!'
csvfile.writerow([hello])

Declare the encoding of the file when you open it. newline='' is also required per the csv documentation.
import csv
with open('test.csv','w',encoding='utf8',newline='') as csvfile:
csvfile = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
hello = 'привет, мир!'
csvfile.writerow([hello])

You just need to encode the hello string before you write it to a file (csv). Otherwise Python is expecting you to input only ascii characters, in case of non-ascii characters, you may use utf-8 encoding as:
# -*- coding: utf-8 -*-
import csv
with open("test.csv", 'w') as csvfile:
csvfile = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
hello = u'привет, мир!' # Better way of declaring a unicode string literal
csvfile.writerow([hello.encode("utf-8")])

Add this code in your file
# encoding=utf8
--------------------------------------
import sys
reload(sys)
sys.setdefaultencoding('utf8')

For Python 2 guys, use this instead of the normal "open" function:
import codecs
codecs.open(out_path, encoding='utf-8', mode='w')
This is equivalent of the following in Python 3:
open(out_path, 'w', encoding='utf8')

Related

How to fix encoding issue in a python script for spanish alphabet

This short script is to convert a CSV into a JSON. The CSV contains letters in spanish alphabet, which is still UTF-8 I believe. The script seems to have issue reading letters like ñ or é from the CSV.
After executing
python2 csvToJSON.py
This console returns 'UnicodeDecodeError: 'utf8' codec can't decode byte 0xed in position 11: invalid continuation byte'
# -*- coding: utf-8 -*-
import codecs
import csv
import json
csvfile = codecs.open('practice.csv', encoding='utf-8').read()
# csvfile = open('practice.csv', 'r')
jsonfile = open('file.json', 'w')
fieldnames = ("contraseña", "id")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
json.dump(row, jsonfile)
jsonfile.write('\n')

How to display cyrillic text from a file in python?

I want to read some cyrilic text from a txt file in Python 3.
This is what the text file contains.
абцдефгчийклмнопярстувшхыз
I used:
with open('text.txt', 'r') as myfile:
text=myfile.read()
print (text)
But this is the ouput in the python shell:
ÿþ01F45D3G89:;<=>?O#ABC2HEK7
Can someone explain why this is the output?
Python supports utf-8 for this sort of thing.
You should be able to do:
with open('text.txt', encoding = 'utf-8', mode = 'r') as my_file:
...
Also, be sure that your text file is saved with utf-8 encoding. I tested this in my shell and without proper encoding my output was:
?????????????????????
With proper encoding:
file = open('text.txt', encoding='utf-8', mode='r')
text = file.read()
print(text)
абцдефгчийклмнопярстувшхы
Try working on the file using codecs, you need to
import codecs
and then do
text = codecs.open('text.txt', 'r', 'utf-8')
Basically you need utf8

Do I have to encode unicode variable before write to file?

I read the "Unicdoe Pain" article days ago. And I keep the "Unicode Sandwich" in mind.
Now I have to handle some Chinese and I've got a list
chinese = [u'中文', u'你好']
Do i need to proceed encoding before writing to file?
add_line_break = [word + u'\n' for word in chinese]
encoded_chinese = [word.encode('utf-8') for word in add_line_break]
with open('filename', 'wb') as f:
f.writelines(encoded_chinese)
Somehow I find out that in python2. I can do this:
chinese = ['中文', '你好']
with open('filename', 'wb') as f:
f.writelines(chinese)
no unicode matter involed. :D
You don't have to do that, you could use io or codecs to open the file with encoding.
import io
with io.open('file.txt', 'w', encoding='utf-8') as f:
f.write(u'你好')
codecs.open has the same syntax.
In python3;
with open('file.txt, 'w', encoding='utf-8') as f:
f.write('你好')
will do just fine.

Writing CSV file with umlauts causing "UnicodeEncodeError: 'ascii' codec can't encode character"

I am trying to write characters with double dots (umlauts) such as ä, ö and Ö. I am able to write it to the file with data.encode("utf-8") but the result b'\xc3\xa4\xc3\xa4\xc3\x96' is not nice (UTF-8 as literal characters). I want to get "ääÖ" as written stored to a file.
How can I write data with umlaut characters to a CSV file in Python 3?
import csv
data="ääÖ"
with open("test.csv", "w") as fp:
a = csv.writer(fp, delimiter=";")
data=resultFile
a.writerows(data)
Traceback:
File "<ipython-input-280-73b1f615929e>", line 5, in <module>
a.writerows(data)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 15: ordinal not in range(128)
Add a parameter encoding to the open() and set it to 'utf8'.
import csv
data = "ääÖ"
with open("test.csv", 'w', encoding='utf8') as fp:
a = csv.writer(fp, delimiter=";")
a.writerows(data)
Edit: Removed the use of io library as open is same as io.open in Python 3.
This solution should work on both python2 and 3 (not needed in python3):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
data="ääÖ"
with open("test.csv", "w") as fp:
a = csv.writer(fp, delimiter=";")
a.writerows(data)
Credits to:
Working with utf-8 encoding in Python source

This is my current way of writing to a file. However, I can't do UTF-8?

f = open("go.txt", "w")
f.write(title)
f.close()
What if "title" is in japanese/utf-8? How do I modify this code to be able to write "title" without having the ascii error?
Edit: Then, how do I read this file in UTF-8?
How to use UTF-8:
import codecs
# ...
# title is a unicode string
# ...
f = codecs.open("go.txt", "w", "utf-8")
f.write(title)
# ...
fileObj = codecs.open("go.txt", "r", "utf-8")
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file
It depends on whether you want to insert a Unicode UTF-8 byte order mark, of which the only way I know of is to open a normal file and write:
import codecs
f = open('go.txt', 'wb')
f.write(codecs.BOM_UTF8)
f.write(title.encode('utf-8')
f.close()
Generally though, I don't want to add a UTF-8 BOM and the following will suffice though:
import codecs
f = codecs.open('go.txt', 'w', 'utf-8')
f.write(title)
f.close()

Categories

Resources