Problems with unicode using pystache - python

Rendering a bunch of files, I get - with some of them - the following problem:
'ascii' codec can't decode byte 0xe5 in position 7128: ordinal not in range(128)
It seems some of those files are in unicode. How can I read those files so that pystache is able to render them? Currently I am reading those files as follows:
content = open(filename, 'r').read()
Is there an equivalent (simple) way of reading the full unicode file?

I've fixed this with change
content = open(filename, 'r').read()
to
import codecs
content = codecs.open(filename, 'r', 'utf-8').read()

Related

Script crashes when trying to read a specific Japanese character from file

I was trying to save some Japanese characters from a text file into a string. Most of the characters like "道" make no problems. But others like "坂" don't work. When I'm trying to read them, my script crashes.
Do I have to use a specific encoding while reading the file?
That's my code btw:
with open(path, 'r') as file:
lines = [line.rstrip() for line in file]
The error is:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 310: character maps to <undefined>
You have to specify the encoding when working with non ASCII, like this:
file = open(filename, encoding="utf8")

'utf-8' codec can't decode byte 0x8a in position 170: invalid start byte

I am trying to do this:
fh = request.FILES['csv']
fh = io.StringIO(fh.read().decode())
reader = csv.DictReader(fh, delimiter=";")
This is failing always with the error in title and I spent almost 8 hours on this.
here is my understanding:
I am using python3, so file fh is in bytes. I am encoding it into string and putting it in memory via StringIO.
with csv.DictReader() trying to read it as dict into memory. It is failing here:
also tried with io.StringIO(fh.read().decode('utf-8')), but same error.
what am I missing? :/
The error is because there is some non-ASCII character in the file and it can't be encoded/decoded. One simple way to avoid this error is to encode/decode such strings with encode()/decode() function as follows (if a is the string with non-ASCII character):
a.decode('utf-8')
Also, you could try opening the file as:
with open('filename', 'r', encoding = 'utf-8') as f:
your code using f as file pointer
use 'rb' if your file is binary.

Not able to read file due to unicode error in python

I'm trying to read a file and when I'm reading it, I'm getting a unicode error.
def reading_File(self,text):
url_text = "Text1.txt"
with open(url_text) as f:
content = f.read()
Error:
content = f.read()# Read the whole file
File "/home/soft/anaconda/lib/python3.6/encodings/ascii.py", line 26, in
decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 404:
ordinal not in range(128)
Why is this happening? I'm trying to run the same on Linux system, but on Windows it runs properly.
According to the question,
i'm trying to run the same on Linux system, but on Windows it runs properly.
Since we know from the question and some of the other answers that the file's contents are neither ASCII nor UTF-8, it's a reasonable guess that the file is encoded with one of the 8-bit encodings common on Windows.
As it happens 0x92 maps to the character 'RIGHT SINGLE QUOTATION MARK' in the cp125* encodings, used on US and latin/European regions.
So probably the the file should be opened like this:
# Python3
with open(url_text, encoding='cp1252') as f:
content = f.read()
# Python2
import codecs
with codecs.open(url_text, encoding='cp1252') as f:
content = f.read()
There can be two reasons for that to happen:
The file contains text encoded with an encoding different than 'ascii' and, according you your comments to other answers, 'utf-8'.
The file doesn't contain text at all, it is binary data.
In case 1 you need to figure out how the text was encoded and use that encoding to open the file:
open(url_text, encoding=your_encoding)
In case 2 you need to open the file in binary mode:
open(url_text, 'rb')
As it looks, default encoding is ascii while Python3 it's utf-8, below syntax to open the file can be used
open(file, encoding='utf-8')
Check your system default encoding,
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
If it's not UTF-8, reset the encoding of your system.
export LANGUAGE=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LC_TYPE=en_US.UTF-8
You can use codecs.open to fix this issue with the correct encoding:
import codecs
with codecs.open(filename, 'r', 'utf8' ) as ff:
content = ff.read()

Codec issue while reading txt file via Python

Having an issue while trying to open .txt file (which contains only pure text in it) with Python 3.6 using simple open () approach:
with open('3003.txt', 'r') as myfile:
data=myfile.read()
It gives the errortrace like that:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 182: invalid continuation byte
I used couple of googled approaches but they dont solve the deal. What should be done to solve this?
You can use Python 3 open() style file handler which streams bytestrings:
open('3003.txt', 'rb') as myfile:
data=myfile.read()
Note the 'b' meaning binary mode

UnicodeDecodeError: save to file in python

i want to read file, find something in it and save the result, but when I want to save it it give me a error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)
Code to save to file:
fileout.write((key + ';' + nameDict[key]+ ';'+src + alt +'\n').decode('utf-8'))
What can I do to fix it?
Thank you
You are trying to concatenate unicode values with byte strings, then turn the result to unicode, to write it to a file object that most likely only takes byte strings.
Don't mix unicode and byte strings like that.
Open the file to write to with io.open() to automate encoding Unicode values, then handle only unicode in your code:
import io
with io.open(filename, 'w', encoding='utf8') as fileout:
# code gathering stuff from BeautifulSoup
fileout.write(u'{};{};{}{}\n'.format(key, nameDict[key], src, alt)
You may want to check out the csv module to handle writing out delimiter-separated values. If you do go that route, you'll have to explicitly encode your columns:
import csv
with open(filename, 'wb') as fileout:
writer = csv.writer(fileout, delimiter=';')
# code gathering stuff from BeautifulSoup
row = [key, nameDict[key], src + alt]
writer.writerow([c.encode('utf8') for c in row])
If some of this data comes from other files, make sure you also decode to Unicode first; again, io.open() to read these files is probably the best option, to have the data decoded to Unicode values for you as you read.

Categories

Resources