Python script utf-8 issue - python

I try to use this python text-to-speech converter to convert Greek into mp3.
Git says utf-8 is supported but when I try to translate text like "Γεια σου" it throws an error as shown below:
What I type on cmd: gtts-cli.py "Γεια σου" -l el -o hi.mp3
What I get:
'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in
range(128)
Any ideas?
Update:
I added utf-8 support as shown below. I even updated to python3. Still getting a similar error...
'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte
What I added:
text = args.text.decode('utf-8')
Any ideas?

There is related open issue in this project, please take a look.
Looks like the fix was created by the somebody already though, but it is still not merged.

Related

python pyinstaller UnicodeDecodeError cp949

I get unicodedecodeerror when I try to install pyinstaller.
The error message reades:
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 208687: illegal multibyte sequence
When I google this error, it looks like an error with codec to read the file.
Tried some of the solutions found online but didn't work.
How can I fix this?
I think in your code have function to print some data with the codec which the window shell does not support display. Remove them and try again(I cannot comment because not enough rep so i wrote here)

Python encoding issue while reading a file

I am trying to read a file that contains this character in it "ë". The problem is that I can not figure out how to read it no matter what I try to do with the encoding. When I manually look at the file in textedit it is listed as a unknown 8-bit file. If I try changing it to utf-8, utf-16 or anything else it either does not work or messes up the entire file. I tried reading the file just in standard python commands as well as using codecs and can not come up with anything that will read it correctly. I will include a code sample of the read below. Does anyone have any clue what I am doing wrong? This is Python 2.17.10 by the way.
readFile = codecs.open("FileName",encoding='utf-8')
The line I am trying to read is this with nothing else in it.
Aeëtes
Here are some of the errors I get:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x91 in position 0: invalid start byte
UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM -- I know this one is that it is not a utf-16 file.
UnicodeDecodeError: 'ascii' codec can't decode byte 0x91 in position 0: ordinal not in range(128)
If I don't use a Codec the word comes in as Ae?tes which then crashes later in the program. Just to be clear, none of the suggested questions or any other anywhere on the net have pointed to an answer. One other detail that might help is that I am using OS X, not Windows.
Credit for this answer goes to RadLexus for figuring out the proper encoding and also to Mad Physicist who pointed me in the right track even if I did not consider all possible encodings.
The issue is apparently a Mac will convert the .txt file to mac_roman. If you use that encoding it will work perfectly.
This is the line of code that I used to convert it.
readFile = codecs.open("FileName",encoding='mac_roman')

Django ascii/utf-8 encoding error on Windows caused by ImageField [duplicate]

This question already has answers here:
Adding ImageField to model causes exception in django
(2 answers)
Closed 8 years ago.
I'm running Django application on Windows.
I noticed that it stopped working after I've had added ImageField in one of my models (when I commented this field, application runs).
I've got following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb3 in position 35: ordinal not in range(128)
I've tried to set local variables:
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
export LC_LANG='en_US.UTF-8'
But it didn't help...
I tried to change system encoding in manage.py
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
but I've got following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in position 35: invalid start byte
How can I fix it?
Is it Python 2.x? Add this line to the beginning of the file.
# -*- coding: utf8 -*-
or check this answer
There is a few possible solutions I can think of. One, try encoding it in base64 instead of just directly encoding it. You could also try using UTF-16 as your codec and see if that helps.

Django getting encoding error only on command line

I have a script that reads an XML file and writes it into the Database.
When I run it through the browser (call it via a view) it works fine, but
when I created a Command for it (./manage.py importxmlfile) I get the following message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 6: ordinal not in range(128)
I'm not sure why it would only happen when calling the import via command line.. any ideas?
Update
I'm trying to convert an lxml.etree._ElementUnicodeResult object to string and save it in the DB (utf8 collation) using str(result).
This produces the error mentioned above only on Command Line.
Ah, don't use str(result).
instead, do:
result.encode('utf-8')
When you call str(result), python will use the default system encoding (usually ascii) to try and encode the bytes in result. This will break if the ordinal not in range(128). Rather than using the ascii codec, just .encode() and tell python which codec to use.
Check out the Python Unicode HowTo for more information. You might also want to check out this related question or this excellent presentation on the subject.

Diacritic signs

How should I write "mąka" in Python without an exception?
I've tried var= u"mąka" and var= unicode("mąka") etc... nothing helps
I have coding definition in first line in my document, and still I've got that exception:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
Save the following 2 lines into write_mako.py:
# -*- encoding: utf-8 -*-
open(u"mąka.txt", 'w').write("mąka\n")
Run:
$ python write_mako.py
mąka.txt file that contains the word mąka should be created in the current directory.
If it doesn't work then you can use chardet to detect actual encoding of the file (see chardet example usage):
import chardet
print chardet.detect(open('write_mako.py', 'rb').read())
In my case it prints:
{'confidence': 0.75249999999999995, 'encoding': 'utf-8'}
The # -- coding: -- line must specify the encoding the source file is saved in. This error message:
'utf8' codec can't decode byte 0xb1 in position 0: unexpected code byte
indicates you aren't saving the source file in UTF-8. You can save your source file in any encoding that supports the characters you are using in the source code, just make sure you know what it is and have an appropriate coding line.
What exception are you getting?
You might try saving your source code file as UTF-8, and putting this at the top of the file:
# coding=utf-8
That tells Python that the file’s saved as UTF-8.
This code works for me, saving the file as UTF-8:
v = u"mąka"
print repr(v)
The output I get is:
u'm\u0105ka'
Please copy and paste the exact error you are getting. If you are getting this error:
UnicodeEncodeError: 'charmap' codec can't encode character ... in position ...: character maps to <undefined>
Then you are trying to output the character somewhere that does not support UTF-8 (e.g. your shell's character encoding is set to something other than UTF-8).

Categories

Resources