How to use the input method for Japanese - python

message = input()
The line can accept user input for Japanese on my Mac. However, when the code runs on Ubuntu 18.04, it reports the following error:
message = input()
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 4-5: invalid continuation byte
How to make the encoding working on my Ubuntu? I searched and most posts are on file reading in UTF 8. How about the simple 'input()' function?

Related

How do I read from a file that has the 0xEB character in it?

I have tried using open("oxeb.txt").read() in python 2 and it works but it doesn't work in python 3.
I know that the default encoding in python 2 is ascii and the default encoding in python 3 is utf8 so I tried doing this in python 3: open("oxeb.txt").read() and it STILL doesn't work.
How can I read a file with this character in it - independent of my python version?
Note: this is the error I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 4: invalid continuation byte
You can open the file in binary mode.
Obviously then you no longer have printable string data, but binary data.
So you will need to convert it.
text = open("oxeb.txt","rb").read()
text = text.decode('iso-8859-1')

Python script utf-8 issue

I try to use this python text-to-speech converter to convert Greek into mp3.
Git says utf-8 is supported but when I try to translate text like "Γεια σου" it throws an error as shown below:
What I type on cmd: gtts-cli.py "Γεια σου" -l el -o hi.mp3
What I get:
'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in
range(128)
Any ideas?
Update:
I added utf-8 support as shown below. I even updated to python3. Still getting a similar error...
'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte
What I added:
text = args.text.decode('utf-8')
Any ideas?
There is related open issue in this project, please take a look.
Looks like the fix was created by the somebody already though, but it is still not merged.

python pyinstaller UnicodeDecodeError cp949

I get unicodedecodeerror when I try to install pyinstaller.
The error message reades:
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 208687: illegal multibyte sequence
When I google this error, it looks like an error with codec to read the file.
Tried some of the solutions found online but didn't work.
How can I fix this?
I think in your code have function to print some data with the codec which the window shell does not support display. Remove them and try again(I cannot comment because not enough rep so i wrote here)

Python Unicode Decode Error for Byte not in file

I am reading a large file in python line by line with readline(). After reaching close to 672,280 lines I get an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 228:
invalid start byte.
However, I have searched the file using grep for a byte 0xfd and it returned none. I also wrote c++ code to go through the file and look for a byte 0xfd and still got nothing. So I have no idea what is going on here. Is it an error because the file is too big?
I just don't see how a decoding error can happen for a byte not in a file.
Thanks
you can try out to open file with ISO encoding.
open('myfile.txt', encoding = "ISO-8859-1")

Django getting encoding error only on command line

I have a script that reads an XML file and writes it into the Database.
When I run it through the browser (call it via a view) it works fine, but
when I created a Command for it (./manage.py importxmlfile) I get the following message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 6: ordinal not in range(128)
I'm not sure why it would only happen when calling the import via command line.. any ideas?
Update
I'm trying to convert an lxml.etree._ElementUnicodeResult object to string and save it in the DB (utf8 collation) using str(result).
This produces the error mentioned above only on Command Line.
Ah, don't use str(result).
instead, do:
result.encode('utf-8')
When you call str(result), python will use the default system encoding (usually ascii) to try and encode the bytes in result. This will break if the ordinal not in range(128). Rather than using the ascii codec, just .encode() and tell python which codec to use.
Check out the Python Unicode HowTo for more information. You might also want to check out this related question or this excellent presentation on the subject.

Categories

Resources