I'm trying to start a script, but I have run in to a problem.
[ERROR] 'charmap' codec can't encode character '\u300b' in position 11: character maps to
Your code is printing something that is unreadable for the machine. Try changing the output encoding to "utf-8" or just use code below at the very first lines of your code:
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')
Related
I am reading some files from google cloud storage using python
spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)
However, I keep getting an error that complains about the df.show(10) line:
df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)
I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8" to the spark.read.option, as I already did. Since this doesn't help, I am still getting this error, could experts help? Thanks in advance.
How about exporting PYTHONIOENCODING before running your Spark job:
export PYTHONIOENCODING=utf8
For Python 3.7+ the following should also do the trick:
sys.stdout.reconfigure(encoding='utf-8')
For Python 2.x you can use the following:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
I try to use this python text-to-speech converter to convert Greek into mp3.
Git says utf-8 is supported but when I try to translate text like "Γεια σου" it throws an error as shown below:
What I type on cmd: gtts-cli.py "Γεια σου" -l el -o hi.mp3
What I get:
'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in
range(128)
Any ideas?
Update:
I added utf-8 support as shown below. I even updated to python3. Still getting a similar error...
'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte
What I added:
text = args.text.decode('utf-8')
Any ideas?
There is related open issue in this project, please take a look.
Looks like the fix was created by the somebody already though, but it is still not merged.
I am getting a message 'ascii' codec can't encode character u'\xe9' when I am writing a string to my file, heres how I am writing my file
my_file = open(output_path, "w")
my_file.write(output_string)
my_file.close()
I have been searching and found answers like this UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) and the first one didn't work and then this one I'm confused why I am encoding data I want to be able to read
import io
f = io.open(filename, 'w', encoding='utf8')
Thanks for the help
As mentioned, you're trying to write non-ASCII characters with the ASCII encoding. Since the built-in open function doesn't support the encoding parameter, then consider always using io.open in Python 2.7 (which is the default since Python 3.x).
Im' trying read a docx file in python 2.7 with this code:
import docx
document = docx.Document('sim_dir_administrativo.docx')
docText = '\n\n'.join([
paragraph.text.encode('utf-8') for paragraph in document.paragraphs])
And then I'm trying to decode the string inside the file with this code, because I have some special characters (e.g. ã):
print docText.decode("utf-8")
But, I'm getting this error:
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
494457: character maps to <undefined>
How can I solve this?
The print function can only print characters that are in your local encoding. You can find out what that is with sys.stdout.encoding. To print with special characters you must first encode to your local encoding.
# -*- coding: utf-8 -*-
import sys
print sys.stdout.encoding
print u"Stöcker".encode(sys.stdout.encoding, errors='replace')
print u"Стоескер".encode(sys.stdout.encoding, errors='replace')
This code snippet was taken from this stackoverflow response.
I'm trying to print a string from an archived web crawl, but when I do I get this error:
print page['html']
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 17710: ordinal not in range(128)
When I try print unicode(page['html']) I get:
print unicode(page['html'],errors='ignore')
TypeError: decoding Unicode is not supported
Any idea how I can properly code this string, or at least get it to print? Thanks.
You need to encode the unicode you saved to display it, not decode it -- unicode is the unencoded form. You should always specify an encoding, so that your code will be portable. The "usual" pick is utf-8:
print page['html'].encode('utf-8')
If you don't specify an encoding, whether or not it works will depend on what you're printing to -- your editor, OS, terminal program, etc.