Python 3.5 cannot read a file with special character ★ - python

i have below code in python 3.5 screen of IDE and powershell
real_path_log_file = 'C:\\Users\\XXXXX\\Desktop\\report\\file1.log'
with open(real_path_log_file, 'r', encoding='utf-8') as open_log:
read_log = open_log.readlines()
print(read_log)
from powershell i get below error but if i run this from pycharm i can getting the output of file content
please note that file i am reading has ★ character
Traceback (most recent call last):
File ".\test_read_file.py", line 5, in <module>
print(read_log)
File "C:\Users\XXXXX\AppData\Local\Programs\Python\Python35\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 367: character maps to <undefined>

I'm no python expert. At least in 3.5, I think print() needs to believe that the console can support unicode characters. For what it's worth, 3.8.5 doesn't have this problem.

Related

Scalene: An exception of type UnicodeEncodeError

I'm trying to run Scalene inside a .ipynb fily in Jupyter with %%scalene and get the following error:
Scalene: An exception of type UnicodeEncodeError occurred. Arguments:
('charmap', '\r\n<html>\r\n <head>\r\n <title>Scalene</title> ...
followed by basically the whole Scalene Github website html code and ending with:
Traceback (most recent call last):
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1949, in run_profiler
exit_status = profiler.profile_code(
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1781, in profile_code
Scalene.generate_html(profile_fname=Scalene.__profile_filename, output_fname=Scalene.__profiler_html)
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\site-packages\scalene\scalene_profiler.py", line 1729, in generate_html
f.write(rendered_content)
File "C:\Users\marci\anaconda3\envs\wc2022v2_env\lib\encodings\cp1250.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xa3' in position 79534: character maps to <undefined>
Am I right to think it is the website coding that is causing this error? Or is it the browser?
If anything else: what can be done about it?
python 3.8.15
Got the same error related to cp1250. Seems scalene profiler didn't bother to test national environments. Tried to use chcp 65001 to set runtime to utf-8, but it didn't help.
My fix was to hack its source "(...)\scalene\scalene_profiler.py" at line 1728:
instead of:
with open(output_fname, "w") as f:
use:
with open(output_fname, "w", encoding='utf-8') as f:
That solved the problem.

Emoji support reading from file in python? [duplicate]

I need to analyse a textfile in tamil (utf-8 encoded). Im using nltk package of Python on the interface IDLE. when i try to read the text file on the interface, this is the error i get. how do i avoid this?
corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
corpus = open('C:\\Users\\Customer\\Desktop\\DISSERTATION\\ettuthokai.txt').read()
File "C:\Users\Customer\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 33: character maps to <undefined>
Since you are using Python 3, just add the encoding parameter to open():
corpus = open(
r"C:\Users\Customer\Desktop\DISSERTATION\ettuthokai.txt", encoding="utf-8"
).read()

About UnicodeDecodeError

I am writing a program to count the words with python(3.6), the code runs smoothly from the terminal. But if I use python IDLE, below error happens:
Traceback (most recent call last):
File "/Users/zhangchaont/python/Course Python Programming/6.7V2.py", line 122, in <module>
main()
File "/Users/zhangchaont/python/Course Python Programming/6.7V2.py", line 21, in main
for line in txtFile:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 33: ordinal not in range(128)
How to solve this?
Since there is not much info about your code. I can only suggest instead of codecs you can also use this package.
https://github.com/iki/unidecode. The method below should solve your problem. Open your file with open method, and pass it the file_handle.read()
unidecode.unidecode_expect_nonascii(string)

Why am I getting "UnicodeEncodeError: 'charmap' codec can't encode character" error message

I encountered a problem with Python script I wrote while running in a Windows CMD window, and boiled the essence of the problem down to the following SSCCE:
The Python script (x.py)
import sys
in_file = open (sys.argv[1], 'rt')
for line in in_file:
line = line.rstrip ('\n')
print ('line="%s"' % (line))
in_file.close ()
The input data file (x.txt)
Line 1
Line 2 “text”
Line 3
The command line invocation
python x.py x.txt
The error output
C:\junk>python x.py x.txt
line="Line 1"
Traceback (most recent call last):
File "x.py", line 7, in <module>
print ('line="%s"' % (line))
File "C:\Program Files (x86)\Python34\lib\encodings\cp862.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u201c' in position 13: character maps to <undefined>
C:\junk>
It seems to be failing on the second input record ("Line 2"). What am I doing wrong?
The answer turned out to be a Windows codepage issue.
The second input line uses the ANSI typographical characters 0x93 (147) and 0x94 (148), corresponding to the left and right quotation marks, respectively. Although the input file was meant to be an ASCII file (i.e., characters < 128 decimal), word processors, in contrast to text editors, will often insert these specialized characters.
Python read it well enough, but threw an exception when trying to print it to the console window. As the error output shows, the error message emanates from lib\encodings\cp862.py, which corresponds to code page 862, MS_DOS's code page for Hebrew. Windows attempts to convert the ANSI character 0x93 (147) to the Unicode U+201C ("LEFT DOUBLE QUOTATION MARK"), which Python's default encoding cannot support.
Executing the CHCP (Change Codepage) command gives:
C:\junk>chcp
Active code page: 862
C:\junk>
Changing the CMD window's codepage to CP 1252 ("Latin-1") solves the problem:
C:\junk>chcp 1252
Active code page: 1252
C:\junk>python x.py x.txt
line="Line 1"
line="Line 2 “text”"
line="Line 3"
C:\junk>

UnicodeDecodeError in Python 2.7

I am trying to read a utf-8 encoded xml file in python and I am doing some processing on the lines read from the file something like below:
next_sent_separator_index = doc_content.find(word_value, int(characterOffsetEnd_value) + 1)
Where doc_content is the line read from the file and word_value is one of the string from the the same line. I am getting encoding related error for above line whenever doc_content or word_value is having some Unicode characters. So, I tried to decode them first with utf-8 decoding (instead of default ascii encoding) as below :
next_sent_separator_index = doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
But I am still getting UnicodeDecodeError as below :
Traceback (most recent call last):
File "snippetRetriver.py", line 402, in <module>
sentences_list,lemmatised_sentences_list = getSentenceList(form_doc)
File "snippetRetriver.py", line 201, in getSentenceList
next_sent_separator_index = doc_content.decode('utf-8').find(word_value.decode('utf-8'), int(characterOffsetEnd_value) + 1)
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128)
Can anyone suggest me a suitable approach / way to avoid these kind of encoding errors in python 2.7 ?
codecs.utf_8_decode(input.encode('utf8'))

Categories

Resources