How to use Special characters in Python? - python

this is my code
f = open('test.txt','w')
f.write("\N{Circled White Star}")
f.close
And I get this error
Traceback (most recent call last):
File "f:/mc experiment/python/SkyblockSniper-main/df.py", line 3, in <module>
f.write("\N{Circled White Star}")
File "F:\programing\python\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u272a' in position 0: character maps to <undefined>
What I expected
The test.txt file should have
✪
What i got
<nothing>

Try changing the encoding of the file you open. UTF-8 worked for my testing.
You should also open files using context managers instead of the way you did it.
star = "✪"
with open('test.txt', 'w', encoding="UTF-8") as f:
f.write(f"\n{star}")

Without access to the original Circled White Star symbol, unicode may be useful
star = '\u272A'
with open('test.txt', 'w', encoding="UTF-8") as f:
f.write(star)

Related

Editing UTF-8 text file on Windows

I'm trying to manipulate a text file with song names. I want to clean up the data, by changing all the spaces and tabs into +.
This is the code:
input = open('music.txt', 'r')
out = open("out.txt", "w")
for line in input:
new_line = line.replace(" ", "+")
new_line2 = new_line.replace("\t", "+")
out.write(new_line2)
#print(new_line2)
fh.close()
out.close()
It gives me an error:
Traceback (most recent call last):
File "music.py", line 3, in <module>
for line in input:
File "C:\Users\nfeyd\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2126: character maps to <undefined>
As music.txt is saved in UTF-8, I changed the first line to:
input = open('music.txt', 'r', encoding="utf8")
This gives another error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u039b' in position 21: character maps to <undefined>
I tried other things with the out.write() but it didn't work.
This is the raw data of music.txt.
https://pastebin.com/FVsVinqW
I saved it in windows editor as UTF-8 .txt file.
If your system's default encoding is not UTF-8, you will need to explicitly configure it for both the filehandles you open, on legacy versions of Python 3 on Windows.
with open('music.txt', 'r', encoding='utf-8') as infh,\
open("out.txt", "w", encoding='utf-8') as outfh:
for line in infh:
line = line.replace(" ", "+").replace("\t", "+")
outfh.write(line)
This demonstrates how you can use fewer temporary variables for the replacements; I also refactored to use a with context manager, and renamed the file handle variables to avoid shadowing the built-in input function.
Going forward, perhaps a better solution would be to upgrade your Python version; my understanding is that Python should now finally offer UTF-8 by default on Windows, too.

Error when printing random line from text file

I need to print a random line from the file "Long films".
My code is:
import random
with open('Long films') as f:
lines = f.readlines()
print(random.choice(lines))
But it prints this error:
Traceback (most recent call last):
line 3, in <module>
lines = f.readlines()
line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 36: ordinal not in range(128)
What do I need to do in order to avoid this error?
The problem is not with printing, it is with reading. It seems your file has some special characters. Try opening your file with a different encoding:
with open('Long films', encoding='latin-1') as f:
...
Also, have you made any settings to your locale? Have you set any encoding scheme at the top of your file? Ordinarily, python3 will "helpfully" decode your text to utf-8, so you typically should not be getting this error.

I get python frameworks error while reading a csv file, when I try a different easier file it works fine

import csv
exampleFile = open('example.csv')
exampleReader = csv.reader(exampleFile)
for row in exampleReader:
print('Row #' + str(exampleReader.line_num) + ' ' + str(row))
Traceback (most recent call last):
File "/Users/jossan113/Documents/Python II/test.py", line 7, in <module>
for row in exampleReader:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x89 in position 4627: ordinal not in range(128)
Do anyone have any idea why I get this error? I tried an very easy cvs file from the internet and it worked just fine, but when I try the bigger file it doesn't
The file contains unicode characters, which was painful to deal with in old versions of python, since you are using 3.5 try opening the file as utf-8 and see if the issue goes away:
exampleFile = open('example.csv', encoding="utf-8")
From the docs:
Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see locale.getpreferredencoding()). To decode a file using a different encoding, use the encoding argument of open:
import csv
with open('some.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
csv modeule docs

How to print to a file a string with diacritics?

I have a word in Polish as a string variable which I need to print to a file:
# coding: utf-8
a = 'ilośc'
with open('test.txt', 'w') as f:
print(a, file=f)
This throws
Traceback (most recent call last):
File "C:/scratches/scratch_3.py", line 5, in <module>
print(a, file=f)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u015b' in position 3: character maps to <undefined>
Looking for existing answers (with .decode("utf-8"), or with .encode("utf-8")) and trying various incantations I finally managed the file to be created.
Unfortunately what was written was b'ilośc'and not ilośc. When I tried to decode that before printing to the file, I got back to the initial error and the same traceback.
How to write a str containing diacritics to a file so that it is a string and not a bytes representation?
The traceback says that you are trying to save 'ś' ('\u015b') character using cp1252 encoding (the default is locale.getpreferredencoding(False)—your Windows ANSI code page) that can't represent this Unicode character (there more than a million Unicode characters and cp1252 is a single-byte encoding that can represent only 256 characters).
Use a character encoding that can represent the desired characters:
with open(filename, 'w', encoding='utf-16') as file:
print('ilośc', file=file)
a = 'ilośc'
with open('test.txt', 'w') as f:
f.write(a)
You can even write to the file using the binary mode:
a = 'ilośc'
with open('test.txt', 'wb') as f:
f.write(a.encode())

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

I don't know exactly what's the source of this error and how to fix it. I am getting it by running this code.
Traceback (most recent call last):
File "t1.py", line 86, in <module>
write_results(results)
File "t1.py", line 34, in write_results
dw.writerows(results)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Any explanation is really appreciated!
I changed the code and now I get this error:
File "t1.py", line 88, in <module>
write_results(results)
File "t1.py", line 35, in write_results
dw.writerows(results)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Here's the change:
with codecs.open('results.csv', 'wb', 'utf-8') as f:
dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
dw.writer.writerow(dw.fieldnames)
dw.writerows(results)
The error is raised by this part of the code:
with open('results.csv', 'w') as f:
dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
dw.writer.writerow(dw.fieldnames)
dw.writerows(results)
You're opening an ASCII file, and then you're trying to write non-ASCII data to it. I guess that whoever wrote that script happened to never encounter a non-ASCII character during testing, so he never ran into an error.
But if you look at the docs for the csv module, you'll see that the module can't correctly handle Unicode strings (which is what Beautiful Soup returns), that CSV files always have to be opened in binary mode, and that only UTF-8 or ASCII are safe to write.
So you need to encode all the strings to UTF-8 before writing them. I first thought that it should suffice to encode the strings on writing, but the Python 2 csv module chokes on the Unicode strings anyway. So I guess there's no other way but to encode each string explicitly:
In parse_results(), change the line
results.append({'url': url, 'create_date': create_date, 'title': title})
to
results.append({'url': url, 'create_date': create_date, 'title': title.encode("utf-8")})
That might already be sufficient since I don't expect URLs or dates to contain non-ASCII characters.
This should work. works for me. Code snippet
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf8')
data = [["a", "b", u'\xe9']]
with open("output.csv", "w") as csv_file:
writer = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
writer.writerows(data)

Categories

Resources