Changing the preferred encoding for Windows7 command prompt - python

Using Python3.3
Running a python script to do some CAD file conversion, but getting below error
Conversion Failed
Traceback (most recent call last):
File "Start.py", line 141, in convertLib
lib.writeLibrary(modFile,symFile)
File "C:\Python33\Eagle2Kicad/Library\Library.py", line 67, in writeLibrary
self.writeSymFile(symFile)
File "C:\Python33\Eagle2Kicad/Library\Library.py", line 88, in writeSymFile
devicepart.write(symFile)
File "C:\Python33\Eagle2Kicad/Common\Symbol.py", line 51, in write
symbol.write(symFile)
File "C:\Python33\Eagle2Kicad/Common\Symbol.py", line 114, in write
symFile.write(pin.symRep())
File "C:\Python33\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x96' in position 4: character maps to <undefined>
It seems the preferred encoding in my Windows7 command prompt is NOT cp1252 (typing chcp shows "Active code page 437"). How can I change it to cp1252?
Can anyone please suggest?
EDIT:
As the default preferred encoding of the python command line is cp1252, I tried running the script from python command line instead of the windows command prompt. But I am still getting the same error as above. Can anyone please suggest?

When writing to files, the console encoding is not taken into account; only the locale is. From the open() function documenation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
On your system, evidently that call returns 'cp1252'. The remedy is to always name the encoding for files explicitly; pass in an encoding argument to set the desired encoding for files:
with open(filename, 'w', encoding='utf8') as symFile:

Related

Which encoding should Python open function use?

I'm getting an exception when reading a file that contains a RIGHT DOUBLE QUOTATION MARK Unicode symbol. It is encoded in UTF-8 (0xE2 0x80 0x9D). The minimal example:
import sys
print(sys.getdefaultencoding())
f = open("input.txt", "r")
r.readline()
This script fails reading the first line even if the right quotation mark is not on the first line. The exception looks like that:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 102: char
acter maps to <undefined>
The input file is in utf-8 encoding, I've tried both with and without BOM. The default encoding returned by sys.getdefaultencoding() is utf-8.
This script fails on the machine with Python 3.6.5 but works well on another with Python 3.6.0. Both machines are Windows.
My questions are mostly theoretical, as this exception is thrown from external software that I cannot change, and it reads file that I don't wish to change. What should be the difference in these machines except the Python patch version? Why does vanilla open use cp1252 if the system default is utf-8?
As clearly stated in Python's open documentation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
Windows defaults to a localized encoding (cp1252 on US and Western European versions). Linux typically defaults to utf-8.
Because it is platform-dependent, use the encoding parameter and specify the encoding of the file explicitly.

CSV Module "UnicodeEncodeError" when using Dictwriter.writerows

I'm setting up a prod environment on a new Mac server that should mirror my dev environment. The job runs without a hitch on my dev computer, but on the server I'm getting this traceback:
Traceback (most recent call last):
File "/usr/local/share/Code/PycharmProjects/etl3/jira_scripts/jira_issues_incremental.py", line 189, in <module>
writer.writerows(rows)
File "/usr/local/bin/anaconda3/envs/etl3/lib/python3.5/csv.py", line 156, in writerows
return self.writer.writerows(map(self._dict_to_list, rowdicts))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 1195: ordinal not in range(128)
This job is being run through the Run Shell Script terminal in the Automator App. I've checked the sys.defaultencoding() in the Automater terminal, as well as on the machine itself. Everything says utf8. I've also checked the encoding in my PostgreSQL database, and that is also set to UTF8. Here is my open statement for the file that the Dictwriter is writing to:
with open(loadfile, 'w') as outf:
writer = csv.DictWriter(
f=outf,
delimiter='|',
fieldnames=fieldnames,
extrasaction='ignore',
escapechar=r'/',
quoting=csv.QUOTE_MINIMAL
)
writer.writerows(rows)
I'm a little stumped as to where to even start to track down this error since all the default encodings seem to be correct... I should mention that this file is then copied to a PostgreSQL database using the psycopg2.cursor.copy_from command after, so the file should be written in a mode compatible with that.
You did not specify an encoding for your file, so the default codec is used for your system. Currently that is ASCII. See the open() documentation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
Specify a different codec instead. UTF-8 would work:
with open(loadfile, 'w', encoding='utf8') as outf:
sys.getdefaultencoding() doesn't apply here; that's merely the default for unqualified str.encode() calls.

python Unicode decode error when accessing records of OrderedDict

using python 3.5.2 on windows (32), I'm reading a DBF file which returns me an OrderedDict.
from dbfread import DBF
Table = DBF('FME.DBF')
for record in Table:
print(record)
When accessing the first record all is ok until I reach a record which contains diacritics:
Traceback (most recent call last):
File "getdbe.py", line 3, in <module>
for record in Table:
File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\dbf.py", line 311, in _iter_records
for field in self.fields]
File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\dbf.py", line 311, in <listcomp>
for field in self.fields]
File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\field_parser.py", line 75, in parse
return func(field, data)
File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\field_parser.py", line 83, in parseC
return decode_text(data.rstrip(b'\0 '), self.encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 11: ordinal not in range(128)
Even if I don't print the record I still have the problem.
Any idea ?
dbfread failed to detect the correct encoding from your DBF file. From the Character Encodings section of the documentation:
dbfread will try to detect the character encoding (code page) used in the file by looking at the language_driver byte. If this fails it reverts to ASCII. You can override this by passing encoding='my-encoding'.
Emphasis mine.
You'll have to pass in an explicit encoding; this will invariably be a Windows codepage. Take a look at the supported codecs in Python; you'll have to use one that starts with cp here. If you don't know what codepage to you you'll have some trial-and-error work to do. Note that some codepages overlap in characters, so even if a codepage appears to produce legible results, you may want to continue searching and trying out different records in your data file to see what fits best.

UnicodeDecodeError even when importing simple txt file in Python

I would like to import even a simple text file into Python. For example, here's the contents of example.txt:
hello
my
friend
Very simple. However, when I try to import the file and read it:
f = open('example.txt')
f.read()
I get the following error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f.read()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
What's the source of this problem? Clearly there are not any non-ascii characters in the file.
I've tried this in IDLE, terminal (Mac OSX) and Rodeo and get similar issues in all.
I'm very new to Python and am concerned I may have screwed up something in installation. I've downloaded various versions over the years, straight from Python, Anaconda, macports, etc. and I'm wondering if the various sources are not playing nicely...
Python 3.5.1 on OSX 10.11.4.
Maybe your file is saved with the encoding UTF-8 with BOM (Byte order mark). Try to save you file explicit as UTF-8 (without BOM). While the BOM is not included in the ASCII codec, it causes an UnicodeError.
Hope this helps!

My python can only open the saved text file by notepad, why?

I am using Pyhton3.4.1 and win7. I am trying to reading a txt file exported from a software. it seems that python cannot read this text file. But I found if I open the text file by notepad and add a space in any place and save it, the python works well then.
I tried the same code and same file on my mac, it has the same problem as in windows.
For original text file, not working,open and saved in windows notepad, working,
open ans saved in mac textedit, not working.
I am doubting the original coding of the text file might not be right.
Thanks
Python code
InputFileName=input("Please tell me the input file name:")
#StartLNum=int(input("Please tell me the start line number:"))
#EndLNum=int(input("Please tell me the end line number:"))
StartLNum=18
EndLNum=129
lnum=1
OutputName='out'+InputFileName
fw=open(OutputName,'w')
with open(InputFileName,"r") as fr:
for line in fr:
if (lnum >= StartLNum) & (lnum<=EndLNum):
#print(line)
fw.write(line)
lnum+=1
fw.close()
Shell
>>> ================================ RESTART ================================
>>>
Please tell me the input file name:Jul-18-2014.txt
Traceback (most recent call last):
File "C:\Users\Jeremy\Desktop\read.py", line 13, in <module>
for line in fr:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb3 in position 4309: illegal multibyte sequence
>>> ================================ RESTART ================================
>>>
Please tell me the input file name:Jul-18-2014.txt
>>>
Plus, the error below is the same code reported on my mac(Python3.4.1,OS10.9)
Traceback (most recent call last):
File "/Users/Jeremy/Desktop/read.py", line 14, in <module>
for line in fr:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb3 in position 4174: ordinal not in range(128)
When you save the file in Notepad, the file is reencoded to be saved as your default file encoding for your Windows installation. Notepad auto-detected the encoding when it opened the file, however.
Python opens file using that same system encoding, by default, which is why you can now open the file. Quoting the open() function documentation:
encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.
You'll have to explicitly specify the correct encoding for the file if you wanted to open it directly in Python:
with open(InputFileName, "r", encoding='utf-8-sig') as fr:
I used 'utf-8-sig' as an example here, as that is a file encoding that Notepad can auto-detect. It could well be that the encoding is UTF-16 or plain UTF-8 or any number of other encodings, however.
If you think that the page is encoded with a specific ANSI codepage you still have to name the exact codepage. Your system is configured to use code page 936 (GBK) but that is not the correct encoding for this file.
See the codecs module for a list of supported encodings.

Categories

Resources