UnicodeError when importing python file - python

I am trying to import a python file Sonderbuch_BASECASE_3ph.py into another python file test.py. test.py is in the main dir foo while Sonderbuch_BASECASE_3ph.py is in a subdir grid_data.
Sonderbuch_BASECASE_3ph.py has a function with the same name, which I need to import as well:
# Sonderbuch_BASECASE_3ph
from numpy import array
def Sonderbuch_BASECASE_3ph():
.....
Both of these attempts to import result in a SyntaxError:
from grid_data import Sonderbuch_BASECASE_3ph
import grid_data.Sonderbuch_BASECASE_3ph
Output:
Traceback (most recent call last):
File "C:/Users/Artur/Desktop/foo/test.py", line 1, in <module>
from grid_data import Sonderbuch_BASECASE_3ph
File "C:\Users\Artur\Desktop\foo\grid_data\Sonderbuch_BASECASE_3ph.py", line 1550
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in position 29: invalid continuation byte
Edit:
The encoding of the file seems to be windows-1252, at least that is what pycharm is proposing. Decoding the file in windows-1252 does not solve the ErrorMsg though. Sonderbuch_BASECASE_3hp.py is just a storage file for a dictionary. I was hoping I could just import it.
None of the encodings seem to work.

What's in your Sonderbuch_BASECASE_3ph.py file exactly?
I guess that the files use different encoding hence importing one to another may result in error. My guess is that your test.py is in UTF-8 while the other file is encoded with latin-1 or something like that. Check what's the encoding of the files (you can do it in PyCharm, Sublime, Notepad++, etc.). In Pycharm, you can see the encoding of a file at the bottom right (by default).

Related

Which encoding should Python open function use?

I'm getting an exception when reading a file that contains a RIGHT DOUBLE QUOTATION MARK Unicode symbol. It is encoded in UTF-8 (0xE2 0x80 0x9D). The minimal example:
import sys
print(sys.getdefaultencoding())
f = open("input.txt", "r")
r.readline()
This script fails reading the first line even if the right quotation mark is not on the first line. The exception looks like that:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 102: char
acter maps to <undefined>
The input file is in utf-8 encoding, I've tried both with and without BOM. The default encoding returned by sys.getdefaultencoding() is utf-8.
This script fails on the machine with Python 3.6.5 but works well on another with Python 3.6.0. Both machines are Windows.
My questions are mostly theoretical, as this exception is thrown from external software that I cannot change, and it reads file that I don't wish to change. What should be the difference in these machines except the Python patch version? Why does vanilla open use cp1252 if the system default is utf-8?
As clearly stated in Python's open documentation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
Windows defaults to a localized encoding (cp1252 on US and Western European versions). Linux typically defaults to utf-8.
Because it is platform-dependent, use the encoding parameter and specify the encoding of the file explicitly.

UnicodeDecodeError even when importing simple txt file in Python

I would like to import even a simple text file into Python. For example, here's the contents of example.txt:
hello
my
friend
Very simple. However, when I try to import the file and read it:
f = open('example.txt')
f.read()
I get the following error:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
f.read()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
What's the source of this problem? Clearly there are not any non-ascii characters in the file.
I've tried this in IDLE, terminal (Mac OSX) and Rodeo and get similar issues in all.
I'm very new to Python and am concerned I may have screwed up something in installation. I've downloaded various versions over the years, straight from Python, Anaconda, macports, etc. and I'm wondering if the various sources are not playing nicely...
Python 3.5.1 on OSX 10.11.4.
Maybe your file is saved with the encoding UTF-8 with BOM (Byte order mark). Try to save you file explicit as UTF-8 (without BOM). While the BOM is not included in the ASCII codec, it causes an UnicodeError.
Hope this helps!

Can't handle strings in windows

I have written a python 2.7 code in linux and it worked fine.
It uses
os.listdir(os.getcwd())
to read folder names as variables and uses them later in some parts.
In linux I used simple conversion trick to manually convert the non asci characters into asci ones.
str(str(tfile)[0:-4]).replace('\xc4\xb0', 'I').replace("\xc4\x9e", 'G').replace("\xc3\x9c", 'U').replace("\xc3\x87", 'C').replace("\xc3\x96", 'O').replace("\xc5\x9e", 'S') ,str(line.split(";")[0]).replace(" ", "").rjust(13, "0"),a))
This approach failed in windows. I tried
udata = str(str(str(tfile)[0:-4])).decode("UTF-8")
asci = udata.encode("ascii","ignore")
Which also failed with following
DEM¦-RTEPE # at this string error occured
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Python27\lib\lib-tk\Tkinter.py", line 1532, in __call__
return self.func(*args)
File "C:\Users\benhur.satir\workspace\Soykan\tkinter.py", line 178, in SparisDerle
udata = str(str(str(tfile)[0:-4])).decode("utf=8")
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa6 in position 3: invalid start byte
How can I handle such characters in windows?
NOTE:Leaving them UTF causes xlswriter module to fail, so I need to convert them to asci. Missing characters are not desirable yet acceptable.
Windows does not like UTF8. You probably get the folder names in the default system encoding, generally win1252 (a variant of ISO-8859-1).
That's the reason why you could not find UTF8 characters in the file names. By the way the exception says you found a character of code 0xa6, which in win1252 encoding would be |.
It does not say exactly what is the encoding on your windows system as it may depends on the localization, but it proves the data is not UTF8 encoded.
How about this?
You can use this for optional .replace()
In the module of string, there is a set of characters that can be used..
>>> import string
>>> string.digits+string.punctuation
'0123456789!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
>>>

UnicodeDecodeError when using Python 2.x unicodecsv

I'm trying to write out a csv file with Unicode characters, so I'm using the unicodecsv package. Unfortunately, I'm still getting UnicodeDecodeErrors:
# -*- coding: utf-8 -*-
import codecs
import unicodecsv
raw_contents = 'He observes an “Oversized Gorilla” near Ashford'
encoded_contents = unicode(raw_contents, errors='replace')
with codecs.open('test.csv', 'w', 'UTF-8') as f:
w = unicodecsv.writer(f, encoding='UTF-8')
w.writerow(["1", encoded_contents])
This is the traceback:
Traceback (most recent call last):
File "unicode_test.py", line 11, in <module>
w.writerow(["1", encoded_contents])
File "/Library/Python/2.7/site-packages/unicodecsv/__init__.py", line 83, in writerow
self.writer.writerow(_stringify_list(row, self.encoding, self.encoding_errors))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 691, in write
return self.writer.write(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 17: ordinal not in range(128)
I thought converting it to Unicode would be good enough, but that doesn't seem to be the case. I'd really like to understand what is happening so that I'm better prepared to handle these errors in other projects in the future.
From the traceback, it looks like I can reproduce the error like this:
>>> raw_contents = 'He observes an “Oversized Gorilla” near Ashford'
>>> raw_contents.encode('UTF-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 15: ordinal not in range(128)
>>>
Up until now, I thought I had a decent working knowledge of working with Unicode text in Python 2.x, but this has humbled me.
You should not use codecs.open() for your file. unicodecsv wraps the csv module, which always writes a byte string to the open file object. In order to write that byte string to a Unicode-aware file object such as returned by codecs.open(), it is implicitly decoded; this is where your UnicodeDecodeError exception stems from.
Use a file in binary mode instead:
with open('test.csv', 'wb') as f:
w = unicodecsv.writer(f, encoding='UTF-8')
w.writerow(["1", encoded_contents])
The binary mode is not strictly necessary unless your data contains embedded newlines, but the csv module wants to control how newlines are written to ensure that such values are handled correctly. However, not using codecs.open() is an absolute requirement.
The same thing happens when you call .encode() on a byte string; you already have encoded data there, so Python implicitly decodes to get a Unicode value to encode.

My python can only open the saved text file by notepad, why?

I am using Pyhton3.4.1 and win7. I am trying to reading a txt file exported from a software. it seems that python cannot read this text file. But I found if I open the text file by notepad and add a space in any place and save it, the python works well then.
I tried the same code and same file on my mac, it has the same problem as in windows.
For original text file, not working,open and saved in windows notepad, working,
open ans saved in mac textedit, not working.
I am doubting the original coding of the text file might not be right.
Thanks
Python code
InputFileName=input("Please tell me the input file name:")
#StartLNum=int(input("Please tell me the start line number:"))
#EndLNum=int(input("Please tell me the end line number:"))
StartLNum=18
EndLNum=129
lnum=1
OutputName='out'+InputFileName
fw=open(OutputName,'w')
with open(InputFileName,"r") as fr:
for line in fr:
if (lnum >= StartLNum) & (lnum<=EndLNum):
#print(line)
fw.write(line)
lnum+=1
fw.close()
Shell
>>> ================================ RESTART ================================
>>>
Please tell me the input file name:Jul-18-2014.txt
Traceback (most recent call last):
File "C:\Users\Jeremy\Desktop\read.py", line 13, in <module>
for line in fr:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb3 in position 4309: illegal multibyte sequence
>>> ================================ RESTART ================================
>>>
Please tell me the input file name:Jul-18-2014.txt
>>>
Plus, the error below is the same code reported on my mac(Python3.4.1,OS10.9)
Traceback (most recent call last):
File "/Users/Jeremy/Desktop/read.py", line 14, in <module>
for line in fr:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb3 in position 4174: ordinal not in range(128)
When you save the file in Notepad, the file is reencoded to be saved as your default file encoding for your Windows installation. Notepad auto-detected the encoding when it opened the file, however.
Python opens file using that same system encoding, by default, which is why you can now open the file. Quoting the open() function documentation:
encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.
You'll have to explicitly specify the correct encoding for the file if you wanted to open it directly in Python:
with open(InputFileName, "r", encoding='utf-8-sig') as fr:
I used 'utf-8-sig' as an example here, as that is a file encoding that Notepad can auto-detect. It could well be that the encoding is UTF-16 or plain UTF-8 or any number of other encodings, however.
If you think that the page is encoded with a specific ANSI codepage you still have to name the exact codepage. Your system is configured to use code page 936 (GBK) but that is not the correct encoding for this file.
See the codecs module for a list of supported encodings.

Categories

Resources