DjangoUnicodeDecodeError while getting field - python

I have faced a problem while I was reading from GEOSGeometry object. I have used this code
ds = DataSource(shp file path)
lyr = ds[0]
for feat in lyr:
geom_t = feat.geom.transform(wgs84, clone=True)
name =feat.get('name')
this code works fine for my shape files.but if name field contains a utf8 string such as 'تست' it raises this error
DjangoUnicodeDecodeError at /views/importdata/
'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte. You passed in b'\xc8\xe1\xe6\xc7\xd1 \xc7\xe3\xc7\xe3 \xd1\xd6\xc7' (<class 'bytes'>)
Unicode error hint
The string that could not be encoded/decoded was: �����
well I find out this is an internal error which is related to gdal or geos wrapper in django. the error comes from this line
return force_text(string, encoding=self._feat.encoding, strings_only=True)
in field.py in this directory
D:\Python\Python36\lib\site-packages\django\contrib\gis\gdal\field.py in as_string
is there any way to find a solution for this problem?
thanks

well for the other people who might face this problem . I had to set encoding in datasource definition.
here is the main part
ds = DataSource(shp,encoding='cp1256')

Related

problems of reading a collection of files including non-ascii characters

I am trying to build word vectors using the following code segment
DIR = "C:\Users\Desktop\data\\rec.sport.hockey"
posts = [open(os.path.join(DIR,f)).read() for f in os.listdir(DIR)]
x_train = vectorizer.fit_transform(posts)
However, the code returns the following error message
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240: invalid start byte
I think it is related to some non-ascii characters. How to solve this issue?
Is the file automatically generated? If not, one simple solution is to open the file with Notepad++ and convert the encoding to utf-8.

Importing VISA waveform from an oscilloscope into Python

I having problems with the return result from this VISA acquisition call:
ribData = []
ribData = inst.query('CURVe?')
I am using this call to grab a waveform from an oscilloscope. I am developing this program in Python.
If the values are positive, the call returns the binary values and I can graph them, but if I drop the waveform on the scope below the halfway point I receive the error:
Traceback (most recent call last):
File "C:/_Python/TDS3054/mainTds.py", line 107, in cButton
ribData = inst.query('CURVe?')
File "C:\Python34\lib\site-packages\pyvisa\resources\messagebased.py", line 384, in query
return self.read()
File "C:\Python34\lib\site-packages\pyvisa\resources\messagebased.py", line 309, in read
message = self.read_raw().decode(enco)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 401: ordinal not in range(128)
I use the following VISA calls to set up the oscilloscope prior to the acquisition.
inst.write('DATa:SOUrce CH2')
inst.write('DATa:WIDth 1')
inst.write('DATA:START 10')
inst.write('DATA:STOP 800')
inst.write('DATa:ENCdg RIBinary') # RIB -128 to +127
Because, the failure occurs on the acquisition call (CURVe), I was wondering is there a VISA library call available to solve this problem. Perhaps I need to set the Unicode to UTF-8 or perhaps VISA doesn't deal with Unicodes or perhaps this is not my problem.
Most likely it was a Unicode problem, but I found the answer in the pyVisa interface document. In this document I found the function query_binary_values(), and I replaced the inst.Query('CURVe") with it. This is how I used it.
tdsData = inst.query_binary_values('CURVe?', datatype='b', is_big_endian=True)
The data that was returned did not have the UnicodeDecodeError, and I was able to plot all the lines with no problem.
Notice in that error it states that it's using 'ascii' encoding. This is not preferred as I'm pretty sure your instrument is outputting a wider range of value than that encoding method can handle:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 401: ordinal not in range(128)
So to fix (at least in my case) you can specify a different encoding method.
First reference your instrument in python then set its encoding to 'latin-1'
eg:
inst = rm.open_resource('USB0::0x1AB1::0x0588::DS1ET152915193::INSTR')
inst.encoding = "latin-1"
Then to read in the values and convert them to an array of scalars just do something like:
data = map(ord, inst.query(":WAVeform:DATA? FFT")) #UTF string to int array conversion
data should now contain all "full-ranged" values.

Unicode Error in Django while loading in data

So I'm trying to load this line in as a name for a model:
"Auf der grünen Wiese (1953)"
but I get the error
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 70: invalid start byte
I'm looking at: http://docs.python.org/2/howto/unicode.html#the-unicode-type
but I'm still not exactly sure about the fix to this problem. I can cast it as a unicode with the option to replace/ignore the error but I don't think that is the most ideal solution?
I also see that django provides a few functions to help with this stuff: https://docs.djangoproject.com/en/dev/ref/unicode/ but I'm still not quite sure how to approach it.
The line is encoded using latin1. To properly decode it
you should do (assuming Python 2.x):
line = 'Auf der gr\xfcnen Wiese (1953)'
name = line.decode('latin1')
If you are reading this from a file, you can also do:
f = codecs.open(path, 'r', 'latin1')
name = f.readline().strip()

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I am working google appengine python 2.5.
I am experiencing unicodedecoderror on following code because myuser name has following value
userName = unicode(userName).encode('utf-8') # äºï¼égãwmj is value in this variable
userName = unicode(userName).encode('utf-8')
strData = '{\"Sid\" :1, \"Oppid\" :%s, \"Aid\" :%s, \"EC\" :\"%s\", \"Name\" :\%s"' % (enemyID, userID, userEmpCode,userName)
params = {'deviceToken' : oDeviceToken,
'message' : strMessage,
'CertificateId' : certificateId,
'Data' : strData
}
result = urlfetch.fetch(url = url,
payload = urllib.urlencode(params),
method = urlfetch.POST,
headers = {"Authorization" : authString},
deadline = 30
)
I am doing the following steps on username to encode it into utf-8 so that I could send it as payload.
username = unicode(username).encode(utf-8)
I believe the error occurs when I call urllib.urlencode(params)
Please guide what is going wrong.. or you can..
and what should be ultimate strategy to deal with unicode string on appengine python..
I have tried different solutions reading different threads.. but still did not work
You're problem seems to be that you're calling unicode(userName) without an encoding on your already-encoded string, so it "defaults to the current default string encoding", which seems to be ascii in your case.
You probably should not call unicode in any case, if you know it's a unicode value, you're fine already, if not, call .decode with the correct encoding.
If you're unsure, test using isinstance since trying to decode a unicode value will result in yet another error.
I had a similar issue when porting Python 3 code from Ubuntu Linux 14.04 to FreeBSD 10.3. The latter system seems to use ASCII by default instead of UTF-8 when opening files with Python 3.4.4.
Specifying encoding='utf-8' with the file open command resolved my issue:
open('filepath', encoding='utf-8')

Python Encoding\Decoding for writing to a text file

I've honestly spent a lot of time on this, and it's slowly killing me. I've stripped content from a PDF and stored it in an array. Now I'm trying to pull it back out of the array and write it into a txt file. However, I do not seem to be able to make it happen because of encoding issues.
allTheNTMs.append(contentRaw[s1:].encode("utf-8"))
for a in range(len(allTheNTMs)):
kmlDescription = allTheNTMs[a]
print kmlDescription #this prints out fine
outputFile.write(kmlDescription)
The error i'm getting is "unicodedecodeerror: ascii codec can't decode byte 0xc2 in position 213:ordinal not in range (128).
I'm just messing around now, but I've tried all kinds of ways to get this stuff to write out.
outputFile.write(kmlDescription).decode('utf-8')
Please forgive me if this is basic, I'm still learning Python (2.7).
Cheers!
EDIT1: Sample data looks something like the following:
Chart 3686 (plan, Morehead City) [ previous update 4997/11 ] NAD83 DATUM
Insert the accompanying block, showing amendments to coastline,
depths and dolphins, centred on: 34° 41´·19N., 76° 40´·43W.
Delete R 34° 43´·16N., 76° 41´·64W.
When I add the print type(raw), I get
Edit 2: When I just try to write the data, I receive the original error message (ascii codec can't decode byte...)
I will check out the suggested thread and video. Thanks folks!
Edit 3: I'm using Python 2.7
Edit 4: agf hit the nail on the head in the comments below when (s)he noticed that I was double encoding. I tried intentionally double encoding a string that had previously been working and produced the same error message that was originally thrown. Something like:
text = "Here's a string, but imagine it has some weird symbols and whatnot in it - apparently latin-1"
textEncoded = text.encode('utf-8')
textEncodedX2 = textEncoded.encode('utf-8')
outputfile.write(textEncoded) #Works!
outputfile.write(textEncodedX2) #failed
Once I figured out I was trying to double encode, the solution was the following:
allTheNTMs.append(contentRaw[s1:].encode("utf-8"))
for a in range(len(allTheNTMs)):
kmlDescription = allTheNTMs[a]
kmlDescriptionDecode = kmlDescription.decode("latin-1")
outputFile.write(kmlDescriptionDecode)
It's working now, and I sure appreciate all of your help!!
My guess is that output file you have opened has been opened with latin1 or even utf-8 codec hence you are not able to write utf-8 encoded data to that because it tries to reconvert it, otherwise to a normally opened file you can write any arbitrary data string, here is an example recreating similar error
u = u'सच्चिदानन्द हीरानन्द वात्स्यायन '
s = u.encode('utf-8')
f = codecs.open('del.text', 'wb',encoding='latin1')
f.write(s)
output:
Traceback (most recent call last):
File "/usr/lib/wingide4.1/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/usr/lib/python2.7/codecs.py", line 691, in write
return self.writer.write(data)
File "/usr/lib/python2.7/codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
Solution:
this will work, if you don't set any codec
f = open('del.txt', 'wb')
f.write(s)
other option is to directly write to file without encoding the unicode strings, if outputFile has been opened with correct codec e.g.
f = codecs.open('del.text', 'wb',encoding='utf-8')
f.write(u)
Your error message doesn't seem to appear to relate to any of your Python syntax but actually the fact you're trying to decode a Hex value which has no equivalent in UTF-8.
HEX 0xc2 appears to represent a latin character - an uppercase A with an accent on the top. Therefore, instead of using "allTheNTMs.append(contentRaw[s1:].encode("utf-8"))", try:-
allTheNTMs.append(contentRaw[s1:].encode("latin-1"))
I'm not an expert in Python so this may not work but it would appear you're trying to encode a latin character. Given the error message you are receiving too, it would appear that when trying to encode in UTF-8, Python only looks through the first 128 entries given that your error appears to indicate that entry "0Xc2" is out of range which indeed it is out of the first 128 entries of UTF-8.

Categories

Resources