load .json into python; UnicodeDecodeError - python

I am trying to load a json file into python with no success. I have been googling a solution for the past few hours and just cannot seem to get it to load. I have tried to load it using the same json.load('filename') function that has worked for everyone. I keep getting :
"UnicodeDecodeError: 'utf8' codec can't decode byte 0xc2 in postion 124: invalid continuation byte"
Here is the code I am using
import json
json_data = open('myfile.json')
for line in json_data:
data = json.loads(line) <--I get an error at this.
Here is a sample line from my file
{"topic":"security","question":"Putting the Biba-LaPadula Mandatory Access Control Methods to Practise?","excerpt":"Text books on database systems always refer to the two Mandatory Access Control models; Biba for the Integrity objective and Bell-LaPadula for the Secrecy or Confidentiality objective.\n\nText books ...\r\n "}
What is my error if this seems to have worked for everyone in every example I have googled?

Have you tried:
json.loads(line.decode("utf-8"))
Similar question asked here: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2
Edit:
If the above does not work,
json.loads(line.decode("utf-8","ignore"))
will.

Related

Python3 now throwing ASCII error loading data from JSON file

I am relatively new to Python and spent at least two hours searching both the internet and now StackOverFlow and could not find out what the problem is here. My code was working in Python2, now persistent error message below. I even found this code online for an apparent answer to my question claiming it worked in python3, but it's not for me. Odd.
import json, sys
with open('2689364.json', 'r', encoding='utf-8') as json_data:
d = json.load(json_data)
print(d)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb6' in position 1938: ordinal not in range(128)

Encoding error while trying to read Hindi text from csv file using pandas

I am trying to read Devanagari Text from a csv file using pandas. I am getting an error when using encoding="utf-8". When i changed encoding="latin1", I am getting NaN values.
Please help if someone has already encountered a similar problem or knows how to solve this.
Thanks in advance.
Here is the Error I am getting:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position
31: invalid start byte

I am trying to get html data from a site using urllib but for some sites i am ending up with some unknown characters in python

Hey guys i am trying to get a html data from a site using urllib.openurl.read() but for some sites all i am getting is data link this *
6\xbdW\xb6\xd6\xff\xca\x9d\x9bO|\xc0\x96a\xc7\xc8\xf7\xa7\x10-\x8aM{\xf8\x*
and i have no clue what it is and why i am getting like this. I tried googling it some said there is encoding decoding problem i tried that as well but as you can see no luck there so please guide me in this darkness. Here is my code --- >
url = "http://mangafox.me/manga/online_the_comic/c001/1.html" # for this site and some more its not working
page = urllib.urlopen(url).read()
print page
and you guys know whats happening after printing this code.
This page its on gzip format, you got to unzip before take the data:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)
0x8b in the begin of the code it means gzip format.
You should take a look in this question:
twitter trends api UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: unexpected code byte

python pyinstaller UnicodeDecodeError cp949

I get unicodedecodeerror when I try to install pyinstaller.
The error message reades:
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 208687: illegal multibyte sequence
When I google this error, it looks like an error with codec to read the file.
Tried some of the solutions found online but didn't work.
How can I fix this?
I think in your code have function to print some data with the codec which the window shell does not support display. Remove them and try again(I cannot comment because not enough rep so i wrote here)

Unicode Error in Django while loading in data

So I'm trying to load this line in as a name for a model:
"Auf der grĂ¼nen Wiese (1953)"
but I get the error
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc in position 70: invalid start byte
I'm looking at: http://docs.python.org/2/howto/unicode.html#the-unicode-type
but I'm still not exactly sure about the fix to this problem. I can cast it as a unicode with the option to replace/ignore the error but I don't think that is the most ideal solution?
I also see that django provides a few functions to help with this stuff: https://docs.djangoproject.com/en/dev/ref/unicode/ but I'm still not quite sure how to approach it.
The line is encoded using latin1. To properly decode it
you should do (assuming Python 2.x):
line = 'Auf der gr\xfcnen Wiese (1953)'
name = line.decode('latin1')
If you are reading this from a file, you can also do:
f = codecs.open(path, 'r', 'latin1')
name = f.readline().strip()

Categories

Resources