Read unicode encoded JSON and access it using Python

Read unicode encoded JSON and access it using Python - python

I have a very huge unicode JSON data of the following format
{u'completed': True, u'entries': [{u'absolute_time':
u'2017-05-17T10:41:52Z', u'command': None, u'level':
u'NORMAL',......
It has Json objects within JSON objects. Unable to read it and parse it due to the encoding. Tried the following code.
Could someone please tell how to parse it and convert it to a normal JSON object.
with open(r"inp.json", 'r') as jsonData:
jsonToPython = json.load(jsonData) #gives error here itself
#jsonData = ast.literal_eval(jsonData)
print(json.dumps(jsonToPython))
#print (jsonToPython)

You can try to load the (stringified) python object using ast:
>>> #obj = open(r"inp.json", 'r').read()
>>> obj = "{u'completed': True, u'entries': [{u'absolute_time': u'2017-05-17T10:41:52Z'}]}"
>>> ast.literal_eval(obj)
{'completed': True, 'entries': [{'absolute_time': '2017-05-17T10:41:52Z'}]}
>>>

Related

Python JSON encoding invalid json format

i've stucked, i have a json with symbols like ' in values and syntax with ' and "
Example mix double qoute and single qoutelink
json ={
'key': "val_'_ue",
'secondkey': 'value'
}
With json loads and json dumps i got a str type not a dict to iterate, any ideas how i get it fixed?
print(postParams)# = {'csrf-token': "TOKEN_INCLUDES_'_'_symbols", 'param2': 'params2value'}
jsn_dict2 = json.loads(json.dumps(postParams))
print(type(jsn_dict2)) # ERROR HERE why str and not dict
for key, val in jsn_dict2.items():
print("key="+str(key))

you dont need to dumps() an already string json data:
jsn_dict = json.loads(json.dumps(res))
should be :
jsn_dict = json.loads(res)
UPDATE
according to comments the data is looks like so:
postParams = "{'csrf-token': \"TOKEN_INCLUDES_'_'_symbols\", 'add-to-your-blog-submit-button': 'add-to-your-blog-submit-button'}"
so i found an library that can help damaged json string like this one:
first run :
pip install demjson
then this code can help you:
from demjson import decode
data = decode(postParams)
data
>>> {'csrf-token': "TOKEN_INCLUDES_'_'_symbols",
'add-to-your-blog-submit-button': 'add-to-your-blog-submit-button'}

In your json u have missed the "," comma separation between two keys. The actual structure of the JSON is
json_new ={
'key': "val_'_ue",
'secondkey': 'value'
}
use
json_actual=json.dumps(json_new)
to read,
json_read=json.loads(json_actual)

how to dynamically create array variable in json

In python, I can use the below code to dynamically create IP address array to store n number of IP addresses.
for i in range(12):
addr[i] = "10.1.1." + str(i)
print addr[i]
I want a similar code for JSON file. Is it possible to define variables with looping in JSON file?

You don't "loop over a json file" - this makes no sense actually. You create a json string representation of a Python object - usually a dict - and eventually write it to a file (or send it as HTTP response content etc), or create a Python object from a json string:
>>> import json
>>> addr = ["10.1.1.{}".format(i) for i in range(12)]
>>> print(repr(json.dumps(addr)))
'["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"]'
>>> print(repr(json.dumps({"addr": addr})))
'{"addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"]}'
If you have an existing json string (from a file or any other source) and want to update it, you just parse it (with json.loads) to a Python object, update the python object, and dump it back to json. Assuming you have the following json string (either from a file, http response or whatever, doesn't matter):
>>> source = '{"foo": "bar", "addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2"]}'
and you want to change "foo"'s value to "baaz", add addresses up to 10.1.1.11 and add a "frobnicate: true" key=>value pair:
>>> import json
>>> data = json.loads(source)
>>> data["foo"] = "baaz"
>>> for i in range(12):
... ip = "10.1.1.{}".format(i)
... if ip not in data["addr"]:
... data["addr"].append(ip)
...
>>> data["frobnicate"] = True
>>> updated_json = json.dumps(data)
>>> print(repr(updated_json))
'{"foo": "baaz", "addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"], "frobnicate": true}'
>>>

converting json string to dict in python

I am getting a value in request.body, it is like :
a = '[data={"vehicle":"rti","action_time":"2015-04-21 14:18"}]'
type(a) == str
I want to convert this str to dict. i have tried by doing this
b=json.loads(a)
But am getting error
ValueError: No JSON object could be decoded

The data you are receiving is not properly formatted JSON. You're going to have to do some parsing or data transformation before you can convert it using the json module.
If you know that the data always begins with the literal string '[data=' and always ends with the literal string ']', and that the rest of the data is valid json, you can simply strip off the problematic characters:
b = json.loads(a[6:-1])
If the data can't be guaranteed to be in precisely that format, you'll have to learn what the actual format is, and do more intelligent parsing.

It is not a valid json format that you are receiving.
A valid format is of type:
'{"data":{"vehicle":"rti","action_time":"2015-04-21 14:18"}}'

import json
a = '[data={"vehicle":"rti","action_time":"2015-04-21 14:18"}]'
r = a.split("=")
r[:] = r[0].replace("[", ""), r[1].replace("]", "")
d = '{"%s":%s}'%(r[0],r[1])
dp = json.loads(d)
print dp

Error in parsing .json file using Python

I am getting the following error. What does it mean?
AttributeError: 'bool' object has no attribute 'decode'
in code line : writer.writerow({k:v.decode('utf8') for k,v in dictionary.iteritems()})
My code looks like :
import json
import csv
def make_csv(data):
fname = "try.csv"
with open(fname,'wb') as outf:
dic_list = data['bookmarks']
dictionary = dic_list[0]
writer = csv.DictWriter(outf,fieldnames = sorted(dictionary.keys()), restval = "None", extrasaction = 'ignore')
writer.writeheader()
for dictionary in dic_list:
writer.writerow({k:v.decode('utf8') for k,v in dictionary.iteritems()})
return
def main():
fil = "readability.json"
f = open(fil,'rb')
data = json.loads(f.read())
print type(data)
make_csv(data)
The json file looks like :
{ "bookmarks" : [{..},{..} ..... {..}],
"recommendations" : [{..},{..}...{..}]
}
where [..] = list and {..} = dictionary
EDIT :
The above problem was solved, But when I ran the above code, The CSV file generated has some discrepancies. Some rows were pasted randomly i.e. under different headers in .csv file. Any suggestions?

Somewhere in your readability.json file you have an entry that's a boolean value, like true or false (in JSON), translated to the Python True and False objects.
You should not be using decode() in the first place, however, as json.loads() already produces Unicode values for strings.
Since this is Python 2, you want to encode your data, to UTF-8, instead. Convert your objects to unicode first:
writer.writerow({
k: unicode(v).encode('utf8')
for k ,v in dictionary.iteritems()
})
Converting existing Unicode strings to unicode is a no-op, but for integers, floating point values, None and boolean values you'll get a nice Unicode representation that can be encoded to UTF-8:
>>> unicode(True).encode('utf8')
'True'

Python: json.loads returns items prefixing with 'u'

I'll be receiving a JSON encoded string from Objective-C, and I am decoding a dummy string (for now) like the code below. My output comes out with character 'u' prefixing each item:
[{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}...
How is JSON adding this Unicode character? What's the best way to remove it?
mail_accounts = []
da = {}
try:
s = '[{"i":"imap.gmail.com","p":"aaaa"},{"i":"imap.aol.com","p":"bbbb"},{"i":"333imap.com","p":"ccccc"},{"i":"444ap.gmail.com","p":"ddddd"},{"i":"555imap.gmail.com","p":"eee"}]'
jdata = json.loads(s)
for d in jdata:
for key, value in d.iteritems():
if key not in da:
da[key] = value
else:
da = {}
da[key] = value
mail_accounts.append(da)
except Exception, err:
sys.stderr.write('Exception Error: %s' % str(err))
print mail_accounts

The u- prefix just means that you have a Unicode string. When you really use the string, it won't appear in your data. Don't be thrown by the printed output.
For example, try this:
print mail_accounts[0]["i"]
You won't see a u.

Everything is cool, man. The 'u' is a good thing, it indicates that the string is of type Unicode in python 2.x.
http://docs.python.org/2/howto/unicode.html#the-unicode-type

The d3 print below is the one you are looking for (which is the combination of dumps and loads) :)
Having:
import json
d = """{"Aa": 1, "BB": "blabla", "cc": "False"}"""
d1 = json.loads(d) # Produces a dictionary out of the given string
d2 = json.dumps(d) # Produces a string out of a given dict or string
d3 = json.dumps(json.loads(d)) # 'dumps' gets the dict from 'loads' this time
print "d1: " + str(d1)
print "d2: " + d2
print "d3: " + d3
Prints:
d1: {u'Aa': 1, u'cc': u'False', u'BB': u'blabla'}
d2: "{\"Aa\": 1, \"BB\": \"blabla\", \"cc\": \"False\"}"
d3: {"Aa": 1, "cc": "False", "BB": "blabla"}

Those 'u' characters being appended to an object signifies that the object is encoded in Unicode.
If you want to remove those 'u' characters from your object, you can do this:
import json, ast
jdata = ast.literal_eval(json.dumps(jdata)) # Removing uni-code chars
Let's checkout from python shell
>>> import json, ast
>>> jdata = [{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}]
>>> jdata = ast.literal_eval(json.dumps(jdata))
>>> jdata
[{'i': 'imap.gmail.com', 'p': 'aaaa'}, {'i': '333imap.com', 'p': 'bbbb'}]

Unicode is an appropriate type here. The JSONDecoder documentation describe the conversion table and state that JSON string objects are decoded into Unicode objects.
From 18.2.2. Encoders and Decoders:
JSON Python
==================================
object dict
array list
string unicode
number (int) int, long
number (real) float
true True
false False
null None
"encoding determines the encoding used to interpret any str objects decoded by this instance (UTF-8 by default)."

The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.

I kept running into this problem when trying to capture JSON data in the log with the Python logging library, for debugging and troubleshooting purposes. Getting the u character is a real nuisance when you want to copy the text and paste it into your code somewhere.
As everyone will tell you, this is because it is a Unicode representation, and it could come from the fact that you’ve used json.loads() to load in the data from a string in the first place.
If you want the JSON representation in the log, without the u prefix, the trick is to use json.dumps() before logging it out. For example:
import json
import logging
# Prepare the data
json_data = json.loads('{"key": "value"}')
# Log normally and get the Unicode indicator
logging.warning('data: {}'.format(json_data))
>>> WARNING:root:data: {u'key': u'value'}
# Dump to a string before logging and get clean output!
logging.warning('data: {}'.format(json.dumps(json_data)))
>>> WARNING:root:data: {'key': 'value'}

Try this:
mail_accounts[0].encode("ascii")

Just replace the u' with a single quote...
print (str.replace(mail_accounts,"u'","'"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read unicode encoded JSON and access it using Python - python

Related

Python JSON encoding invalid json format

how to dynamically create array variable in json

converting json string to dict in python

Error in parsing .json file using Python

Python: json.loads returns items prefixing with 'u'

Categories

Resources