Related
Take a look at this example dictionary:
This is in a JSON file:
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
}
}
}
So in this example, we are showing the contents of my fridge, except we show the AMOUNT (2) of every ITEM (soda) in every CATEGORY (drinks). The way we did this, we firstly created a dictionary for the fridge and for every category we have another dictionary where inside it we have the item-type and amount of it.
Now let's say we went shopping... We bought some FRUITS at the supermarket and we got:
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
We want to put this data (or fruits) into my fridge, except we don't have a CATEGORY for "fruits"!! So not only do we have to add a new dictionary into my fridge, but we also have data that we want to already add too!
Now this fridge thing was an example to help you understand what I want. So how do you insert a new dictionary into an already existing one with key-value pairs in it? In other words, I want my fridge to look like this:
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
},
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
}
}
I tried APPEND but as expected, it does not work for dictionaries (it is for lists only) and so I do not know what to do... keep in mind that I do not want to re-define my data, I want to ADD data to existing data so that I can edit it later. Would appreciate some help, Thanks!
Suppose you have this string in proper json format:
j_str1='''\
{"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
}
}}'''
And:
j_str2='''\
{"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}}
'''
First convert to a Python dict:
import json
j=json.loads(j_str1)
>>> j
{'fridge': {'vegetables': {'Cucumber': 0, 'Carrot': 2, 'Lettuce': 5}, 'drinks': {'Water': 12, 'Juice': 4, 'Soda': 2}}}
Then update:
j["fridge"].update(json.loads(j_str2))
>>> j
{'fridge': {'vegetables': {'Cucumber': 0, 'Carrot': 2, 'Lettuce': 5}, 'drinks': {'Water': 12, 'Juice': 4, 'Soda': 2}, 'fruits': {'Apple': 3, 'Banana': 2, 'Melon': 1}}}
Then convert back into json:
>>> print(json.dumps(j,indent=3))
{
"fridge": {
"vegetables": {
"Cucumber": 0,
"Carrot": 2,
"Lettuce": 5
},
"drinks": {
"Water": 12,
"Juice": 4,
"Soda": 2
},
"fruits": {
"Apple": 3,
"Banana": 2,
"Melon": 1
}
}
}
I have json like this:
json = {
"b": 22,
"x": 12,
"a": 2,
"c": 4
}
When i generate an Excel file from this json like this:
import pandas as pd
df = pd.read_json(json_text)
file_name = 'test.xls'
file_path = "/tmp/" + file_name
df.to_excel(file_path, index=False)
print("path to excel " + file_path)
Pandas does its own ordering in the Excel file like this:
pandas_json = {
"a": 2,
"b": 22,
"c": 4,
"x": 12
}
I don't want this. I need the ordering which exists in the json. Please give me some advice how to do this.
UPDATE:
if i have json like this:
json = [
{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},
{"b": 22, "x":12, "a": 4, "c": 4},
]
pandas will generate its own ordering like this:
panas_json = [
{"a": 2, "b":22, "c": 4, "x": 12},
{"a": 2, "b":22, "c": 2, "x": 12},
{"a": 4, "b":22, "c": 4, "x": 12},
]
How can I make pandas preserve my own ordering?
You can read the json as OrderedDict which will help to retain original order:
import json
from collections import OrderedDict
json_ = """{
"b": 22,
"x": 12,
"a": 2,
"c": 4
}"""
data = json.loads(json_, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data,orient='index')
0
b 22
x 12
a 2
c 4
Edit, updated json also works:
j="""[{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},{"b": 22, "x":12, "a": 4, "c": 4}]"""
data = json.loads(j, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data).to_json(orient='records')
'[{"b":22,"x":12,"a":2,"c":4},{"b":22,"x":12,"a":2,"c":2},
{"b":22,"x":12,"a":4,"c":4}]'
I have json output from the Linux fio command, as shown below, that I'd to parse for values like a dictionary, extracting certain values from certain keys. But the nested layer of this json output is clumping the output into huge "values" in the KVP. Any tips for how I can better parse these nested data structures?
{
"disk_util": [
{
"aggr_util": 96.278308,
"in_queue": 247376,
"write_ticks": 185440,
"read_ticks": 61924,
"write_merges": 0,
"read_merges": 0,
"write_ios": 240866,
"read_ios": 18257,
"name": "dm-0",
"util": 97.257058,
"aggr_read_ios": 18465,
"aggr_write_ios": 243642,
"aggr_read_merges": 1,
"aggr_write_merge": 72,
"aggr_read_ticks": 62420,
"aggr_write_ticks": 185796,
"aggr_in_queue": 245504
},
{
"util": 96.278308,
"name": "sda",
"read_ios": 18465,
"write_ios": 243642,
"read_merges": 1,
"write_merges": 72,
"read_ticks": 62420,
"write_ticks": 185796,
"in_queue": 245504
}
],
"jobs": [
{
"latency_window": 0,
"latency_percentile": 100,
"latency_target": 0,
"latency_depth": 64,
"latency_ms": {
">=2000": 0,
"2000": 0,
"1000": 0,
"750": 0,
"2": 0,
"4": 0,
"10": 0,
"20": 0,
"50": 0,
"100": 0,
"250": 0,
"500": 0
},
"latency_us": {
"1000": 0,
"750": 0,
"2": 0,
"4": 0,
"10": 0,
"20": 0,
"50": 0,
"100": 0,
"250": 0,
"500": 0
},
"write": {
"iops_samples": 35,
"iops_stddev": 1608.115728,
"iops_mean": 13835.571429,
"iops_max": 16612,
"iops_min": 9754,
"bw_samples": 35,
"drop_ios": 0,
"short_ios": 0,
"total_ios": 243678,
"runtime": 17611,
"iops": 13836.692976,
"bw": 55346,
"io_kbytes": 974712,
"io_bytes": 998105088,
"slat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"clat_ns": {
"percentile": {
"0.00": 0
},
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"lat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"bw_min": 39016,
"bw_max": 66448,
"bw_agg": 99.994218,
"bw_mean": 55342.8,
"bw_dev": 6432.427333
},
"read": {
"iops_samples": 35,
"iops_stddev": 126.732776,
"iops_mean": 1048.257143,
"iops_max": 1336,
"iops_min": 772,
"bw_samples": 35,
"drop_ios": 0,
"short_ios": 0,
"total_ios": 18466,
"runtime": 17611,
"iops": 1048.549202,
"bw": 4194,
"io_kbytes": 73864,
"io_bytes": 75636736,
"slat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"clat_ns": {
"percentile": {
"0.00": 0
},
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"lat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"bw_min": 3088,
"bw_max": 5344,
"bw_agg": 99.993188,
"bw_mean": 4193.714286,
"bw_dev": 506.844597
},
"job options": {
"rwmixread": "7",
"rw": "randrw",
"size": "1G",
"iodepth": "64",
"bs": "4k",
"filename": "test",
"name": "test"
},
"elapsed": 18,
"eta": 0,
"error": 0,
"groupid": 0,
"jobname": "test",
"trim": {
"iops_samples": 0,
"iops_stddev": 0,
"iops_mean": 0,
"iops_max": 0,
"iops_min": 0,
"bw_samples": 0,
"drop_ios": 0,
"short_ios": 0,
"total_ios": 0,
"runtime": 0,
"iops": 0,
"bw": 0,
"io_kbytes": 0,
"io_bytes": 0,
"slat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"clat_ns": {
"percentile": {
"0.00": 0
},
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"lat_ns": {
"stddev": 0,
"mean": 0,
"max": 0,
"min": 0
},
"bw_min": 0,
"bw_max": 0,
"bw_agg": 0,
"bw_mean": 0,
"bw_dev": 0
},
"usr_cpu": 11.447391,
"sys_cpu": 74.680597,
"ctx": 28972,
"majf": 0,
"minf": 31,
"iodepth_level": {
">=64": 99.975967,
"32": 0.1,
"16": 0.1,
"8": 0.1,
"4": 0.1,
"2": 0.1,
"1": 0.1
},
"latency_ns": {
"1000": 0,
"750": 0,
"2": 0,
"4": 0,
"10": 0,
"20": 0,
"50": 0,
"100": 0,
"250": 0,
"500": 0
}
}
],
"global options": {
"gtod_reduce": "1",
"direct": "1",
"ioengine": "libaio",
"randrepeat": "1"
},
"time": "Sat Oct 14 23:18:28 2017",
"timestamp_ms": 1508023108010,
"timestamp": 1508023108,
"fio version": "fio-3.1"
}
I'm importing it from a file really simplistically:
import json
my_file = open('fio.json', 'r')
my_dict = json.load(my_file)
for k, v in my_dict.items():
print("Key: {0}, value: {1}").format(k, v)
But when iterating, it's making all the nested tables and dicts return munged output, like
Key: disk_util, value: [{u'aggr_write_ticks': 185796, u'write_merges': 0, u'write_ticks': 185440, u'write_ios': 240866, u'aggr_write_ios': 243642, u'aggr_read_ticks': 62420, u'read_ios': 18257, u'util': 97.257058, u'read_ticks': 61924, u'aggr_write_merge': 72, u'read_merges': 0, u'aggr_in_queue': 245504, u'aggr_read_ios': 18465, u'aggr_util': 96.278308, u'aggr_read_merges': 1, u'in_queue': 247376, u'name': u'dm-0'}, {u'read_merges': 1, u'name': u'sda', u'write_ios': 243642, u'read_ios': 18465, u'util': 96.278308, u'read_ticks': 62420, u'write_merges': 72, u'in_queue': 245504, u'write_ticks': 185796}]
json.load() maintain json file type.
You seem to have an syntax error.
In the wrong position ).
import json
my_file = open('fio.json', 'r')
my_dict = json.load(my_file)
for index, key in enumerate(my_dict):
print("Key: {0}, value: {1}".format(key, my_dict[key]))
I have a json message where at the highest level, I have a dictionary with unknown depth and structures, and am looking to traverse it to format it, ending up with a new, formatted dictionary. After using timeit, I found it to be very slow and discovered that recursion in python is not very quick at all. All that being understood, I don't know how to actually transform my recursive function "Foo.format_it" into a loop based one, if possible.
import time
import json
class Foo(object):
def __init__(self):
self.msg_out = {}
self.msg_in = None
self.sample_data = """
{
"data": {
"a": "",
"b": "",
"c": "127.0.0.1",
"d": 80,
"e": {"f": false,"g": false,"h": false,"i": false,"j": false,"k": false},
"l": [ {"ii": 2, "hh": 10, "gg": 200, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": 3, "ee": 0},
{"ii": 5, "hh": 20, "gg": 300, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": -1, "ee": -1},
{"ii": 5, "hh": 30, "gg": -400, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": -1, "ee": -1}],
"m": true,
"n": true,
"o": 1000,
"p": 2000,
"q": "",
"r": 5,
"s": 0,
"t": true,
"u": true,
"v": {"jj": 5, "kk": 0, "ll": 10, "mm": 9, "nn": [ { "aa": 20, "bb": 30 }, { "aa": 20, "bb": 30 } ] }
}
}
"""
def format(self, msg_in):
print msg_in
self.msg_in = json.loads( msg_in )
self.msg_out = {}
self.format_it(self.msg_in, self.msg_out)
import pprint
print pprint.pformat(self.msg_out)
return json.dumps( self.msg_out )
def ff(self, val, out_struct):
if int(val) < 0:
out_struct[u'ff'] = ""
else:
out_struct[u'ff'] = str(val)
def format_it(self, item, out_struct):
if isinstance(item, dict):
for dict_key, dict_val in item.iteritems():
if dict_key in dir(self):
dict_key = getattr(self, dict_key)(dict_val, out_struct)
if dict_key:
if isinstance(dict_val, dict):
out_struct[dict_key] = {}
self.format_it(dict_val, out_struct[dict_key])
elif isinstance(dict_val, list):
out_struct[dict_key] = []
self.format_it(dict_val, out_struct[dict_key])
else:
out_struct[dict_key] = dict_val
elif isinstance(item, list):
for list_val in item:
if isinstance(list_val, dict):
out_struct.append({})
self.format_it(list_val, out_struct[-1])
elif isinstance(list_val, list):
out_struct.append([])
self.format_it(list_val, out_struct[-1])
else:
out_struct.append(list_val)
else:
pass
if __name__ == "__main__":
tic = time.clock()
f = Foo()
f.format(f.sample_data)
print (time.clock()-tic)
Here is the in data and out data per request, in the simplest case, only the key 'ff' needed to be formatted and so -1 became an empty string:
[IN]
{
"data": {
"a": "",
"b": "",
"c": "127.0.0.1",
"d": 80,
"e": {"f": false,"g": false,"h": false,"i": false,"j": false,"k": false},
"l": [ {"ii": 2, "hh": 10, "gg": 200, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": 3, "ee": 0},
{"ii": 5, "hh": 20, "gg": 300, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": -1, "ee": -1},
{"ii": 5, "hh": 30, "gg": -400, "aa": -1, "bb": -1, "ff":-1, "cc": -1, "dd": -1, "ee": -1}],
"m": true,
"n": true,
"o": 1000,
"p": 2000,
"q": "",
"r": 5,
"s": 0,
"t": true,
"u": true,
"v": {"jj": 5, "kk": 0, "ll": 10, "mm": 9, "nn": [ { "aa": 20, "bb": 30 }, { "aa": 20, "bb": 30 } ] }
}
}
[OUT]
{u'data': {u'a': u'',
u'b': u'',
u'c': u'127.0.0.1',
u'd': 80,
u'e': {u'f': False,
u'g': False,
u'h': False,
u'i': False,
u'j': False,
u'k': False},
u'l': [{u'aa': -1,
u'bb': -1,
u'cc': -1,
u'dd': 3,
u'ee': 0,
u'ff': '',
u'gg': 200,
u'hh': 10,
u'ii': 2},
{u'aa': -1,
u'bb': -1,
u'cc': -1,
u'dd': -1,
u'ee': -1,
u'ff': '',
u'gg': 300,
u'hh': 20,
u'ii': 5},
{u'aa': -1,
u'bb': -1,
u'cc': -1,
u'dd': -1,
u'ee': -1,
u'ff': '',
u'gg': -400,
u'hh': 30,
u'ii': 5}],
u'm': True,
u'n': True,
u'o': 1000,
u'p': 2000,
u'q': u'',
u'r': 5,
u's': 0,
u't': True,
u'u': True,
u'v': {u'jj': 5,
u'kk': 0,
u'll': 10,
u'mm': 9,
u'nn': [{u'aa': 20, u'bb': 30}, {u'aa': 20, u'bb': 30}]}}}
The code is a bit pared down and uses tic/toc vs timeit. In using both, the execution of just the recursion seems to be around .0012s (where I even remove the object creation and json load from the time calculation).
I created a nested dictionary in Python like this:
{
"Laptop": {
"sony": 1
"apple": 2
"asus": 5
},
"Camera": {
"sony": 2
"sumsung": 1
"nikon" : 4
},
}
But I couldn't figure out how to write this nested dict into a json file. Any comments will be appreciated..!
d = {
"Laptop": {
"sony": 1,
"apple": 2,
"asus": 5,
},
"Camera": {
"sony": 2,
"sumsung": 1,
"nikon" : 4,
},
}
with open("my.json","w") as f:
json.dump(d,f)