python append to array in json object - python

I have the following json object in python:
jsonobj = {
"a": {
"b": {
"c": var1,
"d": var2,
"e": [],
},
},
}
And I would like to append key-value elements into "e", but can't figure out the syntax for it. I tried appending with the following, but it doesn't come out right with the brackets and quotes:
jsobj["a"]["b"]["e"].append("'f':" + var3)
Instead, I want "e" to be the following:
"e":[
{"f":var3, "g":var4, "h":var5},
{"f":var6, "g":var7, "h":var8},
]
Does anyone know the right way to append to this json array? Much appreciation.

jsobj["a"]["b"]["e"].append({"f":var3, "g":var4, "h":var5})
jsobj["a"]["b"]["e"].append({"f":var6, "g":var7, "h":var8})

Just add the dictionary as a dictionary object not a string :
jsobj["a"]["b"]["e"].append(dict(f=var3))
Full source :
var1 = 11
var2 = 32
jsonobj = {"a":{"b":{"c": var1,
"d": var2,
"e": [],
},
},
}
var3 = 444
jsonobj["a"]["b"]["e"].append(dict(f=var3))
jsonobj will contain :
{'a': {'b': {'c': 11, 'd': 32, 'e': [{'f': 444}]}}}

jsonobj["a"]["b"]["e"] += [{'f': var3, 'g' : var4, 'h': var5},
{'f': var6, 'g' : var7, 'h': var8}]

Related

Creating a nested dictionary from a CSV file with Python with 2 levels

I'm struggling with csv import to nested dictionary.
I found a example thats almost perfect for me:
UID,BID,R
U1,B1,4
U1,B2,3
U2,B1,2
import csv
new_data_dict = {}
with open("data.csv", 'r') as data_file:
data = csv.DictReader(data_file, delimiter=",")
for row in data:
item = new_data_dict.get(row["UID"], dict())
item[row["BID"]] = int(row["R"])
new_data_dict[row["UID"]] = item
print new_data_dict
in my case I have one level of nesting more to do. my data looks like:
FID,UID,BID,R
A1,U1,B1,4
A1,U1,B2,3
A1,U2,B1,2
A2,U1,B1,4
A2,U1,B2,3
A2,U2,B1,2
Result should be:
{"A1":{"U1":{"B1":4, "B2": 3}, "U2":{"B1":2}},
"A2":{"U1":{"B1":4, "B2": 3}, "U2":{"B1":2}}}
How would I have to complete and correct the code posted above?
Thx,
Toby
using a collections.defaultdict that defines itself as a default dictionary recursively, it's very easy to nest the levels.
This self-contained example (which is not using a file but a list of lines) demonstrates it:
import collections
import csv,json
data_file="""FID,UID,BID,R
A1,U1,B1,4
A1,U1,B2,3
A1,U2,B1,2
A2,U1,B1,4
A2,U1,B2,3
A2,U2,B1,2
""".splitlines()
def nesteddict():
return collections.defaultdict(nesteddict)
new_data_dict = nesteddict()
data = csv.DictReader(data_file, delimiter=",")
for row in data:
new_data_dict[row["FID"]][row["UID"]][row["BID"]] = row["R"]
# dump as json to have a clean, indented representation
print(json.dumps(new_data_dict,indent=2))
result:
{
"A1": {
"U1": {
"B1": "4",
"B2": "3"
},
"U2": {
"B1": "2"
}
},
"A2": {
"U1": {
"B1": "4",
"B2": "3"
},
"U2": {
"B1": "2"
}
}
}
the "magic" line is this:
def nesteddict():
return collections.defaultdict(nesteddict)
each time a key is missing in the dictionary nesteddict is called, which creates a default dictionary with the same properties (saw that in an old StackOverflow answer: Nested defaultdict of defaultdict)
then creating the levels or updating them is done with just:
new_data_dict[row["FID"]][row["UID"]][row["BID"]] = row["R"]
If you're going simple, you can try this:
import csv
new_data_dict = {}
with open("data.csv", "r") as data_file:
data = csv.DictReader(data_file, delimiter=",")
for row in data:
if row["R"] != "R":
item = new_data_dict.get(row["UID"], dict())
item[row["BID"]] = int(row["R"])
temp_dict = new_data_dict.get(row["FID"], dict())
if row["UID"] in temp_dict:
temp_dict[row["UID"]].update(item)
else:
temp_dict[row["UID"]] = item
new_data_dict[row["FID"]] = temp_dict
print new_data_dict
I just added a new dictionary called temp_dict before the assignment to new_data so that previous values can be maintained.
Result:
{'A1': {'U1': {'B1': 4, 'B2': 3}, 'U2': {'B1': 2}}, 'A2': {'U1': {'B1': 4, 'B2': 3}, 'U2': {'B1': 2}}}

Querying large Mongodb collection using pymongo

I want to query my mongodb collection which has more than 5k records, each record has key-value pair like
{
"A" : "unique-value1",
"B" : "service1",
"C" : 1.2321,
...
},
...
here A will always have unique value, B has value like service1, service2, ....service8 and C is some float value.
what I want is to get a record like this with key-value pair.
{
"A" : "unique-value1",
"B" : "service1",
"C" : 1.2321
}
{
"A" : "unique-value2",
"B" : "service2",
"C" : 0.2321
}
{
"A" : "unique-value3",
"B" : "service1",
"C" : 3.2321
}
I am not sure how to do this, earlier I used MapReduce but that time I was needed to generate records with A and C key value paire only but now since i also need B i do not know what should i do.
this is what i was doing
map_reduce = Code("""
function () {
emit(this.A, parseFloat(this.C));
}
""")
result = my_collection.map_reduce(map_reduce, reduce, out='temp_collection')
for doc in result.find({}):
out = dict()
out[doc['_id']] = doc['_id']
out['cost'] = doc['value']
out_handle.update_one(
{'A': doc['_id']},
{'$set': out},
upsert=True
)
Unless I've misunderstood what you need , it looks like you are making this harder than it need be. Just project the keys you want using the second parameter of the find method.
for record in db.testcollection.find({}, { 'A': 1, 'B': 1, 'C': 1}):
db.existingornewcollection.replace_one({'_id': record['_id']}, record, upsert=True)
Full example:
from pymongo import MongoClient
from bson.json_util import dumps
db = MongoClient()['testdatabase']
db.testcollection.insert_one({
"A": "unique-value1",
"B": "service1",
"C": 1.2321,
"D": "D",
"E": "E",
"F": "F",
})
for record in db.testcollection.find({}, { 'A': 1, 'B': 1, 'C': 1}):
db.existingornewcollection.replace_one({'_id': record['_id']}, record, upsert=True)
print(dumps(db.existingornewcollection.find_one({}, {'_id': 0}), indent=4))
gives:
{
"A": "unique-value1",
"B": "service1",
"C": 1.2321
}

Parsing multiple Json objects and merging into a single json object in Python

I have string containing multiple jsons. I want to convert that string to single json object.
For example,
Assuming the following input,
input = """
{
"a" : {
"x":"y",
"w":"z"
}
}
{
"b" : {
"v":"w",
"z":"l"
}
}
"""
The expected output will be:
Output:
{
"a" : {
"x":"y",
"w":"z"
}
"b" : {
"v":"w",
"z":"l"
}
}
if we treat them as dictionaries and have
>>> a = {'a':{'a':1}}
>>> b = {'b':{'b':1}}
we can simply
>>> a.update(b)
>>> a
{'a': {'a': 1}, 'b': {'b': 1}}
you can take advantage of the fact that you can see when a dictionary begins by looking if a line starts with '{':
import json
input = """
{
"a" : {
"x":"y",
"w":"z"
}
}
{
"b" : {
"v":"w",
"z":"l"
}
}"""
my_dicts = {}
start_dict = False
one_json = ''
for line in input.split('\n'):
if line.startswith('{'):
# check if we pass a json
if start_dict:
my_dicts.update(json.loads(one_json))
one_json = ''
else:
start_dict = True
one_json = f'{one_json}\n{line}'
# take the last json
my_dicts.update(json.loads(one_json))
print(my_dicts)
output:
{'a': {'w': 'z', 'x': 'y'}, 'b': {'v': 'w', 'z': 'l'}}
Build up a list of dictionaries parsing each character. One could also parse each line.
There is good probability of finding a user library that already does this function but here is a way to go
import json
braces = []
dicts = []
dict_chars = []
for line in inp: # input is a builtin so renamed from input to inp
char = line.strip()
dict_chars.append(line)
if '{' == char:
braces.append('}')
elif '}' == char:
braces.pop()
elif len(braces) == 0 and dict_chars:
text = ''.join(dict_chars)
if text.strip():
dicts.append(json.loads(text))
dict_chars = []
Then, merge dictionaries in the list.
merged_dict = {}
for dct in dicts:
merged_dict.update(dct)
> print(merged_dict)
{u'a': {u'x': u'y', u'w': u'z'}, u'b': {u'z': u'l', u'v': u'w'}}
Output merged dictionary as json string with indentation.
merged_output = json.dumps(merged_dict, indent=4)

Getting pandas dataframe from nested dictionaries?

I am new to Python and I have not been able to find a good answer for my problem after looking for a while.
I am trying to create a Pandas dataframe from a list of dictionaries.
My list of nested dictionaries is the following:
{'category_1': [{'a': '151',
'b': '116',
'c': '86'}],
'category_2': [{'d': '201',
'e': '211',
'f': '252'},
{'d': '-1',
'e': '-9',
'f': '-7'}],
'category_3': {'g': 'Valid',
'h': None,
'i': False,
'j': False},
'category_4': {'k': None,
'l': None,
'm': None,
'n': None}}
And my output should be
a b c d e f g h i j k l m n
0 151 116 86 201,-1 211,-9 252,-7 valid None False False None None None None
What i tried,
I'm able to do category 1,3,4 but couldn't figure out the 2nd category
I tried concat and for nested loop to get it
ex=pd.concat([pd.Series(d) for d in (eg1)], axis=1).T
Then mergiting it.
As i said, couldn't figure out in the whole!
I wrote a short recursive function that returns a series, or a concatenation of several series if one of the keys in your dict (e.g category_2) contains a list of multiple dicts.
def expand(x):
if type(x) == dict:
return pd.Series(x)
elif type(x) == list:
return pd.concat([expand(i) for i in x])
If I start with the dictionary that you pasted in in your example above:
d = {'category_1': [{'a': '151',
'b': '116',
'c': '86'}],
'category_2': [{'d': '201',
'e': '211',
'f': '252'},
{'d': '-1',
'e': '-9',
'f': '-7'}],
'category_3': {'g': 'Valid',
'h': None,
'i': False,
'j': False},
'category_4': {'k': None,
'l': None,
'm': None,
'n': None}}
Then it's just a matter of concatenating all the series created by the recursive method I wrote:
output = pd.concat([expand(value) for key, value in d.items()])
And merging any duplicate indices so that their items appear in one row and are separated by commas. I also reshape the series into a df with one row and several columns:
output = pd.DataFrame(output.groupby(output.index).apply(lambda x: ','.join(x.astype(str)))).T
This results in a dataframe that matches your desired output:
output
a b c d e f g h i j k l m n
0 151 116 86 201,-1 211,-9 252,-7 Valid None Invalid Invalid None None None None
The code below recursively tries to flatten the input structure that can have lists or other dicts. When it hit the leafs, adds the content to a flattened dict and then convert it to a dataframe.
flattened_dict = {}
def flatten(obj, name = ''):
if isinstance(obj, dict):
for key, value in obj.items():
flatten(obj[key], key)
elif isinstance(obj, list):
for e in obj:
flatten(e)
else:
if obj == 'null':
obj = None
flattened_dict[name] = [obj]
flatten(eg1)
The result is:
Please note that you have to define the null as a string. The definition for the original dict is:
eg1 = {
"my_list": {
"category_1": [
{
"a": "151",
"b": "116",
"c": "86"
}
],
"category_2": [
{
"d": "201",
"e": "211",
"f": "252"
},
{
"d": "-1 ",
"e": "-9",
"f": "-7"
}
],
"category_3": {
"g": "Valid",
"h": "null",
"i": "Invalid",
"j": "Invalid"
},
"category_4": {
"k": "null",
"l": "null",
"m": "null",
"n": "null"
}
}
}

Subtract dict A from dict B (deep del)?

If I have a deeply nested dict is there a built-in way to subtract/remove list of "paths" (eg: keyA.keyB.key1, keyA.keyC.key2, etc) or a the keys of a second dict from the original dict? Or maybe there is a common module which has functionality like this?
Here's a suggestion:
D = { "keyA": {
"keyB" : {
"keyC" : 42,
"keyD": 13
},
"keyE" : 55
}
}
def remove_path(dictionary, path):
for node in path[:-1]:
dictionary = dictionary[node]
del dictionary[path[-1]]
remove_path(D, ["keyA", "keyB", "keyD"])
print D # prints {'keyA': {'keyB': {'keyC': 42}, 'keyE': 55}}
You'll probably want to introduce some error checking, too.
Just in case the other answers aren't what you're looking for, here's one that subtracts one dictionary from another.
def subtract(a, b):
""" Remove the keys in b from a. """
for k in b:
if k in a:
if isinstance(b[k], dict):
subtract(a[k], b[k])
else:
del a[k]
Another solution:
d = {
'A' : {
'C' : {
'D' : {
'E' : 4,
},
'F' : 5,
},
},
'B' : 2,
}
def DeepDictDel(path, dict):
for key in path.split('.'):
owner = dict
dict = dict[key]
del owner[key]
print d # prints {'A': {'C': {'D': {'E': 4}, 'F': 5}}, 'B': 2}
DeepDictDel('A.C.D', d)
print d # prints {'A': {'C': {'F': 5}}, 'B': 2}

Categories

Resources