Comparing python dictionaries and find diffrence of the two

Comparing python dictionaries and find diffrence of the two - python

So im trying to write a python program that will take 2 .json files compare the contents and display the differences between the two. So far my program takes user input to select two files and compares the two just fine. I have hit a wall trying to figure out how to print what the actual differences are between the two files.
my program:
#!/usr/bin/env python2
import json
#get_json() requests user input to select a .json file
#and creates a python dict with the information
def get_json():
file_name = raw_input("Enter name of JSON File: ")
with open(file_name) as json_file:
json_data = json.load(json_file)
return json_data
#compare_json(x,y) takes 2 dicts, and compairs the contents
#print match if equal, or not a match if there is difrences
def compare_json(x,y):
for x_values, y_values in zip(x.iteritems(), y.iteritems()):
if x_values == y_values:
print 'Match'
else:
print 'Not a match'
def main():
json1 = get_json()
json2 = get_json()
compare_json(json1, json2)
if __name__ == "__main__":
main()
example of my .json:
{
"menu": {
"popup": {
"menuitem": [
{
"onclick": "CreateNewDoc()",
"value": "New"
},
{
"onclick": "OpenDoc()",
"value": "Open"
},
{
"onclick": "CloseDoc()",
"value": "Close"
}
]
},
"id": "file",
"value": "File"
}
}

Your problem stems from the fact that dictionaries are stored in a structure with an internal logical consistency - when you ask for someDict.items() and someOtherDict.items(), the key-value pairs of elements are computed by the same algorithm. However, due to differences in the keys that may be present in either dictionary, identical keys may not be present in the corresponding index in either list returned by the call to dict.items(). As a result, you are much better off checking if a particular key exists in another dictionary, and comparing the associated value in both.
def compare_json(x,y):
for x_key in x:
if x_key in y and x[x_key] == y[x_key]:
print 'Match'
else:
print 'Not a match'
if any(k not in x for k in y):
print 'Not a match'
If you want to print out the actual differences:
def printDiffs(x,y):
diff = False
for x_key in x:
if x_key not in y:
diff = True
print "key %s in x, but not in y" %x_key
elif x[x_key] != y[x_key]:
diff = True
print "key %s in x and in y, but values differ (%s in x and %s in y)" %(x_key, x[x_key], y[x_key])
if not diff:
print "both files are identical"

You might want to try out the jsondiff library in python.
https://pypi.python.org/pypi/jsondiff/0.1.0
The examples referenced from the site are below.
>>> from jsondiff import diff
>>> diff({'a': 1}, {'a': 1, 'b': 2})
{<insert>: {'b': 2}}
>>> diff({'a': 1, 'b': 3}, {'a': 1, 'b': 2})
{<update>: {'b': 2}}
>>> diff({'a': 1, 'b': 3}, {'a': 1})
{<delete>: ['b']}
>>> diff(['a', 'b', 'c'], ['a', 'b', 'c', 'd'])
{<insert>: [(3, 'd')]}
>>> diff(['a', 'b', 'c'], ['a', 'c'])
{<delete>: [1]}
# Similar items get patched
>>> diff(['a', {'x': 3}, 'c'], ['a', {'x': 3, 'y': 4}, 'c'])
{<update>: [(1, {<insert>: {'y': 4}})]}
# Special handling of sets
>>> diff({'a', 'b', 'c'}, {'a', 'c', 'd'})
{<add>: set(['d']), <discard>: set(['b'])}
# Parse and dump JSON
>>> print diff('["a", "b", "c"]', '["a", "c", "d"]', parse=True, dump=True, indent=2)
{
"$delete": [
1
],
"$insert": [
[
2,
"d"
]
]
}

Related

How to find indirect relation? [Python]

So I'm trying to find indirect relations in a dictionary but I can't seem to find a general code for my program: this is what I have
#find if A is related to E
data = {"A": {"B": 5, "C": 7}, "B": {"E": 8}, "C": {}, "D": {}, "E": {"D": 9}}
if "E" in data["A"]:
result = True
if "E" in data["B"] or "D" in data["C"]:
result = True
else:
result = False
print(result)
#output = True because "E" is in data["A"]
For this one example it works and ofcourse I've could generalize this with x's and y's but if I have a data variable with a complexer dictionary it wouldn't work. Maybe recursive code or a for loop? If somebody could help, it would be very much appreciated.
Thank you in advance

for k,v in data.items():
for l,u in data.items():
if k in u:
print(f"{k} in {u}")
so that the desired function might be :
def has_indirect_rel(dico):
for k,v in dico.items():
for l,u in dico.items():
if k in u: return True
return False

First, the numbers aren't of interest to the problem at hand, so let's reduce the data from dict of dictionaries to dict of sets:
data = {'A': {'B', 'C'}, 'B': {'E'}, 'C': {}, 'D': {}, 'E': {'D'}}
We could search the data recursively:
def has_relation(mapping, a, b):
if b in mapping[a]:
return True
for c in mapping[a]:
if has_relation(mapping, c, b):
return True
return False
print(has_relation(data, 'A', 'D'))
print(has_relation(data, 'A', 'E'))
print(has_relation(data, 'A', 'F'))

Create a nested tree from list

From a list of lists, I would like to create a nested dictionary of which the keys would point to the next value in the sublist. In addition, I would like to count the number of times a sequence of sublist values occurred.
Example:
From a list of lists as such:
[['a', 'b', 'c'],
['a', 'c'],
['b']]
I would like to create a nested dictionary as such:
{
'a': {
{'b':
{
'c':{}
'count_a_b_c': 1
}
'count_a_b*': 1
},
{'c': {},
'count_a_c': 1
}
'count_a*': 2
},
{
'b':{},
'count_b':1
}
}
Please note that the names of the keys for counts do not matter, they were named as such for illustration.

i was curious how i would do this and came up with this:
lst = [['a', 'b', 'c'],
['a', 'c'],
['b']]
tree = {}
for branch in lst:
count_str = 'count_*'
last_node = branch[-1]
cur_tree = tree
for node in branch:
if node == last_node:
count_str = count_str[:-2] + f'_{node}'
else:
count_str = count_str[:-2] + f'_{node}_*'
cur_tree[count_str] = cur_tree.get(count_str, 0) + 1
cur_tree = cur_tree.setdefault(node, {})
nothing special happening here...
for your example:
import json
print(json.dumps(tree, sort_keys=True, indent=4))
produces:
{
"a": {
"b": {
"c": {},
"count_a_b_c": 1
},
"c": {},
"count_a_b_*": 1,
"count_a_c": 1
},
"b": {},
"count_a_*": 2,
"count_b": 1
}
it does not exactly reproduce what you imagine - but that is in part due to the fact that your desired result is not a valid python dictionary...
but it may be a starting point for you to solve your problem.

Represent a dictionary in a list of dictionaries as a number

I am trying to provide a user with a choice as to how many (all, select few, single) list of dictionaries the script should run over.
Currently I can ask user to select based on key called "instance_name" but it is troublesome to type out the entire name. If user simply hit enter, it processes all. All dictionaries in the list have identical keys and structure.
data = r.json()
results = []
for i in data:
print(i['instance_name'])
option = input("Please select instance to generate report. To generate for all, simply press [Enter]: ")
if len(option):
for i in data:
if option in i['instance_name']:
results.append(
(i['aps']['id'], i['instance_name'], i['login'], i['password']))
else:
for i in data:
results.append(
(i['aps']['id'], i['instance_name'], i['login'], i['password']))
return results
Data looks like this:
>>> from pprint import pprint
>>> pprint(data[0])
{'aps': {'id': 'cd7f0e5f-dfad-41a8-ab99-52e7cbd75a94',
'modified': '2017-05-27T07:26:45Z',
'revision': 35,
'status': 'aps:ready',
'type': 'http://something/application/version'},
'instance_name': 'Test1',
'login': 'abcdd#xyz.com',
'password': 'xxxxxx'}
>>> type(data)
<class 'list'>
>>> len(data)
17
>>>
Output currently is like this:
C:\Code>python -i options.py
Test1
Test2
...
Test17
Please select instance to generate report. To generate for all, simply press [Enter]: Test1
[('cd7f0e5f-dfad-41a8-ab99-52e7cbd75a94', 'Test1', 'abcdd#xyz.com', 'xxxxxx')]
>>>
Is there a way to represent these names alongside a number? So that the user can enter a number, or comma separated multiple numbers to select a single or multiple instance?
1. Test1
2. Test2
...
17. Test17
The list of dictionaries is not static, it can increase in the future.

Use enumerate.
for i, dictionary in enumerate(dictionary_list):
print(i, dictionary['instance_name']
Result:
1 Test1
2 Test2
3 Test3
...

Here's my approach. First, create a map of the "index" to the dictionaries.
data = [
{'instance_name': 'Test1', 'other': 'a'},
{'instance_name': 'Test2', 'other': 'b'},
{'instance_name': 'Test3', 'other': 'c'},
{'instance_name': 'Test4', 'other': 'd'}
]
mapped_data = dict(enumerate(data, 1))
# { 1: {'instance_name': 'Test1', 'other': 'a'},
# 2: {'instance_name': 'Test2', 'other': 'b'},
# 3: {'instance_name': 'Test3', 'other': 'c'},
# 4: {'instance_name': 'Test4', 'other': 'd'}}
Then, write a function to look-up the dictionary given either the "index" or the value of instance_name.
def get_data(key, mapping, attr):
k = key.strip()
# find first member of mapping.items() whose index or d[attr] matches
return next(d for i, d in mapping.items() if k == i or k == d[attr])
# binds mapped_data and 'instance_name' to the function
custom_get_data = lambda k: get_data(k, mapped_data, 'instance_name')
Finally, set up the prompt and input to reference these objects.
indices = range(1, len(data) + 1) # [1, 2, 3, 4]
for i in indices:
print('{}. {}'.format(i, mapped_data[i]['instance_name']))
option = input('Enter a value, nothing, or a comma separated list: ')
if not option:
result_keys = indices
elif ',' in option:
result_keys = option.split(',')
else:
result_keys = [option]
results = map(custom_get_data, result_keys)

make a dict/json from string with duplicate keys Python

I have a string that could be parsed as a JSON or dict object. My string variable looks like this :
my_string_variable = """{
"a":1,
"b":{
"b1":1,
"b2":2
},
"b": {
"b1":3,
"b2":2,
"b4":8
}
}"""
When I do json.loads(my_string_variable), I have a dict but only the second value of the key "b" is kept, which is normal because a dict can't contain duplicate keys.
What would be the best way to have some sort of defaultdict like this :
result = {
"a": 1,
"b": [{"b1": 1, "b2": 2}, {"b1": 3, "b2": 2, "b4": 8}],
}
I have already looked for similar questions but they all deal with dicts or lists as an input and then create defaultdicts to handle the duplicate keys.
In my case I have a string variable and I would want to know if there is a simple way to achieve this.

something like the following can be done.
import json
def join_duplicate_keys(ordered_pairs):
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) == list:
d[k].append(v)
else:
newlist = []
newlist.append(d[k])
newlist.append(v)
d[k] = newlist
else:
d[k] = v
return d
raw_post_data = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
newdict = json.loads(raw_post_data, object_pairs_hook=join_duplicate_keys)
print (newdict)
Please note that above code depends on value type, if type(d[k]) == list. So if original string itself gives a list then there could be some error handling required to make the code robust.

Accepted answer is perfectly fine. I just wanted to show another approach.
So at first, you dedicate a list for values in order to easily accumulate next values. At the end, you call pop on the lists which have only one item. This means that the list doesn't have duplicate values:
import json
from collections import defaultdict
my_string_variable = '{"a":1, "b":{"b1":1,"b2":2}, "b": { "b1":3, "b2":2,"b4":8} }'
def join_duplicate_keys(ordered_pairs):
d = defaultdict(list)
for k, v in ordered_pairs:
d[k].append(v)
return {k: v.pop() if len(v) == 1 else v for k, v in d.items()}
d = json.loads(my_string_variable, object_pairs_hook=join_duplicate_keys)
print(d)
output:
{'a': 1, 'b': [{'b1': 1, 'b2': 2}, {'b1': 3, 'b2': 2, 'b4': 8}]}

Pretty print JSON dumps

I use this code to pretty print a dict into JSON:
import json
d = {'a': 'blah', 'b': 'foo', 'c': [1,2,3]}
print json.dumps(d, indent = 2, separators=(',', ': '))
Output:
{
"a": "blah",
"c": [
1,
2,
3
],
"b": "foo"
}
This is a little bit too much (newline for each list element!).
Which syntax should I use to have this:
{
"a": "blah",
"c": [1, 2, 3],
"b": "foo"
}
instead?

I ended up using jsbeautifier:
import jsbeautifier
opts = jsbeautifier.default_options()
opts.indent_size = 2
jsbeautifier.beautify(json.dumps(d), opts)
Output:
{
"a": "blah",
"c": [1, 2, 3],
"b": "foo"
}

After years, I found a solution with the built-in pprint module:
import pprint
d = {'a': 'blah', 'b': 'foo', 'c': [1,2,3]}
pprint.pprint(d) # default width=80 so this will be printed in a single line
pprint.pprint(d, width=20) # here it will be wrapped exactly as expected
Output:
{'a': 'blah',
'b': 'foo',
'c': [1, 2, 3]}

Another alternative is print(json.dumps(d, indent=None, separators=(',\n', ': ')))
The output will be:
{"a": "blah",
"c": [1,
2,
3],
"b": "foo"}
Note that though the official docs at https://docs.python.org/2.7/library/json.html#basic-usage say the default args are separators=None --that actually means "use default of separators=(', ',': ') ). Note also that the comma separator doesn't distinguish between k/v pairs and list elements.

I couldn't get jsbeautifier to do much, so I used regular expressions. Had json pattern like
'{\n "string": [\n 4,\n 1.0,\n 6,\n 1.0,\n 8,\n 1.0,\n 9,\n 1.0\n ],\n...'
that I wanted as
'{\n "string": [ 4, 1.0, 6, 1.0, 8, 1.0, 9, 1.0],\n'
so
t = json.dumps(apriori, indent=4)
t = re.sub('\[\n {7}', '[', t)
t = re.sub('(?<!\]),\n {7}', ',', t)
t = re.sub('\n {4}\]', ']', t)
outfile.write(t)
So instead of one "dump(apriori, t, indent=4)", I had those 5 lines.

This has been bugging me for a while as well, I found a 1 liner I'm almost happy with:
print json.dumps(eval(str(d).replace('[', '"[').replace(']', ']"').replace('(', '"(').replace(')', ')"')), indent=2).replace('\"\\"[', '[').replace(']\\"\"', ']').replace('\"\\"(', '(').replace(')\\"\"', ')')
That essentially convert all lists or tuples to a string, then uses json.dumps with indent to format the dict. Then you just need to remove the quotes and your done!
Note: I convert the dict to string to easily convert all lists/tuples no matter how nested the dict is.
PS. I hope the Python Police won't come after me for using eval... (use with care)

Perhaps not quite as efficient, but consider a simpler case (somewhat tested in Python 3, but probably would work in Python 2 also):
def dictJSONdumps( obj, levels, indentlevels = 0 ):
import json
if isinstance( obj, dict ):
res = []
for ix in sorted( obj, key=lambda x: str( x )):
temp = ' ' * indentlevels + json.dumps( ix, ensure_ascii=False ) + ': '
if levels:
temp += dictJSONdumps( obj[ ix ], levels-1, indentlevels+1 )
else:
temp += json.dumps( obj[ ix ], ensure_ascii=False )
res.append( temp )
return '{\n' + ',\n'.join( res ) + '\n}'
else:
return json.dumps( obj, ensure_ascii=False )
This might give you some ideas, short of writing your own serializer completely. I used my own favorite indent technique, and hard-coded ensure_ascii, but you could add parameters and pass them along, or hard-code your own, etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing python dictionaries and find diffrence of the two - python

Related

How to find indirect relation? [Python]

Create a nested tree from list

Represent a dictionary in a list of dictionaries as a number

make a dict/json from string with duplicate keys Python

Pretty print JSON dumps

Categories

Resources