compare two lists of dictionaries for specific fields - python

I've two lists containing dictionaries. I want to compare certain fields in each of these dictionaries.
current_list = [{"name": "Bill","address": "Home", "age": 23, "accesstime":11:14:01},
{"name": "Fred","address": "Home", "age": 26, "accesstime":11:57:43},
{"name": "Nora","address": "Home", "age": 33, "accesstime":11:24:14}]
backup_list = [{"name": "Bill","address": "Home", "age": 23, "accesstime":13:34:24},
{"name": "Fred","address": "Home", "age": 26, "accesstime":13:34:26},
{"name": "Nora","address": "Home", "age": 33, "accesstime":13:35:14}]
The list / dictionaries should be the same in order, and i just want to compare certain key, value pairs. Like name, address, age and ignore access time, but what i have so far compares each key / pair. So i just want to compare
current_list:dictionary[0][name] -> backup_list:dictionary[0][name] and then
current_list:dictionary[0][address] -> backup_list:dictionary[0][address]
and so on.
for x in current_list:
for y in backup_list:
for k, v in x.items():
for kk, vv in y.items():
if k == kk:
print("Match: {0}".format(kk))
break
elif k != kk:
print("No match: {0}".format(kk))
Current output
Match name with name
No Match address with name
Match address with address
No Match age with name
No Match age with address
Match age with age
No Match dateRegistered with name
No Match dateRegistered with address
No Match dateRegistered with age
Match dateRegistered with dateRegistered
Preferred output
Match name with name
Match address with address
Match age with age
* Due to a requirement change my list became a list of Elementtree xml elements *
So instead of the above list, its becomes
backup_list = ["<Element 'New' at 0x0000000002698C28>, <Element 'Update' at 0x0000000002698CC8>, <Element 'New' at 0x0000000002698CC8>"]
Where the ElementTree is an xml element containing:
{"name": "Nora", "address": "Home", "age": 33, "dateRegistered": 20140812}"
So this based on the answer below seems to satisfy my requirements so far:
value_to_compare = ["name", "address", "age"]
for i, elem in enumerate(current_list):
backup_dict = backup_list[i]
if elem.tag == "New":
for key in value_to_compare:
try:
print("Match {0} {1} == {2}:".format(key, backup_dict.attrib[key], elem.attrib[key]))
except KeyError:
print("key {} not found".format(key))
except:
raise
else:
continue

I don't know if I fully understood your question but I think the following code should do the trick:
compare_arguments = ["name", "age", "address"]
for cl, bl in zip(current_list, backup_list):
for ca in compare_arguments:
if cl[ca] == bl[ca]:
print("Match {0} with {0}".format(cl[ca]))
print("-" * 10)
What is done in the code above is a zip iteration over both lists. With another list you specify the fields you want to compare. In the main loop you iterate over the comparable fields and print them accordingly.

Someone has already made a module called deepdiff that does this and sooo much more! Refer to this answer for their detailed explanation!
First - install it
pip install deepdiff
Then - enjoy
#of course import it
from deepdiff import DeepDiff
current_list, backup_list = [...], [...] #values stated in question.
for c, b in zip(current_list, backup_list):
dif = DeepDiff(c, b)
for key in ["name", "age", "address"]:
try:
assert dif['values_changed'][f"root['{key}'"]
#pass the below line to exclude any non-matching values like your desired output has
print(f"No Match {key} with {key}")
except KeyError:
print(f"Match {key} with {key}")
Results: - as expected
Match name with name
Match address with address
Match age with age
Match name with name
Match address with address
Match age with age
Match name with name
Match address with address
Match age with age
Final Note
This module has soo much else you can utilize such as type changes, key changes/removals/additions, an extensive text comparison, and searches as well. Definitely well worth a look into.
~GL on your project!

Simply compare with this-
for current in current_list:
for backup in backup_list:
for a in backup:
for b in current:
if a == b:
if a == "name" or a== "age" or a== "address" :
if backup[a] == current[b]:
print (backup[a])
print (current[b])

I do not understand the rationnal of your data structure, but I think that will do the trick:
value_to_compare = ["name", "address", "age"]
for i, elem in enumerate(current_list):
backup_dict = backup_list[i]
for key in value_to_compare:
try:
print("Match {}: {} with {}".format(key, elem[key], backup_dict[key]))
except KeyError:
print("key {} not found".format(key))
# may be a raise here.
except:
raise

You can compare all corresponding fields with this code:
for dct1, dct2 in zip(current_list, backup_list):
for k, v in dct1.items():
if k == "accesstime":
continue
if v == dct2[k]:
print("Match: {0} with {0}".format(k))
else:
print("No match: {0} with {0}".format(k))
Note that the values of your "accesstime" keys are not valid Python objects!

If you are happy to use a 3rd party library, this kind of task can be more efficiently implemented, and in a more structured way, via Pandas:
import pandas as pd
res = pd.merge(pd.DataFrame(current_list),
pd.DataFrame(backup_list),
on=['name', 'address', 'age'],
how='outer',
indicator=True)
print(res)
accesstime_x address age name accesstime_y _merge
0 11:14:01 Home 23 Bill 13:34:24 both
1 11:57:43 Home 26 Fred 13:34:26 both
2 11:24:14 Home 33 Nora 13:35:14 both
The result _merge = 'both' for each row indicates the combination of ['name', 'address', 'age'] occurs in both lists but, in addition, you get to see the accesstime from each input.

You can use zip method to iterate over lists simultaneously.
elements_to_compare = ["name", "age", "address"]
for dic1, dic2 in zip(current_list, backup_list):
for element in elements_to_compare :
if dic1[element] == dic2[element]:
print("Match {0} with {0}".format(element))

Related

How to traverse a json file and find the key name using value?

can anyone please help me to find the solution
I am having a JSON file like below
{
"Parent1": ["Child1", "Child2","Child5"],
"Parent2": ["Child3", "Child4","Child5"]
}
expectation: Python code to find the parent name using child name
User input: Child1
Expected output : Parent1
OR
User input: Child5
Expected output : Parent1,Parent2
This is an apporach you can use
choice = input("Enter search string ")
ans = []
for i,j in data.items():
if choice in j:
ans.append(i)
ans will now contain the keys you need as a list
You should read the json file and then loop over the dictionary to find matching for the given input
def find_key_name_by_value(json_input, key_to_find):
result = []
for key, value in input.items():
if key_to_find in value:
result.append(key)
return result
json_input = {
"Parent1": ["Child1", "Child2","Child5"],
"Parent2": ["Child3", "Child4","Child5"]
}
result = find_key_name_by_value(json_input, "Child4")
print(result)

How can I print hello and all female members

test.json
{
"A Company":[{"female":["Jessica","Eve"]},{"male":["Mike","Peter"]}],
"B Company":[{"female":["Laura","Pamela"]},{"male":["Mark","Steve"]}]
}
test.py
import json
f = open('test.json',)
data = json.load(f)
for v in data.values():
for element in v:
print(element)
Output:
{'female': ['Jessica', 'Eve']}
{'male': ['Mike', 'Peter']}
{'female': ['Laura', 'Pamela']}
{'male': ['Mark', 'Steve']}
How can I print this: "Hello Jessica" "Hello Eve" "Hello Laura" "Hello Pamela"?
You can use an iterator to extract then names and a for-loop to print the greetings without building an intermediate list:
data = {
"A Company":[{"female":["Jessica","Eve"]},{"male":["Mike","Peter"]}],
"B Company":[{"female":["Laura","Pamela"]},{"male":["Mark","Steve"]}]
}
names = (name for groups in data.values()
for group in groups
for name in group.get("female",[]))
for name in names: print("Hello",name)
Hello Jessica
Hello Eve
Hello Laura
Hello Pamela
You missed the innermost loop, where you iterate the inner records and check if they are Males or Females.
Please see the example:
import json
json_file = """
{
"A Company":[{"female":["Jessica","Eve"]},{"male":["Mike","Peter"]}],
"B Company":[{"female":["Laura","Pamela"]},{"male":["Mark","Steve"]}]
}
"""
parsed = json.loads(json_file)
for val in parsed.values():
for record in val:
# This below is the innermost loop
for key, value in record.items():
# If it's female then we use list comprehension to print the greetings
if key == "female":
[print(f"Hello {name}") for name in value]

Remove comma(,) from string number in dictionary list

I have a list of dictionary:: Sample data:: Like this I have n number of data.
datas = [{"_id":"1234as", "Total students":"123,321", "TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
I tried
for data in datas:
for i in data.values():
re.sub('[^A-Za-z0-9]+', '', i)
datas.append(i)
I just want to remove comma(,) from TotalStudents and TotalPresent and replace the value in datas.
Edit 1
In my list of dictionary I also have value as::
datas = [{"_id":"1234as","Totalstudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
Here, in key TotalStudents value is "NA" and TotalPresent is "". Is there a way to replace whereever "NA" or "" appears replace with "0".
If you want to replace the values of specific keys, make sure that the keys are the same because the first dict in your example has Total Students but the second has TotalStudents.
Try this:
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
d["Total Students"] = d["Total Students"].replace(",", "")
d["TotalPresent"] = d["TotalPresent"].replace(",", "")
print(datas)
# output: [{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'Total Students': '343431', 'TotalPresent': '541656'}]
If you want to replace commas from all the keys, you can try (but bare in mind that in this case, all the values of your dict must be strings):
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "Total Students": "343,431", "TotalPresent": "541,656"}]
for d in datas:
for k in d:
d[k] = d[k].replace(",", "")
You can iterate over the key,value pairs in the dictionaries. And after removing the comma replace the value for that key.
import re
datas = [{"_id": "1234as", "Total Students": "123,321", "TotalPresent": "321,345"},
{"_id": "1234asas", "TotalStudents": "343,431", "TotalPresent": "541,656"}]
for data in datas:
for key, value in data.items():
print(key, value)
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print(datas)
Result
_id 1234as
Total Students 123,321
TotalPresent 321,345
_id 1234asas
TotalStudents 343,431
TotalPresent 541,656
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'},
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
This is a way to make your code working, and thus always replacing all values. If necessary you need to add your own checks to make it smarter.
EDIT
To catch the "NA" and "" values I have added some if statements. It's simple and stays close to your own code.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
value = "0"
else:
value = re.sub('[^A-Za-z0-9]+', '', value)
data[key] = value
print()
for data in datas:
print(data)
Result
{'_id': '1234as', 'TotalStudents': '123321', 'TotalPresent': '321345'}
{'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}
{'_id': '9934 asas', 'TotalStudents': '0', 'TotalPresent': '0'}
To make the code more efficient you can place the new values directly in data. In this case you don't replace the "_id" anymore with it's own value.
import re
datas = [{"_id":"1234as","TotalStudents":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"},
{"_id":"9934 asas","TotalStudents":"NA","TotalPresent":""}]
for data in datas:
print(data)
for key, value in data.items():
if key == "TotalStudents":
if value == "NA":
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
elif key == "TotalPresent":
if not value:
data[key] = "0"
else:
data[key] = re.sub('[^A-Za-z0-9]+', '', value)
print()
for data in datas:
print(data)
re.sub does not work in-place - it does return altered str. More generally as strs are immutable functions processing them are not working in-place. Solution using re.sub might looks following way:
import re
datas = [{"_id":"1234as","Total Students":"123,321","TotalPresent":"321,345"},
{"_id":"1234asas","TotalStudents":"343,431","TotalPresent":"541,656"}]
cleandatas = []
for data in datas:
cleandatas.append({k:re.sub('[^A-Za-z0-9]+', '', v) for k,v in data.items()})
print(cleandatas)
Output:
[{'_id': '1234as', 'Total Students': '123321', 'TotalPresent': '321345'}, {'_id': '1234asas', 'TotalStudents': '343431', 'TotalPresent': '541656'}]
I used dict-comprehension to create new cleaned dicts

Parsing through JSON file with python and selecting multiple values on certain conditions

I have this JSON file.
{
"reviewers":[
{
"user":{
"name":"keyname",
"emailAddress":"John#email",
"id":3821,
"displayName":"John Doe",
"active":true,
"slug":"jslug",
"type":"NORMAL",
"link":{
"url":"/users/John",
"rel":"self"
},
},
"role":"REVIEWER",
"approved":true
},
{
"user":{
"name":"keyname2",
"emailAddress":"Harry#email",
"id":6306,
"displayName":"Harry Smith",
"active":true,
"slug":"slug2",
"link":{
"type":"NORMAL",
"url":"/users/Harry",
"rel":"self"
},
},
"role":"REVIEWER",
"approved":false
}
],
}
Initially, I was using a snippet of code that would go through and grab the full names of the reviewers.
def get_reviewers(json):
reviewers = ""
for key in json["reviewers"]:
reviewers += key["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
which would return "John Doe, Harry Smith". However, now I'm trying to get it so that the script will return a (A) next to the name of the user if their tag equals true "approved"=true.
So for example the code above would get the names, then see that John's approved tag is true and Harry's is false, then return "John Doe(A), Harry Smith". I'm just not sure where to even begin to do this. Can anyone point me in the right direction?
This is what I've been trying so far but obviously it isn't working as I'd like it to.
def get_reviewers(stash_json):
reviewers = ""
for key in stash_json["reviewers"]:
if stash_json["reviewers"][0]["approved"] == true:
reviewers += key["user"]["displayName"] + "(A)" + ", "
else:
reviewers += key["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
which outputs Jason Healy(A), Joan Reyes(A)
This is what my stash_json outputs when put through pprint.
You probably want something along the lines of this:
def get_reviewers(stash_json):
reviewers = ""
for item in stash_json["reviewers"]:
if item["approved"]:
reviewers += item["user"]["displayName"] + "(A)" + ", "
else:
reviewers += item["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
I think part of your confusion comes from the fact that "reviewers" is a list of dict elements, and each dict element has a key-value approved, but also a key "user" which value itself is another dict.
Read the JSON file carefully, and for debugging purposes, use plenty of
print(...)
print(type(...)) # whether something is a dict, list, str, bool etc
or
from pprint import pprint # pretty printing
pprint(...)
This looks like a good place to use join and list comprehension:
def get_reviewers(stash_json):
return ", ".join([item['user']['displayName'] + ('(A)' if item['approved'] else '') for item in stash_json['reviewers']])

Turn a simple dictionary into dictionary with nested lists

Given the following data received from a web form:
for key in request.form.keys():
print key, request.form.getlist(key)
group_name [u'myGroup']
category [u'social group']
creation_date [u'03/07/2013']
notes [u'Here are some notes about the group']
members[0][name] [u'Adam']
members[0][location] [u'London']
members[0][dob] [u'01/01/1981']
members[1][name] [u'Bruce']
members[1][location] [u'Cardiff']
members[1][dob] [u'02/02/1982']
How can I turn it into a dictionary like this? It's eventually going to be used as JSON but as JSON and dictionaries are easily interchanged my goal is just to get to the following structure.
event = {
group_name : 'myGroup',
notes : 'Here are some notes about the group,
category : 'social group',
creation_date : '03/07/2013',
members : [
{
name : 'Adam',
location : 'London',
dob : '01/01/1981'
}
{
name : 'Bruce',
location : 'Cardiff',
dob : '02/02/1982'
}
]
}
Here's what I have managed so far. Using the following list comprehension I can easily make sense of the ordinary fields:
event = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] != "catches" ]
but I'm struggling with the members list. There can be any number of members. I think I need to separately create a list for them and add that to a dictionary with the non-iterative records. I can get the member data like this:
tmp_members = [(key, request.form.getlist(key)) for key in request.form.keys() if key[0:7]=="members"]
Then I can pull out the list index and field name:
member_arr = []
members_orig = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] ==
"members" ]
for i in members_orig:
p1 = i[0].index('[')
p2 = i[0].index(']')
members_index = i[0][p1+1:p2]
p1 = i[0].rfind('[')
members_field = i[0][p1+1:-1]
But how do I add this to my data structure. The following won't work because I could be trying to process members[1][name] before members[0][name].
members_arr[int(members_index)] = {members_field : i[1]}
This seems very convoluted. Is there a simper way of doing this, and if not how can I get this working?
You could store the data in a dictionary and then use the json library.
import json
json_data = json.dumps(dict)
print(json_data)
This will print a json string.
Check out the json library here
Yes, convert it to a dictionary, then use json.dumps(), with some optional parameters, to print out the JSON in the format you need:
eventdict = {
'group_name': 'myGroup',
'notes': 'Here are some notes about the group',
'category': 'social group',
'creation_date': '03/07/2013',
'members': [
{'name': 'Adam',
'location': 'London',
'dob': '01/01/1981'},
{'name': 'Bruce',
'location': 'Cardiff',
'dob': '02/02/1982'}
]
}
import json
print json.dumps(eventdict, indent=4)
The order of the key:value pairs is not always consistent, but if you're just looking for pretty-looking JSON that can be parsed by a script, while remaining human-readable, this should work. You can also sort the keys alphabetically, using:
print json.dumps(eventdict, indent=4, sort_keys=True)
The following python functions can be used to create a nested dictionary from the flat dictionary. Just pass in the html form output to decode().
def get_key_name(str):
first_pos = str.find('[')
return str[:first_pos]
def get_subkey_name(str):
'''Used with lists of dictionaries only'''
first_pos = str.rfind('[')
last_pos = str.rfind(']')
return str[first_pos:last_pos+1]
def get_key_index(str):
first_pos = str.find('[')
last_pos = str.find(']')
return str[first_pos:last_pos+1]
def decode(idic):
odic = {} # Initialise an empty dictionary
# Scan all the top level keys
for key in idic:
# Nested entries have [] in their key
if '[' in key and ']' in key:
if key.rfind('[') == key.find('[') and key.rfind(']') == key.find(']'):
print key, 'is a nested list'
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
# Append can't be used because we may not get the list in the correct order.
try:
odic[key_name][key_index] = idic[key][0]
except KeyError: # List doesn't yet exist
odic[key_name] = [None] * (key_index + 1)
odic[key_name][key_index] = idic[key][0]
except IndexError: # List is too short
odic[key_name] = odic[key_name] + ([None] * (key_index - len(odic[key_name]) + 1 ))
# TO DO: This could be a function
odic[key_name][key_index] = idic[key][0]
else:
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
subkey_name = get_subkey_name(key).replace('[','',1).replace(']','',1)
try:
odic[key_name][key_index][subkey_name] = idic[key][0]
except KeyError: # Dictionary doesn't yet exist
print "KeyError"
# The dictionaries must not be bound to the same object
odic[key_name] = [{} for _ in range(key_index+1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
except IndexError: # List is too short
# The dictionaries must not be bound to the same object
odic[key_name] = odic[key_name] + [{} for _ in range(key_index - len(odic[key_name]) + 1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
else:
# This can be added to the output dictionary directly
print key, 'is a simple key value pair'
odic[key] = idic[key][0]
return odic

Categories

Resources