Comparing values of two dictionary's items

Comparing values of two dictionary's items - python

I need to compare the values of the items in two different dictionaries.
Let's say that dictionary RawData has items that represent phone numbers and number names.
Rawdata for example has items like: {'name': 'Customer Service', 'number': '123987546'} {'name': 'Switchboard', 'number': '48621364'}
Now, I got dictionary FilteredData, which already contains some items from RawData: {'name': 'IT-support', 'number': '32136994'} {'name': 'Company Customer Service', 'number': '123987546'}
As you can see, Customer Service and Company Customer Service both have the same values, but different keys. In my project, there might be hundreds of similar duplicates, and we only want unique numbers to end up in FilteredData.
FilteredData is what we will be using later in the code, and RawData will be discarded.
Their names(keys) can be close duplicates, but not their numbers(values)**
There are two ways to do this.
A. Remove the duplicate items in RawData, before appending them into FilteredData.
B. Append them into FilteredData, and go through the numbers(values) there, removing the duplicates. Can I use a set here to do that? It would work on a list, obviously.
I'm not looking for the most time-efficient solution. I'd like the most simple and easy to learn one, if and when someone takes over my job someday. In my project it's mandatory for the next guy working on the code to get a quick grip of it.
I've already looked at sets, and tried to face the problem by nesting two for loops, but something tells me there gotta be an easier way.
Of course I might have missed the obvious solution here.
Thanks in advance!

I hope I understands your problem here:
data = [{'name': 'Customer Service', 'number': '123987546'}, {'name': 'Switchboard', 'number': '48621364'}]
newdata = [{'name': 'IT-support', 'number': '32136994'}, {'name': 'Company Customer Service', 'number': '123987546'}]
def main():
numbers = set()
for entry in data:
numbers.add(entry['number'])
for entry in newdata:
if entry['number'] not in numbers:
data.append(entry)
print data
main()
Output:
[{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'},
{'name': 'IT-support', 'number': '32136994'}]

What you can do is take a dict.values(), create a set of those to remove duplicates and then go through the old dictionary and find the first key with that value and add it to a new one. Keep the set around because when you get the next dict entry, try adding the element to that set and see if the length of the set is longer that before adding it. If it is, it's a unique element and you can add it to the dict.

If you're willing on changing how FilteredData is currently, you can just use a dict and use the number as your key:
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
# Change how FilteredData is structured
FilteredDataMap = {
'32136994':
{'name': 'IT-support', 'number': '32136994'},
'123987546':
{'name': 'Company Customer Service', 'number': '123987546'}
}
for item in RawData:
number = item.get('number')
if number not in FilteredDataMap:
FilteredDataMap[number] = item
# If you need the list of items
FilteredData = list(FilteredDataMap.values())
You can just pull the actual list from the Map using .values()

I take the numbers are unique. Then, another solution would be taking advantage of the uniqueness of dictionary keys. This means converting each list of dictionary to a dictionary of 'number:name' pairs. Then, you simple need to update RawData with FilteredData.
RawData = [
{'name': 'Customer Service', 'number': '123987546'},
{'name': 'Switchboard', 'number': '48621364'}
]
FilteredData = [
{'name': 'IT-support', 'number': '32136994'},
{'name': 'Company Customer Service', 'number': '123987546'}
]
def convert_list(input_list):
return {item['number']:item['name'] for item in input_list}
def unconvert_dict(input_dict):
return [{'name':val, 'number': key} for key, val in input_dict.items()]
NewRawData = convert_list(RawData)
NewFilteredData = conver_list(FilteredData)
DesiredResultConverted = NewRawData.update(NewFilteredData)
DesuredResult = unconvert_dict(DesiredResultConverted)
In this example, the variables will have the following values:
NewRawData = {'123987546':'Customer Service', '48621364': 'Switchboard'}
NewFilteredData = {'32136994': 'IT-support', '123987546': 'Company Customer Service'}
When you update NewRawData with NewFilteredData, Company Customer Service will overwrite Customer Service as the value associated with the key 123987546. So,
DesiredResultConverted = {'123987546':'Company Customer Service', '48621364': 'Switchboard', '32136994': 'IT-support'}
Then, if you still prefer the original format, you can "unconvert" back.

Related

How to access specific attributeds and output information in the object

Hello there I have a JSON dataset with a whole bunch of entries like that:
There are multiple entries for each date that look the same, just different dates with multiple entries of people added to that date.
So I could get information by using
{json_content['12-1-2021'][0]['name']}
{'12-1-2021': [{'initials': 'IS',
'name': 'Sam',
'age': 23,
'address': 'Freedom Dr',
'city': 'Seattle',
'state': 'WA'},
{'initials': 'SD',
'name': 'Sed',
'age': 21,
'address': 'Washington Dr',
'city': 'Phoenix',
'state': 'AZ'}]}
I want to iterate somehow through the dataset and select for instance all the people who live in Seattle without the date(maybe add the date later- not sure the requirement on that yet). But I can't do that without specifying the date since the beginning.
Thank you!

You definitely can. Since you don't care about the keys of the dictionary just go through the values:
names = [
person['name']
for date in json_content.values()
for person in date if person['city'] == 'Seattle'
]
If you don't want to make any assumptions about how valid the structure of the json is, you can check it explicitly along the way in addition to checking the city:
[
person['name']
for date in json_content.values() if isinstance(date, list)
for person in date if all([
isinstance(person, dict),
'name' in person,
'city' in person,
person['city'] == 'Seattle'])
]
Both of these get you ['Sam'] for your sample json.

If you want a complete person record and you are having future scope for a date then I will suggest the below code.
for person_records in json_content.values():
if isinstance(person_records, list):
for person in person_records:
if person.get("city") == "Seattle":
print(person)

List comprehension and flattening deep data structure

Lets say I have a data structure that looks like the following (this is greatly simplified, my actual data has a significant amount of day-specific data for each job on each date):
data = {
'2019-01-01': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-2-id': {'name': 'Job 2', 'address': '824 1st Ave.'},
},
'2019-01-02': {
'job-1-id': {'name': 'Job 1', 'address': '123 main st.'},
'job-3-id': {'name': 'Job 3', 'address': '485 Pleasant Rd.'}
}
}
What I would like to do is flatten this, pushing the date and job id to an array of objects. E.g.:
data_flat = [
{'id': 'job-1-id', 'date': '2019-01-01', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-2-id', 'date': '2019-01-01', 'name': 'Job 2', 'address': '824 1st Ave.'},
{'id': 'job-1-id', 'date': '2019-01-02', 'name': 'Job 1', 'address': '123 main st.'},
{'id': 'job-3-id', 'date': '2019-01-02', 'name': 'Job 3', 'address': '485 Pleasant Rd.'},
]
Obviously I can look over and build a new array:
data_flat = []
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({'id': job_id, 'date': date, etc...})
But is there a more pythonic/efficient way to do this using list comprehension with nested data like this? About all I can think of is using list comprehension for the inner loop, and then instead of appending, using extend to build the list. Thoughts?

A possible list comprehension solution might be as follows, where we unpack job dictionary and add id and date key-value pair to it, while iterating over the two for loops
[{**job, 'id': job_id, 'date': date} for date, jobs in data.items() for job_id, job in jobs.items()]
Which in traditional for-loop, looks like
for date, jobs in data.items():
for job_id, job in jobs.items():
data_flat.append({**job, 'id': job_id, 'date': date})

flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _list, date: _list.extend(
{'date': date, 'id':_id, **detail} for _id, detail in data[date].items()) or _list,
data,
[])
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Filtering through a list with embedded dictionaries

I've got a json format list with some dictionaries within each list, it looks like the following:
[{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
The amount of entries within the list can be up to 100. I plan to present the 'name' for each entry, one result at a time, for those that have London as a town. The rest are of no use to me. I'm a beginner at python so I would appreciate a suggestion in how to go about this efficiently. I initially thought it would be best to remove all entries that don't have London and then I can go through them one by one.
I also wondered if it might be quicker to not filter but to cycle through the entire json and select the names of entries that have the town as London.

You can use filter:
data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
london_dicts = filter(lambda d: d['venue']['town'] == 'London', data)
for d in london_dicts:
print(d)
This is as efficient as it can get because:
The loop is written in C (in case of CPython)
filter returns an iterator (in Python 3), which means that the results are loaded to memory one by one as required

One way is to use list comprehension:
>>> data = [{"id":13, "name":"Albert", "venue":{"id":123, "town":"Birmingham"}, "month":"February"},
{"id":17, "name":"Alfred", "venue":{"id":456, "town":"London"}, "month":"February"},
{"id":20, "name":"David", "venue":{"id":14, "town":"Southampton"}, "month":"June"},
{"id":17, "name":"Mary", "venue":{"id":56, "town":"London"}, "month":"December"}]
>>> [d for d in data if d['venue']['town'] == 'London']
[{'id': 17,
'name': 'Alfred',
'venue': {'id': 456, 'town': 'London'},
'month': 'February'},
{'id': 17,
'name': 'Mary',
'venue': {'id': 56, 'town': 'London'},
'month': 'December'}]

Filtering Items in dictionary

I would like to filter items out of one dictionary where that dictionary contains items of another dictionary. So, say that I have two dictionary's dict1 and dict2 where
dict1 = {
1:{'account_id':1234, 'case':1234, 'date': 12/31/15, 'content': 'some content'},
2:{'account_id':1235, 'case':1235, 'date': 12/15/15, 'content': 'some content'}
}
dict2 = {
1:{'account_id':1234, 'case':1234, 'date': 12/31/15, 'content': 'some different content'},
2:{'account_id':4321, 'case':4321, 'date': 6/12/15, 'content': 'some different content'},
3:{'account_id':1235, 'case':1235, 'date': 12/15/15, 'content': 'some different content'}
}
I would like to match on account_id, case and date and have the output be a third dictionary with matched entries from dict2 being 1 and 3.
out = {
1:{'account_id':1234, 'case':1234, 'date': 12/31/15, 'content': 'some different content'},
2:{'account_id':1235, 'case':1235, 'date': 12/15/15, 'content': 'some different content'}
}
How would I accomplish this? I am using Python 3.5

Well then, I believe this is what you are looking for:
from itertools import count
from operator import itemgetter
# Set the criteria for unique entry (prevents us from needing to write this twice)
get_identifier = itemgetter("account_id","case","date")
# Create a set of all unique items.
unique_entries = set(map(get_identifier, dict1.values()))
# Get all entries that match one of the unique entries
matched_entires = (d for d in dict2.values() if get_identifier(d) in unique_entries)
# Recreate a new dict together with a counter for items.
out = dict(zip(count(1), matched_entires))
For more info about count() and itemgetter(), see their respective docs.
Using a set and generator comprehensions ensures efficiency at the highest level.

Use list of indices to manipulate a nested dictionary

I'm trying to perform operations on a nested dictionary (data retrieved from a yaml file):
data = {'services': {'web': {'name': 'x'}}, 'networks': {'prod': 'value'}}
I'm trying to modify the above using the inputs like:
{'services.web.name': 'new'}
I converted the above to a list of indices ['services', 'web', 'name']. But I'm not able to/not sure how to perform the below operation in a loop:
data['services']['web']['name'] = new
That way I can modify dict the data. There are other values I plan to change in the above dictionary (it is extensive one) so I need a solution that works in cases where I have to change, EG:
data['services2']['web2']['networks']['local'].
Is there a easy way to do this? Any help is appreciated.

You may iterate over the keys while moving a reference:
data = {'networks': {'prod': 'value'}, 'services': {'web': {'name': 'x'}}}
modification = {'services.web.name': 'new'}
for key, value in modification.items():
keyparts = key.split('.')
to_modify = data
for keypart in keyparts[:-1]:
to_modify = to_modify[keypart]
to_modify[keyparts[-1]] = value
print(data)
Giving:
{'networks': {'prod': 'value'}, 'services': {'web': {'name': 'new'}}}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.