Python dictionaries, find similarities

Python dictionaries, find similarities - python

I have a python dictionary with a thousand items. Each item is, itself, a dictionary. I'm looking for a clean and elegant way to parse through each item, and find & create templates.
Here's a simplified example of the individual dictionaries' structure:
{'id': 1,
'template': None,
'height': 80,
'width': 120,
'length': 75,
'weight': 100}
From this, I want to pass through once, and if, 500 of the 1000 share the same height and width, determine that, so I can build a template off that data, and assign the template id to 'template'. I can build a gigantic reference hash, but I'm hoping there's a cleaner more elegant way to accomplish this.
The actual data includes closer to 30 keys, of which a small subset need to be excluded from the template checking.

Given dict of dicts items:
import itertools as it
for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])):
# in list(itemIter) you will find all items with dimensions (height, width)

#eumiro had an excellent core idea, namely that of using itertools.groupby() to arrange the items with common values together in batches. However besides neglecting to sort things first using the same key function as #Jochen Ritzel pointed-out (and is also mentioned in the documentation), he also didn't address the several other things you mentioned wanting to do.
Below is a more complete and somewhat longer answer. It determines the templates and assigns them in one pass thought the dict-of-dicts. To do this, after first creating a sorted list of items, it uses groupby() to batch them, and if there are enough in each group, creates a template and assigns its ID to each member.
inventory = {
'item1': {'id': 1, 'template': None, 'height': 80, 'width': 120, 'length': 75, 'weight': 100},
'item2': {'id': 2, 'template': None, 'height': 30, 'width': 40, 'length': 20, 'weight': 20},
'item3': {'id': 3, 'template': None, 'height': 80, 'width': 100, 'length': 96, 'weight': 150},
'item4': {'id': 4, 'template': None, 'height': 30, 'width': 40, 'length': 60, 'weight': 75},
'item5': {'id': 5, 'template': None, 'height': 80, 'width': 100, 'length': 36, 'weight': 33}
}
import itertools as itools
def print_inventory():
print 'inventory:'
for key in sorted(inventory.iterkeys()):
print ' {}: {}'.format(key, inventory[key])
print "-- BEFORE --"
print_inventory()
THRESHOLD = 2
ALLKEYS = ['template', 'height', 'width', 'length', 'weight']
EXCLUDEDKEYS = ['template', 'length', 'weight']
INCLUDEDKEYS = [key for key in ALLKEYS if key not in EXCLUDEDKEYS]
# determines which keys make up a template
sortby = lambda item, keys=INCLUDEDKEYS: tuple(item[key] for key in keys)
templates = {}
templateID = 0
sortedinventory = sorted(inventory.itervalues(), key=sortby)
for templatetuple, similariter in itools.groupby(sortedinventory, sortby):
similaritems = list(similariter)
if len(similaritems) >= THRESHOLD:
# create and assign a template
templateID += 1
templates[templateID] = templatetuple # tuple of values of INCLUDEDKEYS
for item in similaritems:
item['template'] = templateID
print
print "-- AFTER --"
print_inventory()
print
print 'templates:', templates
print
When I run it, the following is the output:
-- BEFORE --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': None, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': None, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': None, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': None, 'id': 5}
-- AFTER --
inventory:
item1: {'weight': 100, 'height': 80, 'width': 120, 'length': 75, 'template': None, 'id': 1}
item2: {'weight': 20, 'height': 30, 'width': 40, 'length': 20, 'template': 1, 'id': 2}
item3: {'weight': 150, 'height': 80, 'width': 100, 'length': 96, 'template': 2, 'id': 3}
item4: {'weight': 75, 'height': 30, 'width': 40, 'length': 60, 'template': 1, 'id': 4}
item5: {'weight': 33, 'height': 80, 'width': 100, 'length': 36, 'template': 2, 'id': 5}
templates: {1: (30, 40), 2: (80, 100)}

Related

Dictionary to Pandas Dataframe without un-nesting some values

I have the below dictionary, and I only want the columns to be key, metric and collectionperiod. These columns can have nested values which I would leave for now and un-nest later. But for some reason the values in the dataframe look off.
{'key': {'formFactor': 'PHONE', 'origin': 'https://www.sample'},
'metrics': {'cumulative_layout_shift': {'histogram': [{'start': '0.00',
'end': '0.10',
'density': 0.7861256879559706},
{'start': '0.10', 'end': '0.25', 'density': 123},
{'start': '0.25', 'density': 111}],
'percentiles': {'p75': '0.07'}},
'experimental_interaction_to_next_paint': {'histogram': [{'start': 0,
'end': 200,
'density': 0.5416453755748598},
{'start': 200, 'end': 500, 'density': 1},
{'start': 500, 'density': 23}],
'percentiles': {'p75': 504}},
'experimental_time_to_first_byte': {'histogram': [{'start': 0,
'end': 800,
'density': 123},
{'start': 800, 'end': 1800, 'density': 123},
{'start': 1800, 'density': 23}],
'percentiles': {'p75': 877}},
'first_contentful_paint': {'histogram': [{'start': 0,
'end': 1800,
'density': 22},
{'start': 1800, 'end': 3000, 'density': 664},
{'start': 3000, 'density': 67}],
'percentiles': {'p75': 1662}},
'first_input_delay': {'histogram': [{'start': 0,
'end': 100,
'density': 234},
{'start': 100, 'end': 300, 'density': 44},
{'start': 300, 'density': 555}],
'percentiles': {'p75': 34}},
'largest_contentful_paint': {'histogram': [{'start': 0,
'end': 2500,
'density': 0.7725250984877367},
{'start': 2500, 'end': 4000, 'density': 777},
{'start': 4000, 'density': 544}],
'percentiles': {'p75': 2352}}},
'collectionPeriod': {'firstDate': {'year': 2022, 'month': 10, 'day': 14},
'lastDate': {'year': 2022, 'month': 11, 'day': 10}}}
When I add the above res to the code below, there seems to be an index column that is actually the 'key' nested values, but I don't want them like that. The dataframe should only have 1 row:
df = pd.DataFrame.from_dict(res, orient ='columns')
df

Given the format of the data, consider using pd.DataFrame.from_dict() which outputs the desired format:
df = pd.DataFrame.from_dict([res])

Nested dictionary from a txt with the dictionary

I have a txt file with the dictionary like this:
{'origin': {'Ukraine': 50, 'Portugal': 20, 'others': 10}, 'native language': {'ucranian': 50; 'english': 45, 'russian': 30, 'others': 10}, 'second language': {'ucranian': 50; 'english': 45, 'russian': 30, 'others': 10, 'none': 0}, 'profession': {'medical doctor': 50, 'healthcare professional': 40, 'cooker': 30, 'others': 10, 'spy': 0}, 'first aid skills': {'yes': 50, 'no': 0}, 'driving skills': {'yes': 40, 'no': 0}, 'cooking skills': {'yes': 50, 'some': 30, 'no': 0}, 'IT skills': {'yes': 50, 'little': 35, 'no': 0}}
And I want to create a dictionary from this
I tried using ast.literal_eval but it gives me the following error:
SyntaxError: expression expected after dictionary key and ':'
This is my code :
def helpersSkills(helpersFile, skillsFile):
"""
"""
helpers = open(helpersFile, 'r')
skills = open(skillsFile, 'r')
skillsLines = skills.read()
dictionary = ast.literal_eval(skillsLines)
...
helpersSkills('helpersArrived2.txt', 'skills.txt')

as said by #ThierryLathuille it was just some writing errors in the txt file
so its working:
{'origin': {'Ukraine': 50, 'Portugal': 20, 'others': 10}, 'native language': {'ucranian': 50, 'english': 45, 'russian': 30, 'others': 10}, 'second language': {'ucranian': 50, 'english': 45, 'russian': 30, 'others': 10, 'none': 0}, 'profession': {'medical doctor': 50, 'healthcare professional': 40, 'cooker': 30, 'others': 10, 'spy': 0}, 'first aid skills': {'yes': 50, 'no': 0}, 'driving skills': {'yes': 40, 'no': 0}, 'cooking skills': {'yes': 50, 'some': 30, 'no': 0}, 'IT skills': {'yes': 50, 'little': 35, 'no': 0}}
This is the code :
def helpersSkills(helpersFile, skillsFile):
"""
"""
helpers = open(helpersFile, 'r')
skills = open(skillsFile, 'r')
skillsLines = skills.read()
dictionary = ast.literal_eval(skillsLines)
...
helpersSkills('helpersArrived2.txt', 'skills.txt')

How to append the sum of keys of each dictionary to another key?

I have a json format like below:-
l = {'itc': 'ball','classes': [{'scores': [{'len': 89,'width':50},{'len': 27,'width': 17}]},
{'scores': [{'len': 90,'width': 44},{'len': 0,'width': 0}]},
{'scores': [{'len': 50,'width': 26},{'len': 0,'width': 0}]}]}
Now I want to create a new list of dictionaries. like below:-
output= [{'result': [{'len': 89, 'width': 50}, {'len': 27, 'width': 17}], 'total': 116}, {'result': [{'len': 90, 'width': 44}, {'len': 0, 'width': 0}], 'total': 90}, {'result': [{'len': 50, 'width': 26}, {'len': 0, 'width': 0}], 'total': 50}]
I was able to divide the values and was able to place in the required format but I am not able to append the total score key 'len' of every dictionary in to the total of every dictionary result. It is calculating the whole values of all the dictionaries. The code and the output I got is as follows:-
added=[]
output=[]
for k,v in l.items():
if k=='classes':
for i in v:
for ke,ve in i.items():
if ke=='scores':
for j in ve:
for key,val in j.items():
if key=='len':
add = val
added.append(add)
sumed=sum(added)
out={'result':ve,'total':sumed}
output.append(out)
print(output)
Output I got:-
[{'result': [{'len': 89, 'width': 50}, {'len': 27, 'width': 17}], 'total': 116}, {'result': [{'len': 90, 'width': 44}, {'len': 0, 'width': 0}], 'total': 206}, {'result': [{'len': 50, 'width': 26}, {'len': 0, 'width': 0}], 'total': 256}]
As you could see that its summing up all the values and appending them to key total. How do I append sum of each dictionary score to the total key of each dictionary result as below?
output= [{'result': [{'len': 89, 'width': 50}, {'len': 27, 'width': 17}], 'total': 116}, {'result': [{'len': 90, 'width': 44}, {'len': 0, 'width': 0}], 'total': 90}, {'result': [{'len': 50, 'width': 26}, {'len': 0, 'width': 0}], 'total': 50}]

Use sum to get the total:
res = [{"result" : cl["scores"], "total" : sum(d["len"] for d in cl["scores"])} for cl in l["classes"]]
print(res)
Output
[{'result': [{'len': 89, 'width': 50}, {'len': 27, 'width': 17}], 'total': 116}, {'result': [{'len': 90, 'width': 44}, {'len': 0, 'width': 0}], 'total': 90}, {'result': [{'len': 50, 'width': 26}, {'len': 0, 'width': 0}], 'total': 50}]
Or the equivalent, for-loop:
res = []
for cl in l["classes"]:
scores = cl["scores"]
total = sum(d["len"] for d in cl["scores"])
res.append({"result": scores, "total": total})

How do i sort a dictionary by a 'subkey' in Python

Could i get some guidance on how to do this.
Variable
people = {'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}
My Intended output after sorting the people variable by the subkey 'distance'
people = {'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}

Most answers provide a way to create a new dictionary based the contents of the old. If you want to simply reorder the keys of the existing dictionary, you can do something similar:
for k in sorted(people, key=lambda x: people[x]['distance']):
people[k] = people.pop(k)
When a key is removed, it is also removed from the iteration order. Adding it back makes it the last key in the iteration order. Repeat this for every key, and you redefine the iteration order of the keys. This works because sorted completes its iteration over the dict before the for loop starts modifying it.

Just use sorted()
people = dict(sorted(people.items(), key=lambda x: x[1]['distance']))
or
people = {k: v for k, v in sorted(people.items(), key=lambda x: x[1]['distance'])}

try the following code:
people = {'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}
people = dict(sorted(people.items(), key=lambda item: item[1]['distance'], reverse=False))
print(people)
Output:
people = {'charles': {'distance': 3, 'age': 37, 'height': 1.4}, 'adam': {'distance': 14, 'age': 22, 'height': 1.3}, 'jeff': {'distance': 46, 'age': 42, 'height': 1.6}}

Python: mimic the effect of groupby with list/dict comprehension

I need to transform a list of dict:
original = [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
]
into a dict with the value of type key as the key:
expected = {
'a': [
{'type': 'a', 'length': 34, 'width': 74},
{'type': 'a', 'length': 15, 'width': 22},
],
'b': [
{'type': 'b', 'length': 53, 'width': 54},
{'type': 'b', 'length': 11, 'width': 45},
],
}
This can be achieved with itertools.groupby or by iterating through the list manually, but is there any way to do it with just list/dict comprehension?

You could do something like this:
{t: [i for i in original if i['type'] == t] for t in {i['type'] for i in original}}
But it's both difficult to read and has a worst-case runtime complexity of O(n²), where n is the number of items in the list. Using itertools.groupby on a sorted list is both faster and easier to read.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python dictionaries, find similarities - python

Given dict of dicts items: import itertools as it for (height, width), itemIter in it.groupby (items.values(), lambda x: (x['height'], x['width'])): # in list(itemIter) you will find all items with dimensions (height, width)

Related

Dictionary to Pandas Dataframe without un-nesting some values

Nested dictionary from a txt with the dictionary

How to append the sum of keys of each dictionary to another key?

How do i sort a dictionary by a 'subkey' in Python

Python: mimic the effect of groupby with list/dict comprehension

Categories

Resources