Dictionary list/dict comparison - python

I would really appreciate any help on the below. I am looking to create a set of values with 1 name compiling all duplicates, with a second dict value to total another value from a list of dicts. i have compiled the below code as an example:
l = [{'id': 1, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 2, 'name': 'apple', 'price': '150', 'year': '2071', 'currency': 'eur'},
{'id': 3, 'name': 'apple', 'price': '1220', 'year': '2076', 'currency': 'eur'},
{'id': 4, 'name': 'cucumber', 'price': '90000000', 'year': '2080', 'currency': 'eur'},
{'id': 5, 'name': 'pear', 'price': '1000', 'year': '2000', 'currency': 'eur'},
{'id': 6, 'name': 'apple', 'price': '150', 'year': '2022', 'currency': 'eur'},
{'id': 9, 'name': 'apple', 'price': '100', 'year': '2000', 'currency': 'eur'},
{'id': 10, 'name': 'grape', 'price': '150', 'year': '2022', 'currency': 'eur'},
]
new_list = []
for d in l:
if d['name'] not in new_list:
new_list.append(d['name'])
print(new_list)
price_list = []
for price in l:
if price['price'] not in price_list:
price_list.append(price['price'])
print(price_list)
The out put i am hoping to achieve is:
[{'name': 'apple'}, {'price': <The total price for all apples>}]

Use a dictionary whose keys are the names and values are the list of prices. Then calculate the averages of each list.
d = {}
for item in l:
d.setdefault(item['name'], []).append(int(item['price']))
for name, prices in d.items()
d[name] = sum(prices)
print(d)
Actually, I thought this was the same as yesterday's question, where you wanted the average. If you just want the total, you don't need the lists. Use a defaultdict containing integers, and just add the price to it.
from collections import defaultdict
d = defaultdict(int)
for item in l:
d[item['name']] += int(item['price'])
print(d)

This method only requires one loop:
prices = {}
for item in l:
prices.update({item['name']: prices.get(item['name'], 0) + int(item['price'])})
print(prices)
Just for fun I decided to also implement the functionality with the item and price dictionaries separated as asked in the question, which gave the following horrendous code:
prices = []
for item in l:
# get indices of prices of corresponding items
price_idx = [n+1 for n, x in enumerate(prices) if item['name'] == x.get('name') and n % 2 == 0]
if not price_idx:
prices.append({'name': item['name']})
prices.append({'price': int(item['price'])})
else:
prices[price_idx[0]].update({'price': prices[price_idx[0]]['price'] + int(item['price'])})
print(prices)
And requires the following function to retrieve prices:
def get_price(name):
for n, x in enumerate(prices):
if n % 2 == 0 and x['name'] == name:
return prices2[n+1]['price']
Which honestly completely defeats the point of having a data structure. But if it answers your question, there you go.

This could be another one:
result = {}
for item in l:
if item['name'] not in result:
result[item['name']] = {'name': item['name'], 'price': 0}
result[item['name']]['price'] += int(item['price'])

Related

Create a list of lists from a dictionary python

I have a list of dictionaries that I am wanting to convert to a nested list with the first element of that list(lst[0]) containing the dictionary keys and the rest of the elements of the list containing values for each dictionary.
[{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
My expected output result is:
[['id','name','city'], ['123','bob','LA'],['321','sally','manhattan'],['125','fred','miami']]
What would be a way to go about this? Any help would be greatly appreciated.
you can use:
d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
[[k for k in d[0].keys()], *[list(i.values()) for i in d ]]
output:
[['id', 'name', 'city'],
['123', 'bob', 'LA'],
['321', 'sally', 'manhattan'],
['125', 'fred', 'miami']]
first, you get a list with your keys then get a list with the values for every inner dict
>>> d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
>>> [list(x[0].keys())]+[list(i.values()) for i in d]
[['id', 'name', 'city'], ['123', 'bob', 'LA'], ['321', 'sally', 'manhattan'], ['125', 'fred', 'miami']]
Serious suggestion: To avoid the possibility of some dicts having a different iteration order, base the order off the first entry and use operator.itemgetter to get a consistent order from all entries efficiently:
import operator
d = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
keys = list(d[0])
keygetter = operator.itemgetter(*keys)
result = [keys, *[list(keygetter(x)) for x in d]] # [keys, *map(list, map(keygetter, d))] might be a titch faster
If a list of tuples is acceptable, this is simpler/faster:
keys = tuple(d[0])
keygetter = operator.itemgetter(*keys)
result = [keys, *map(keygetter, d)]
Unserious suggestion: Let csv do it for you!
import csv
import io
dicts = [{'id': '123',
'name': 'bob',
'city': 'LA'},
{'id': '321',
'name': 'sally',
'city': 'manhattan'},
{'id': '125',
'name': 'fred',
'city': 'miami'}]
with io.StringIO() as sio:
writer = csv.DictWriter(sio, dicts[0].keys())
writer.writeheader()
writer.writerows(dicts)
sio.seek(0)
result = list(csv.reader(sio))
Try it online!
This can be done with for loop and enumerate() built-in method.
listOfDicts = [
{"id": "123", "name": "bob", "city": "LA"},
{"id": "321", "name": "sally", "city": "manhattan"},
{"id": "125", "name": "fred", "city": "miami"},
]
results = []
for index, dic in enumerate(listOfDicts, start = 0):
if index == 0:
results.append(list(dic.keys()))
results.append(list(dic.values()))
else:
results.append(list(dic.values()))
print(results)
output:
[['id', 'name', 'city'], ['123', 'bob', 'LA'], ['321', 'sally', 'manhattan'], ['125', 'fred', 'miami']]

Make dictionaries in list of dictionaries equal length

Assuming a list of dictionaries with unequal length, what's the best way to make them equal length i.e. for the missing key-value, add key but with value set to empty string or null:
lst = [
{'id': '123', 'name': 'john'},
{'id': '121', 'name': 'jane'},
{'id': '121'},
{'name': 'mary'}
]
to become:
lst = [
{'id': '123', 'name': 'john'},
{'id': '121', 'name': 'jane'},
{'id': '121', 'name': ''},
{'id': '', 'name': 'mary'}
]
The only way I can think of is converting to pandas dataframe then back to dict:
pd.DataFrame(lst).to_dict(orient='records')
Finding all the keys requires a full initial pass of the data:
>>> set().union(*lst)
{'id', 'name'}
Now iterate the dicts and set default for each key:
keys = set().union(*lst)
for d in lst:
for k in keys:
d.setdefault(k, '')
You could use colleections.ChainMap to get all the keys:
>>> lst = [
... {'id': '123', 'name': 'john'},
... {'id': '121', 'name': 'jane'},
... {'id': '121'},
... {'name': 'mary'}
... ]
>>>
>>> from collections import ChainMap
>>>
>>> for k in ChainMap(*lst):
... for d in lst:
... _ = d.setdefault(k, '')
...
>>> lst
[{'id': '123', 'name': 'john'}, {'id': '121', 'name': 'jane'}, {'id': '121', 'name': ''}, {'name': 'mary', 'id': ''}]
Try using this snippet
lst = [
{'id': '123', 'name': 'john'},
{'id': '121', 'name': 'jane'},
{'id': '121'},
{'name': 'mary'}
]
for data in lst:
if "name" not in data:
data["name"] = ""
if "id" not in data:
data["id"] = ""
print(lst)
Here's one way (Python 3.5+).
>>> all_keys = set(key for d in lst for key in d)
>>> [{**dict.fromkeys(all_keys, ''), **d} for d in lst]
[{'id': '123', 'name': 'john'}, {'id': '121', 'name': 'jane'}, {'id': '121', 'name': ''}, {'id': '', 'name': 'mary'}]
(Note that the unpacking order is critical here, you must unpack d after the dictionary with the default values in order to override the default values.)

Cartesian product of multiple lists of dictionaries

I have two or more dictionaries and each of them is a list of dictionaries (something like json format), for example:
list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
cartesian_product(list_1 * list_2) = [{'Name': 'John', 'Age':25, 'Product': 'Car', 'Id': 1}, {'Name': 'John', 'Age':25, 'Product': 'TV', 'Id': 2}, {'Name': 'Mary' , 'Age': 15, 'Product': 'Car', 'Id': 1}, {'Name': 'Mary' , 'Age': 15, 'Product': 'TV', 'Id': 2}]
How can I do this and be efficient with memory use? The way i'm doing it right now runs out of RAM with big lists. I know it's probably something with itertools.product , but i couldn't figure out how to do this with a list of dicts. Thank you.
PD: I'm doing it this way for the moment:
gen1 = (row for row in self.tables[0])
table = []
for row in gen1:
gen2 = (dictionary for table in self.tables[1:] for dictionary in table)
for element in gen2:
new_row = {}
new_row.update(row)
new_row.update(element)
table.append(new_row)
Thank you!
Here is a solution to the problem posted:
list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
from itertools import product
ret_list = []
for i1, i2 in product(list_1, list_2):
merged = {}
merged.update(i1)
merged.update(i2)
ret_list.append(merged)
The key here is to make use of the update functionality of dicts to add members. This version will leave the parent dicts unmodified. and will silently drop duplicate keys in favor of whatever is seen last.
However, this will not help with memory usage. The simple fact is that if you want to do this operation in memory you will need to be able to store the starting lists and the resulting product. Alternatives include periodically writing to disk or breaking the starting data into chunks and deleting chunks as you go.
Just convert the dictionaries to lists, take the product, and back to dictionaries again:
import itertools
list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
l1 = [l.items() for l in list_1]
l2 = [l.items() for l in list_2]
print [dict(l[0] + l[1]) for l in itertools.product(l1, l2)]
The output is:
[{'Age': 25, 'Id': 1, 'Name': 'John', 'Product': 'Car'}, {'Age': 25,
'Id': 2, 'Name': 'John', 'Product': 'TV'}, {'Age': 15, 'Id': 1,
'Name': 'Mary', 'Product': 'Car'}, {'Age': 15, 'Id': 2, 'Name':
'Mary', 'Product': 'TV'}]
If this isn't memory-efficient enough for you, then try:
for l in itertools.product(l1.iteritems() for l1 in list_1,
l2.iteritems() for l2 in list_2):
# work with one product at a time
For Python 3:
import itertools
list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
print ([{**l[0], **l[1]} for l in itertools.product(list_1, list_2)])

Python for loops and manipulations

Hey guys I was scoping out the code from another post here on stackoverflow and I noticed something about a "for loop". If you change the list using "pop" or "remove" it messes up the indexes of that internal loop. It will skip whole items if you pop/remove item from the list. The way I got around it was to actually make a copy of the list to use in the "for loop" while I manipulated the other list. Im new to python.
I added to his list. My program removes anyone weighing over 180 or anyone named joe. At first I just used d_list when I notice this issue. Then I just "temp_list = d_list" which I thought made a separate copy but I guess doesn't. Then I used the copy attribute for list to make it work. This way I would not manipulate the list the "for loop" was using.
My question is, is that normal and did I go about it right to fix it? To me if the data was huge you would not want to make a copy of data. The other alternative I come up with is using a while loop instead of the outer for loop.
d_list = [ {'id':1, 'Name': 'Hannah', 'weight':150}, {'id':2, 'Name':'Andrew', 'weight':200}, {'id':3, 'Name':'Joe', 'weight':180},
{'id':4, 'Name':'Joe', 'weight':180}, {'id':5, 'Name':'Steve', 'weight':200}, {'id':6, 'Name':'Joe', 'weight':180},
{'id':7, 'Name':'George', 'weight':180}]
temp_list = d_list
#temp_list = d_list.copy()
print(d_list)
i = 0
for item in temp_list: # may make a while loop
print(item, "i = ", i, end="[")
for k, v in item.items():
print(end="*")
if (k == "weight") and (v > 180):
d_list.pop(i)
print('^popped^', i, end="") # <-- pop but you need an index
i -= 1
elif (k == "Name") and (v == "Joe"):
d_list.remove(item) # <-- remove just uses item to find and remove
print("^removed^", i, end="")
i -= 1
i += 1
print("]")
print(d_list)
print("i = ", i)
Because of the problems you identified, this kind of thing is best done by making a new list of the qualifying elements. Also, it's silly to scan all keys and values; dictionaries are meant to be used by looking up the key:
newlist = []
for item in d_list:
if item["weight"] <= 180 and item["Name"] != "Joe":
newlist.append(item)
You can then free up the old list if you're worried about "wasting" space:
del d_list
temp_list = d_list creates a reference so any changes in either list will be reflected in both so that would definitely not work. temp_list = d_list.copy() creates a shallow copy which would work as would temp_list = d_list[:] but a better approach to avoid any copying at all would be to use reversed and just remove the elements from the list:
for item in reversed(d_list):
if item.get("weight", 0) > 180 or item.get("Name") == "Joe":
d_list.remove(item)
i -= 1
If you wanted to pop you could start at the end using a range in reverse:
for i in range(len(d_list) -1 , -1, - 1):
item = d_list[i]
if item.get("weight", 0) > 180 or item.get("Name") == "Joe":
i -= 1
A third option is a list comprehension using d_list[:] to mutate the original object/list:
d_list[:] = [d for d in d_list if d.get("weight", 0) <= 180 and d.get("Name") != "Joe"]
Or combine it with a generator expression:
d_list[:] = (d for d in d_list if d.get("weight", 0) <= 180 and d.get("Name") != "Joe")
All approaches will give you the same output. Using dict.get instead of iterating over all the items is also a more efficient solution, we do two lookups per iteration instead of looking at all the keys and values in each dict.
Some timings using python3:
In [14]: %%timeit
d_list = [{'id': 1, 'Name': 'Hannah', 'weight': 150}, {'id': 2, 'Name': 'Andrew', 'weight': 200},
{'id': 3, 'Name': 'Joe', 'weight': 180},
{'id': 4, 'Name': 'Joe', 'weight': 180}, {'id': 5, 'Name': 'Steve', 'weight': 200},
{'id': 6, 'Name': 'Joe', 'weight': 180},
{'id': 7, 'Name': 'George', 'weight': 180}]
for item in reversed(d_list):
if item.get("weight", 0) > 180 or item.get("Name") == "Joe":
d_list.remove(item)
....:
100000 loops, best of 3: 4.35 µs per loop
In [15]: %%timeit
d_list = [{'id': 1, 'Name': 'Hannah', 'weight': 150}, {'id': 2, 'Name': 'Andrew', 'weight': 200},
{'id': 3, 'Name': 'Joe', 'weight': 180},
{'id': 4, 'Name': 'Joe', 'weight': 180}, {'id': 5, 'Name': 'Steve', 'weight': 200},
{'id': 6, 'Name': 'Joe', 'weight': 180},
{'id': 7, 'Name': 'George', 'weight': 180}]
for i in range(len(d_list) - 1, -1, - 1): # may make a while loop
item = d_list[i]
if item.get("weight", 0) > 180 or item.get("Name") == "Joe":
d_list.pop(i)
....:
....:
100000 loops, best of 3: 4.48 µs per loop
In [16]: %%timeit
....: d_list = [{'id': 1, 'Name': 'Hannah', 'weight': 150}, {'id': 2, 'Name': 'Andrew', 'weight': 200},
....: {'id': 3, 'Name': 'Joe', 'weight': 180},
....: {'id': 4, 'Name': 'Joe', 'weight': 180}, {'id': 5, 'Name': 'Steve', 'weight': 200},
....: {'id': 6, 'Name': 'Joe', 'weight': 180},
....: {'id': 7, 'Name': 'George', 'weight': 180}]
....: d_list[:] = (d for d in d_list if d.get("weight", 0) <= 180 and d.get("Name") != "Joe")
....:
100000 loops, best of 3: 3.23 µs per loop
In [17]: %%timeit
d_list = [{'id': 1, 'Name': 'Hannah', 'weight': 150}, {'id': 2, 'Name': 'Andrew', 'weight': 200},
{'id': 3, 'Name': 'Joe', 'weight': 180},
{'id': 4, 'Name': 'Joe', 'weight': 180}, {'id': 5, 'Name': 'Steve', 'weight': 200},
{'id': 6, 'Name': 'Joe', 'weight': 180},
{'id': 7, 'Name': 'George', 'weight': 180}]
d_list[:] = [d for d in d_list if d.get("weight", 0) <= 180 and d.get("Name") != "Joe"]
....:
100000 loops, best of 3: 2.98 µs per loop
So the list comp is the fastest followed by the gen exp. If you know the keys always exist then accessing with d["weight"] etc.. will also be faster again

Create a new list of dicts in common between n lists of dicts?

I have an unknown number of lists of product results as dictionary entries that all have the same keys. I'd like to generate a new list of products that appear in all of the old lists.
'what products are available in all cities?'
given:
list1 = [{'id': 1, 'name': 'bat', 'price': 20.00}, {'id': 2, 'name': 'ball', 'price': 12.00}, {'id': 3, 'name': 'brick', 'price': 19.00}]
list2 = [{'id': 1, 'name': 'bat', 'price': 18.00}, {'id': 3, 'name': 'brick', 'price': 11.00}, {'id': 2, 'name': 'ball', 'price': 17.00}]
list3 = [{'id': 1, 'name': 'bat', 'price': 16.00}, {'id': 4, 'name': 'boat', 'price': 10.00}, {'id': 3, 'name': 'brick', 'price': 15.00}]
list4 = [{'id': 1, 'name': 'bat', 'price': 14.00}, {'id': 2, 'name': 'ball', 'price': 9.00}, {'id': 3, 'name': 'brick', 'price': 13.00}]
list...
I want a list of dicts in which the 'id' exists in all of the old lists:
result_list = [{'id': 1, 'name': 'bat}, {'id': 3, 'name': 'brick}]
The values that aren't constant for a given 'id' can be discarded, but the values that are the same for a given 'id' must be in the results list.
If I know how many lists I've got, I can do:
results_list = []
for dict in list1:
if any(dict['id'] == d['id'] for d in list2):
if any(dict['id'] == d['id'] for d in list3):
if any(dict['id'] == d['id'] for d in list4):
results_list.append(dict)
How can I do this if I don't know how many lists I've got?
Put the ids into sets and then take the intersection of the sets.
list1 = [{'id': 1, 'name': 'steve'}, {'id': 2, 'name': 'john'}, {'id': 3, 'name': 'mary'}]
list2 = [{'id': 1, 'name': 'jake'}, {'id': 3, 'name': 'tara'}, {'id': 2, 'name': 'bill'}]
list3 = [{'id': 1, 'name': 'peter'}, {'id': 4, 'name': 'rick'}, {'id': 3, 'name': 'marci'}]
list4 = [{'id': 1, 'name': 'susan'}, {'id': 2, 'name': 'evan'}, {'id': 3, 'name': 'tom'}]
lists = [list1, list2, list3, list4]
sets = [set(x['id'] for x in lst) for lst in lists]
intersection = set.intersection(*sets)
print(intersection)
Result:
{1, 3}
Note that we call the class method set.intersection rather than the instance method set().intersection, since the latter takes intersections of its arguments with the empty set set(), and of course the intersection of anything with the empty set is empty.
If you want to turn this back into a list of dicts, you can do:
result = [{'id': i, 'name': None} for i in intersection]
print(result)
Result:
[{'id': 1, 'name': None}, {'id': 3, 'name': None}]
Now, if you also want to hold onto those attributes which are the same for all instances of a given id, you'll want to do something like this:
list1 = [{'id': 1, 'name': 'bat', 'price': 20.00}, {'id': 2, 'name': 'ball', 'price': 12.00}, {'id': 3, 'name': 'brick', 'price': 19.00}]
list2 = [{'id': 1, 'name': 'bat', 'price': 18.00}, {'id': 3, 'name': 'brick', 'price': 11.00}, {'id': 2, 'name': 'ball', 'price': 17.00}]
list3 = [{'id': 1, 'name': 'bat', 'price': 16.00}, {'id': 4, 'name': 'boat', 'price': 10.00}, {'id': 3, 'name': 'brick', 'price': 15.00}]
list4 = [{'id': 1, 'name': 'bat', 'price': 14.00}, {'id': 2, 'name': 'ball', 'price': 9.00}, {'id': 3, 'name': 'brick', 'price': 13.00}]
lists = [list1, list2, list3, list4]
sets = [set(x['id'] for x in lst) for lst in lists]
intersection = set.intersection(*sets)
all_keys = set(lists[0][0].keys())
result = []
for ident in intersection:
res = [dic for lst in lists
for dic in lst
if dic['id'] == ident]
replicated_keys = []
for key in all_keys:
if len(set(dic[key] for dic in res)) == 1:
replicated_keys.append(key)
result.append({key: res[0][key] for key in replicated_keys})
print(result)
Result:
[{'id': 1, 'name': 'bat'}, {'id': 3, 'name': 'brick'}]
What we do here is:
Look at each id in intersection and grab each dict corresponding to that id.
Find which keys have the same value in all of those dicts (one of which is guaranteed to be id).
Put those key-value pairs into result
This code assumes that:
Each dict in list1, list2, ... will have the same keys. If this assumption is false, let me know - it shouldn't be difficult to relax.

Categories

Resources