Difference of list of dictionaries - python

I've searched quite a lot but I haven't found any similar question to that one.
I have two lists of dictionaries in following format:
data1 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
data2 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
]
desired output:
final_data = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
I want only dictionaries which are in data1 and not in data2.
Until now when I found a match in two for loops I popped the dictionary out of the list but that does not seem like a good approach to me. How can I achieve desired output?
It doesn't have to be time efficient since there will be max tens of dictionaries in each list
Current implementation:
counter_i = 0
for i in range(len(data1)):
counter_j = 0
for j in range(len(data2)):
if data1[i-counter_i]['id'] == data2[j-counter_j]['id'] and data1[i-counter_i]['date_time'] == data2[j-counter_j]['date_time']
data1.pop(i-counter_i)
data2.pop(j-counter_j)
counter_i += 1
counter_j += 1
break

If performance is not an issue, why not:
for d in data2:
try:
data1.remove(d)
except ValueError:
pass
list.remove checks for object equality, not identity, so will work for dicts with equal keys and values. Also, list.remove only removes one occurrence at a time.

schwobaseggl's answer is probably the cleanest solution (just make a copy before removing if you need to keep data1 intact).
But if you want to use a set difference... well dicts are not hashable, because their underlying data could change and lead to issues (same reason why lists or sets are not hashable either).
However, you can get all the dict pairs in a frozenset to represent a dictionary (assuming the dictionary values are hashable -schwobaseggl). And frozensets are hashable, so you can add those to a set a do normal set difference. And reconstruct the dictionaries at the end :D.
I don't actually recommend doing it, but here we go:
final_data = [
dict(s)
for s in set(
frozenset(d.items()) for d in data1
).difference(
frozenset(d.items()) for d in data2
)
]

you can go in either way:
Method 1:
#using filter and lambda function
final_data = filter(lambda i: i not in data2, data1)
final_data = list(final_data)
Method 2:
# using list comprehension to perform task
final_data = [i for i in data1 if i not in data2]

Related

Assigning certain variable, then value to nested dictionary without modifing the original variable [duplicate]

This question already has an answer here:
Only latest value of dictionary getting added to list [duplicate]
(1 answer)
Closed 3 years ago.
To the point, i got two events:
a = {'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000)}
b = {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000)}
my goal is to assign and nest one event to the other like this:
a['key2'] = b
and this is result:
{'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000), 'key2': {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000)}}
but when i want to assign new key to nested it works but it does also modify variable b, result:
a['key2']['nestedkey'] = {'somekey': 'somevalue'}
{'key': 'a', 'time': datetime.datetime(2020, 2, 15, 11, 18, 18, 982000), 'key2': {'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000), 'nestedkey': {'somekey': 'somevalue'}}}
{'key': 'b', 'time': datetime.datetime(2020, 2, 1, 11, 47, 14, 522000), 'nestedkey': {'somekey': 'somevalue'}}
Can someone explain why variable b is getting modified? And if there is anyway to do it without modifying it?
In python by default you're not making a copy of an object when you assign it. So when you're doing a['key2'] = b, a['key2] just hold a reference to b. Weather you modify b or a['key2'] it's going to modify the same object.
To make a copy you can use deepcopy:
import copy
a['key2'] = copy.deepcopy(b)
Then it would works as you are expecting, modifying a['key2'] will not modify b
This happens because of variable b is being used by reference. Basically a['key2']=b says that a['key2'] points to the location in memory where b is stored, so when changes are made to a['key2'] or the variable b the same data is being changed.
To avoid this you can make a deep copy of of b and assign that to a[key2] like so:
import copy
a[key2] = copy.deepcopy(b)
This should give you your desired results.
To get more details about how copy works see here

DataFrame.replace by nested dict

I have a huge Dataframe, where a lot of entries needs to be changed. So, I've created a translation dict, which has the following structure:
{'Data.Bar Layer': {'1': 0,
'1.E': 21,
'2': 13,
'2.E': 22,
'3': 14,
'3.E': 24,
'4': 15,
'4.E': 23,
'B': 16,
'CL1': 1,
'CL2': 2,
'CL2a': 6,
'CL3': 3,
'CL3a': 4,
'CL4': 5,
'E': 18,
'L1': 7,
'L2': 8,
'L2a': 12,
'L3': 9,
'L3a': 10,
'L4': 11,
'T': 17,
'T&B': 19,
'T+B': 20},
'Data.Bar Type': {'N': 0, 'R': 1},
'Data.Defined No. Bars': {'No': 0, 'Yes': 1},
'Data.Design Option': {'-1': 0, 'Main Model': 1},...}
screenshot of the dictionaries print representation
The first key corresponds to the dataframe column and the second key to the value that needs to be changed, e.g. in column Data.Bar Layer all '1' should be 0. This is how the documentation of pandas.dataframe.replace states the dictionary to look like
However, same values have to be exchanged multiple times, which (I guess) leads to the error:
Replacement not allowed with overlapping keys and values
Here is a snippet of the Dataframe. Is there any work around to avoid this error? I tried some approaches with apply and map, but they didn't work, unfortunately.
Thanks in advance and kind regards,
Max
There might be a more pythonic way, but this code works for me;
for col in your_dict.keys():
df[col].replace(your_dict[col], inplace=True)

Adding values from for-loop into an array

I'm trying to get the length of the values of the states array into a separate array then sort them by descending order, but I'm having trouble getting all the length values of the string into the array instead of having a single value after the iteration.
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
for i in states:
a = [len(i)]
print(a)
Since you want the lengths sorted in descending order, use sorted with reverse=True and list comprehension
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
a = sorted([len(i) for i in states], reverse=True)
print (a)
Output
[11, 9, 8, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3]
To get the indices of the sorted list without resorting to NumPy arrays, there are many ways: see here. I personally prefer to directly make use of NumPy's argsort. As the name suggests, it returns an array of indices corresponding to the sorted array/list in ascending order. To get the indices for descending order, you can just reverse the array returned by argsort by using [::-1]. Following is a solution to your problem:
import numpy as np
states = ["Abia", "Adamawa", "Anambra", "Akwa Ibom", "Bauchi", "Bayelsa", "Benue", "Borno", "Cross River", "Delta", "Ebonyi", "Enugu", "Edo", "Ekiti", "Gombe", "Imo", "Jigawa", "Kaduna", "Kano", "Katsina", "Kebbi", "Kogi", "Kwara", "Lagos", "Nasarawa", "Niger", "Ogun", "Ondo", "Osun", "Oyo", "Plateau", "Rivers", "Sokoto", "Taraba", "Yobe", "Zamfara"]
a = [len(i) for i in states]
indices_sorted = np.argsort(a)[::-1] # [::-1] gives you indices for decreasing order
Output
array([ 8, 3, 24, 35, 19, 1, 2, 30, 5, 4, 10, 16, 17, 33, 32, 31, 22,
13, 6, 7, 9, 11, 14, 25, 23, 20, 21, 26, 27, 34, 28, 18, 0, 12,
15, 29])
Now as you can see, the first index in the above output is 8 which means the 9th element of states which is Cross River. Similarly you can access and verify the other elements.
You can use a list comprehension:
lengths = [len(state) for state in states]
If you need to use a for loop, create a list and append to it:
lengths = []
for i in states:
lengths.append(len(i))
You can also do this using the map function without using a for loop:
a = list(map(len,states))
Through generator:
lens = [len(a) for a in states]

Merge two arrays by collections of two elements

I have an array containing an even number of integers. The array represents a pairing of an identifier and a count. The tuples have already been sorted by the identifier. I would like to merge a few of these arrays together. I have thought of a few ways to do it but they are fairly complicated and I feel there might be an easy way to do this with python.
IE:
[<id>, <count>, <id>, <count>]
Input:
[14, 1, 16, 4, 153, 21]
[14, 2, 16, 3, 18, 9]
Output:
[14, 3, 16, 7, 18, 9, 153, 21]
It would be better to store these as dictionaries than as lists (not just for this purpose, but for other use cases, such as extracting the value of a single ID):
x1 = [14, 1, 16, 4, 153, 21]
x2 = [14, 2, 16, 3, 18, 9]
# turn into dictionaries (could write a function to convert)
d1 = dict([(x1[i], x1[i + 1]) for i in range(0, len(x1), 2)])
d2 = dict([(x2[i], x2[i + 1]) for i in range(0, len(x2), 2)])
print d1
# {16: 4, 153: 21, 14: 1}
After that, you could use any of the solutions in this question to add them together. For example (taken from the first answer):
import collections
def d_sum(a, b):
d = collections.defaultdict(int, a)
for k, v in b.items():
d[k] += v
return dict(d)
print d_sum(d1, d2)
# {16: 7, 153: 21, 18: 9, 14: 3}
collections.Counter() is what you need here:
In [21]: lis1=[14, 1, 16, 4, 153, 21]
In [22]: lis2=[14, 2, 16, 3, 18, 9]
In [23]: from collections import Counter
In [24]: dic1=Counter(dict(zip(lis1[0::2],lis1[1::2])))
In [25]: dic2=Counter(dict(zip(lis2[0::2],lis2[1::2])))
In [26]: dic1+dic2
Out[26]: Counter({153: 21, 18: 9, 16: 7, 14: 3})
or :
In [51]: it1=iter(lis1)
In [52]: it2=iter(lis2)
In [53]: dic1=Counter(dict((next(it1),next(it1)) for _ in xrange(len(lis1)/2)))
In [54]: dic2=Counter(dict((next(it2),next(it2)) for _ in xrange(len(lis2)/2)))
In [55]: dic1+dic2
Out[55]: Counter({153: 21, 18: 9, 16: 7, 14: 3})
Use collections.Counter:
import itertools
import collections
def grouper(n, iterable, fillvalue=None):
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
count1 = collections.Counter(dict(grouper(2, lst1)))
count2 = collections.Counter(dict(grouper(2, lst2)))
result = count1 + count2
I've used the itertools library grouper recipe here to convert your data to dictionaries, but as other answers have shown you there are more ways to skin that particular cat.
result is a Counter with each id pointing to a total count:
Counter({153: 21, 18: 9, 16: 7, 14: 3})
Counters are multi-sets and will keep track of the count of each key with ease. It feels like a much better data structure for your data. They support summing, as used above, for example.
All of the previous answers look good, but I think that the JSON blob should be properly formed to begin with or else (from my experience) it can cause some serious problems down the road during debugging etc. In this case with id and count as the fields, the JSON should look like
[{"id":1, "count":10}, {"id":2, "count":10}, {"id":1, "count":5}, ...]
Properly formed JSON like that is much easier to deal with, and probably similar to what you have coming in anyway.
This class is a bit general, but certainly extensible
from itertools import groupby
class ListOfDicts():
def init_(self, listofD=None):
self.list = []
if listofD is not None:
self.list = listofD
def key_total(self, group_by_key, aggregate_key):
""" Aggregate a list of dicts by a specific key, and aggregation key"""
out_dict = {}
for k, g in groupby(self.list, key=lambda r: r[group_by_key]):
print k
total=0
for record in g:
print " ", record
total += record[aggregate_key]
out_dict[k] = total
return out_dict
if __name__ == "__main__":
z = ListOfDicts([ {'id':1, 'count':2, 'junk':2},
{'id':1, 'count':4, 'junk':2},
{'id':1, 'count':6, 'junk':2},
{'id':2, 'count':2, 'junk':2},
{'id':2, 'count':3, 'junk':2},
{'id':2, 'count':3, 'junk':2},
{'id':3, 'count':10, 'junk':2},
])
totals = z.key_total("id", "count")
print totals
Which gives
1
{'count': 2, 'junk': 2, 'id': 1}
{'count': 4, 'junk': 2, 'id': 1}
{'count': 6, 'junk': 2, 'id': 1}
2
{'count': 2, 'junk': 2, 'id': 2}
{'count': 3, 'junk': 2, 'id': 2}
{'count': 3, 'junk': 2, 'id': 2}
3
{'count': 10, 'junk': 2, 'id': 3}
{1: 12, 2: 8, 3: 10}

Merging 3 dict()'s in python

Is there a method of logically merging multiple dictionaries if they have common strings between them? Even if these common strings match between values of one dict() to a key of another?
I see a lot of similar questions on SO but none that seem to address my specific issue of relating multiple keys in "lower level files" to those in higher keys/values(level1dict)
Say we have:
level1dict = { '1':[1,3], '2':2 }
level2dict = { '1':4, '3':[5,9], '2':10 }
level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
finaldict = level1dict
When I say logically I mean, in level1dict 1=1,3 and in level2dict 1=4 and 3=5,9 so overall (so far) 1 = 1,3,4,5,9 (sorting not important)
The result I would like to get to is
#.update or .append or .default?
finaldict = {'1':[1,3,4,5,9,6,8,11,12,14,15,16,17] '2':[2,10,18,19,20]}
Answered: Thank you Ashwini Chaudhary and Abhijit for the networkx module.
This is a problem of connected component subgraphs and can be best determined if you want to use networkx. Here is a solution to your problem
>>> import networkx as nx
>>> level1dict = { '1':[1,3], '2':2 }
>>> level2dict = { '1':4, '3':[5,9], '2':10 }
>>> level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
>>> G=nx.Graph()
>>> for lvl in level:
for key, value in lvl.items():
key = int(key)
try:
for node in value:
G.add_edge(key, node)
except TypeError:
G.add_edge(key, value)
>>> for sg in nx.connected_component_subgraphs(G):
print sg.nodes()
[1, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 17]
[2, 10, 13, 18, 19, 20]
>>>
Here is how you visualize it
>>> import matplotlib.pyplot as plt
>>> nx.draw(G)
>>> plt.show()
A couple of notes:
It's not convenient that some values are numbers and some are lists. Try converting numbers to 1-item lists first.
If the order is not important, you'll be better off using sets instead of lists. They have methods for all sorts of "logical" operations.
Then you can do:
In [1]: dict1 = {'1': {1, 3}, '2': {2}}
In [2]: dict2 = {'1': {4}, '2': {10}, '3': {5, 9}}
In [3]: dict3 = {'1': {6, 8, 11}, '2': {13}, '4': {12}}
In [4]: {k: set.union(*(d[k] for d in (dict1, dict2, dict3)))
for k in set.intersection(*(set(d.keys()) for d in (dict1, dict2, dict3)))}
Out[4]: {'1': set([1, 3, 4, 6, 8, 11]), '2': set([2, 10, 13])}
In [106]: level1dict = { '1':[1,3], '2':2 }
In [107]: level2dict = { '1':4, '3':[5,9], '2':10 }
In [108]: level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
In [109]: keys=set(level2dict) & set(level1dict) & set(level3dict) #returns ['1','2']
In [110]: dic={}
In [111]: for key in keys:
dic[key]=[]
for x in (level1dict,level2dict,level3dict):
if isinstance(x[key],int):
dic[key].append(x[key])
elif isinstance(x[key],list):
dic[key].extend(x[key])
.....:
In [112]: dic
Out[112]: {'1': [1, 3, 4, 6, 8, 11], '2': [2, 10, 13]}
# now iterate over `dic` again to get the values related to the items present
# in the keys `'1'` and `'2'`.
In [122]: for x in dic:
for y in dic[x]:
for z in (level1dict,level2dict,level3dict):
if str(y) in z and str(y) not in dic:
if isinstance(z[str(y)],(int,str)):
dic[x].append(z[str(y)])
elif isinstance(z[str(y)],list):
dic[x].extend(z[str(y)])
.....:
In [123]: dic
Out[123]:
{'1': [1, 3, 4, 6, 8, 11, 5, 9, 14, 15, 12, 16, 17],
'2': [2, 10, 13, 18, 19, 20]}

Categories

Resources