Is there a method of logically merging multiple dictionaries if they have common strings between them? Even if these common strings match between values of one dict() to a key of another?
I see a lot of similar questions on SO but none that seem to address my specific issue of relating multiple keys in "lower level files" to those in higher keys/values(level1dict)
Say we have:
level1dict = { '1':[1,3], '2':2 }
level2dict = { '1':4, '3':[5,9], '2':10 }
level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
finaldict = level1dict
When I say logically I mean, in level1dict 1=1,3 and in level2dict 1=4 and 3=5,9 so overall (so far) 1 = 1,3,4,5,9 (sorting not important)
The result I would like to get to is
#.update or .append or .default?
finaldict = {'1':[1,3,4,5,9,6,8,11,12,14,15,16,17] '2':[2,10,18,19,20]}
Answered: Thank you Ashwini Chaudhary and Abhijit for the networkx module.
This is a problem of connected component subgraphs and can be best determined if you want to use networkx. Here is a solution to your problem
>>> import networkx as nx
>>> level1dict = { '1':[1,3], '2':2 }
>>> level2dict = { '1':4, '3':[5,9], '2':10 }
>>> level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
>>> G=nx.Graph()
>>> for lvl in level:
for key, value in lvl.items():
key = int(key)
try:
for node in value:
G.add_edge(key, node)
except TypeError:
G.add_edge(key, value)
>>> for sg in nx.connected_component_subgraphs(G):
print sg.nodes()
[1, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 17]
[2, 10, 13, 18, 19, 20]
>>>
Here is how you visualize it
>>> import matplotlib.pyplot as plt
>>> nx.draw(G)
>>> plt.show()
A couple of notes:
It's not convenient that some values are numbers and some are lists. Try converting numbers to 1-item lists first.
If the order is not important, you'll be better off using sets instead of lists. They have methods for all sorts of "logical" operations.
Then you can do:
In [1]: dict1 = {'1': {1, 3}, '2': {2}}
In [2]: dict2 = {'1': {4}, '2': {10}, '3': {5, 9}}
In [3]: dict3 = {'1': {6, 8, 11}, '2': {13}, '4': {12}}
In [4]: {k: set.union(*(d[k] for d in (dict1, dict2, dict3)))
for k in set.intersection(*(set(d.keys()) for d in (dict1, dict2, dict3)))}
Out[4]: {'1': set([1, 3, 4, 6, 8, 11]), '2': set([2, 10, 13])}
In [106]: level1dict = { '1':[1,3], '2':2 }
In [107]: level2dict = { '1':4, '3':[5,9], '2':10 }
In [108]: level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
In [109]: keys=set(level2dict) & set(level1dict) & set(level3dict) #returns ['1','2']
In [110]: dic={}
In [111]: for key in keys:
dic[key]=[]
for x in (level1dict,level2dict,level3dict):
if isinstance(x[key],int):
dic[key].append(x[key])
elif isinstance(x[key],list):
dic[key].extend(x[key])
.....:
In [112]: dic
Out[112]: {'1': [1, 3, 4, 6, 8, 11], '2': [2, 10, 13]}
# now iterate over `dic` again to get the values related to the items present
# in the keys `'1'` and `'2'`.
In [122]: for x in dic:
for y in dic[x]:
for z in (level1dict,level2dict,level3dict):
if str(y) in z and str(y) not in dic:
if isinstance(z[str(y)],(int,str)):
dic[x].append(z[str(y)])
elif isinstance(z[str(y)],list):
dic[x].extend(z[str(y)])
.....:
In [123]: dic
Out[123]:
{'1': [1, 3, 4, 6, 8, 11, 5, 9, 14, 15, 12, 16, 17],
'2': [2, 10, 13, 18, 19, 20]}
Related
I have a list like that:hg = [['A1'], ['A1b'], ['A1b1a1a2a1a~'], ['BT'], ['CF'], ['CT'], ['F'], ['GHIJK'], ['I'], ['I1a2a1a1d2a1a~'], ['I2'], ['I2~'], ['I2a'], ['I2a1'], ['I2a1a'], ['I2a1a2'], ['I2a1a2~'], ['IJ'], ['IJK'], ['L1a2']]
For example, if we look at :['A1'] ['A1b'] ['A1b1a1a2a1a~']
I want to count how many time the pattern 'A1','A1b' and 'A1b1a1a2a1a~' occurs.
Basically, A1 appears 3 times (A1 itself, A1 in A1b and A1 in A1b1a1a2a1a) and A1b two times (A1b itself and A1b in A1b1a1a2a1a) and A1b1a1a2a1a one time. Obviously, I want to do that for the entire list.
However, if in the list we have for example E1b1a1, I don't want to count a match of A1 in E1b1a1.
So what I did is:
dic_test = {}
for i in hg:
for j in hg:
if ''.join(i) in ''.join(j):
if ''.join(i) not in dic_test.keys():
dic_test[''.join(i)]=1
else:
dic_test[''.join(i)]+=1
print (dic_test)
output:{'A1': 3, 'A1b': 2, 'A1b1a1a2a1a~': 1, 'BT': 1, 'CF': 1, 'CT': 1, 'F': 2, 'GHIJK': 1, 'I': 12, 'I1a2a1a1d2a1a~': 1, 'I2': 7, 'I2~': 1, 'I2a': 5, 'I2a1': 4, 'I2a1a': 3, 'I2a1a2': 2, 'I2a1a2~': 1, 'IJ': 3, 'IJK': 2, 'L1a2': 1}
However, as explained above, there is one issue. For example, F should be equal at one and not 2. The reason is because with the code above, I look for F anywhere in the list. But I don't know how to correct that!
There is a second thing that I don't know how to do:
Based on the output:
{'A1': 3, 'A1b': 2, 'A1b1a1a2a1a~': 1, 'BT': 1, 'CF': 1, 'CT': 1, 'F': 2, 'GHIJK': 1, 'I': 12, 'I1a2a1a1d2a1a~': 1, 'I2': 7, 'I2~': 1, 'I2a': 5, 'I2a1': 4, 'I2a1a': 3, 'I2a1a2': 2, 'I2a1a2~': 1, 'IJ': 3, 'IJK': 2, 'L1a2': 1}
I would like to sum the values of the dic based on shared pattern:
example of the desired output{A1b1a1a2a1a~: 6, 'BT': 1,'CF': 1, 'CT': 1, 'F': 1, 'GHIJK': 1, 'I1a2a1a1d2a1a~': 13, I2a1a2:35, 'IJK': 5, 'IJK': 5}:
For example, A1b1a1a2a1a = 6 it's because it is made by A1 which has a value of 3, A1b with a value of 2 and the value of A1b1a1a2a1a equal at 1.
I don't know how to do that.
Any helps will be much appreciated!
Thanks
You count 'F' twice because you are iterating over the product of hg and hg so that the condition if ''.join(i) in ''.join(j) happens twice for 'F'. I solved that by checking the indexes.
You mentioned in the comment that the pattern should be at the beginning of the string so in doesn't work here. You can use .startswith() for that.
I first created a dictionary from the items but sorted(That's important for your second question about summing the values). They all start with the value of 1. Then I iterated over the the items, increased the value only if they are not in the same position.
For the second part of your question, because they are sorted, only the previous items can be at the beginning of the next items. So I got the pairs with .popitem() which hands the last pair (in Python 3.7 and above) and check its previous ones until the dictionary is empty.
hg = [['A1'], ['A1b'], ['A1b1a1a2a1a~'], ['BT'], ['CF'], ['CT'], ['F'], ['GHIJK'], ['I'], ['I1a2a1a1d2a1a~'], ['I2'], ['I2~'], ['I2a'], ['I2a1'], ['I2a1a'], ['I2a1a2'], ['I2a1a2~'], ['IJ'], ['IJK'], ['L1a2']]
# create a sorted dicitonary of all items each with the value of 1.
d = dict.fromkeys((item[0] for item in sorted(hg)), 1)
for idx1, (k, v) in enumerate(d.items()):
for idx2, item in enumerate(hg):
if idx1 != idx2 and item[0].startswith(k):
d[k] += 1
print(d)
print("-----------------------------------")
# last pair in `d`
k, v = d.popitem()
result = {k: v}
while d:
# pop last pair in `d`
k1, v1 = d.popitem()
# get last pair in `result`
k2, v2 = next(reversed(result.items()))
if k2.startswith(k1):
result[k2] += v1
else:
result[k1] = v1
print({k: result[k] for k in reversed(result)})
output:
{'A1': 3, 'A1b': 2, 'A1b1a1a2a1a~': 1, 'BT': 1, 'CF': 1, 'CT': 1, 'F': 1, 'GHIJK': 1, 'I': 11, 'I1a2a1a1d2a1a~': 1, 'I2': 7, 'I2a': 6, 'I2a1': 5, 'I2a1a': 4, 'I2a1a2': 3, 'I2a1a2~': 2, 'I2~': 2, 'IJ': 2, 'IJK': 1, 'L1a2': 1}
-----------------------------------
{'A1b1a1a2a1a~': 6, 'BT': 1, 'CF': 1, 'CT': 1, 'F': 1, 'GHIJK': 1, 'I1a2a1a1d2a1a~': 12, 'I2a1a2~': 27, 'I2~': 2, 'IJK': 3, 'L1a2': 1}
I think you made a mistake for your expected result and it should be like this, but let me know if mine is wrong.
#S.B helped me to better understand what I wanted to do, so I did some modifications to the second part of the script.
I converted the dictionary "d" (re-named "hg_d") into a list of list:
hg_d_to_list = list(map(list, hg_d.items()))
Then, I created a dictionary where the keys are the words and the values the list of the words that matches with startswith() like:
nested_HGs = defaultdict(list)
for i in range(len(hg_d_to_list)):
for j in range(i+1,len(hg_d_to_list)):
if hg_d_to_list[j][0].startswith(hg_d_to_list[i][0]):
nested_HGs[hg_d_to_list[j][0]].append(hg_d_to_list[i][0])
nested_HGs defaultdict(<class 'list'>, {'A1b': ['A1'], 'A1b1a1a2a1a': ['A1', 'A1b'], 'I1a2a1a1d2a1a~': ['I'], 'I2': ['I'], 'I2a': ['I', 'I2'], 'I2a1': ['I', 'I2', 'I2a'], 'I2a1a': ['I', 'I2', 'I2a', 'I2a1'], 'I2a1a2': ['I', 'I2', 'I2a', 'I2a1', 'I2a1a'], 'I2a1a2~': ['I', 'I2', 'I2a', 'I2a1', 'I2a1a', 'I2a1a2'], 'I2~': ['I', 'I2'], 'IJ': ['I'], 'IJK': ['I', 'IJ']})
Then, I sum each key and the value(s) associated to the dictionary "nested_HGs" based on the values of the dictionary "hg_d" like:
HGs_score = {}
for key,val in hg_d.items():
for key2,val2 in nested_HGs.items():
if key in val2 or key in key2:
if key2 not in HGs_score.keys():
HGs_score[key2]=val
else:
HGs_score[key2]+=val
HGs_score {'A1b': 5, 'A1b1a1a2a1a': 6, 'I1a2a1a1d2a1a~': 12, 'I2': 18, 'I2a': 24, 'I2a1': 29, 'I2a1a': 33, 'I2a1a2': 36, 'I2a1a2~': 38, 'I2~': 20, 'IJ': 13, 'IJK': 14}
Here, I realized that I don't care about the key with a value = at 1.
To finish, I get the key of the dictionary that has the highest value :
final_HG_classification = max(HGs_score, key=HGs_score.get)
final_HG_classification=I2a1a2~
It looks like it's working! Any suggestions or improvements are more than welcome.
Thanks in advance.
I have multiple data coming for samples and each dataframe look like below.
I wish to convert extraction of each dataframe to a dictionary.
lake = pd.DataFrame({'t': ['t0', 't1', 't2', 't3'],
'area': [10, 20, 10, 15],
'freq': [100, 88, 130, 140],
'sensor1_avg': [2, 5, 2, 8],
'sensor2_avg': [3, 3, 2, 3],
'sensor3_avg': [7, 5, 2, 3],
'sensor4_avg': [7, 5, 2, 3]})
def process_df_todict(df):
max_area = max(df.area)
min_area = min(df.area)
max_freq = max(df.freq)
min_freq = min(df.freq)
max_delta_sensor_avg = max(max(df.sensor1_avg)-min(df.sensor1_avg), max(df.sensor2_avg)-min(df.sensor2_avg), max(df.sensor3_avg)-min(df.sensor3_avg), max(df.sensor4_avg)-min(df.sensor4_avg))
min_delta_sensor_avg = min(max(df.sensor1_avg)-min(df.sensor1_avg), max(df.sensor2_avg)-min(df.sensor2_avg), max(df.sensor3_avg)-min(df.sensor3_avg), max(df.sensor4_avg)-min(df.sensor4_avg))
final_dict = {max_area : eval(max_area), min_area : eval(min_area), ....}
return final_dict
process_df_to_dict(lake)
output: {'max_area': 20, 'min_area': 10, 'max_freq': 140, 'min_freq':
88, 'max_delta_sensor_avg':6, 'min_delta_sensor_avg':1}
Is there any better way to extract data out of dataframe to a dict than what is shown..?
Column has built in max / min methods you can use, also no need for eval:
max_area = df.area.max()
For sensor columns summary, instead of enumerate all columns manually, you can use filter(like='sensor') and process all columns in one go:
lake.filter(like='sensor').pipe(
lambda sensors: sensors.max() - sensors.min()
).pipe(
lambda delta: {
'max_delta_sensor_avg': delta.max(),
'min_delta_sensor_avg': delta.min()
}
)
{'max_delta_sensor_avg': 6, 'min_delta_sensor_avg': 1}
Put together:
def process_df_todict(df):
sensor_stats = lake.filter(like='sensor').pipe(
lambda sensors: sensors.max() - sensors.min()
).pipe(
lambda delta: {'max_delta_sensor_avg': delta.max(), 'min_delta_sensor_avg': delta.min()}
)
return {
'max_area': df.area.max(),
'min_area': df.area.min(),
'max_freq': df.freq.max(),
'min_freq': df.freq.min(),
**sensor_stats
}
process_df_todict(lake)
{'max_area': 20, 'min_area': 10, 'max_freq': 140, 'min_freq': 88, 'max_delta_sensor_avg': 6, 'min_delta_sensor_avg': 1}
I've searched quite a lot but I haven't found any similar question to that one.
I have two lists of dictionaries in following format:
data1 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
data2 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
]
desired output:
final_data = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
I want only dictionaries which are in data1 and not in data2.
Until now when I found a match in two for loops I popped the dictionary out of the list but that does not seem like a good approach to me. How can I achieve desired output?
It doesn't have to be time efficient since there will be max tens of dictionaries in each list
Current implementation:
counter_i = 0
for i in range(len(data1)):
counter_j = 0
for j in range(len(data2)):
if data1[i-counter_i]['id'] == data2[j-counter_j]['id'] and data1[i-counter_i]['date_time'] == data2[j-counter_j]['date_time']
data1.pop(i-counter_i)
data2.pop(j-counter_j)
counter_i += 1
counter_j += 1
break
If performance is not an issue, why not:
for d in data2:
try:
data1.remove(d)
except ValueError:
pass
list.remove checks for object equality, not identity, so will work for dicts with equal keys and values. Also, list.remove only removes one occurrence at a time.
schwobaseggl's answer is probably the cleanest solution (just make a copy before removing if you need to keep data1 intact).
But if you want to use a set difference... well dicts are not hashable, because their underlying data could change and lead to issues (same reason why lists or sets are not hashable either).
However, you can get all the dict pairs in a frozenset to represent a dictionary (assuming the dictionary values are hashable -schwobaseggl). And frozensets are hashable, so you can add those to a set a do normal set difference. And reconstruct the dictionaries at the end :D.
I don't actually recommend doing it, but here we go:
final_data = [
dict(s)
for s in set(
frozenset(d.items()) for d in data1
).difference(
frozenset(d.items()) for d in data2
)
]
you can go in either way:
Method 1:
#using filter and lambda function
final_data = filter(lambda i: i not in data2, data1)
final_data = list(final_data)
Method 2:
# using list comprehension to perform task
final_data = [i for i in data1 if i not in data2]
Hello I have a counterdict that contains data like this :
{1301: Counter({'total': 18,
'inDevelopment': 13,
'isDuplicate': 2,
'inAnalysis': 2,
'inQuest': 1}),
1302: Counter({'total': 15,
'inDevelopment': 9,
'inQuest': 1,
'inValidation': 1,
'inAnalysis': 1,
'ongoing' : 3})}
How can I retrieve its values in a list without repetition.
I mean I would like to extract all the existing values, but instead of getting them all, I would like to have them NOT duplicated, so instead of this :
[' inDevelopment','isDuplicate','inAnalysis', 'inQuest','total', 'inDevelopment','inQuest', 'inValidation','inAnalysis', 'ongoing']
The output would be like this :
['total','inDevelopment','isDuplicate','inAnalysis','inQuest','inValidation','ongoing']
Any help would be appreciated, thanks!
You can union Counter objects using | operator:
>>> from collections import Counter
>>> a = Counter('123')
>>> b = Counter('44144')
>>> a
Counter({'2': 1, '3': 1, '1': 1})
>>> b
Counter({'4': 4, '1': 1})
>>> a | b
Counter({'4': 4, '2': 1, '3': 1, '1': 1})
>>> list(a | b)
['2', '3', '1', '4']
In Python 2.x
>>> from collections import Counter
>>> d = {1301: Counter({'total': 18,
...
... "ongoing" : 3})}
>>> list(reduce(lambda a,b:a|b, d.values()))
['inAnalysis', 'inQuest', 'inDevelopment', ' inDevelopment', 'inValidation', 'ongoing', 'isDuplicate', 'total']
In Python 3.x
>>> from collections import Counter
>>> from functools import reduce
>>> d = ...
>>> list(reduce(lambda a,b:a|b, d.values()))
['inValidation', 'total', ' inDevelopment', 'inDevelopment', 'isDuplicate', 'ongoing', 'inQuest', 'inAnalysis']
UPDATE
You can also use set.union:
>>> list(set().union(*d.values()))
['inValidation', 'inDevelopment', 'isDuplicate', 'total', 'ongoing', 'inAnalysis', 'inQuest', ' inDevelopment']
This work in both Python 2.x/3.x in one code.
you can use np.unique with import numpy as np and
>>> d = {1301: Counter({'total': 18,
'inDevelopment': 13,
'isDuplicate': 2, 'inAnalysis': 2,
'inQuest': 1}),
1302: Counter({'total': 15, 'inDevelopment': 9,
'inQuest': 1,
'inValidation': 1,
'inAnalysis': 1,
"ongoing" : 3})}
gives
>>> np.unique(list(d[1301]|d[1302]))
array(['inAnalysis', 'inDevelopment', 'inQuest', 'inValidation',
'isDuplicate', 'ongoing', 'total'],
dtype='|S13')
I have an array containing an even number of integers. The array represents a pairing of an identifier and a count. The tuples have already been sorted by the identifier. I would like to merge a few of these arrays together. I have thought of a few ways to do it but they are fairly complicated and I feel there might be an easy way to do this with python.
IE:
[<id>, <count>, <id>, <count>]
Input:
[14, 1, 16, 4, 153, 21]
[14, 2, 16, 3, 18, 9]
Output:
[14, 3, 16, 7, 18, 9, 153, 21]
It would be better to store these as dictionaries than as lists (not just for this purpose, but for other use cases, such as extracting the value of a single ID):
x1 = [14, 1, 16, 4, 153, 21]
x2 = [14, 2, 16, 3, 18, 9]
# turn into dictionaries (could write a function to convert)
d1 = dict([(x1[i], x1[i + 1]) for i in range(0, len(x1), 2)])
d2 = dict([(x2[i], x2[i + 1]) for i in range(0, len(x2), 2)])
print d1
# {16: 4, 153: 21, 14: 1}
After that, you could use any of the solutions in this question to add them together. For example (taken from the first answer):
import collections
def d_sum(a, b):
d = collections.defaultdict(int, a)
for k, v in b.items():
d[k] += v
return dict(d)
print d_sum(d1, d2)
# {16: 7, 153: 21, 18: 9, 14: 3}
collections.Counter() is what you need here:
In [21]: lis1=[14, 1, 16, 4, 153, 21]
In [22]: lis2=[14, 2, 16, 3, 18, 9]
In [23]: from collections import Counter
In [24]: dic1=Counter(dict(zip(lis1[0::2],lis1[1::2])))
In [25]: dic2=Counter(dict(zip(lis2[0::2],lis2[1::2])))
In [26]: dic1+dic2
Out[26]: Counter({153: 21, 18: 9, 16: 7, 14: 3})
or :
In [51]: it1=iter(lis1)
In [52]: it2=iter(lis2)
In [53]: dic1=Counter(dict((next(it1),next(it1)) for _ in xrange(len(lis1)/2)))
In [54]: dic2=Counter(dict((next(it2),next(it2)) for _ in xrange(len(lis2)/2)))
In [55]: dic1+dic2
Out[55]: Counter({153: 21, 18: 9, 16: 7, 14: 3})
Use collections.Counter:
import itertools
import collections
def grouper(n, iterable, fillvalue=None):
args = [iter(iterable)] * n
return itertools.izip_longest(fillvalue=fillvalue, *args)
count1 = collections.Counter(dict(grouper(2, lst1)))
count2 = collections.Counter(dict(grouper(2, lst2)))
result = count1 + count2
I've used the itertools library grouper recipe here to convert your data to dictionaries, but as other answers have shown you there are more ways to skin that particular cat.
result is a Counter with each id pointing to a total count:
Counter({153: 21, 18: 9, 16: 7, 14: 3})
Counters are multi-sets and will keep track of the count of each key with ease. It feels like a much better data structure for your data. They support summing, as used above, for example.
All of the previous answers look good, but I think that the JSON blob should be properly formed to begin with or else (from my experience) it can cause some serious problems down the road during debugging etc. In this case with id and count as the fields, the JSON should look like
[{"id":1, "count":10}, {"id":2, "count":10}, {"id":1, "count":5}, ...]
Properly formed JSON like that is much easier to deal with, and probably similar to what you have coming in anyway.
This class is a bit general, but certainly extensible
from itertools import groupby
class ListOfDicts():
def init_(self, listofD=None):
self.list = []
if listofD is not None:
self.list = listofD
def key_total(self, group_by_key, aggregate_key):
""" Aggregate a list of dicts by a specific key, and aggregation key"""
out_dict = {}
for k, g in groupby(self.list, key=lambda r: r[group_by_key]):
print k
total=0
for record in g:
print " ", record
total += record[aggregate_key]
out_dict[k] = total
return out_dict
if __name__ == "__main__":
z = ListOfDicts([ {'id':1, 'count':2, 'junk':2},
{'id':1, 'count':4, 'junk':2},
{'id':1, 'count':6, 'junk':2},
{'id':2, 'count':2, 'junk':2},
{'id':2, 'count':3, 'junk':2},
{'id':2, 'count':3, 'junk':2},
{'id':3, 'count':10, 'junk':2},
])
totals = z.key_total("id", "count")
print totals
Which gives
1
{'count': 2, 'junk': 2, 'id': 1}
{'count': 4, 'junk': 2, 'id': 1}
{'count': 6, 'junk': 2, 'id': 1}
2
{'count': 2, 'junk': 2, 'id': 2}
{'count': 3, 'junk': 2, 'id': 2}
{'count': 3, 'junk': 2, 'id': 2}
3
{'count': 10, 'junk': 2, 'id': 3}
{1: 12, 2: 8, 3: 10}