Compare dicts and merge them. No overwrite and no duplicate values - python

I made a mistake in my question here (wrong requested input and expected output):
Comparing dicts, updating NOT overwriting values
I am not looking for this solution:
Combining 2 dictionaries with common key
So this question is not a duplicate
Problem statement:
requested input:
d1 = {'a': ['a'], 'b': ['b', 'c']}
d2 = {'b': ['c', 'd'], 'c': ['e','f']}
expected output (I don't care about the order of the keys / values!):
new_dict = {'a': ['a'], 'b': ['b', 'c', 'd'], 'c': ['e', 'f']}
The solution in Combining 2 dictionaries with common key
gives following output:
new_dict = {'a': ['a'], 'b': ['b', 'c', 'c', 'd'], 'c': ['e', 'f']}
I don't want the duplicates to be stored.
My solution (it works but it is not so efficient):
unique_vals = []
new_dict = {}
for key in list(d1.keys())+list(d2.keys()) :
unique_vals = []
try:
for val in d1[key]:
try:
for val1 in d2[key]:
if(val1 == val) and (val1 not in unique_vals):
unique_vals.append(val)
except:
continue
except:
new_dict[key] = unique_vals
new_dict[key] = unique_vals
for key in d1.keys():
for val in d1[key]:
if val not in new_dict[key]:
new_dict[key].append(val)
for key in d2.keys():
for val in d2[key]:
if val not in new_dict[key]:
new_dict[key].append(val)

Here is how I would go about it:
d1 = {'a': ['a'], 'b': ['b', 'c']}
d2 = {'b': ['c', 'd'], 'c': ['e','f']}
dd1 = {**d1, **d2}
dd2 = {**d2, **d1}
{k:list(set(dd1[k]).union(set(dd2[k]))) for k in dd1}
Produces the desired result.

I suggest using a default dictionary collection with a set as a default value.
It guarantees that all values will be unique and makes the code cleaner.
Talking about efficiecy it's O(n^2) by time.
from collections import defaultdict
d1 = {'a': ['a'], 'b': ['b', 'c']}
d2 = {'b': ['c', 'd'], 'c': ['e','f']}
new_dict = defaultdict(set)
for k, v in d1.items():
new_dict[k] = new_dict[k].union(set(v))
for k, v in d2.items():
new_dict[k] = new_dict[k].union(set(v))

Try this code. You can remove deep copy if modifications in the initial array are fine for you.
import copy
def merge(left, right):
res = copy.deepcopy(left)
for k, v in right.items():
res[k] = list(set(res[k]).union(v)) if k in res else v
return res

Simple if statement if you don't want to use a Set.
d3 = dict(d2)
for k,v in d1.items():
if k not in d3:
d3[k] = v
else:
for n in d1[k]:
if n not in d3[k]:
d3[k].append(n)

Related

How to merge keys of dictionary which have the same value?

I need to combine two dictionaries by their value, resulting in a new key which is the list of keys with the shared value. All I can find online is how to add two values with the same key or how to simply combine two dictionaries, so perhaps I am just searching in the wrong places.
To give an idea:
dic1 = {'A': 'B', 'C': 'D'}
dic2 = {'D': 'B', 'E': 'F'}
Should result in:
dic3 = {['A', 'D']: 'B', 'C': 'D', 'E': 'F'}
I am not sure why you would need such a data structure, you can probably find a better solution to your problem. However, just for the sake of answering your question, here is a possible solution:
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
key_list = list(dic2.keys())
val_list = list(dic2.values())
r = {}
for k,v in dic1.items():
if v in val_list:
i = val_list.index(v) #get index at value
k2 = key_list[i] #use index to retrive the key at value
r[(k, k2)] = v #make the dict entry
else:
r[k] = v
val_list = list(r.values()) #get all the values already processed
for k,v in dic2.items():
if v not in val_list: #if missing value
r[k] = v #add new entry
print(r)
output:
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
You can't assign a list as a key in a python dictionary since the key must be hashable and a list is not an ashable object, so I have used a tuple instead.
I would use a defaultdict of lists and build a reversed dict and in the end reverse it while converting the lists to tuples (because lists are not hashable and can't be used as dict keys):
from collections import defaultdict
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
temp = defaultdict(list)
for d in (dic1, dic2):
for key, value in d.items():
temp[value].append(key)
print(temp)
res = {}
for key, value in temp.items():
if len(value) == 1:
res[value[0]] = key
else:
res[tuple(value)] = key
print(res)
The printout from this (showing the middle step of temp) is:
defaultdict(<class 'list'>, {'B': ['A', 'D'], 'D': ['C'], 'F': ['E']})
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
If you are willing to compromise from 1-element tuples as keys, the second part will become much simpler:
res = {tuple(value): key for key, value in temp.items()}

create new dict from old dict values? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Goal
old_dct= {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
new_dct = {'A':['B','C','D','E']
,'B':['C','D','E']
,'C':['D','E']
,'D':['E']
,'F':['G','K']
,'G':['K']}
The keys of new_dct is the same as old_dct. The values of new_dct is the value of old_dct based on keys. If the value is still in old_dct keys, then value should be update and the type is list. For example, 'A' is the key of old_dct and its value is B and B is the key of 'old_dct' and its value is C and so on. So the values of A in new_dct is ['B','C','D','E'].
old_dct= {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
def get_values(letter, old_dict, values):
if letter in old_dict:
values.append(letter)
new_letter = old_dict[letter]
return get_values(new_letter, old_dict, values)
values.append(letter)
return values
new_dict = {}
for key, value in old_dct.items():
new_dict[key] = get_values(value, old_dct, [])
print(new_dict)
output
{'A': ['B', 'C', 'D', 'E'], 'B': ['C', 'D', 'E'], 'C': ['D', 'E'], 'D': ['E'], 'F': ['G', 'K'], 'G': ['K']}
you can try this:
old_dct= {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
def itr_dct(key):
lst = []
g = key
while old_dct.get(key) and g:
g = old_dct.get(g)
lst.append(g)
return lst[:-1]
{k : itr_dct(k) for k,v in old_dct.items()}
Output:
{'A': ['B', 'C', 'D', 'E'],
'B': ['C', 'D', 'E'],
'C': ['D', 'E'],
'D': ['E'],
'F': ['G', 'K'],
'G': ['K']}
You could try a first approach (not too efficient because recalculates a lot of paths) like this:
old_dct= {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
new_dct = {}
for l in old_dct:
current_path = []
k = l
while k in old_dct and k is not None:
k = old_dct.get(k,None)
if k:
current_path.append(k)
new_dct [l] = current_path
If you don't want to recalculate paths, you could check if the path of the current letter was already calculated:
old_dct= {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
new_dct = {}
for l in old_dct:
if l not in new_dct :
current_path = []
k = l
while k in old_dct and k is not None:
k = old_dct.get(k,None)
if k:
current_path.append(k)
if k in new_dct :
current_path.extend([k]+new_dct [k])
break
new_dct [l] = current_path
def staggered_dict(d):
l = []
new_d = {}
for key, value in d.items():
temp_d = {**d}
temp_d.pop(key)
l.append(value)
while value in temp_d:
l.append(temp_d[value])
temp_d = {**temp_d}
value = temp_d.pop(value)
new_d[key] = l
l = []
return new_d
old_dct = {'A':'B','B':'C','C':'D','D':'E','F':'G','G':'K'}
new_dct = staggered_dict(old_dct)
print(new_dct)
Out:
{'A': ['B', 'C', 'D', 'E'], 'B': ['C', 'D', 'E'], 'C': ['D', 'E'], 'D': ['E'], 'F': ['G', 'K'], 'G': ['K']}

Dictionary value replace

Imagine I have a dictionary like this:
d = {'1':['a'], '2':['b', 'c', 'd'], '3':['e', 'f'], '4':['g']}
Each key of the dictionary represents a unique person of a certain class.
Each key must have only one value.
Key's with one value represent the correct reassignment.
Key's with more than one value represent the possibilities. One of those values is the most correct for that key.
I have a processed list with the most correct value.
LIST = ['c', 'e']
I must now iterate values of LIST through values of dictionary when len(values) > 1 and replace them to look like this:
d = {'1':['a'], '2':['c'], '3':['e'], '4':['g']}
Initialise your correct values inside a set.
correct = {'c', 'e'}
# correct = set(LIST)
Now, assuming list values with more than one element can have only a single correct element, you can build a dictionary using a conditional comprehension:
d2 = {k : list(correct.intersection(v)) if len(v) > 1 else v for k, v in d.items()}
print(d2)
# {'1': ['a'], '2': ['c'], '3': ['e'], '4': ['g']}
If there can be more than one possible correct value, you can take just the first one.
d2 = {}
for k, v in d.items():
if len(v) > 1:
c = list(correct.intersection(v))
v = c[:1]
d2[k] = v
print(d2)
# {'1': ['a'], '2': ['c'], '3': ['e'], '4': ['g']}
If you meant to mutate d in-place (because making a full copy can be expensive), then the above solution simplifies to
for k, v in d.items():
if len(v) > 1:
c = list(correct.intersection(v))
d[k] = c[:1]
print(d)
# {'1': ['a'], '2': ['c'], '3': ['e'], '4': ['g']}
Another approach in one statement using dict comprehension:
d = {'1':['a'], '2':['b', 'c', 'd'], '3':['e', 'f'], '4':['g']}
a = ['c', 'e']
output = {k: v if not any(j in set(a) for j in v) else list(set(v) & set(a)) if v and isinstance(v, (list, tuple)) else [] for k, v in d.items()}
# More readeable like this:
# {
# k: v if not any(j in set(a) for j in v) else list(set(v) & set(a))
# if v and isinstance(v, (list, tuple))
# else [] for k, v in d.items()
# }
print(output)
Output:
{'1': ['a'], '2': ['c'], '3': ['e'], '4': ['g']}

Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys

Considering that I have two lists like:
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
and I need to create a dictionary where the keys are those element from second list that are found in the first and values are lists of elements found between "keys" like:
result = {
'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']
}
What's a more pythonic way to do this?
Currently I'm doing this :
# I'm not sure that the first element in the second list
# will also be in the first so I have to create a key
k = ''
d[k] = []
for x in l2:
if x in l1:
k = x
d[k] = []
else:
d[k].append(x)
But I'm quite positive that this is not the best way to do it and it also doesn't looks nice :)
Edit:
I also have to mention that no list is necessary ordered and neither the second list must start with an element from the first one.
I don't think you'll do much better if this is the most specific statement of the problem. I mean I'd do it this way, but it's not much better.
import collections
d = collections.defaultdict(list)
s = set(l1)
k = ''
for x in l2:
if x in s:
k = x
else:
d[k].append(x)
For fun, you can also do this with itertools and 3rd party numpy:
import numpy as np
from itertools import zip_longest, islice
arr = np.where(np.in1d(l2, l1))[0]
res = {l2[i]: l2[i+1: j] for i, j in zip_longest(arr, islice(arr, 1, None))}
print(res)
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Here is a version using itertools.groupby. It may or may not be more efficient than the plain version from your post, depending on how groupby is implemented, because the for loop has fewer iterations.
from itertools import groupby
from collections import defaultdict, deque
def group_by_keys(keys, values):
"""
>>> sorted(group_by_keys('abcdef', [
... 1, 2, 3,
... 'b', 4, 5,
... 'd',
... 'a', 6, 7,
... 'c', 8, 9,
... 'a', 10, 11, 12
... ]).items())
[('a', [6, 7, 10, 11, 12]), ('b', [4, 5]), ('c', [8, 9])]
"""
keys = set(keys)
result = defaultdict(list)
current_key = None
for is_key, items in groupby(values, key=lambda x: x in keys):
if is_key:
current_key = deque(items, maxlen=1).pop() # last of items
elif current_key is not None:
result[current_key].extend(items)
return result
This doesn't distinguish between keys that don't occur in values at all (like e and f), and keys for which there are no corresponding values (like d). If this information is needed, one of the other solutions might be better suited.
Updated ... Again
I misinterpreted the question. If you are using large lists then list comprehensions are the way to go and they are fairly simple once you learn how to use them.
I am going to use two list comprehensions.
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
print(res)
Results:
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Speed Testing for large lists:
import collections
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4', *(str(i) for i in range(300)),
'b', 'some_other_el_1', 'some_other_el_2', *(str(i) for i in range(100)),
'c', 'another_element_1', 'another_element_2', *(str(i) for i in range(200)),
'd', '', '', 'another_element_3', 'd4'
]
def run_comp():
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
def run_other():
d = collections.defaultdict(list)
k = ''
for x in l2:
if x in l1:
k = x
else:
d[k].append(x)
import timeit
print('For Loop:', timeit.timeit(run_other, number=1000))
print("List Comprehension:", timeit.timeit(run_comp, number=1000))
Results:
For Loop: 0.1327093063242541
List Comprehension: 0.09343156142774986
old stuff below
This is rather simple with list comprehensions.
{key: [val for val in l2 if key in val] for key in l1}
Results:
{'a': ['a', 'a1', 'a2', 'a3', 'a4'],
'b': ['b', 'b1', 'b2', 'b3', 'b4'],
'c': ['c', 'c1', 'c2', 'c3', 'c4'],
'd': ['d', 'd1', 'd2', 'd3', 'd4'],
'e': [],
'f': []}
The code below shows what is happening above.
d = {}
for key in l1:
d[key] = []
for val in l2:
if key in val:
d[key].append(val)
The list comprehension / dictionary comprehension (First piece of code) is actually way faster. List comprehensions are creating the list in place which is much faster than walking through and appending to the list. Appending makes the program walk the list, allocate more memory, and add the data to the list which can be very slow for large lists.
References:
http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions
You can use itertools.groupby:
import itertools
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = ['x', 'q', 'we', 'da', 'po', 'a', 'el1', 'el2', 'el3', 'el4', 'b', 'some_other_el_1', 'some_other_el_2', 'c', 'another_element_1', 'another_element_2', 'd', '', '', 'another_element_3', 'd4']
groups = [[a, list(b)] for a, b in itertools.groupby(l2, key=lambda x:x in l1)]
final_dict = {groups[i][-1][-1]:groups[i+1][-1] for i in range(len(groups)-1) if groups[i][0]}
Output:
{'a': ['el1', 'el2', 'el3', 'el4'], 'b': ['some_other_el_1', 'some_other_el_2'], 'c': ['another_element_1', 'another_element_2'], 'd': ['', '', 'another_element_3', 'd4']}
Your code is readable, does the job and is reasonably efficient. There's no need to change much!
You could use more descriptive variable names and replace l1 with a set for faster lookup:
keys = ('a', 'c', 'b', 'e', 'f', 'd')
keys_and_values = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
current_key = None
result = {}
for x in keys_and_values:
if x in keys:
current_key = x
result[current_key] = []
elif current_key:
result[current_key].append(x)
print(result)
# {'a': ['el1', 'el2', 'el3', 'el4'],
# 'c': ['another_element_1', 'another_element_2'],
# 'b': ['some_other_el_1', 'some_other_el_2'],
# 'd': ['', '', 'another_element_3', 'd4']}
def find_index():
idxs = [l2.index(i) for i in set(l1).intersection(set(l2))]
idxs.sort()
idxs+= [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
return(res)
Comparison of methods, using justengel's test:
justengel
run_comp: .455
run_other: .244
mkrieger1
group_by_keys: .160
me
find_index: .068
Note that my method ignores keys that don't appear l2, and doesn't handle cases where keys appear more than once in l2. Adding in empty lists for keys that don't appear in l2 can be done by {**res, **{key: [] for key in set(l1).difference(set(l2))}}, which raises the time to .105.
Even cleaner than turning l1 into a set, use the keys of the dictionary you're building. Like this
d = {x: [] for x in l1}
k = None
for x in l2:
if x in d:
k = x
elif k is not None:
d[k].append(x)
This is because (in the worst case) your code would be iterating over all the values in l1 for every value in l2 on the if x in l1: line, because checking if a value is in a list takes linear time. Checking if a value is in a dictionary's keys is constant time in the average case (same with sets, as already suggested by Eric Duminil).
I set k to None and check for it because your code would've returned d with '': ['x','q','we','da','po'], which is presumably not what you want. This assumes l1 can't contain None.
My solution also assumes it's okay for the resulting dictionary to contain keys with empty lists if there are items in l1 that never appear in l2. If that's not okay, you can remove them at the end with
final_d = {k: v for k, v in d.items() if v}

Find same dictionary by value

I have dictionary like below
dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
Then in following dictionary two dictionaries are same, so expected result will be like below
result = [['a','c'],['b','d']]
>>> seen = {}
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> for k in dict1:
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values() # note: unordered
[['a', 'c'], ['b', 'd']]
If order is needed:
>>> from collections import OrderedDict
>>> dict1 = {'a':{'a':20, 'b':30}, 'b':{'a':30, 'b':40}, 'c':{'a':20, 'b':30}, 'd':{'a':30, 'b':40}}
>>> seen = OrderedDict()
>>> for k in sorted(dict1):
fs = frozenset(dict1[k].items())
seen.setdefault(fs, []).append(k)
>>> seen.values()
[['a', 'c'], ['b', 'd']]
Note: This code is currently cross-compatible on Python 2/3. On Python 2 you can make it more efficient by using .iteritems() instead of .items()
A quick one: 1st get different values, then list comprehension.
>>> values = []
>>> for k in dict1:
if dict1[k] not in values:
values.append(dict1[k])
>>> values
[{'a': 20, 'b': 30}, {'a': 30, 'b': 40}]
>>> [[k for k in dict1 if dict1[k] == v] for v in values]
[['a', 'c'], ['b', 'd']]

Categories

Resources