Grouping dictionaries based on similar values is easy, but I have trouble thinking of a good way of doing the opposite: grouping dictionaries where one of the key's value differ from the rest.
For instance, take these:
a = {1: 'a', 2: 'b', 3:'c'}
b = {1: 'a', 2: 'b', 3:'d'}
c = {1: 'c', 2: 'b', 3:'d'}
These can be grouped into two different sets, where one of the key values differ:
# Expected output:
{3: {a, b}, # Differs on 3
1: {b, c}} # Differs on 1
I have trouble thinking of a good way of implementing such a function. Do you have any suggestions about how to go forward?
You can get a dictionary difference, assuming that the keys and values are hashable, by usings sets on the items. You can then get a list of pairs of dicts, and what their difference is:
a = {1: 'a', 2: 'b', 3:'c'}
b = {1: 'a', 2: 'b', 3:'d'}
c = {1: 'c', 2: 'b', 3:'d'}
def diff_dict(dicta, dictb):
aset = set(dicta.items())
bset = set(dictb.items())
diff = aset ^ bset
return tuple(set(x[0] for x in diff))
print diff_dict(a, b)
(3,)
all_dicts = [a,b,c]
listgroup = []
for dicta, dictb in itertools.combinations(all_dicts, 2):
key = diff_dict(dicta, dictb)
listgroup.append((key, (dicta, dictb)))
If you only want single items, gate the append with an if len(key) == 1.
Related
I need to combine two dictionaries by their value, resulting in a new key which is the list of keys with the shared value. All I can find online is how to add two values with the same key or how to simply combine two dictionaries, so perhaps I am just searching in the wrong places.
To give an idea:
dic1 = {'A': 'B', 'C': 'D'}
dic2 = {'D': 'B', 'E': 'F'}
Should result in:
dic3 = {['A', 'D']: 'B', 'C': 'D', 'E': 'F'}
I am not sure why you would need such a data structure, you can probably find a better solution to your problem. However, just for the sake of answering your question, here is a possible solution:
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
key_list = list(dic2.keys())
val_list = list(dic2.values())
r = {}
for k,v in dic1.items():
if v in val_list:
i = val_list.index(v) #get index at value
k2 = key_list[i] #use index to retrive the key at value
r[(k, k2)] = v #make the dict entry
else:
r[k] = v
val_list = list(r.values()) #get all the values already processed
for k,v in dic2.items():
if v not in val_list: #if missing value
r[k] = v #add new entry
print(r)
output:
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
You can't assign a list as a key in a python dictionary since the key must be hashable and a list is not an ashable object, so I have used a tuple instead.
I would use a defaultdict of lists and build a reversed dict and in the end reverse it while converting the lists to tuples (because lists are not hashable and can't be used as dict keys):
from collections import defaultdict
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
temp = defaultdict(list)
for d in (dic1, dic2):
for key, value in d.items():
temp[value].append(key)
print(temp)
res = {}
for key, value in temp.items():
if len(value) == 1:
res[value[0]] = key
else:
res[tuple(value)] = key
print(res)
The printout from this (showing the middle step of temp) is:
defaultdict(<class 'list'>, {'B': ['A', 'D'], 'D': ['C'], 'F': ['E']})
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
If you are willing to compromise from 1-element tuples as keys, the second part will become much simpler:
res = {tuple(value): key for key, value in temp.items()}
Hello Stackoverflow people,
I have a nested dictionary with lists as values and I want to create a dict where all the list entries get their corresponding key as value.
Example time!
# what I have
dict1 = {"A":[1,2,3], "B":[4,5,6], "C":[7,8,9]}
# what I want
dict2 = {1:"A", 2:"A", 3:"A", 4:"B", 5:"B", 6:"B", 7:"C", 8:"C", 9:"C"}
Any help will be much appreciated!
Try this
dict1 = {"A":[1,2,3], "B":[4,5,6], "C":[7,8,9]}
dict2= {}
for keys,values in dict1.items():
for i in values:
dict2[i]=keys
print(dict2)
Output
{1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'C', 8: 'C', 9: 'C'}
Hope it helps
Use dictionary comprehension:
d = {'a': 'b', 'c': 'd', 'e': 'f'}
d2 = dict((v1, k) for k, v in d.items() for v1 in v) # Here is the one-liner
assuming your key: value dictionary contains list as a value and using dict comprehension.
Using a second loop to iterate over the list present in original dictionary.
{item: key for key, value in dict1.items() for item in value}
The following is my dictionary and I need to check if I have repeated key or Value
dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
This should return false or some kind of indicator which helps me print out that key or value might be repeated. It would be much appreciated if I am able to identify if a key is repeated or a Value (but not required).
Dictionaries can't have duplicate keys, so in case of repeated keys it only keeps the last value, so check values (one-liner is your friend):
print(('There are duplicates' if len(set(dict.values()))!=len(values) else 'No duplicates'))
Well in a dictionary keys can't repeat so we only have to deal with values.
dict = {...}
# get the values
values = list(dict.values())
And then you can use a set() to check for duplicates:
if len(values) == len(set(values)): print("no duplicates")
else: print("duplicates)
It's not possible to check if a key repeats in a dictionary, because dictionaries in Python only support unique keys. If you enter the dictionary as is, only the last value will be associated with the redundant key:
In [4]: dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
In [5]: dict
Out[5]: {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'e'}
A one-liner to find repeating values
In [138]: {v: [k for k in d if d[k] == v] for v in set(d.values())}
Out[138]: {'a': [' 1'], 'b': ['2', '3'], 'c': ['4'], 'e': ['5']}
Check all the unique values of the dict with set(d.values()) and then creating a list of keys that correspond to those values.
Note: repeating keys will just be overwritten
In [139]: {'a': 1, 'a': 2}
Out[139]: {'a': 2}
What about
has_dupes = len(d) != len(set(d.values()))
I'm on my phone so I cant test it. But j think it will work.
Well, although key value should be unique according to the documentation, there is still condition where repeated key could appear.
For example,
>>> import json
>>> a = {1:10, "1":20}
>>> b = json.dumps(a)
>>> b
'{"1": 20, "1": 10}'
>>> c = json.loads(b)
>>> c
{u'1': 10}
>>>
But in general, when python finds out there's conflict, it takes the latest value assigned to that key.
For your question, you should use comparison such as
len(dict) == len(set(dict.values()))
because set in python contains an unordered collection of unique and immutable objects, it could automatically get all unique values even when you have duplicate values in dict.values()
Say I have a list with some numbers that are duplicates.
list = [1,1,1,1,2,3,4,4,1,2,5,6]
I want to identify all the elements in the list that are repeating and consecutive, including the first element, i.e. replacing all elements in the list to values in a dictionary:
mydict = {1: 'a', 4: 'd'}
list = ['a','a','a','a',2,3,'d','d',1,2,5,6]
Because I want to replace the first instance of the repetition as well, I am quite confused as to how to proceed!
itertools.groupby is your friend:
from itertools import groupby
mydict = {1: 'a', 4: 'd'}
A = [1,1,1,1,2,3,4,4,1,2,5,6]
res = []
for k, g in groupby(A):
size = len(list(g))
if size > 1:
res.extend([mydict[k]] * size) # see note 1
else:
res.append(k)
print(res) # -> ['a', 'a', 'a', 'a', 2, 3, 'd', 'd', 1, 2, 5, 6]
Notes:
If you want to catch possible KeyErrors and have a default value you want to fall back on, use mydict.get(k, <default>) instead of mydict[k]
Here is a list containing duplicates:
l1 = ['a', 'b', 'c', 'a', 'a', 'b']
Here is the desired result:
l1 = ['a', 'b', 'c', 'a_1', 'a_2', 'b_1']
How can the duplicates be renamed by appending a count number?
Here is an attempt to achieve this goal; however, is there a more Pythonic way?
for index in range(len(l1)):
counter = 1
list_of_duplicates_for_item = [dup_index for dup_index, item in enumerate(l1) if item == l1[index] and l1.count(l1[index]) > 1]
for dup_index in list_of_duplicates_for_item[1:]:
l1[dup_index] = l1[dup_index] + '_' + str(counter)
counter = counter + 1
In Python, generating a new list is usually much easier than changing an existing list. We have generators to do this efficiently. A dict can keep count of occurrences.
l = ['a', 'b', 'c', 'a', 'a', 'b']
def rename_duplicates( old ):
seen = {}
for x in old:
if x in seen:
seen[x] += 1
yield "%s_%d" % (x, seen[x])
else:
seen[x] = 0
yield x
print list(rename_duplicates(l))
I would do something like this:
a1 = ['a', 'b', 'c', 'a', 'a', 'b']
a2 = []
d = {}
for i in a1:
d.setdefault(i, -1)
d[i] += 1
if d[i] >= 1:
a2.append('%s_%d' % (i, d[i]))
else:
a2.append(i)
print a2
Based on your comment to #mathmike, if your ultimate goal is to create a dictionary from a list with duplicate keys, I would use a defaultdict from the `collections Lib.
>>> from collections import defaultdict
>>> multidict = defaultdict(list)
>>> multidict['a'].append(1)
>>> multidict['b'].append(2)
>>> multidict['a'].append(11)
>>> multidict
defaultdict(<type 'list'>, {'a': [1, 11], 'b': [2]})
I think the output you're asking for is messy itself, and so there is no clean way of creating it.
How do you intend to use this new list? Would a dictionary of counts like the following work instead?
{'a':3, 'b':2, 'c':1}
If so, I would recommend:
from collections import defaultdict
d = defaultdict(int) # values default to 0
for key in l1:
d[key] += 1
I wrote this approach for renaming duplicates in a list with any separator and a numeric or alphabetical postfix (e.g. _1, _2 or _a, _b, _c etc.). Might not be the best you could write efficient-wise, but I like this as a clean readable code which is also scalable easily.
def rename_duplicates(label_list, seperator="_", mode="numeric"):
"""
options for 'mode': numeric, alphabet
"""
import string
if not isinstance(label_list, list) or not isinstance(seperator, str):
raise TypeError("lable_list and separator must of type list and str, respectively")
for item in label_list:
l_count = label_list.count(item)
if l_count > 1:
if mode == "alphabet":
postfix_str = string.ascii_lowercase
if len(postfix_str) < l_count:
# do something
pass
elif mode == "numeric":
postfix_str = "".join([str(i+1) for i in range(l_count)])
else:
raise ValueError("the 'mode' could be either 'numeric' or 'alphabet'")
postfix_iter = iter(postfix_str)
for i in range(l_count):
item_index = label_list.index(item)
label_list[item_index] += seperator + next(postfix_iter)
return label_list
label_list = ['a', 'b', 'c', 'a', 'a', 'b']
use the function:
rename_duplicates(label_list)
result:
['a_1', 'b_1', 'c', 'a_2', 'a_3', 'b_2']