Group dictionaries based on values that differ - python

Grouping dictionaries based on similar values is easy, but I have trouble thinking of a good way of doing the opposite: grouping dictionaries where one of the key's value differ from the rest.
For instance, take these:
a = {1: 'a', 2: 'b', 3:'c'}
b = {1: 'a', 2: 'b', 3:'d'}
c = {1: 'c', 2: 'b', 3:'d'}
These can be grouped into two different sets, where one of the key values differ:
# Expected output:
{3: {a, b}, # Differs on 3
1: {b, c}} # Differs on 1
I have trouble thinking of a good way of implementing such a function. Do you have any suggestions about how to go forward?

You can get a dictionary difference, assuming that the keys and values are hashable, by usings sets on the items. You can then get a list of pairs of dicts, and what their difference is:
a = {1: 'a', 2: 'b', 3:'c'}
b = {1: 'a', 2: 'b', 3:'d'}
c = {1: 'c', 2: 'b', 3:'d'}
def diff_dict(dicta, dictb):
aset = set(dicta.items())
bset = set(dictb.items())
diff = aset ^ bset
return tuple(set(x[0] for x in diff))
print diff_dict(a, b)
(3,)
all_dicts = [a,b,c]
listgroup = []
for dicta, dictb in itertools.combinations(all_dicts, 2):
key = diff_dict(dicta, dictb)
listgroup.append((key, (dicta, dictb)))
If you only want single items, gate the append with an if len(key) == 1.

Related

How to merge keys of dictionary which have the same value?

I need to combine two dictionaries by their value, resulting in a new key which is the list of keys with the shared value. All I can find online is how to add two values with the same key or how to simply combine two dictionaries, so perhaps I am just searching in the wrong places.
To give an idea:
dic1 = {'A': 'B', 'C': 'D'}
dic2 = {'D': 'B', 'E': 'F'}
Should result in:
dic3 = {['A', 'D']: 'B', 'C': 'D', 'E': 'F'}
I am not sure why you would need such a data structure, you can probably find a better solution to your problem. However, just for the sake of answering your question, here is a possible solution:
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
key_list = list(dic2.keys())
val_list = list(dic2.values())
r = {}
for k,v in dic1.items():
if v in val_list:
i = val_list.index(v) #get index at value
k2 = key_list[i] #use index to retrive the key at value
r[(k, k2)] = v #make the dict entry
else:
r[k] = v
val_list = list(r.values()) #get all the values already processed
for k,v in dic2.items():
if v not in val_list: #if missing value
r[k] = v #add new entry
print(r)
output:
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
You can't assign a list as a key in a python dictionary since the key must be hashable and a list is not an ashable object, so I have used a tuple instead.
I would use a defaultdict of lists and build a reversed dict and in the end reverse it while converting the lists to tuples (because lists are not hashable and can't be used as dict keys):
from collections import defaultdict
dic1 = {'A':'B', 'C':'D'}
dic2 = {'D':'B', 'E':'F'}
temp = defaultdict(list)
for d in (dic1, dic2):
for key, value in d.items():
temp[value].append(key)
print(temp)
res = {}
for key, value in temp.items():
if len(value) == 1:
res[value[0]] = key
else:
res[tuple(value)] = key
print(res)
The printout from this (showing the middle step of temp) is:
defaultdict(<class 'list'>, {'B': ['A', 'D'], 'D': ['C'], 'F': ['E']})
{('A', 'D'): 'B', 'C': 'D', 'E': 'F'}
If you are willing to compromise from 1-element tuples as keys, the second part will become much simpler:
res = {tuple(value): key for key, value in temp.items()}

switch key and values in a dict of lists

Hello Stackoverflow people,
I have a nested dictionary with lists as values and I want to create a dict where all the list entries get their corresponding key as value.
Example time!
# what I have
dict1 = {"A":[1,2,3], "B":[4,5,6], "C":[7,8,9]}
# what I want
dict2 = {1:"A", 2:"A", 3:"A", 4:"B", 5:"B", 6:"B", 7:"C", 8:"C", 9:"C"}
Any help will be much appreciated!
Try this
dict1 = {"A":[1,2,3], "B":[4,5,6], "C":[7,8,9]}
dict2= {}
for keys,values in dict1.items():
for i in values:
dict2[i]=keys
print(dict2)
Output
{1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'C', 8: 'C', 9: 'C'}
Hope it helps
Use dictionary comprehension:
d = {'a': 'b', 'c': 'd', 'e': 'f'}
d2 = dict((v1, k) for k, v in d.items() for v1 in v) # Here is the one-liner
assuming your key: value dictionary contains list as a value and using dict comprehension.
Using a second loop to iterate over the list present in original dictionary.
{item: key for key, value in dict1.items() for item in value}

Check if repeating Key or Value exists in Python Dictionary

The following is my dictionary and I need to check if I have repeated key or Value
dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
This should return false or some kind of indicator which helps me print out that key or value might be repeated. It would be much appreciated if I am able to identify if a key is repeated or a Value (but not required).
Dictionaries can't have duplicate keys, so in case of repeated keys it only keeps the last value, so check values (one-liner is your friend):
print(('There are duplicates' if len(set(dict.values()))!=len(values) else 'No duplicates'))
Well in a dictionary keys can't repeat so we only have to deal with values.
dict = {...}
# get the values
values = list(dict.values())
And then you can use a set() to check for duplicates:
if len(values) == len(set(values)): print("no duplicates")
else: print("duplicates)
It's not possible to check if a key repeats in a dictionary, because dictionaries in Python only support unique keys. If you enter the dictionary as is, only the last value will be associated with the redundant key:
In [4]: dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
In [5]: dict
Out[5]: {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'e'}
A one-liner to find repeating values
In [138]: {v: [k for k in d if d[k] == v] for v in set(d.values())}
Out[138]: {'a': [' 1'], 'b': ['2', '3'], 'c': ['4'], 'e': ['5']}
Check all the unique values of the dict with set(d.values()) and then creating a list of keys that correspond to those values.
Note: repeating keys will just be overwritten
In [139]: {'a': 1, 'a': 2}
Out[139]: {'a': 2}
What about
has_dupes = len(d) != len(set(d.values()))
I'm on my phone so I cant test it. But j think it will work.
Well, although key value should be unique according to the documentation, there is still condition where repeated key could appear.
For example,
>>> import json
>>> a = {1:10, "1":20}
>>> b = json.dumps(a)
>>> b
'{"1": 20, "1": 10}'
>>> c = json.loads(b)
>>> c
{u'1': 10}
>>>
But in general, when python finds out there's conflict, it takes the latest value assigned to that key.
For your question, you should use comparison such as
len(dict) == len(set(dict.values()))
because set in python contains an unordered collection of unique and immutable objects, it could automatically get all unique values even when you have duplicate values in dict.values()

Replacing all the elements of a list that are consecutive and duplicates in Python

Say I have a list with some numbers that are duplicates.
list = [1,1,1,1,2,3,4,4,1,2,5,6]
I want to identify all the elements in the list that are repeating and consecutive, including the first element, i.e. replacing all elements in the list to values in a dictionary:
mydict = {1: 'a', 4: 'd'}
list = ['a','a','a','a',2,3,'d','d',1,2,5,6]
Because I want to replace the first instance of the repetition as well, I am quite confused as to how to proceed!
itertools.groupby is your friend:
from itertools import groupby
mydict = {1: 'a', 4: 'd'}
A = [1,1,1,1,2,3,4,4,1,2,5,6]
res = []
for k, g in groupby(A):
size = len(list(g))
if size > 1:
res.extend([mydict[k]] * size) # see note 1
else:
res.append(k)
print(res) # -> ['a', 'a', 'a', 'a', 2, 3, 'd', 'd', 1, 2, 5, 6]
Notes:
If you want to catch possible KeyErrors and have a default value you want to fall back on, use mydict.get(k, <default>) instead of mydict[k]

How to append count numbers to duplicates in a list in Python?

Here is a list containing duplicates:
l1 = ['a', 'b', 'c', 'a', 'a', 'b']
Here is the desired result:
l1 = ['a', 'b', 'c', 'a_1', 'a_2', 'b_1']
How can the duplicates be renamed by appending a count number?
Here is an attempt to achieve this goal; however, is there a more Pythonic way?
for index in range(len(l1)):
counter = 1
list_of_duplicates_for_item = [dup_index for dup_index, item in enumerate(l1) if item == l1[index] and l1.count(l1[index]) > 1]
for dup_index in list_of_duplicates_for_item[1:]:
l1[dup_index] = l1[dup_index] + '_' + str(counter)
counter = counter + 1
In Python, generating a new list is usually much easier than changing an existing list. We have generators to do this efficiently. A dict can keep count of occurrences.
l = ['a', 'b', 'c', 'a', 'a', 'b']
def rename_duplicates( old ):
seen = {}
for x in old:
if x in seen:
seen[x] += 1
yield "%s_%d" % (x, seen[x])
else:
seen[x] = 0
yield x
print list(rename_duplicates(l))
I would do something like this:
a1 = ['a', 'b', 'c', 'a', 'a', 'b']
a2 = []
d = {}
for i in a1:
d.setdefault(i, -1)
d[i] += 1
if d[i] >= 1:
a2.append('%s_%d' % (i, d[i]))
else:
a2.append(i)
print a2
Based on your comment to #mathmike, if your ultimate goal is to create a dictionary from a list with duplicate keys, I would use a defaultdict from the `collections Lib.
>>> from collections import defaultdict
>>> multidict = defaultdict(list)
>>> multidict['a'].append(1)
>>> multidict['b'].append(2)
>>> multidict['a'].append(11)
>>> multidict
defaultdict(<type 'list'>, {'a': [1, 11], 'b': [2]})
I think the output you're asking for is messy itself, and so there is no clean way of creating it.
How do you intend to use this new list? Would a dictionary of counts like the following work instead?
{'a':3, 'b':2, 'c':1}
If so, I would recommend:
from collections import defaultdict
d = defaultdict(int) # values default to 0
for key in l1:
d[key] += 1
I wrote this approach for renaming duplicates in a list with any separator and a numeric or alphabetical postfix (e.g. _1, _2 or _a, _b, _c etc.). Might not be the best you could write efficient-wise, but I like this as a clean readable code which is also scalable easily.
def rename_duplicates(label_list, seperator="_", mode="numeric"):
"""
options for 'mode': numeric, alphabet
"""
import string
if not isinstance(label_list, list) or not isinstance(seperator, str):
raise TypeError("lable_list and separator must of type list and str, respectively")
for item in label_list:
l_count = label_list.count(item)
if l_count > 1:
if mode == "alphabet":
postfix_str = string.ascii_lowercase
if len(postfix_str) < l_count:
# do something
pass
elif mode == "numeric":
postfix_str = "".join([str(i+1) for i in range(l_count)])
else:
raise ValueError("the 'mode' could be either 'numeric' or 'alphabet'")
postfix_iter = iter(postfix_str)
for i in range(l_count):
item_index = label_list.index(item)
label_list[item_index] += seperator + next(postfix_iter)
return label_list
label_list = ['a', 'b', 'c', 'a', 'a', 'b']
use the function:
rename_duplicates(label_list)
result:
['a_1', 'b_1', 'c', 'a_2', 'a_3', 'b_2']

Categories

Resources