Duplicates in a dictionary (Python) - python

I need to write a function that returns true if the dictionary has duplicates in it. So pretty much if anything appears in the dictionary more than once, it will return true.
Here is what I have but I am very far off and not sure what to do.
d = {"a", "b", "c"}
def has_duplicates(d):
seen = set()
d={}
for x in d:
if x in seen:
return True
seen.add(x)
return False
print has_duplicates(d)

If you are looking to find duplication in values of the dictionary:
def has_duplicates(d):
return len(d) != len(set(d.values()))
print has_duplicates({'a': 1, 'b': 1, 'c': 2})
Outputs:
True

def has_duplicates(d):
return False
Dictionaries do not contain duplicate keys, ever. Your function, btw., is equivalent to this definition, so it's correct (just a tad long).
If you want to find duplicate values, that's
len(set(d.values())) != len(d)
assuming the values are hashable.

In your code, d = {"a", "b", "c"}, d is a set, not a dictionary.
Neither dictionary keys nor sets can contain duplicates. If you're looking for duplicate values, check if the set of the values has the same size as the dictionary itself:
def has_duplicate_values(d):
return len(set(d.values())) != len(d)

Python dictionaries already have unique keys.
Are you possibly interested in unique values?
set(d.values())
If so, you can check the length of that set to see if it is smaller than the number of values. This works because sets eliminate duplicates from the input, so if the result is smaller than the input, it means some duplicates were found and eliminated.

Not only is your general proposition that dictionaries can have duplicate keys false, but also your implementation is gravely flawed: d={} means that you have lost sight of your input d arg and are processing an empty dictionary!

The only thing that a dictionary can have duplicates of, is values. A dictionary is a key, value store where the keys are unique. In Python, you can create a dictionary like so:
d1 = {k1: v1, k2: v2, k3: v1}
d2 = [k1, v1, k2, v2, k3, v1]
d1 was created using the normal dictionary notation. d2 was created from a list with an even number of elements. Note that both versions have a duplicate value.
If you had a function that returned the number of unique values in a dictionary then you could say something like:
len(d1) != func(d1)
Fortunately, Python makes it easy to do this using sets. Simply converting d1 into a set is not sufficient. Lets make our keys and values real so you can run some code.
v1 = 1; v2 = 2
k1 = "a"; k2 = "b"; k3 = "c"
d1 = {k1: v1, k2: v2, k3: v1}
print len(d1)
s = set(d1)
print s
You will notice that s has three members too and looks like set(['c', 'b', 'a']). That's because a simple conversion only uses the keys in the dict. You want to use the values like so:
s = set(d1.values())
print s
As you can see there are only two elements because the value 1 occurs two times. One way of looking at a set is that it is a list with no duplicate elements. That's what print sees when it prints out a set as a bracketed list. Another way to look at it is as a dict with no values. Like many data processing activities you need to start by selecting the data that you are interested in, and then manipulating it. Start by selecting the values from the dict, then create a set, then count and compare.

This is not a dictionary, is a set:
d = {"a", "b", "c"}
I don't know what are you trying to accomplish but you can't have dictionaries with same key. If you have:
>>> d = {'a': 0, 'b':1}
>>> d['a'] = 2
>>> print d
{'a': 2, 'b': 1}

Related

Adding the values in a two different dictionaries and creating a new dictionary

I have the following two dictionaries
scores1={'a':10,'b':20,'c':30,'d':10} #dictionary holds value scores for a,b,c,d
and
scores2={'a':20,'b':10} #this dictionary only has scores for keys a and b
I need to collate and sum the scores for keys a and b in both dictionaries to produce the following output:
The answer could be 'done' using one of the following two methods (and there may be others I'd be interested to hear)
1. Using the creation of a new dictionary:
finalscores={a:30,b:30} #adds up the scores for keys a and b and makes a new dictionary
OR
2. update the scores2 dictionary (and add the values from scores1 to the scores2 corresponding respective values
An accepted answer would show both the above with any suitable explanation as well as suggest any more astute or efficient ways of solving the problem.
There was a suggestion on another SO answer that the dictionaries could simply be added:
print(scores1+scores2)
Is there any pythonic way to combine two dicts (adding values for keys that appear in both)?
But I want to do this in the simplest method possible, without iterator imports or classes
I have also tried, but to no avail:
newdict={}
newdict.update(scores1)
newdict.update(scores2)
for i in scores1.keys():
try:
addition = scores[i] + scores[i]
newdict[i] = addition
except KeyError:
continue
For the first solution:
scores1={'a':10,'b':20,'c':30,'d':10} #dictionary holds value scores for a,b,c,d
scores2={'a':20,'b':10} #this dictionary only has scores for keys a and b
finalscores=dict((key, sum([scores1[key] if key in scores1 else 0, scores2[key] if key in scores2 else 0])) for key in set(scores1.keys()+scores2.keys()))
print(finalscores)
# outputs {'a': 30, 'c': 30, 'b': 30, 'd': 10}
This iterates through a set of all keys in both dictionaries, creates a tuple with the values of the key in both dictionaries or 0 and then passes said tuple through the sum function adding the results. Finally, it generates a dictionary.
EDIT
In multiple lines, to understand the logic, this is what the one-liner does:
finalscores = {}
for key in set(scores1.keys()+scores2.keys()):
score_sum = 0
if key in scores1:
score_sum += scores1[key]
if key in scores2:
score_sum += scores2[key]
finalscores[key] = score_sum
For the second solution:
scores1={'a':10,'b':20,'c':30,'d':10} #dictionary holds value scores for a,b,c,d
scores2={'a':20,'b':10} #this dictionary only has scores for keys a and b
for k1 in scores1:
if k1 in scores2:
scores2[k1] += scores1[k1] # Adds scores1[k1] to scores2[k1], equivalent to do scores2[k1] = scores2[k1] + scores1[k1]
else:
scores2[k1] = scores1[k1]
print(scores2)
# outputs {'a': 30, 'c': 30, 'b': 30, 'd': 10}

Merging values from 2 dictionaries (Python)

(I'm new to Python!)
Trying to figure out this homework question:
The function will takes a​s input​ two dictionaries, each mapping strings to integers. The function will r​eturn​ a dictionary that maps strings from the two input dictionaries to the sum of the integers in the two input dictionaries.
my idea was this:
def ​add(​dicA,dicB):
dicA = {}
dicB = {}
newdictionary = dicA.update(dicB)
however, that brings back None.
In the professor's example:
print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
the output is:
{'alice':15, 'Bob':103, 'Carlie':2}
My issue really is that I don't understand how to add up the values from each dictionaries. I know that the '+' is not supported with dictionaries. I'm not looking for anyone to do my homework for me, but any suggestions would be very much appreciated!
From the documentation:
update([other])
Update the dictionary with the key/value pairs from other, overwriting existing keys. Return None.
You don't want to replace key/value pairs, you want to add the values for similar keys. Go through each dictionary and add each value to the relevant key:
def ​add(​dicA,dicB):
result = {}
for d in dicA, dicB:
for key in d:
result[key] = result.get(key, 0) + d[key]
return result
result.get(key, 0) will retrieve the value of an existing key or produce 0 if key is not yet present.
First of all, a.update(b) updates a in place, and returns None.
Secondly, a.update(b) wouldn't help you to sum the keys; it would just produce a dictionary with the resulting dictionary having all the key, value pairs from b:
>>> a = {'alice':10, 'Bob':3, 'Carlie':1}
>>> b = {'alice':5, 'Bob':100, 'Carlie':1}
>>> a.update(b)
>>> a
{'alice': 5, 'Carlie': 1, 'Bob': 100}
It'd be easiest to use collections.Counter to achieve the desired result. As a plus, it does support addition with +:
from collections import Counter
def add(dicA, dicB):
return dict(Counter(dicA) + Counter(dicB))
This produces the intended result:
>>> print(add({'alice':10, 'Bob':3, 'Carlie':1}, {'alice':5, 'Bob':100, 'Carlie':1}))
{'alice': 15, 'Carlie': 2, 'Bob': 103}
The following is not meant to be the most elegant solution, but to get a feeling on how to deal with dicts.
dictA = {'Alice':10, 'Bob':3, 'Carlie':1}
dictB = {'Alice':5, 'Bob':100, 'Carlie':1}
# how to iterate through a dictionary
for k,v in dictA.iteritems():
print k,v
# make a new dict to keep tally
newdict={}
for d in [dictA,dictB]: # go through a list that has your dictionaries
print d
for k,v in d.iteritems(): # go through each dictionary item
if not k in newdict.keys():
newdict[k]=v
else:
newdict[k]+=v
print newdict
Output:
Bob 3
Alice 10
Carlie 1
{'Bob': 3, 'Alice': 10, 'Carlie': 1}
{'Bob': 100, 'Alice': 5, 'Carlie': 1}
{'Bob': 103, 'Alice': 15, 'Carlie': 2}
def ​add(​dicA,dicB):
You define a function that takes two arguments, dicA and dicB.
dicA = {}
dicB = {}
Then you assign an empty dictionary to both those variables, overwriting the dictionaries you passed to the function.
newdictionary = dicA.update(dicB)
Then you update dicA with the values from dicB, and assign the result to newdictionary. dict.update always returns None though.
And finally, you don’t return anything from the function, so it does not give you any results.
In order to combine those dictionaries, you actually need to use the values that were passed to it. Since dict.update mutates the dictionary it is called on, this would change one of those passed dictionaries, which we generally do not want to do. So instead, we use an empty dictionary, and then copy the values from both dictionaries into it:
def add (dicA, dicB):
newDictionary = {}
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
If you want the values to sum up automatically, then use a Counter instead of a normal dictionary:
from collections import Counter
def add (dicA, dicB):
newDictionary = Counter()
newDictionary.update(dicA)
newDictionary.update(dicB)
return newDictionary
I suspect your professor wants to achieve this using more simple methods. But you can achieve this very easily using collections.Counter.
from collections import Counter
def add(a, b):
return dict(Counter(a) + Counter(b))
Your professor probably wants something like this:
def add(a, b):
new_dict = copy of a
for each key/value pair in b
if key in new_dict
add value to value already present in new_dict
else
insert key/value pair into new_dict
return new_dict
You can try this:
def add(dict1, dict2):
return dict([(key,dict1[key]+dict2[key]) for key in dict1.keys()])
I personally like using a dictionary's get method for this kind of merge:
def add(a, b):
result = {}
for dictionary in (a, b):
for key, value in dictionary.items():
result[key] = result.get(key, 0) + value
return result

Python: Why does the dict.fromkeys method not produce a working dicitonary

I had the following dictionary:
ref_range = range(0,100)
aas = list("ACDEFGHIKLMNPQRSTVWXY*")
new_dict = {}
new_dict = new_dict.fromkeys(ref_range,{k:0 for k in aas})
Then I added a 1 to a specific key
new_dict[30]['G'] += 1
>>>new_dict[30]['G']
1
but
>>>new_dict[31]['G']
1
What is going on here? I only incremented the nested key 30, 'G' by one.
Note: If I generate the dictionary this way:
new_dict = {}
for i in ref_range:
new_dict[i] = {a:0 for a in aas}
Everything behaves fine. I think this is a similar question here, but I wanted to know a bit about why this happening rather than how to solve it.
fromkeys(S, v) sets all of the keys in S to the same value v. Meaning that all of the keys in your dictionary new_dict refer to the same dictionary object, not to their own copies of that dictionary.
To set each to a different dict object you cannot use fromkeys. You need to just set each key to a new dict in a loop.
Besides what you have you could also do
{i: {a: 0 for a in aas} for i in ref_range}

Dividing dictionary into nested dictionaries, based on the key's name on Python 3.4

I have the following dictionary (short version, real data is much larger):
dict = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
I want to divide it into smaller nested dictionaries, depending on a part of the key name.
Expected output would be:
# fixed closing brackets
dict1 = {'C-STD-B&M-SUM: {'-1': 0, '-10': 4.520475}}
dict2 = {'H-NSW-BAC-ART: {'-9': 0.33784000000000003, '0': 0}}
dict3 = {'H-NSW-BAC-ENG: {'-59': 0.020309999999999998, '-6': 0}}
Logic behind is:
dict1: if the part of the key name is 'C-STD-B&M-SUM', add to dict1.
dict2: if the part of the key name is 'H-NSW-BAC-ART', add to dict2.
dict3: if the part of the key name is 'H-NSW-BAC-ENG', add to dict3.
Partial code so far:
def divide_dictionaries(dict):
c_std_bem_sum = {}
for k, v in dict.items():
if k[0:13] == 'C-STD-B&M-SUM':
c_std_bem_sum = k[14:17], v
What I'm trying to do is to create the nested dictionaries that I need and then I'll create the dictionary and add the nested one to it, but I'm not sure if it's a good way to do it.
When I run the code above, the variable c_std_bem_sum becomes a tuple, with only two values that are changed at each iteration. How can I make it be a dictionary, so I can later create another dictionary, and use this one as the value for one of the keys?
One way to approach it would be to do something like
d = {'C-STD-B&M-SUM:-1': 0, 'C-STD-B&M-SUM:-10': 4.520475, 'H-NSW-BAC-ART:-9': 0.33784000000000003, 'H-NSW-BAC-ART:0': 0, 'H-NSW-BAC-ENG:-59': 0.020309999999999998, 'H-NSW-BAC-ENG:-6': 0,}
def divide_dictionaries(somedict):
out = {}
for k,v in somedict.items():
head, tail = k.split(":")
subdict = out.setdefault(head, {})
subdict[tail] = v
return out
which gives
>>> dnew = divide_dictionaries(d)
>>> import pprint
>>> pprint.pprint(dnew)
{'C-STD-B&M-SUM': {'-1': 0, '-10': 4.520475},
'H-NSW-BAC-ART': {'-9': 0.33784000000000003, '0': 0},
'H-NSW-BAC-ENG': {'-59': 0.020309999999999998, '-6': 0}}
A few notes:
(1) We're using nested dictionaries instead of creating separate named dictionaries, which aren't convenient.
(2) We used setdefault, which is a handy way to say "give me the value in the dictionary, but if there isn't one, add this to the dictionary and return it instead.". Saves an if.
(3) We can use .split(":") instead of hardcoding the width, which isn't very robust -- at least assuming that's the delimiter, anyway!
(4) It's a bad idea to use dict, the name of a builtin type, as a variable name.
That's because you're setting your dictionary and overriding it with a tuple:
>>> a = 1, 2
>>> print a
>>> (1,2)
Now for your example:
>>> def divide_dictionaries(dict):
>>> c_std_bem_sum = {}
>>> for k, v in dict.items():
>>> if k[0:13] == 'C-STD-B&M-SUM':
>>> new_key = k[14:17] # sure you don't want [14:], open ended?
>>> c_std_bem_sum[new_key] = v
Basically, this grabs the rest of the key (or 3 characters, as you have it, the [14:None] or [14:] would get the rest of the string) and then uses that as the new key for the dict.

retrieving keys from dictionaries depending on value in python

I'm trying to find the most efficient way in python to create a dictionary of 'guids' (point ids in rhino) and retrieve them depending on the value(s) I assign them, change that value(s) and restoring them back in the dictionary. One catch is that with Rhinoceros3d program the points have a random generated ID number which I don't know so I can only call them depending on the value I give them.
are dictionaries the correct way? should the guids be the value instead of the keys?
a very basic example :
arrPts=[]
arrPts = rs.GetPoints() # ---> creates a list of point-ids
ptsDict = {}
for ind, pt in enumerate(arrPts):
ptsDict[pt] = ('A'+str(ind))
for i in ptsDict.values():
if '1' in i :
print ptsDict.keys()
how can I make the above code print the key that has the value '1' , instead of all the keys? and then change the key's value from 1 to e.g. 2 ?
any help also on the general question would be appreciated to know I'm in the right direction.
Thanks
Pav
You can use dict.items().
An example:
In [1]: dic={'a':1,'b':5,'c':1,'d':3,'e':1}
In [2]: for x,y in dic.items():
...: if y==1:
...: print x
...: dic[x]=2
...:
a
c
e
In [3]: dic
Out[3]: {'a': 2, 'b': 5, 'c': 2, 'd': 3, 'e': 2}
dict.items() returns a list of tuples containing keys and value pairs in python 2.x:
In [4]: dic.items()
Out[4]: [('a', 2), ('c', 2), ('b', 5), ('e', 2), ('d', 3)]
and in python 3.x it returns an iterable view instead of list.
I think you want the GUID's to be values, not keys, since it looks like you want to look them up by something you assign. ...but it really depends on your use case.
# list of GUID's / Rhinoceros3d point ids
arrPts = ['D20EA4E1-3957-11d2-A40B-0C5020524153',
'1D2680C9-0E2A-469d-B787-065558BC7D43',
'ED7BA470-8E54-465E-825C-99712043E01C']
# reference each of these by a unique key
ptsDict = dict((i, value) for i, value in enumerate(arrPts))
# now `ptsDict` looks like: {0:'D20EA4E1-3957-11d2-A40B-0C5020524153', ...}
print(ptsDict[1]) # easy to "find" the one you want to print
# basically make both keys: `2`, and `1` point to the same guid
# Note: we've just "lost" the previous guid that the `2` key was pointing to
ptsDict[2] = ptsDict[1]
Edit:
If you were to use a tuple as the key to your dict, it would look something like:
ptsDict = {(loc, dist, attr3, attr4): 'D20EA4E1-3957-11d2-A40B-0C5020524153',
(loc2, dist2, attr3, attr4): '1D2680C9-0E2A-469d-B787-065558BC7D43',
...
}
As you know, tuples are immutable, so you can't change the key to your dict, but you can remove one key and insert another:
oldval = ptsDict.pop((loc2, dist2, attr3, attr4)) # remove old key and get value
ptsDict[(locx, disty, attr3, attr4)] = oldval # insert it back in with a new key
In order to have one key point to multiple values, you'd have to use a list or set to contain the guids:
{(loc, dist, attr3, attr4): ['D20E...', '1D2680...']}

Categories

Resources