I have created three dictionaries-dict1, dict2, and dict2. I want to update dict1 with dict2 first, and resulting dictionary with dict3. I am not sure why they are not adding up.
def wordcount_directory(directory):
dict = {}
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
dicts=[wordcount_file(file) for file in filelist]
dict1=dicts[0]
dict2=dicts[1]
dict3=dicts[2]
for k,v in dict1.iteritems():
if k in dict2.keys():
dict1[k]+=1
else:
dict1[k]=v
for k1,v1 in dict1.iteritems():
if k1 in dict3.keys():
dict1[k1]+=1
else:
dict1[k1]=v1
return dict1
print wordcount_directory("C:\\Users\\Phil2040\\Desktop\\Word_count")
Maybe I am not understanding you question right, but are you trying to add all the values from each of the dictionaries together into one final dictionary? If so:
dict1 = {'a': 1, 'b': 2, 'c': 3}
dict2 = {'b': 5, 'c': 1, 'd': 9}
dict3 = {'d': 1, 'e': 7}
def add_dict(to_dict, from_dict):
for key, value in from_dict.iteritems():
to_dict[key] = to_dict.get(key, 0) + value
result = dict(dict1)
add_dict(result, dict2)
add_dict(result, dict3)
print result
This yields: {'a': 1, 'c': 4, 'b': 7, 'e': 7, 'd': 10}
It would be really helpful to post what the expected outcome should be for your question.
EDIT:
For an arbitrary amount of dictionaries:
result = dict(dicts[0])
for dict_sum in dicts[1:]:
add_dict(result, dict_sum)
print(result)
If you really want to fix the code from your original question in the format it is in:
You are using dict1[k]+=1 when you should be performing dict1[k]+=dict2.get(k, 0).
The introduction of get removes the need to check for its existence with an if statement.
You need to iterate though dict2 and dict3 to introduce new keys from them into dict1
(not really a problem, but worth mentioning) In the if statement to check if the key is in the dictionary, it is recommended to simply the operation to if k in dict2: (see this post for more details)
With the amazing built-in library found by #DisplacedAussie, the answer can be simplified even further:
from collections import Counter
print(Counter(dict1) + Counter(dict2) + Counter(dict3))
The result yields: Counter({'d': 10, 'b': 7, 'e': 7, 'c': 4, 'a': 1})
The Counter object is a sub-class of dict, so it can be used in the same way as a standard dict.
Hmmm, here a simple function that might help:
def dictsum(dict1, dict2):
'''Modify dict1 to accumulate new sums from dict2
'''
k1 = set(dict1.keys())
k2 = set(dict2.keys())
for i in k1 & k2:
dict1[i] += dict2[i]
for i in k2 - k1:
dict1[i] = dict2[i]
return None
... for the intersection update each by adding the second value to the existing one; then for the difference add those key/value pairs.
With that defined you'd simple call:
dictsum(dict1, dict2)
dictsum(dict1, dict3)
... and be happy.
(I will note that functions modify the contents of dictionaries in this fashion are not all that common. I'm returning None explicitly to follow the convention established by the list.sort() method ... functions which modify the contents of a container, in Python, do not normally return copies of the container).
If I understand your question correctly, you are iterating on the wrong dictionary. You want to iterate over dict2 and update dict1 with matching keys or add non-matching keys to dict1.
If so, here's how you need to update the for loops:
for k,v in dict2.iteritems(): # Iterate over dict2
if k in dict1.keys():
dict1[k]+=1 # Update dict1 for matching keys
else:
dict1[k]=v # Add non-matching keys to dict1
for k1,v1 in dict3.iteritems(): # Iterate over dict3
if k1 in dict1.keys():
dict1[k1]+=1 # Update dict1 for matching keys
else:
dict1[k1]=v1 # Add non-matching keys to dict1
I assume that wordcount_file(file) returns a dict of the words found in file, with each key being a word and the associated value being the count for that word. If so, your updating algorithm is wrong. You should do something like this:
keys1 = dict1.keys()
for k,v in dict2.iteritems():
if k in keys1:
dict1[k] += v
else:
dict1[k] = v
If there's a lot of data in these dicts you can make the key lookup faster by storing the keys in a set:
keys1 = set(dict1.keys())
You should probably put that code into a function, so you don't need to duplicate the code when you want to update dict1 with the data in dict3.
You should take a look at collections.Counter, a subclass of dict that supports counting; using Counters would simplify this task considerably. But if this is an assignment (or you're using Python 2.6 or older) you may not be able to use Counters.
Related
I try to sum the value of a key present in other dictionaries with this code:
import functools
import operator
import collections
my_dict = [{'a':0, 'b':1, 'c':5}, {'b':3, 'c':2}, {'b':1, 'c':1}]
sum_key_value = functools.reduce(operator.add, map(collections.Counter, my_dict))
print(sum_key_value)
# Output
# Counter({'c': 8, 'b': 5})
My question is if I want the output to keep all dictionary keys, even if the key does not appear in all the dictionaries like a in my case, what is the best way to do that without using a loop ?
Well there's a lot of nice ways to do it with a for loop, but since you specifially want to avoid a for loop, here's one way:
sum_key_value = dict(functools.reduce(lambda a, b: a.update(b) or a,
my_dict, collections.Counter()))
So what happens here is you create a single Counter, and use it to accumulate the values.
As mentioned in the comments, adding Counter objects will remove non positive keys.
So the issue is not really about not ending up with the union of all keys (as well as adding common values), since that is indeed the behaviour, see if we set a:2:
my_dict = [{'a':2, 'b':1, 'c':5}, {'b':3, 'c':2}, {'b':1, 'c':1}]
functools.reduce(operator.add, map(Counter, my_dict))
# Counter({'a': 2, 'b': 5, 'c': 8})
However, as shown in the question, as per the current implementation when adding Counter objects, non positive values (a:0) get removed.
If you really wanted to use Counter for this, you could tweak a little the current implementation overriding __add__ to get the expected behaviour:
class Counter_tweaked(Counter):
def __add__(self, other):
if not isinstance(other, Counter):
return NotImplemented
result = Counter_tweaked()
for elem, count in self.items():
newcount = count + other[elem]
result[elem] = newcount
for elem, count in other.items():
if elem not in self:
result[elem] = count
return result
functools.reduce(operator.add, map(Counter_tweaked, my_dict))
# Counter_tweaked({'a': 0, 'b': 5, 'c': 8})
The most straightforward approach, here, would be a loop. You might have keys appearing in dictionaries anywhere in the list (e.g.: the third one could have the key "e"), so you would need at least one loop to get the total of keys. And then you can just loop through all dictionaries again to sum up the values. Make an own function of it and you can call it without ever caring about loops again.
def sum_it_up(dictlist):
outdic = {}
for d in dictlist:
for k in d.keys():
outdic[k] = 0
for d in dictlist:
for k in d.keys():
outdic[k]+=d[k]
return outdic
my_dict = [{'a':0, 'b':1, 'c':5}, {'b':3, 'c':2}, {'b':1, 'c':1}]
sum_key_value = sum_it_up(my_dict)
Subtract(or Update) a zero value Counter to keep zero value items
my_dict = [{'a':0, 'b':1, 'c':5}, {'b':3, 'c':2}, {'b':1, 'c':1}]
sum_my_dict = Counter()
zero_dict = Counter()
my_dict_keys = set()
# get the keys of all dict
for dic in my_dict:
sum_my_dict += Counter(dic)
my_dict_keys.update(dic.keys())
# create a dict with all zero values
for key in my_dict_keys:
zero_dict[key] = 0
# in-place subtract zero dict (alter sum_my_dict)
sum_my_dict.subtract(zero_dict)
# in-plact update is the same with subtract
# sum_my_dict.update(zero_dict)
print(sum_my_dict)
Counter({'c': 8, 'b': 5, 'a': 0})
I have two or more dictionary, I like to merge it as one with retaining multiple values of the same key as list. I would not able to share the original code, so please help me with the following example.
Input:
a= {'a':1, 'b': 2}
b= {'aa':4, 'b': 6}
c= {'aa':3, 'c': 8}
Output:
c= {'a':1,'aa':[3,4],'b': [2,6], 'c': 8}
I suggest you read up on the defaultdict: it lets you provide a factory method that initializes missing keys, i.e. if a key is looked up but not found, it creates a value by calling factory_method(missing_key). See this example, it might make things clearer:
from collections import defaultdict
a = {'a': 1, 'b': 2}
b = {'aa': 4, 'b': 6}
c = {'aa': 3, 'c': 8}
stuff = [a, b, c]
# our factory method is the list-constructor `list`,
# so whenever we look up a value that doesn't exist, a list is created;
# we can always be sure that we have list-values
store = defaultdict(list)
for s in stuff:
for k, v in s.items():
# since we know that our value is always a list, we can safely append
store[k].append(v)
print(store)
This has the "downside" of creating one-element lists for single occurences of values, but maybe you are able to work around that.
Please find below to resolve your issue. I hope this would work for you.
from collections import defaultdict
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c={'aa':3, 'c': 8}
dd = defaultdict(list)
for d in (a,b,c):
for key, value in d.items():
dd[key].append(value)
print(dd)
Use defaultdict to automatically create a dictionary entry with an empty list.
To process all source dictionaries in a single loop, use itertools.chain.
The main loop just adds a value from the current item, to the list under
the current key.
As you wrote, for cases when under some key there is only one item,
you have to generate a work dictionary (using dictonary comprehension),
limited to items with value (list) containing only one item.
The value of such item shoud contain only the first (and only) number
from the source list.
Then use this dictionary to update d.
So the whole script can be surprisingly short, as below:
from collections import defaultdict
from itertools import chain
a = {'a':1, 'b': 2}
b = {'aa':4, 'b': 6}
c = {'aa':3, 'c': 8}
d = defaultdict(list)
for k, v in chain(a.items(), b.items(), c.items()):
d[k].append(v)
d.update({ k: v[0] for k, v in d.items() if len(v) == 1 })
As you can see, the actual processing code is contained in only 4 (last) lines.
If you print d, the result is:
defaultdict(list, {'a': 1, 'b': [2, 6], 'aa': [4, 3], 'c': 8})
I have two dictionaries. In both dictionaries, the value of each key is a single list. If any element in any list in dictionary 2 is equal to a key of dictionary 1, I want to replace that element with the first element in that dictionary 1 list.
In other words, I have:
dict1 = {'IDa':['newA', 'x'], 'IDb':['newB', 'x']}
dict2 = {1:['IDa', 'IDb']}
and I want:
dict2 = {1:['newA', 'newB']}
I tried:
for ID1, news in dict1.items():
for x, ID2s in dict2.items():
for ID in ID2s:
if ID == ID1:
print ID1, 'match'
ID.replace(ID, news[0])
for k, v in dict2.items():
print k, v
and I got:
IDb match
IDa match
1 ['IDa', IDb']
So it looks like everything up to the replace method is working. Is there a way to make this work? To replace an entire string in a value-list with a string in another value-list?
Thanks a lot for your help.
Try this:
dict1 = {'IDa':['newA', 'x'], 'IDb':['newB', 'x']}
dict2 = {1:['IDa', 'IDb']}
for key in dict2.keys():
dict2[key] = [dict1[x][0] if x in dict1.keys() else x for x in dict2[key]]
print dict2
this will print:
{1: ['newA', 'newB']}
as required.
Explanation
dict.keys() gives us just the keys of a dictionary (i.e. just the left hand side of the colon). When we use for key in dict2.keys(), at present our only key is 1. If the dictionary was larger, it'd loop through all keys.
The following line uses a list comprehension - we know that dict2[key] gives us a list (the right side of the colon), so we loop through every element of the list (for x in dict2[key]) and return the first entry of the corresponding list in dict1 only if we can find the element in the keys of dict1 (dict1[x][0] if x in dict1.keys) and otherwise leave the element untouched ([else x]).
For example, if we changed our dictionaries to be the following:
dict1 = {'IDa':['newA', 'x'], 'IDb':['newB', 'x']}
dict2 = {1:['IDa', 'IDb'], 2:{'IDb', 'IDc'}}
we'd get the output:
{1: ['newA', 'newB'], 2: ['newB', 'IDc']}
because 'IDc' doesn't exist in the keys of dict1.
You could also use dictionary comprehensions, but I am not sure that they are working in Python 2.7, it may be limited to Python 3 :
# Python 3
dict2 = {k: [dict1.get(e, [e])[0] for e in v] for k,v in dict2.items()}
edit: I just checked, this is working in Python 2.7. However, dict2.items() should be replaced by dict2.iteritems() :
# Python 2.7
dict2 = {k: [dict1.get(e, [e])[0] for e in v] for k,v in dict2.iteritems()}
This was a fun one!
dict2[1] = [dict1[val][0] if val in dict1 else val for val in dict2[1]]
Or, here is the same logic without list comprehension:
new_dict = {1: []}
for val in dict2[1]:
if val in dict1:
new_dict[1].append(dict1[val][0])
else:
new_dict[1].append(val)
dict2 = new_dict
In Python 3, how to copy a key-value mapping from one dict to another, including remove if necessary? Here's some ugly code to do this:
if key in dict2:
dict1[key] = dict2[key]
elif key in dict1:
del dict1[key]
I'm hoping someone can reply with a cleaner and hopefully one-liner way to do this. (And I don't mean "just put that ugly code inside a function" because I don't want to add a hundred little functions to my code.) TIA.
UPDATE:
Since one comment asked for context, and others have tried to give answers that don't actually do what the question stated, I'm going to give an example context -- this isn't exactly what I'm trying to do, but it'll give you a good idea. I just wrote up this code quickly without testing, so hopefully it's not so wrong that I don't get the idea across. Note the ugly code I asked about originally is toward the bottom of this expanded example. TIA.
class TimestampedDict(dict):
def __init__(self):
self._ts = {} # from key to timestamp
def __setitem__(self, key, val):
super().__setitem__(key, val)
self._ts[key] = datetime.now()
def __delitem__(self, key):
super().__delitem__(key)
# When deleting, timestamp is updated by 1 microsecond so that it wins
# against the original but loses to any later changes.
self._ts[key] = self._ts[key] + timedelta(microseconds=1)
def update(self, other):
for key in other._ts:
if key not in self._ts or self._ts[key] < other._ts[key]:
# Other wins, so replace the mapping (including del if necessary).
if key in other:
self[key] = other[key]
elif key in self:
del self[key]
self._ts[key] = other._ts[key]
I might be misunderstanding, but you are looking to create update dict1 with the values from dict2 and remove those that are missing? In that case couldn't you simply set dict1 = dict2?
Otherwise if you are looking to combine two dictionaries and update shared keys, maybe the following will work:
dict1 = {1:2,3:4,5:6}
dict2 = {1:7}
dict(dict1.items() + dict2.items())
The addition of the items of the 2 dictionaries, will produce a new dictionary with all keys, and the values of the second dictionary if overlapping. However after rereading your question I believe you are looking for the former solution (dict1 = dict2).
Have you tried using update:
If this isn't what you want, can you please provide sample data and desired result?
d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'a': 10, 'c': 30, 'd': 40}
d1.update(d2)
>>> d1
{'a': 10, 'b': 2, 'c': 30, 'd': 40}
As I mentioned I don't think there is much to improve with your logic but I find it interesting to see how twisted one can get, so here's a really ugly way of doing it using update():
def update(other):
u = {k: v for k, v in other._ts.items() if k not in self._ts or self._ts[k] < v}
self._ts.update(u)
dict.update({k: other[k] for k in u
if k in other or (self.pop(k, 0) and False})
You are guaranteed (self.pop(k, 0) and False) evaluates to False but also has the side effect of removing k from self. This will only be evaluated if k in other evaluates to False.
>>> a = {1:1, 2:2, 3:3, 4:4}
>>> b = {1:2, 2:3}
>>> a.update({k: b[k] for k in [1,2,3] if k in b or (a.pop(k, 0) and False)})
>>> a
{1: 2, 2: 3, 4: 4}
I have 2 dictionaries.
dict1={('SAN RAMON', 'CA'): 1, ('UPLAND', 'CA'): 4, ('POUGHKEESIE', 'NY'): 3, ('CATTANOOGA', 'TN'): 1}
dict2={('UPLAND', 'CA'): 5223, ('PORT WASHING', 'WI'): 11174, ('PORT CLINTON', 'OH'): 6135, ('GRAIN VALLEY', 'MO'): 10352, ('GRAND JUNCTI', 'CO'): 49688, ('FAIRFIELD', 'IL'): 5165}
These are just samples, in reality each dict has hundreds of entries. I am trying to merge the two dictionaries and create dict 3 that contains {dict1.values(): dict2.values()} but only if that city appears in both dicts. So, one entry in dict3 would look like
{4:5223} # for 'UPLAND', 'CA' since it appears in both dict1 and dict2
This is just a small step in a larger function I am writing. I was going to try something like :
for item in dict1.keys():
if item not in dict2.keys():
del item
return dict[(dict1.keys())=(dict2.keys())]
I can't figure out how to make sure the number of complaints from dict1 matches the same city it is being referred to in dict2.
Here's what I think you want (demo):
dict3 = dict((dict1[key], dict2[key]) for key in dict1 if key in dict2)
Expanded a little, it looks like this:
dict3 = {}
for key in dict1:
if key in dict2:
dict3[dict1[key]] = dict2[key]
The common keys are:
set(dict1.keys()) & set(dict2.keys())
create dict 3 that contains {dict1.values(): dict2.values()}
This doesn't make sense, dictionaries are key-value pairs... what do you really want? Tip:
dict3 = {}
for k in set(dict1.keys()) & set(dict2.keys()):
dict3[dict1[k]]=dict2[k]
{4: 5223}
The straightforward way would be to check each key in one for membership in the other:
result = {}
for key in dict1:
if key in dict2:
result[dict1[key]] = dict2[key]
You could also try converting them into a set or frozenset and taking their intersection, but it's not clear to me whether that will be faster or not:
keys_in_both = frozenset(dict1) & frozenset(dict2)
result = dict((dict1[key], dict2[key]) for key in keys_in_both)