If i have 2 dictionaries x={'a':1,'b':2} and y={'a':1,'b':3}
and i want the output z={'a':2,'b':5}, is there a z=dict.add(x,y) function or should i convert both dictionaries into dataframes and then add them together with z=x.add(y)?
You could use Counter in this case for example:
from pprint import pprint
from collections import Counter
x={'a':1,'b':2}
y={'a':1,'b':3}
c = Counter()
c.update(x)
c.update(y)
pprint(dict(c))
Output:
{'a': 2, 'b': 5}
Or using +:
from pprint import pprint
from collections import Counter
x={'a':1,'b':2}
y={'a':1,'b':3}
pprint(dict(Counter(x) + Counter(y)))
collections.Counter is the natural method, but you can also use a dictionary comprehension after calculating the union of your dictionary keys:
x = {'a':1, 'b':2}
y = {'a':1, 'b':3}
dict_tup = (x, y)
keys = set().union(*dict_tup)
z = {k: sum(i.get(k, 0) for i in dict_tup) for k in keys}
print(z)
{'a': 2, 'b': 5}
Code:
from collections import Counter
x = {"a":1, "b":2}
y = {"a":1, "b":3}
c = Counter(x)
c += Counter(y)
z = dict(c)
print(z)
Output:
{'a': 2, 'b': 5}
Related
I have a string: 'AAAAATTT'
I want to write a program that would count each time 2 values are identical.
So in 'AAAAATTT' it would give a count of:
AA: 4
TT: 2
You can use collections.defaultdict for this. This is an O(n) complexity solution which loops through adjacent letters and builds a dictionary based on a condition.
Your output will be a dictionary with keys as repeated letters and values as counts.
The use of itertools.islice is to avoid building a new list for the second argument of zip.
from collections import defaultdict
from itertools import islice
x = 'AAAAATTT'
d = defaultdict(int)
for i, j in zip(x, islice(x, 1, None)):
if i == j:
d[i+j] += 1
Result:
print(d)
defaultdict(<class 'int'>, {'AA': 4, 'TT': 2}
You could use a Counter:
from collections import Counter
s = 'AAAAATTT'
print([(k*2, v - 1) for k, v in Counter(list(s)).items() if v > 1])
#output: [('AA', 4), ('TT', 2)]
You may use collections.Counter with dictionary comprehension and zip as:
>>> from collections import Counter
>>> s = 'AAAAATTT'
>>> {k: v for k, v in Counter(zip(s, s[1:])).items() if k[0]==k[1]}
{('A', 'A'): 4, ('T', 'T'): 2}
Here's another alternative to achieve this using itertools.groupby, but this one is not as clean as the above solution (also will be slow in terms of performance).
>>> from itertools import groupby
>>> {x[0]:len(x) for i,j in groupby(zip(s, s[1:]), lambda y: y[0]==y[1]) for x in (tuple(j),) if i}
{('A', 'A'): 4, ('T', 'T'): 2}
One way may be as following using Counter:
from collections import Counter
string = 'AAAAATTT'
result = dict(Counter(s1+s2 for s1, s2 in zip(string, string[1:]) if s1==s2))
print(result)
Result:
{'AA': 4, 'TT': 2}
You can try it with just range method without importing anything :
data='AAAAATTT'
count_dict={}
for i in range(0,len(data),1):
data_x=data[i:i+2]
if len(data_x)>1:
if data_x[0] == data_x[1]:
if data_x not in count_dict:
count_dict[data_x] = 1
else:
count_dict[data_x] += 1
print(count_dict)
output:
{'TT': 2, 'AA': 4}
Suppose I have the following dict:
L = {'A': {'root[1]': 'firstvalue', 'root[2]': 'secondvalue'}, 'B': {'root[3]': 'thirdvalue', 'root[4]': 'Fourthvalue'}}
How can I access the values of the keys root[1], root[2], root[3], root[4] (indexes of root[] is dynamic) in Python 2.7.
Try :
>>> L = {'A': {'root[1]': 'firstvalue', 'root[2]': 'secondvalue'}, 'B': {'root[3]': 'thirdvalue', 'root[4]': 'Fourthvalue'}}
>>> L['A']['root[1]']
'firstvalue'
>>> L['A']['root[2]']
'secondvalue'
>>> L['B']['root[3]']
'thirdvalue'
>>> L['B']['root[4]']
'Fourthvalue'
>>>
Something like this:
for (key, value) in L.items():
for (another_key, real_value) in value.items():
print(another_key, real_value)
To access the values from a dict nested inside the dict, i used the following step:
L = {'A': {'root[1]': 'firstvalue', 'root[2]': 'secondvalue'}, 'B': {'root[3]': 'thirdvalue', 'root[4]': 'Fourthvalue'}}
Solution:
F = {}
G = []
F = L.get("A", None)
F= {{'root[1]': 'firstvalue', 'root[2]': 'secondvalue'}}
for value in F.values():
G.append(value)
Output:
G = ['firstvalue', 'secondvalue']
I have two dictionaries mapping IDs to values. For simplicity, lets say those are the dictionaries:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
As named, the dictionaries are not symmetrical.
I would like to get a dictionary of keys from dictionaries d_source and d_target whose values match. The resulting dictionary would have d_source keys as its own keys, and d_target keys as that keys value (in either a list, tuple or set format).
This would be The expected returned value for the above example should be the following list:
{'a': ('1', 'A'),
'b': ('B',),
'c': ('C',),
'3': ('C',)}
There are two somewhat similar questions, but those solutions can't be easily applied to my question.
Some characteristics of the data:
Source would usually be smaller than target. Having roughly few thousand sources (tops) and a magnitude more targets.
Duplicates in the same dict (both d_source and d_target) are not too likely on values.
matches are expected to be found for (a rough estimate) not more than 50% than d_source items.
All keys are integers.
What is the best (performance wise) solution to this problem?
Modeling data into other datatypes for improved performance is totally ok, even when using third party libraries (i'm thinking numpy)
All answers have O(n^2) efficiency which isn't very good so I thought of answering myself.
I use 2(source_len) + 2(dict_count)(dict_len) memory and I have O(2n) efficiency which is the best you can get here I believe.
Here you go:
from collections import defaultdict
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def merge_dicts(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = merge_dicts(d_source, d_target)
By the way, I'm using a tuple in order not to link the resulting lists together.
As you've added specifications for the data, here's a closer matching solution:
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
def second_merge_dicts(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
new_dict = second_merge_dicts(d_source, d_target)
from collections import defaultdict
from pprint import pprint
d_source = {'a': 1, 'b': 2, 'c': 3, '3': 3}
d_target = {'A': 1, 'B': 2, 'C': 3, '1': 1}
d_result = defaultdict(list)
{d_result[a].append(b) for a in d_source for b in d_target if d_source[a] == d_target[b]}
pprint(d_result)
Output:
{'3': ['C'],
'a': ['A', '1'],
'b': ['B'],
'c': ['C']}
Timing results:
from collections import defaultdict
from copy import deepcopy
from random import randint
from timeit import timeit
def Craig_match(source, target):
result = defaultdict(list)
{result[a].append(b) for a in source for b in target if source[a] == target[b]}
return result
def Bharel_match(source_dict, *rest):
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
def modified_Bharel_match(source_dict, *rest):
"""Optimized for ~50% source match due to if statement addition.
Also uses less memory.
"""
unique_values = set(source_dict.values())
flipped_rest = defaultdict(list)
for d in rest:
while d:
k, v = d.popitem()
if v in unique_values:
flipped_rest[v].append(k)
return {k: tuple(flipped_rest.get(v, ())) for k, v in source_dict.items()}
# generate source, target such that:
# a) ~10% duplicate values in source and target
# b) 2000 unique source keys, 20000 unique target keys
# c) a little less than 50% matches source value to target value
# d) numeric keys and values
source = {}
for k in range(2000):
source[k] = randint(0, 1800)
target = {}
for k in range(20000):
if k < 1000:
target[k] = randint(0, 2000)
else:
target[k] = randint(2000, 19000)
best_time = {}
approaches = ('Craig', 'Bharel', 'modified_Bharel')
for a in approaches:
best_time[a] = None
for _ in range(3):
for approach in approaches:
test_source = deepcopy(source)
test_target = deepcopy(target)
statement = 'd=' + approach + '_match(test_source,test_target)'
setup = 'from __main__ import test_source, test_target, ' + approach + '_match'
t = timeit(stmt=statement, setup=setup, number=1)
if not best_time[approach] or (t < best_time[approach]):
best_time[approach] = t
for approach in approaches:
print(approach, ':', '%0.5f' % best_time[approach])
Output:
Craig : 7.29259
Bharel : 0.01587
modified_Bharel : 0.00682
Here is another solution. There are a lot of ways to do this
for key1 in d1:
for key2 in d2:
if d1[key1] == d2[key2]:
stuff
Note that you can use any name for key1 and key2.
This maybe "cheating" in some regards, although if you are looking for the matching values of the keys regardless of the case sensitivity then you might be able to do:
import sets
aa = {'a': 1, 'b': 2, 'c':3}
bb = {'A': 1, 'B': 2, 'd': 3}
bbl = {k.lower():v for k,v in bb.items()}
result = {k:k.upper() for k,v in aa.iteritems() & bbl.viewitems()}
print( result )
Output:
{'a': 'A', 'b': 'B'}
The bbl declaration changes the bb keys into lowercase (it could be either aa, or bb).
* I only tested this on my phone, so just throwing this idea out there I suppose... Also, you've changed your question radically since I began composing my answer, so you get what you get.
It is up to you to determine the best solution. Here is a solution:
def dicts_to_tuples(*dicts):
result = {}
for d in dicts:
for k,v in d.items():
result.setdefault(v, []).append(k)
return [tuple(v) for v in result.values() if len(v) > 1]
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'A': 1, 'B': 2}
print dicts_to_tuples(d1, d2)
I have N dictionaries that contain the same keys, with values that are integers. I want to merge these into a single dictionary based on the maximum value. Currently I have something like this:
max_dict = {}
for dict in original_dict_list:
for key, val in dict.iteritems():
if key not in max_dict or max_dict[key] < val:
max_dict[key] = val
Is there a better (or more "pythonic") way of doing this?
Use collection.Counter() objects instead, they support 'merging' counts natively:
from collections import Counter
max_dict = Counter()
for d in original_dict_list:
max_dict |= Counter(d)
or even:
from collections import Counter
from operator import or_
max_dict = reduce(or_, map(Counter, original_dict_list))
Counter objects are multi-sets (also called 'bags' sometimes). The | operator performs a union on two counters, storing the maximum count for a given key.
A Counter is also a straight subclass of dict, so you can (mostly) treat it like any other dictionary.
Demo:
>>> from collections import Counter
>>> from operator import or_
>>> original_dict_list = [{'foo': 3, 'bar': 10}, {'foo': 42, 'spam': 20}, {'bar': 5, 'ham': 10}]
>>> reduce(or_, map(Counter, original_dict_list))
Counter({'foo': 42, 'spam': 20, 'bar': 10, 'ham': 10})
Assuming not all dictionaries contain all keys:
keys = set(k for x in original_dict_list for k in x)
max_dict = {k:max([x[k] for x in original_dict_list if k in x]) for k in keys}
max_dict = { k:max([v]+[ _dict[k] for _dict in original_dict_list[1:] ])
for k,v in original_dict_list[0].items() }
In Python,
I have list of dicts:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
I want one final dict that will contain the sum of all dicts.
I.e the result will be: {'a':5, 'b':7}
N.B: every dict in the list will contain same number of key, value pairs.
You can use the collections.Counter
counter = collections.Counter()
for d in dict1:
counter.update(d)
Or, if you prefer oneliners:
functools.reduce(operator.add, map(collections.Counter, dict1))
A little ugly, but a one-liner:
dictf = reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), dict1)
Leveraging sum() should get better performance when adding more than a few dicts
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> from operator import itemgetter
>>> {k:sum(map(itemgetter(k), dict1)) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k,sum(map(itemgetter(k), dict1))) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
adding Stephan's suggestion
>>> {k: sum(d[k] for d in dict1) for k in dict1[0]} # Python2.7+
{'a': 5, 'b': 7}
>>> dict((k, sum(d[k] for d in dict1)) for k in dict1[0]) # Python2.6
{'a': 5, 'b': 7}
I think Stephan's version of the Python2.7 code reads really nicely
This might help:
def sum_dict(d1, d2):
for key, value in d1.items():
d1[key] = value + d2.get(key, 0)
return d1
>>> dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
>>> reduce(sum_dict, dict1)
{'a': 5, 'b': 7}
The following code shows one way to do it:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for k in dict1[0].keys(): # Init all elements to zero.
final[k] = 0
for d in dict1:
for k in d.keys():
final[k] = final[k] + d[k] # Update the element.
print final
This outputs:
{'a': 5, 'b': 7}
as you desired.
Or, as inspired by kriss, better but still readable:
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
final = {}
for d in dict1:
for k in d.keys():
final[k] = final.get(k,0) + d[k]
print final
I pine for the days of the original, readable Python :-)
I was interested in the performance of the proposed Counter, reduce and sum methods for large lists. Maybe someone else is interested in this as well.
You can have a look here: https://gist.github.com/torstenrudolf/277e98df296f23ff921c
I tested the three methods for this list of dictionaries:
dictList = [{'a': x, 'b': 2*x, 'c': x**2} for x in xrange(10000)]
the sum method showed the best performance, followed by reduce and Counter was the slowest. The time showed below is in seconds.
In [34]: test(dictList)
Out[34]:
{'counter': 0.01955194902420044,
'reduce': 0.006518083095550537,
'sum': 0.0018319153785705566}
But this is dependent on the number of elements in the dictionaries. the sum method will slow down faster than the reduce.
l = [{y: x*y for y in xrange(100)} for x in xrange(10000)]
In [37]: test(l, num=100)
Out[37]:
{'counter': 0.2401433277130127,
'reduce': 0.11110662937164306,
'sum': 0.2256883692741394}
You can also use the pandas sum function to compute the sum:
import pandas as pd
# create a DataFrame
df = pd.DataFrame(dict1)
# compute the sum and convert to dict.
dict(df.sum())
This results in:
{'a': 5, 'b': 7}
It also works for floating points:
dict2 = [{'a':2, 'b':3.3},{'a':3, 'b':4.5}]
dict(pd.DataFrame(dict2).sum())
Gives the correct results:
{'a': 5.0, 'b': 7.8}
In Python 2.7 you can replace the dict with a collections.Counter object. This supports addition and subtraction of Counters.
Here is a reasonable beatiful one.
final = {}
for k in dict1[0].Keys():
final[k] = sum(x[k] for x in dict1)
return final
Here is another working solution (python3), quite general as it works for dict, lists, arrays. For non-common elements, the original value will be included in the output dict.
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
dict1 = [{'a':2, 'b':3},{'a':3, 'b':4}]
print(mergsum(dict1[0], dict1[1]))
One further one line solution
dict(
functools.reduce(
lambda x, y: x.update(y) or x, # update, returns None, and we need to chain.
dict1,
collections.Counter())
)
This creates only one counter, uses it as an accumulator and finally converts back to a dict.