Combine two dicts and replace missing values [duplicate] - python

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 6 days ago.
I am looking to combine two dictionaries by grouping elements that share common keys, but I would also like to account for keys that are not shared between the two dictionaries. For instance given the following two dictionaries.
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
The intended code would output
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[0,44] ,'e':[5,0]}
Or some array like:
df = [[a,1,11] , [b,2,22] , [c,3,33] , [d,0,44] , [e,5,0]]
The fact that I used 0 specifically to denote an entry not existing is not important per se. Just any character to denote the missing value.
I have tried using the following code
df = defaultdict(list)
for d in (d1, d2):
for key, value in d.items():
df[key].append(value)
But get the following result:
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[44] ,'e':[5]}
Which does not tell me which dict was missing the entry.
I could go back and look through both of them, but was looking for a more elegant solution

You can use a dict comprehension like so:
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
res = {k: [d1.get(k, 0), d2.get(k, 0)] for k in set(d1).union(d2)}
print(res)

Another solution:
d1 = {"a": 1, "b": 2, "c": 3, "e": 5}
d2 = {"a": 11, "b": 22, "c": 33, "d": 44}
df = [[k, d1.get(k, 0), d2.get(k, 0)] for k in sorted(d1.keys() | d2.keys())]
print(df)
Prints:
[['a', 1, 11], ['b', 2, 22], ['c', 3, 33], ['d', 0, 44], ['e', 5, 0]]
If you do not want sorted results, leave the sorted() out.

Related

How to returns a list of all values corresponding to keys greater than x in the dictionary [duplicate]

This question already has answers here:
Iterating over dictionaries using 'for' loops
(15 answers)
How to filter a dictionary according to an arbitrary condition function?
(7 answers)
Closed 12 months ago.
I need to use for For loop to find the return the list of values in a dictionary greater than x.
d= {}
for key in d():
if key > x:
return(d(key))
d = dict(a=1, b=10, c=30, d=2)
>>> d
{'a': 1, 'c': 30, 'b': 10, 'd': 2}
d = dict((k, v) for k, v in d.items() if v >= 10)
>>> d
{'c': 30, 'b': 10}
values_list = list(d.values())
>>> values_list
[30, 10]
We hold greater_than_x list, and append the values in d dictionary if it's bigger than the given x.
x = 20
greater_than_x = []
d = {"a": 10, "b": 20, "c": 30}
for value in d.values():
if value > x:
greater_than_x.append(value)
print(greater_than_x)
>[30]
One-liner applying the same logic:
x = 20
d = {"a": 10, "b": 20, "c": 30}
greater_than_x = [value for value in d.values() if value > x]
print(greater_than_x)
>[30]

Prefer a key by max-value in dictionary?

You can get the key with max value in dictionary this way max(d, key=d.get).
The question when two or more keys have the max how can you set a preferred key.
I found a way to do this by perpending the key with a number.
Is there a better way ?
In [56]: d = {'1a' : 5, '2b' : 1, '3c' : 5 }
In [57]: max(d, key=d.get)
Out[57]: '1a'
In [58]: d = {'4a' : 5, '2b' : 1, '3c' : 5 }
In [59]: max(d, key=d.get)
Out[59]: '3c'
The function given in the key argument can return a tuple. The second element of the tuple will be used if there are several maximums for the first element. With that, you can use the method you want, for example with two dictionnaries:
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference = {'a': 1, 'b': 2, 'c': 3}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'c'
d_preference = {'a': 3, 'b': 2, 'c': 1}
max(d, key=lambda key: (d[key], d_preference[key]))
# >> 'a'
This is a similar idea to #AxelPuig's solution. But, instead of relying on an auxiliary dictionary each time you wish to retrieve an item with max or min value, you can perform a single sort and utilise collections.OrderedDict:
from collections import OrderedDict
d = {'a' : 5, 'b' : 1, 'c' : 5 }
d_preference1 = {'a': 1, 'b': 2, 'c': 3}
d_preference2 = {'a': 3, 'b': 2, 'c': 1}
d1 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference1[x[0]]))
d2 = OrderedDict(sorted(d.items(), key=lambda x: -d_preference2[x[0]]))
max(d1, key=d.get) # c
max(d2, key=d.get) # a
Since OrderedDict is a subclass of dict, there's generally no need to convert to a regular dict. If you are using Python 3.7+, you can use the regular dict constructor, since dictionaries are insertion ordered.
As noted on the docs for max:
If multiple items are maximal, the function returns the first one
encountered.
A slight variation on #AxelPuig's answer. You fix an order of keys in a priorities list and take the max with key=d.get.
d = {"1a": 5, "2b": 1, "3c": 5}
priorities = list(d.keys())
print(max(priorities, key=d.get))

Python 3.6: create new dict using values from another as indices

In Python 3.6.3, I have the following dict D1:
D1 = {0: array([1, 2, 3], dtype=int64), 1: array([0,4], dtype=int64)}
Each value inside the array is the index of the key of another dict D2:
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
I am trying to create a third dict, D3, with the same keys as D1, and as values the keys of D2 corresponding to the indices of D1.values().
The result I am aiming for is:
D3 = {0: ['Mike','Tim','Paul'], 1: ['Jack','Tommy']}
My approach is partial in that I struggle to figure out how to tell D3 to get the keys from D1 and the values from D2. I am not too sure about that and. Any ideas?
D3 = {key:list(D1.values())[v] for key in D1.keys() and v in D2[v]}
You could use a dict-comprehension like so:
from numpy import array
D1 = {0: array([1, 2, 3]), 1: array([0,4])}
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
temp = dict(zip(D2.values(), D2.keys())) # inverting key-value pairs
D3 = {k: [temp.get(i+1, 'N\A') for i in v] for k, v in D1.items()}
which results in:
{0: ['Mike', 'Tim', 'Paul'], 1: ['Jack', 'Tommy']}
If you're using Python 3.6+ you can use enumerate to create a dict to look up the names in D2 by index, and then map the indices in D1 to it:
r = dict(enumerate(D2))
D3 = {k: list(map(r.get, v)) for k, v in D1.items()}
D3 would become:
{0: ['Mike', 'Tim', 'Paul'], 1: ['Jack', 'Tommy']}
This is untested, but I believe this should get you headed in the right direction. I find it helpful sometimes to break out a complicated one-liner into multiple lines
D3={}
for d1k,d1v in D1.items():
D3[d1k] = []
for idx in d1v:
D3[d1k].append(D2[idx])
Might not be the best solution but works
D3={}
for key in D1.keys():
value_list=D1.get(key)
value_list= [(lambda x: x+1)(x) for x in value_list]
temp=[]
for d2_key,value in D2.items():
if value in value_list:
temp.append(d2_key)
D3[key]=temp
Output:
{0: ['Tim', 'Mike', 'Paul'], 1: ['Jack', 'Tommy']}
Here you go!
D1 = {0:[1, 2, 3], 1: [0,4]}
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
D2_inverted = {v: k for k, v in D2.iteritems()}
D3={}
for key in D1:
temp = []
for value in D1[key]:
temp.append(D2_inv[value+1])
D3[key] = temp
print D3
Iterate the keys from D1;
Create a temporary list to store the values you wish to assign to the new dict, and fill it with the desired values from D2. (inverted its keys and values for simplicity);
Assign to D3.

Sum values of similar keys inside two nested dictionary in python

I have nested dictionary like this:
data = {
"2010":{
'A':2,
'B':3,
'C':5,
'D':-18,
},
"2011":{
'A':1,
'B':2,
'C':3,
'D':1,
},
"2012":{
'A':1,
'B':2,
'C':4,
'D':2
}
}
In my case, i need to sum all values based on its similar keys in every year, from 2010 till 2012..
So the result i expected should be like this:
data = {'A':4,'B':7, 'C':12, 'D':-15}
You can use collections.Counter() (works only for positive values!):
In [17]: from collections import Counter
In [18]: sum((Counter(d) for d in data.values()), Counter())
Out[18]: Counter({'C': 12, 'B': 7, 'A': 4, 'D': 3})
Note that based on python documentation Counter is designed only for use cases with positive values:
The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So if you want to get a comprehensive result you can do the summation manually. The collections.defaultdict() is a good way for getting around this problem:
In [28]: from collections import defaultdict
In [29]: d = defaultdict(int)
In [30]: for sub in data.values():
....: for i, j in sub.items():
....: d[i] += j
....:
In [31]: d
Out[31]: defaultdict(<class 'int'>, {'D': -15, 'A': 4, 'C': 12, 'B': 7})
Try this,
reduce(lambda x, y: dict((k, v + y[k]) for k, v in x.iteritems()), data.values())
Result
{'A': 4, 'B': 7, 'C': 12, 'D': -15}

Dictionary Containing list data, filter based on value in list

I have test data which is gathered based on multiple inputs, and results in a single output. I'm currently storing this data in a dictionary whose keys are my parameter/ results labels, and whose values are the test conditions and results. I would like to be able to filter the data so I can generate plots based on isolated conditions.
In my example below, my test conditions would be 'a' and 'b', and the result of the experiment would be 'c'. I want to filter my data so I get a dictionary with the same key, value structure and only my filtered results. However my current dictionary comprehension returns an empty dictionary. Any advice to get the desired result?
Current Code:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filtered_data = {k:v for k,v in data.iteritems() if v in data['b'] >= 20}
Desired Result:
{'a': [0, 1, 2], 'b': [20, 20, 20], 'c': [2.3, 2.9, 3.4]}
Current Result:
{}
Also, is this dictionary of lists a good schema to store data of this type, given that I'm going to want to filter the results, or is there a better way to accomplish this?
use this:
k:[v[i] for i,x in enumerate(v) if data['b'][i] >= 20] for k,v in data.items()}
Desired Result:
{'a': [0, 1, 2], 'c': [2.3, 2.9, 3.4], 'b': [20, 20, 20]}
Consider using the pandas module for this type of work.
import pandas as pd
df = pd.DataFrame(data)
df = df[df["b"] >= 20]
print(df)
It appears like this will give you what you want. You are using the dictionary key to represent the column name and the values are just rows in a given column, so it is amenable to using a dataframe.
Result:
a b c
3 0 20 2.3
4 1 20 2.9
5 2 20 3.4
Are all of the dictionary value lists in matching orders? If so, you could just look at whichever list you want to filter by, say 'b' in this case, find the values you want, and then either use those indices or the same slice on the other values in the dictionary.
For example:
matching_indices = []
for i in data['b']:
if data['b'][i] >= 20:
matching_indices.append(i)
new_dict = {}
for key in data:
for item in matching_indices:
new_dict[key] = data[key][item]
You could probably figure a dictionary comprehension for it if you wanted. Hopefully this is clear.
you can change this into a method which would give it more flexibility. Your current logic means that dataset a and c are neglected because there are no values greater than or equal to 20:
data = {'a': [0, 1, 2, 0, 1, 2], 'b': [10, 10, 10, 20, 20, 20], 'c': [1.3, 1.9, 2.3, 2.3, 2.9, 3.4]}
filter_vals = ['a', 'b']
new_d = {}
for k, v in data.iteritems():
if k in filter_vals:
new_d[k] = [i for i in v if i >= 20]
print new_d
Now i'm not a big fan if many if statements, but something like this is straight forward and can be called many times
def my_filter(operator, condition, filter_vals, my_dict):
new_d = {}
for k, v in my_dict.iteritems():
if k in filter_vals:
if operator == '>':
new_d[k] = [i for i in v if i > condition]
elif operator == '<':
new_d[k] = [i for i in v if i < condition]
elif operator == '<=':
new_d[k] = [i for i in v if i <= condition]
elif operator == '>=':
new_d[k] = [i for i in v if i >= condition]
return new_d
I agree with the pandas approach above.
If for some reason you hate pandas or are an old school computer scientist, tuples are a good way to tore relational data. In your example, the a, b, and c lists are columns rather than rows. For tuples, you would want to store the rows as:
data = {'a':(0,10,1.3),'b':(1,10,1.9),'c':(2,10,2.3),'d':(0,20,2.3),'e':(1,20,2.9),'f':(2,20,3.4)}
where the tuples are stored in the (condition1, condition2, outcome) format you described and you can call a single test or filter a set as you describe. From there you can get a filtered set of results as follows:
filtered_data = {k:v for k,v in data.iteritems() if v[1]>=20}
which returns:
{'d': (0, 20, 2.3), 'e': (1, 20, 2.9), 'f': (2, 20, 3.4)}

Categories

Resources