Union of dict objects in Python [duplicate] - python

This question already has answers here:
How do I merge two dictionaries in a single expression in Python?
(43 answers)
Closed 10 years ago.
How do you calculate the union of two dict objects in Python, where a (key, value) pair is present in the result iff key is in either dict (unless there are duplicates)?
For example, the union of {'a' : 0, 'b' : 1} and {'c' : 2} is {'a' : 0, 'b' : 1, 'c' : 2}.
Preferably you can do this without modifying either input dict. Example of where this is useful: Get a dict of all variables currently in scope and their values

This question provides an idiom. You use one of the dicts as keyword arguments to the dict() constructor:
dict(y, **x)
Duplicates are resolved in favor of the value in x; for example
dict({'a' : 'y[a]'}, **{'a', 'x[a]'}) == {'a' : 'x[a]'}

You can also use update method of dict like
a = {'a' : 0, 'b' : 1}
b = {'c' : 2}
a.update(b)
print a

For a static dictionary, combining snapshots of other dicts:
As of Python 3.9, the binary "or" operator | has been defined to concatenate dictionaries. (A new, concrete dictionary is eagerly created):
>>> a = {"a":1}
>>> b = {"b":2}
>>> a|b
{'a': 1, 'b': 2}
Conversely, the |= augmented assignment has been implemented to mean the same as calling the update method:
>>> a = {"a":1}
>>> a |= {"b": 2}
>>> a
{'a': 1, 'b': 2}
For details, check PEP-584
Prior to Python 3.9, the simpler way to create a new dictionary is to create a new dictionary using the "star expansion" to add teh contents of each subctionary in place:
c = {**a, **b}
For dynamic dictionary combination, working as "view" to combined, live dicts:
If you need both dicts to remain independent, and updatable, you can create a single object that queries both dictionaries in its __getitem__ method (and implement get, __contains__ and other mapping method as you need them).
A minimalist example could be like this:
class UDict(object):
def __init__(self, d1, d2):
self.d1, self.d2 = d1, d2
def __getitem__(self, item):
if item in self.d1:
return self.d1[item]
return self.d2[item]
And it works:
>>> a = UDict({1:1}, {2:2})
>>> a[2]
2
>>> a[1]
1
>>> a[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in __getitem__
KeyError: 3
>>>
NB: If one wants to lazily maintain a Union "view" of two
or more dictionaries, check collections.ChainMap in the standard library - as it has all dictionary methods and cover corner cases not
contemplated in the example above.

Two dictionaries
def union2(dict1, dict2):
return dict(list(dict1.items()) + list(dict2.items()))
n dictionaries
def union(*dicts):
return dict(itertools.chain.from_iterable(dct.items() for dct in dicts))

Related

Is it possible for a key to have multiple names in a dictionary?

I'm not sure if this is even possible but it's worth a shot asking.
I want to be able to access the value from indexing one of the values.
The first thing that came to mind was this but of course, it didn't work.
dict = {['name1', 'name2'] : 'value1'}
print(dict.get('name1))
You can use a tuple (as it's immutable) as a dict key if you need to access it by a pair (or more) of strings (or other immutable values):
>>> d = {}
>>> d[("foo", "bar")] = 6
>>> d[("foo", "baz")] = 8
>>> d
{('foo', 'bar'): 6, ('foo', 'baz'): 8}
>>> d[("foo", "baz")]
8
>>>
This isn't "a key having multiple names", though, it's just a key that happens to be built of multiple strings.
Edit
As discussed in the comments, the end goal is to have multiple keys for each (static) value. That can be succinctly accomplished with an inverted dict first, which is then "flipped" using dict.fromkeys():
def foobar():
pass
def spameggs():
pass
func_to_names = {
foobar: ("foo", "bar", "fb", "foobar"),
spameggs: ("spam", "eggs", "se", "breakfast"),
}
name_to_func = {}
for func, names in func_to_names.items():
name_to_func.update(dict.fromkeys(names, func))
If we tried it you way using:
# Creating a dictionary
myDict = {[1, 2]: 'Names'}
print(myDict)
We get an output of:
TypeError: unhashable type: 'list'
To get around this, we can use this method:
# Creating an empty dictionary
myDict = {}
# Adding list as value
myDict["key1"] = [1, 2]
myDict["key2"] = ["Jim", "Jeff", "Jack"]
print(myDict)

Is it possible to "unpack" a dict in one call?

I was looking for a way to "unpack" a dictionary in a generic way and found a relevant question (and answers) which explained various techniques (TL;DR: it is not too elegant).
That question, however, addresses the case where the keys of the dict are not known, the OP anted to have them added to the local namespace automatically.
My problem is possibly simpler: I get a dict from a function and would like to dissecate it on the fly, knowing the keys I will need (I may not need all of them every time). Right now I can only do
def myfunc():
return {'a': 1, 'b': 2, 'c': 3}
x = myfunc()
a = x['a']
my_b_so_that_the_name_differs_from_the_key = x['b']
# I do not need c this time
while I was looking for the equivalent of
def myotherfunc():
return 1, 2
a, b = myotherfunc()
but for a dict (which is what is returned by my function). I do not want to use the latter solution for several reasons, one of them being that it is not obvious which variable corresponds to which returned element (the first solution has at least the merit of being readable).
Is such operation available?
If you really must, you can use an operator.itemgetter() object to extract values for multiple keys as a tuple:
from operator import itemgetter
a, b = itemgetter('a', 'b')(myfunc())
This is still not pretty; I'd prefer the explicit and readable separate lines where you first assign the return value, then extract those values.
Demo:
>>> from operator import itemgetter
>>> def myfunc():
... return {'a': 1, 'b': 2, 'c': 3}
...
>>> itemgetter('a', 'b')(myfunc())
(1, 2)
>>> a, b = itemgetter('a', 'b')(myfunc())
>>> a
1
>>> b
2
You could also use map:
def myfunc():
return {'a': 1, 'b': 2, 'c': 3}
a,b = map(myfunc().get,["a","b"])
print(a,b)
In addition to the operator.itemgetter() method, you can also write your own myotherfunc(). It takes list of the required keys as an argument and returns a tuple of their corresponding value.
def myotherfunc(keys_list):
reference_dict = myfunc()
return tuple(reference_dict[key] for key in keys_list)
>>> a,b = myotherfunc(['a','b'])
>>> a
1
>>> b
2
>>> a,c = myotherfunc(['a','c'])
>>> a
1
>>> c
3

python SyntaxError with dict(1=...), but {1:...} works

Python seems to have an inconsistency in what kind of keys it will accept for dicts. Or, put another way, it allows certain kinds of keys in one way of defining dicts, but not in others:
>>> d = {1:"one",2:2}
>>> d[1]
'one'
>>> e = dict(1="one",2=2)
File "<stdin>", line 1
SyntaxError: keyword can't be an expression
Is the {...} notation more fundamental, and dict(...) just syntactic sugar? Is it because there is simply no way for Python to parse dict(1="one")?
I'm curious...
This is not a dict issue, but an artifact of Python syntax: keyword arguments must be valid identifiers, and 1 and 2 are not.
When you want to use anything that is not a string following Python identifier rules as a key, use the {} syntax. The constructor keyword argument syntax is just there for convenience in some special cases.
dict is a function call, and function keywords must be identifiers.
As other answer have stated, dict is a function call. It has three syntactic forms.
The form:
dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
The keys (or name as used in this case) must be valid Python identifiers, and ints are not valid.
The limitation is not only the function dict You can demonstrate it like so:
>>> def f(**kw): pass
...
>>> f(one=1) # this is OK
>>> f(1=one) # this is not
File "<stdin>", line 1
SyntaxError: keyword can't be an expression
However, there are two other syntactic forms of you can use.
There is:
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v
Example:
>>> dict([(1,'one'),(2,2)])
{1: 'one', 2: 2}
And from a mapping:
dict(mapping) -> new dictionary initialized from a mapping object's
(key, value) pairs
Example:
>>> dict({1:'one',2:2})
{1: 'one', 2: 2}
While that may not seem like much (a dict from a dict literal) keep in mind that Counter and defaultdict are mappings and this is how you would covert one of those to a dict:
>>> from collections import Counter
>>> Counter('aaaaabbbcdeffff')
Counter({'a': 5, 'f': 4, 'b': 3, 'c': 1, 'e': 1, 'd': 1})
>>> dict(Counter('aaaaabbbcdeffff'))
{'a': 5, 'c': 1, 'b': 3, 'e': 1, 'd': 1, 'f': 4}
If you read the documentation, you will learn that the dict = {stringA = 1, stringB = 2} notation is valid when the keys are simple strings:
When the keys are simple strings, it is sometimes easier to specify
pairs using keyword arguments:
>>>
>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}
Since integers (or other numbers) are not valid keyword arguments, the dict = {1 = 2, 3 = 4} will fail as any call to a function would if you passed an argument to it while naming it with a number:
>>> def test(**kwargs):
... for arg in kwargs:
... print arg, kwargs[arg]
...
>>> test(a=2,b=3)
a 2
b 3
>>> test(1=2, 3=4)
File "<stdin>", line 1
SyntaxError: keyword can't be an expression

Destructuring-bind dictionary contents

I am trying to 'destructure' a dictionary and associate values with variables names after its keys. Something like
params = {'a':1,'b':2}
a,b = params.values()
But since dictionaries are not ordered, there is no guarantee that params.values() will return values in the order of (a, b). Is there a nice way to do this?
from operator import itemgetter
params = {'a': 1, 'b': 2}
a, b = itemgetter('a', 'b')(params)
Instead of elaborate lambda functions or dictionary comprehension, may as well use a built in library.
One way to do this with less repetition than Jochen's suggestion is with a helper function. This gives the flexibility to list your variable names in any order and only destructure a subset of what is in the dict:
pluck = lambda dict, *args: (dict[arg] for arg in args)
things = {'blah': 'bleh', 'foo': 'bar'}
foo, blah = pluck(things, 'foo', 'blah')
Also, instead of joaquin's OrderedDict you could sort the keys and get the values. The only catches are you need to specify your variable names in alphabetical order and destructure everything in the dict:
sorted_vals = lambda dict: (t[1] for t in sorted(dict.items()))
things = {'foo': 'bar', 'blah': 'bleh'}
blah, foo = sorted_vals(things)
How come nobody posted the simplest approach?
params = {'a':1,'b':2}
a, b = params['a'], params['b']
Python is only able to "destructure" sequences, not dictionaries. So, to write what you want, you will have to map the needed entries to a proper sequence. As of myself, the closest match I could find is the (not very sexy):
a,b = [d[k] for k in ('a','b')]
This works with generators too:
a,b = (d[k] for k in ('a','b'))
Here is a full example:
>>> d = dict(a=1,b=2,c=3)
>>> d
{'a': 1, 'c': 3, 'b': 2}
>>> a, b = [d[k] for k in ('a','b')]
>>> a
1
>>> b
2
>>> a, b = (d[k] for k in ('a','b'))
>>> a
1
>>> b
2
Here's another way to do it similarly to how a destructuring assignment works in JS:
params = {'b': 2, 'a': 1}
a, b, rest = (lambda a, b, **rest: (a, b, rest))(**params)
What we did was to unpack the params dictionary into key values (using **) (like in Jochen's answer), then we've taken those values in the lambda signature and assigned them according to the key name - and here's a bonus - we also get a dictionary of whatever is not in the lambda's signature so if you had:
params = {'b': 2, 'a': 1, 'c': 3}
a, b, rest = (lambda a, b, **rest: (a, b, rest))(**params)
After the lambda has been applied, the rest variable will now contain:
{'c': 3}
Useful for omitting unneeded keys from a dictionary.
Hope this helps.
Maybe you really want to do something like this?
def some_func(a, b):
print a,b
params = {'a':1,'b':2}
some_func(**params) # equiv to some_func(a=1, b=2)
If you are afraid of the issues involved in the use of the locals dictionary and you prefer to follow your original strategy, Ordered Dictionaries from python 2.7 and 3.1 collections.OrderedDicts allows you to recover you dictionary items in the order in which they were first inserted
(Ab)using the import system
The from ... import statement lets us desctructure and bind attribute names of an object. Of course, it only works for objects in the sys.modules dictionary, so one could use a hack like this:
import sys, types
mydict = {'a':1,'b':2}
sys.modules["mydict"] = types.SimpleNamespace(**mydict)
from mydict import a, b
A somewhat more serious hack would be to write a context manager to load and unload the module:
with obj_as_module(mydict, "mydict_module"):
from mydict_module import a, b
By pointing the __getattr__ method of the module directly to the __getitem__ method of the dict, the context manager can also avoid using SimpleNamespace(**mydict).
See this answer for an implementation and some extensions of the idea.
One can also temporarily replace the entire sys.modules dict with the dict of interest, and do import a, b without from.
Warning 1: as stated in the docs, this is not guaranteed to work on all Python implementations:
CPython implementation detail: This function relies on Python stack frame support
in the interpreter, which isn’t guaranteed to exist in all implementations
of Python. If running in an implementation without Python stack frame support
this function returns None.
Warning 2: this function does make the code shorter, but it probably contradicts the Python philosophy of being as explicit as you can. Moreover, it doesn't address the issues pointed out by John Christopher Jones in the comments, although you could make a similar function that works with attributes instead of keys. This is just a demonstration that you can do that if you really want to!
def destructure(dict_):
if not isinstance(dict_, dict):
raise TypeError(f"{dict_} is not a dict")
# the parent frame will contain the information about
# the current line
parent_frame = inspect.currentframe().f_back
# so we extract that line (by default the code context
# only contains the current line)
(line,) = inspect.getframeinfo(parent_frame).code_context
# "hello, key = destructure(my_dict)"
# -> ("hello, key ", "=", " destructure(my_dict)")
lvalues, _equals, _rvalue = line.strip().partition("=")
# -> ["hello", "key"]
keys = [s.strip() for s in lvalues.split(",") if s.strip()]
if missing := [key for key in keys if key not in dict_]:
raise KeyError(*missing)
for key in keys:
yield dict_[key]
In [5]: my_dict = {"hello": "world", "123": "456", "key": "value"}
In [6]: hello, key = destructure(my_dict)
In [7]: hello
Out[7]: 'world'
In [8]: key
Out[8]: 'value'
This solution allows you to pick some of the keys, not all, like in JavaScript. It's also safe for user-provided dictionaries
With Python 3.10, you can do:
d = {"a": 1, "b": 2}
match d:
case {"a": a, "b": b}:
print(f"A is {a} and b is {b}")
but it adds two extra levels of indentation, and you still have to repeat the key names.
Look for other answers as this won't cater to the unexpected order in the dictionary. will update this with a correct version sometime soon.
try this
data = {'a':'Apple', 'b':'Banana','c':'Carrot'}
keys = data.keys()
a,b,c = [data[k] for k in keys]
result:
a == 'Apple'
b == 'Banana'
c == 'Carrot'
Well, if you want these in a class you can always do this:
class AttributeDict(dict):
def __init__(self, *args, **kwargs):
super(AttributeDict, self).__init__(*args, **kwargs)
self.__dict__.update(self)
d = AttributeDict(a=1, b=2)
Based on #ShawnFumo answer I came up with this:
def destruct(dict): return (t[1] for t in sorted(dict.items()))
d = {'b': 'Banana', 'c': 'Carrot', 'a': 'Apple' }
a, b, c = destruct(d)
(Notice the order of items in dict)
An old topic, but I found this to be a useful method:
data = {'a':'Apple', 'b':'Banana','c':'Carrot'}
for key in data.keys():
locals()[key] = data[key]
This method loops over every key in your dictionary and sets a variable to that name and then assigns the value from the associated key to this new variable.
Testing:
print(a)
print(b)
print(c)
Output
Apple
Banana
Carrot
An easy and simple way to destruct dict in python:
params = {"a": 1, "b": 2}
a, b = [params[key] for key in ("a", "b")]
print(a, b)
# Output:
# 1 2
I don't know whether it's good style, but
locals().update(params)
will do the trick. You then have a, b and whatever was in your params dict available as corresponding local variables.
Since dictionaries are guaranteed to keep their insertion order in Python >= 3.7, that means that it's complete safe and idiomatic to just do this nowadays:
params = {'a': 1, 'b': 2}
a, b = params.values()
print(a)
print(b)
Output:
1
2

Python "extend" for a dictionary

What is the best way to extend a dictionary with another one while avoiding the use of a for loop? For instance:
>>> a = { "a" : 1, "b" : 2 }
>>> b = { "c" : 3, "d" : 4 }
>>> a
{'a': 1, 'b': 2}
>>> b
{'c': 3, 'd': 4}
Result:
{ "a" : 1, "b" : 2, "c" : 3, "d" : 4 }
Something like:
a.extend(b) # This does not work
a.update(b)
Latest Python Standard Library Documentation
A beautiful gem in this closed question:
The "oneliner way", altering neither of the input dicts, is
basket = dict(basket_one, **basket_two)
Learn what **basket_two (the **) means here.
In case of conflict, the items from basket_two will override the ones from basket_one. As one-liners go, this is pretty readable and transparent, and I have no compunction against using it any time a dict that's a mix of two others comes in handy (any reader who has trouble understanding it will in fact be very well served by the way this prompts him or her towards learning about dict and the ** form;-). So, for example, uses like:
x = mungesomedict(dict(adict, **anotherdict))
are reasonably frequent occurrences in my code.
Originally submitted by Alex Martelli
Note: In Python 3, this will only work if every key in basket_two is a string.
Have you tried using dictionary comprehension with dictionary mapping:
a = {'a': 1, 'b': 2}
b = {'c': 3, 'd': 4}
c = {**a, **b}
# c = {"a": 1, "b": 2, "c": 3, "d": 4}
Another way of doing is by Using dict(iterable, **kwarg)
c = dict(a, **b)
# c = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
In Python 3.9 you can add two dict using union | operator
# use the merging operator |
c = a | b
# c = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
a.update(b)
Will add keys and values from b to a, overwriting if there's already a value for a key.
As others have mentioned, a.update(b) for some dicts a and b will achieve the result you've asked for in your question. However, I want to point out that many times I have seen the extend method of mapping/set objects desire that in the syntax a.extend(b), a's values should NOT be overwritten by b's values. a.update(b) overwrites a's values, and so isn't a good choice for extend.
Note that some languages call this method defaults or inject, as it can be thought of as a way of injecting b's values (which might be a set of default values) in to a dictionary without overwriting values that might already exist.
Of course, you could simple note that a.extend(b) is nearly the same as b.update(a); a=b. To remove the assignment, you could do it thus:
def extend(a,b):
"""Create a new dictionary with a's properties extended by b,
without overwriting.
>>> extend({'a':1,'b':2},{'b':3,'c':4})
{'a': 1, 'c': 4, 'b': 2}
"""
return dict(b,**a)
Thanks to Tom Leys for that smart idea using a side-effect-less dict constructor for extend.
Notice that since Python 3.9 a much easier syntax was introduced (Union Operators):
d1 = {'a': 1}
d2 = {'b': 2}
extended_dict = d1 | d2
>> {'a':1, 'b': 2}
Pay attention: in case first dict shared keys with second dict, position matters!
d1 = {'b': 1}
d2 = {'b': 2}
d1 | d2
>> {'b': 2}
Relevant PEP
You can also use python's collections.ChainMap which was introduced in python 3.3.
from collections import ChainMap
c = ChainMap(a, b)
c['a'] # returns 1
This has a few possible advantages, depending on your use-case. They are explained in more detail here, but I'll give a brief overview:
A chainmap only uses views of the dictionaries, so no data is actually copied. This results in faster chaining (but slower lookup)
No keys are actually overwritten so, if necessary, you know whether the data comes from a or b.
This mainly makes it useful for things like configuration dictionaries.
In terms of efficiency, it seems faster to use the unpack operation, compared with the update method.
Here an image of a test I did:

Categories

Resources