Update dictionary items in list through comprehension - python

I have a dictionary which i want to use as a template to generate multiple dictionaries with updated dictionary item. This list should be used as a dataset for testing purposes in unit tests in pytest.
I am using following construct in my code(checks are excluded):
def _f(template,**kwargs):
result = [template]
for key, value in kwargs.items():
result = [dict(template_item,**dict([(key,v)])) for v in value for template_item in result]
return result
template = {'a': '', 'b': '', 'x': 'asdf'}
r = _f(template, a=[1,2],b=[11,22])
pprint(r)
[{'a': 1, 'b': 11, 'x': 'asdf'},
{'a': 2, 'b': 11, 'x': 'asdf'},
{'a': 1, 'b': 22, 'x': 'asdf'},
{'a': 2, 'b': 22, 'x': 'asdf'}]
I would like to ask if the construct used to build good enough - possibly it can be written more efficient.
Is this correct way to prepare testing data?
EDIT:
Specially i am unsure about
[dict(template_item,**dict([(key,v)])) for v in value for template_item in result]
and
dict(template_item,**dict([(key,v)]))
Before i was thinking about dict.update() but not suitable for comprehension because it is not returning dictionary.
then i was thinking about simple syntax like
d = {'aa': 11, 'bb': 22}
dict(d,x=33,y=44)
{'aa': 11, 'bb': 22, 'x': 33, 'y': 44}
but i was unable to pass key value through variable. And creating dict just to unpack it sounds counterproductive to me.

Specially i am unsure about...
The thing about updating Python dicts in comprehensions is a bit more complex because they are mutable. In Why doesn't a python dict.update() return the object? the best answer suggests your current solution. Personally I'd probably go with a regular for-loop here in order to ensure the code is legible.
Is this correct way to prepare testing data?
Usually in unit tests you will test both for edge cases and regular cases (you don't wanna repeat yourself, though). You usually want to split the tests, so that each has its own name explaining why it's there and possibly some other data that could help some outsider understand why it's important to make sure this scenario works correctly. Putting all scenarios in one list and then running the test for each one of them without giving the reader additional context (in form of at least a test case name) makes it harder for the reader to distinguish between the cases and judge whether they are all really needed.
Putting each of the scenarios in a separate test case may seem a bit tedious at times, but if any of the tests fails, you can immediately tell which part of the software is failing. If you feel like you write way too many unit tests, then perhaps some of them cover the same kinds of scenarios.
When dealing with unit tests performance is rarely the top priority. Usually what counts more is making the tests number minimal, yet sufficient in order to ensure the software is working correctly. The other prioritized thing is making the tests easily understandable. See below for another take on this (not necessarily more performant yet hopefully more legible).
Alternative solution
You could use itertools.product in order to simplify your code.
The template parameter can be removed (since you can pass the template variable names and their possible values in **kwargs):
from pprint import pprint
import itertools
def _f(**kwargs):
keys, values = zip(*(kwargs.items())) # 1.
subsets = [subset for subset in itertools.product(*values)] # 2.
return [
{key: value for key, value in zip(keys, subset)} for subset in subsets
] # 3.
r = _f(a=[1, 2], b=[11, 22], x=['asdf'])
pprint(r)
Now what's happening in each of these steps:
Step 1.
You split the keyword dict into keys and values. It's important, so that you will fix the order of how you iterate through these arguments every time. The keys and values look like this at this point:
keys = ('a', 'b', 'x')
values = ([1, 2], [11, 22], ['asdf'])
Step 2. You compute the cartesian product of the values, which means you get all the possible combinations of taking a value from each of the values lists. The result of this operation is as follows:
subsets = [(1, 11, 'asdf'), (1, 22, 'asdf'), (2, 11, 'asdf'), (2, 22, 'asdf')]
Step 3.
Now you need to map each of keys to their corresponding values in each of the subsets, hence the list and dict comprehensions, the result should be exactly what you computed using your previous method:
[{'a': 1, 'b': 11, 'x': 'asdf'},
{'a': 1, 'b': 22, 'x': 'asdf'},
{'a': 2, 'b': 11, 'x': 'asdf'},
{'a': 2, 'b': 22, 'x': 'asdf'}]

Related

Does collection's Counter keeps data sorted?

I was reading python collections's Counter. It says following:
>>> from collections import Counter
>>> Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
Counter({'z': 9, 'b': 8, 'a': 4, 'c': 2, 'y': 2, 'v': 2})
Somehow these printed values are printed in descending order (9 > 8 > 4 > 2). Why is it so? Does Counter store values sorted?
PS: Am on python 3.7.7
In terms of the data stored in a Counter object: The data is insertion-ordered as of Python 3.7, because Counter is a subclass of the built-in dict. Prior to Python 3.7, there was no guaranteed order of the data.
However, the behavior you are seeing is coming from Counter.__repr__. We can see from the source code that it will first try to display using the Counter.most_common method, which sorts by value in descending order. If that fails because the values are not sortable, it will fall back to the dict representation, which, again, is insertion-ordered.
The order depends on the python version.
For python < 3.7, there is no guaranteed order, since python 3.7 the order is that of insertion.
Changed in version 3.7: As a dict subclass, Counter inherited the
capability to remember insertion order. Math operations on Counter
objects also preserve order. Results are ordered according to when an
element is first encountered in the left operand and then by the order
encountered in the right operand.
Example on python 3.8 (3.8.10 [GCC 9.4.0]):
from collections import Counter
Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
Output:
Counter({'z': 9, 'a': 4, 'c': 2, 'b': 8, 'y': 2, 'v': 2})
how to check that Counter doesn't sort by count
As __str__ in Counter return the most_common, it is not a reliable way to check the order.
Convert to dict, the __str__ representation will be faithful.
c = Counter({'z': 9,'a':4, 'c':2, 'b':8, 'y':2, 'v':2})
print(dict(c))
# {'z': 9, 'a': 4, 'c': 2, 'b': 8, 'y': 2, 'v': 2}

Translate information of two dictionaries into a list of dictionaries

I believe this is not so easy to explain with words but with an example it should be pretty clear.
Suppose I have the dictionaries link_vars and y:
link_vars = {'v1':'AAA', 'v2':'BBB', 'v3':'CCC'}
y = {'v1':[1,2,3], 'v2':[4,5,6], 'v3':[7,8,9]}
And I want to build the list desired_output:
desired_output = [{'AAA':1 , 'BBB':4, 'CCC':7},
{'AAA':2 , 'BBB':5, 'CCC':8},
{'AAA':3 , 'BBB':6, 'CCC':9}]
So, basically, I want to 'translate' the keys in y according to the entries of the dictionary link_vars, and then split the lists in y into small dictionaries to build desired_output. The keys of y and link_vars will always be the same, and the length of each list in the values of y will be the same as well (i.e. there will not be a list with 4 elements and another one with 5).
I'm not being able to think of a smart way to do this. I hope there is an efficient way to do this, as the length of the output list (which is the same as the lengths of each list in the values of y) can be pretty large.
Here's a solution which links the two dictionaries:
from operator import itemgetter
keys = itemgetter(*y)(link_vars)
res = [dict(zip(keys, v)) for v in zip(*y.values())]
[{'AAA': 1, 'BBB': 4, 'CCC': 7},
{'AAA': 2, 'BBB': 5, 'CCC': 8},
{'AAA': 3, 'BBB': 6, 'CCC': 9}]
Explanation
First, define keys, which extracts values from link_vars in an order which is consistent with the keys from y.
Second, use a list comprehension and zip the pre-calculated keys with a transposed version of y.values(). We assume y.values() iterates consistently with y.keys(), which is true in Python 2.x and 3.x.
Ok fine then. A list comprehension should do in this case:
output = [dict(zip(link_vars.values(),i)) for i in zip(*y.values())]
print(output)
Returns:
[{'AAA': 1, 'BBB': 4, 'CCC': 7},
{'AAA': 2, 'BBB': 5, 'CCC': 8},
{'AAA': 3, 'BBB': 6, 'CCC': 9}]
Taking in consideration jpps comment, maybe a more appropriate approach would be to first make sure that we get the right values by merging the dicts together.
temp_d = {v:y.get(k) for k,v in link_vars.items()}
output = [dict(zip(temp_d.keys(),i)) for i in zip(*temp_d.values())]
Or using pandas library, might be overkill, but the syntax is easy to understand as we only need to merge the dicts and handle the rest with to_dict() func.
import pandas as pd
output = pd.DataFrame({v:y.get(k) for k,v in link_vars.items()}).to_dict('r')
Explanation
The key idea here is to zip together the values of y. This is done with zip(*y.values()). And running the list comprehension: [i for i in zip(*y.values())] which equals [(7, 4, 1), (8, 5, 2), (9, 6, 3)] the remaining part is to zip together each component with AAA,BBB,CCC.

Why the python dictionary key is like this?

I have a dictionary:
dict_a = dict(zip(('a','b','c','d','e'),(1,2,3,4,5)))
The output is:
dict_a = {'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}
I want to know why it is not:
dict_a = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
I do know dict_a is a not sorted object, but still want to know why the key order is a, c, b, e, d, not in the other orders.
Thanks
Dictionaries are not just sorted, they are unordered. Dictionaries are, in the deeper level, keys pointing to memory addresses.
Let's tackle this another way. In traditional languages you have arrays. Internally, arrays are contiguous memory, i.e. x[0] and x[1] are next to eachother in memory. Dictionaries meanwhile are loose collections of pointers. y[a] and y[b] have no physical relationship they have no order.
See longer discussions earlier:
Why is python ordering my dictionary like so?
Why is the order in dictionaries and sets arbitrary?
(And this should rather have been a comment, but I don't have the reputation to write one...)
As you said, dictionaries are not ordered objects. So no matter what order you add items to it they will be jumbled up. Dictionary do not support indexing, so therefore it has no reason to be in the correct order. I guess it saves memory not having to know what position the items are supposed to be.
In a way you can say they have indexing using keys to obtain the associated value and not position as in lists. You can only have a distinct key point to a value as you can only have 1 value at position 0 in a list.
More info at Python documentation
Because the regular dictionay does not contains insertion order process. It use arbitrary order. So if you want to use a dictionary as ordered you should use OrderedDict, and it contains insertion order process. But you need to consider your sutition when you use ordered dictionary, because it slower then regular dictionary when you insert item and update it.
If you want to maintain the order of the items in your dictionary, you can use an OrderedDict:
from collections import OrderedDict
dict_a = ()
dict_a['a'] = 1
dict_a['b'] = 2
dict_a['c'] = 3
dict_a['d'] = 4
dict_a['e'] = 5
Then the order of the dictionary will be maintained:
dict_a = OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)])
So, first I thought it was zip that was causing it. However it is not zip since when I enter
>>zip(('a','b','c','d','e'),(1,2,3,4,5)
The output is from left to right
[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)]
So I considered maybe it has to do with memory storage between our systems (I'm using windows10). I checked with the same input and got the same output
>>dict_a = dict(zip(('a','b','c','d','e'),(1,2,3,4,5)))
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}
Now, I checked the documentation at python built-in types. Their code is more what you'd expect where
>>c = dict(zip(['one', 'two', 'three'], [1, 2, 3]))
has the output from right to left, (which I'm guessing after zipping the two lists the dictionary adds from popping values and using the update function).
>>{'three': 3, 'two': 2, 'one': 1}
No matter how you put in the different dictionaries it still has the same output (dictionaries like sets as OP states order does not matter).
>>d = dict([('two', 2), ('one', 1), ('three', 3)])
{'three': 3, 'two': 2, 'one': 1}
The python manual states for built in types
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument
Given that the key arugments don't change even with or without changing zip I tried one last thing. I decided to switch the keys and items
>>d = dict([(1,'a'), (2,'b'),(3,'c'),(4,'d'),(5,'e')])
Which outputted what we expected
{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Ordered by numbers. My only explanation is that during the actual compiling the letters 'a','b','c','d','e' are being stored as another form (e.g. hexadecimal not ascii) which would change the order of a,b,c,d,e in the list that is used for keys in the dictionary.

How to generate the keys of a dictionary using permutations

I need to create a dictionary, values could be left blank or zero but i need the keys to be all the possible combinations of ABCD characters with lenght k. For example, for k = 8
lex = defaultdict(int)
lex = {
'AAAAAAAA':0,
'AAAAAAAB':0,
'AAAAAABB':0,
...}
so far i have tried somethink like this, i know it's wrong but i have no idea how to make it work, i'm new in python so please bear with me.
mydiction = {}
mylist = []
mylist = itertools.permutations('ACTG', 8)
for keys in mydiction:
mydiction[keys] = mylist.next()
print(mydiction)
You can do it in one line, but what you are looking for is combinations_with_replacement
from itertools import combinations_with_replacement
mydict = {"".join(key):0 for key in combinations_with_replacement('ACTG', 8)}
What you're describing isn't permutations, but combinations with replacement. There's a function for that in the itertools module as well.
Note, however, that there are sixty thousand combinations there. Trying to put them all in a dict, or even just iterate over them all, is NOT going to produce happy results.
What's your use case? It's possible you just need to recognize combinations, rather than generating them all exhaustively. And each combination is intrinsically associated with a particular 16-bit integer index, so you could instead store and operate on that.
Although the combinations_with_replacement function works perfectly fine, you will be generating a huge list of string with a collision rate which is relatively high (around 20%)
What you are looking to do can be done using base4 integers. Not only are they faster to process, more memory efficient, but they also have 0 collision (each number is its own hash) meaning a guaranteed O(1) look-up time in worst case.
def num_to_hash(n, k, literals='ABCD'):
return ''.join((literals[(n >> (k - x)*2 & 3)] for x in xrange(1, k+1)))
k = 2
d = {num_to_hash(x, k, 'ACTG'): 0 for x in xrange((4**k) - 1)}
print d
output:
{'AA': 0,
'AC': 0,
'AG': 0,
'AT': 0,
'CA': 0,
'CC': 0,
'CG': 0,
'CT': 0,
'GA': 0,
'GC': 0,
'GT': 0,
'TA': 0,
'TC': 0,
'TG': 0,
'TT': 0}

How to construct a dictionary which contain keys which are invalid keyword arguments, not using {}

If the use of {} is mere shortcut, to dict(), for creating a dict. So I was wondering how is one to create a dict which contains keys which are invalid keyword arguments, like 1, "1foo2bar3" etc, without using the curly braces, or square brackets(with reference to the code below) and sticking to dict()?
Code below is not an option:
a = dict()
a[1] = 1
You mean using the default dict constructor to create a dictionary?
>>> a = dict((x, x) for x in range(10))
>>> a
{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
It takes a collection of tuples, and turns it into a dictionary, usually combined with generator expressions like above - but you could also give it a previously prepared collection of tuples.
>>> dict([(1, 1), ('a', 'a')])
{'a': 'a', 1: 1}
Using your examples:
>>> dict([(1, None), ("1foo", None), ("foo2bar3", None)])
{1: None, '1foo': None, 'foo2bar3': None}
{} is not a "mere" shortcut for dict(). It's a shortcut in terms of performance in addition to the length difference.
http://doughellmann.com/2012/11/the-performance-impact-of-using-dict-instead-of-in-cpython-2-7-2.html
You're better off using dictionary literals unless you need to support a really old version of Python that doesn't include them, or if you're doing some sort of list or other comprehension to build the dict with a version of Python < 2.7 (as versions of Python less than 2.7 don't support dict comprehensions [i.e. {x: y for x, y in some_iterable}], I believe).

Categories

Resources