Python construct a dictionary data type? - python

I'm new in python, and just ran into this statement
data = dict( (k, v) for k, v in data.items() if v != 'null')
I don't really what they doing here to construct a dict. Could you explain it a bit to me? Why using for loop in dict() and why the if comes after? I didn't see anythin like this in the python docs.
Thanks guys

The code uses the dict constructor to create a new dictionary. The constructor can take an iterable of key, value pairs to initialise the new dictionary with. As others have pointed out, the example code has a generator expression the creates this iterable of key, value pairs.
The generator expression acts a little bit like a list and could be re-written like this:
mylist = []
for k, v in data.items():
if v != 'null':
mylist.append((k, v))
But it never actually creates a list, it just yields each value in turn as it is processed by the dict constructor.
As for why the if comes after the loop, this is the syntax chosen by the python developers, so you'd have to ask them. But notice in my re-written generator expression that the if statement is inside (i.e. after) the for statement.
I've linked already to the section on generator expressions in the python documentation but at unkulunkulu's request, here's a couple more:
Carl's Groner's Introduction to List Comprehensions
Fredrik Haard's How to (Effectively) Explain List Comprehensions

The argument to dict() is a generator expression that yields tuples consisting of key, value pairs (i.e., the (k, v)) drawn from data.items(). The dict() built-in function can automatically construct a dictionary object from a list or sequence of such tuples, e.g.:
>>> kvs = [('a', 1), ('b', 2)]
>>> dict(kvs)
{'a': 1, 'b': 2}
The if v != 'null' qualifier instructs the generator to ignore/skip over those elements whose value (that is, the second item in the tuple) equals 'null' (more precisely, it only yields those pairs for which the value is not equal to 'null').
For a much more detailed explanation of generator expressions, see PEP 289.

Related

how to remove duplicate list in the values of a dictionary

i have a dictionary
dictionary = {
1:[[1,2],[3,4],[5,6],[7,8],[1,2]],
2:[[5,6],[7,8],[1,2]],
3:[3,4],[5,6],[3,4]]
}
How can i remove duplicate list in each value of the dictionary?
output = {
1:[[3,4],[5,6],[7,8],[1,2]],
2:[[5,6],[7,8],[1,2]],
3:[3,4],[5,6]]
}
How can i remove all duplicates?
output = [[1,2],[3,4],[5,6],[7,8]]
i have tried doing for loops, like so:
for i in dictionary.values():
for j in i:
for k in i:
if j == k:
i.remove(k)
but i'm just a beginner so i'm not getting any results...
The usual way to do this is to leverage a set, which is like a dictionary that has only keys and no values. Dictionaries (and sets) rely on their keys to be "hashable," which means that you can feed the key through some hash function and get the same result every time. In Python you can call this hash function with hash(some_object), which internally invokes some_object.__hash__().
The problem with this approach is that lists are not hashable. No mutable objects (things you can change with methods like list.append or set.add or dict.union or etc) are. This means you must either check equality by hand, or mutate it into some form that is hashable, use the set, and then mutate it back. I think the latter is probably your best bet.
To that end, let's use a tuple. Tuples are just like lists except they are not idiomatically homogenous (so mixing types is common, not just technically allowed) and their order has semantic meaning. Consider an ordered pair on a plane -- it would matter deeply if the order flipped: (1, 4) is not the same point as (4, 1). They are, however, immutable and hashable.
d = {1: [[1,2],[3,4],[5,6],[7,8],[1,2]],
2: [[5,6],[7,8],[1,2]],
3: [[3,4],[5,6],[3,4]]}
# we'll use a set comprehension here because it's concise
uniques = {tuple(sublst) for lst in d.values() for sublst in lst}
result = [list(tup) for tup in uniques] # then just change them back to lists
Note that the conversion to set and back does lose all ordering. If ordering is important then you'll have to do something like iterate through every sub list, convert it to tuple, check to see if it's already been seen, and if not add it to the seen set and append it to your final list.
d = {1: [[1,2],[3,4],[5,6],[7,8],[1,2]],
2: [[5,6],[7,8],[1,2]],
3: [[3,4],[5,6],[3,4]]}
seen = set()
result = []
for lst in d.values():
for sublst in lst:
tup = tuple(sublst)
if tup not in seen:
seen.add(tup)
result.append(sublst)

tuple to list conversion within dictionary values (list of lists (and tuples))

I am dealing with a dictionary that is formatted as such:
dic = {'Start': [['Story' , '.']],
'Wonderful': [('thing1',), ["thing1", "and", "thing2"]],
'Amazing': [["The", "thing", "action", "the", "thing"]],
'Fantastic': [['loved'], ['ate'], ['messaged']],
'Example': [['bus'], ['car'], ['truck'], ['pickup']]}
if you notice, in the story key, there is a tuple within a list. I am looking for a way to convert all tuples within the inner lists of each key into lists.
I have tried the following:
for value in dic.values():
for inner in value:
inner = list(inner)
but that does not work and I don't see why. I also tried an if type(inner) = tuple statement to try and convert it only if its a tuple but that is not working either... Any help would be very greatly appreciated.
edit: I am not allowed to import, and only have really learned a basic level of python. A solution that I could understand with that in mind is preferred.
You need to invest some time learning how assignment in Python works.
inner = list(inner) constructs a list (right hand side), then binds the name inner to that new list and then... you do nothing with it.
Fixing your code:
for k, vs in dic.items():
dic[k] = [list(x) if isinstance(x, tuple) else x for x in vs]
You need to update the element by its index
for curr in dic.values():
for i, v in enumerate(curr):
if isinstance(v, tuple):
curr[i] = list(v)
print(dic)
Your title, data and code suggest that you only have tuples and lists there and are willing to run list() on all of them, so here's a short way to convert them all to lists and assign them back into the outer lists (which is what you were missing) (Try it online!):
for value in dic.values():
value[:] = map(list, value)
And a fun way (Try it online!):
for value in dic.values():
for i, [*value[i]] in enumerate(value):
pass

Are dict comprehensions evaluated incrementally in Python?

I'd have assumed the results of purge and purge2 would be the same in the following code (remove duplicate elements, keeping the first occurrences and their order):
def purge(a):
l = []
return (l := [x for x in a if x not in l])
def purge2(a):
d = {}
return list(d := {x: None for x in a if x not in d})
t = [2,5,3,7,2,6,2,5,2,1,7]
print(purge(t), purge2(t))
But it looks like with dict comprehensions, unlike with lists, the value of d is built incrementally. Is this what's actually happening? Do I correctly infer the semantics of dict comprehensions from this sample code and their difference from list comprehensions? Does it work only with comprehensions, or also with other right-hand sides referring to the dictionary being assigned to (e.g. comprehensions nested inside other expressions, something involving iterators, comprehensions of types other than dict)? Where is it specified and full semantics can be consulted? Or is it just an undocumented behaviour of the implementation, not to be relied upon?
There's nothing "incremental" going on here. The walrus operator doesn't assign to the variable until the dictionary comprehension completes. if x not in d is referring to the original empty dictionary, not the dictionary that you're building with the comprehension, just as the version with the list comprehension is referring to the original l.
The reason the duplicates are filtered out is simply because dictionary keys are always unique. Trying to create a duplicate key simply ignores the second one. It's the same as if you'd written:
return {2: None, 2: None}
you'll just get {2: None}.
So your function can be simplified to
def purge2(a):
return list({x: None for x in a})

What is this Python magic?

If you do this {k:v for k,v in zip(*[iter(x)]*2)} where x is a list of whatever, you'll get a dictionary with all the odd elements as keys and even ones as their values. woah!
>>> x = [1, "cat", "hat", 35,2.5, True]
>>> d = {k:v for k,v in zip(*[iter(x)]*2)}
>>> d
{1: "cat", "hat": 35, 2.5: True}
I have a basic understanding of how dictionary comprehensions work, how zip works, how * extracts arguments, how [iter(x)]*2 concatenates two copies of the list, and so I was expecting a one-to-one correspondence like {1: 1, "cat": "cat" ...}.
What's going on here?
This is an interesting little piece of code for sure! The main thing it utilizes that you might not expect is that objects are, in effect, passed by reference (they're actually passed by assignment, but hey). iter() constructs an object, so "copying" it (using multiplication on a list, in this case) doesn't create a new one, but rather adds another reference to the same one. That means you have a list where l[0] is an iterator, and l[1] is the same iterator - accessing them both accesses the very same object.
Every time the next element of the iterator is accessed, it continues where it last left off. Since elements are accessed alternately between the first and second elements of the tuples that zip() creates, the single iterator's state is advanced across both elements in the tuple.
After that, the dictionary comprehension simply consumes these pair tuples as they expand to k, v - as they would in any other dictionary comprehension.
This iter(x) creates an iterator over the iterable (list or similar) x. This iterator gets copied using [iter(x)]*2. Now you have a list of two times the same iterator. This means, if I ask one of them for a value, the other (which is the same) gets incremented as well.
zip() now gets the two iterators (which are the same) as two parameters via the zip(* ... ) syntax. This means, it creates a list of pairs of the two arguments it got. It will ask the first iterator for a value (and receive x[0]), then it will ask the other iterator for a value (and receive x[1]), then it will form a pair of the two values and put that in its output. Then it will do this repeatedly until the iterators are exhausted. By this it will form a pair of x[2] and x[3], then a pair of x[4] and x[5], etc.
This list of pairs then is passed to the dictionary comprehension which will form the pairs into key/values of a dictionary.
Easier to read might be this:
{ k: v for (k, v) in zip(x[::2], x[1::2]) }
But that might not be as efficient.

filter items in a python dictionary where keys contain a specific string

I'm a C coder developing something in python. I know how to do the following in C (and hence in C-like logic applied to python), but I'm wondering what the 'Python' way of doing it is.
I have a dictionary d, and I'd like to operate on a subset of the items, only those whose key (string) contains a specific substring.
i.e. the C logic would be:
for key in d:
if filter_string in key:
# do something
else
# do nothing, continue
I'm imagining the python version would be something like
filtered_dict = crazy_python_syntax(d, substring)
for key,value in filtered_dict.iteritems():
# do something
I've found a lot of posts on here regarding filtering dictionaries, but couldn't find one which involved exactly this.
My dictionary is not nested and i'm using python 2.7
How about a dict comprehension:
filtered_dict = {k:v for k,v in d.iteritems() if filter_string in k}
One you see it, it should be self-explanatory, as it reads like English pretty well.
This syntax requires Python 2.7 or greater.
In Python 3, there is only dict.items(), not iteritems() so you would use:
filtered_dict = {k:v for (k,v) in d.items() if filter_string in k}
Go for whatever is most readable and easily maintainable. Just because you can write it out in a single line doesn't mean that you should. Your existing solution is close to what I would use other than I would user iteritems to skip the value lookup, and I hate nested ifs if I can avoid them:
for key, val in d.iteritems():
if filter_string not in key:
continue
# do something
However if you realllly want something to let you iterate through a filtered dict then I would not do the two step process of building the filtered dict and then iterating through it, but instead use a generator, because what is more pythonic (and awesome) than a generator?
First we create our generator, and good design dictates that we make it abstract enough to be reusable:
# The implementation of my generator may look vaguely familiar, no?
def filter_dict(d, filter_string):
for key, val in d.iteritems():
if filter_string not in key:
continue
yield key, val
And then we can use the generator to solve your problem nice and cleanly with simple, understandable code:
for key, val in filter_dict(d, some_string):
# do something
In short: generators are awesome.
You can use the built-in filter function to filter dictionaries, lists, etc. based on specific conditions.
filtered_dict = dict(filter(lambda item: filter_str in item[0], d.items()))
The advantage is that you can use it for different data structures.
input = {"A":"a", "B":"b", "C":"c"}
output = {k:v for (k,v) in input.items() if key_satifies_condition(k)}
Jonathon gave you an approach using dict comprehensions in his answer. Here is an approach that deals with your do something part.
If you want to do something with the values of the dictionary, you don't need a dictionary comprehension at all:
I'm using iteritems() since you tagged your question with python-2.7
results = map(some_function, [(k,v) for k,v in a_dict.iteritems() if 'foo' in k])
Now the result will be in a list with some_function applied to each key/value pair of the dictionary, that has foo in its key.
If you just want to deal with the values and ignore the keys, just change the list comprehension:
results = map(some_function, [v for k,v in a_dict.iteritems() if 'foo' in k])
some_function can be any callable, so a lambda would work as well:
results = map(lambda x: x*2, [v for k,v in a_dict.iteritems() if 'foo' in k])
The inner list is actually not required, as you can pass a generator expression to map as well:
>>> map(lambda a: a[0]*a[1], ((k,v) for k,v in {2:2, 3:2}.iteritems() if k == 2))
[4]
You can use the built-in function 'filter()':
data = {'aaa':12, 'bbb':23, 'ccc':8, 'ddd':34}
# filter by key
print(dict(filter(lambda e:e[0]=='bbb', data.items() ) ) )
# filter by value
print(dict(filter(lambda e:e[1]>18, data.items() ) ) )
OUTPUT:
{'bbb':23}
{'bbb':23, 'ddd':34}

Categories

Resources