This may be a simple question, but I'm having trouble making a unique search for it.
I have a class that defines a static dictionary, then attempts to define a subset of that dictionary, also statically.
So, as a toy example:
class example(object):
first_d = {1:1,2:2,3:3,4:4}
second_d = dict((k,first_d[k]) for k in (2,3))
This produces NameError: global name 'first_d' is not defined
How should I be making this reference? It seems this pattern works in other cases, eg:
class example2(object):
first = 1
second = first + 1
A basic list comprehension has the following syntax
[expression for var in iterable]
When a list comprehension occurs inside a class, the attributes of the class
can be used in iterable. This is true in Python2 and Python3.
However, the attributes of the class can be used (i.e. accessed) in expression in Python2 but not in Python3.
The story is a bit different for generator expressions:
(expression for var in iterable)
While the class attributes can still be accessed from iterable, the class attributes are not accessible from expression. (This is true for Python2 and Python3).
This can all be summarized as follows:
Python2 Python3
Can access class attributes
--------------------------------------------------
list comp. iterable Y Y
list comp. expression Y N
gen expr. iterable Y Y
gen expr. expression N N
dict comp. iterable Y Y
dict comp. expression N N
(Dict comprehensions behave the same as generator expressions in this respect.)
Now how does this relate to your question:
In your example,
second_d = dict((k,first_d[k]) for k in (2,3))
a NameError occurs because first_d is not accessible from the expression part of a generator expression.
A workaround for Python2 would be to change the generator expression to a list comprehension:
second_d = dict([(k,first_d[k]) for k in (2,3)])
However, I don't find this a very comfortable solution since this code will fail in Python3.
You could do as Joel Cornett suggests:
second_d = {k: v for k, v in first_d.items() if k in (2, 3)}
since this uses first_d in the iterable rather than the expression part of the dict comprehension. But this may loop through many more items than necessary if first_d contains many items. Neverthess, this solution might be just fine if first_d is small.
In general, you can avoid this problem by defining a helper function which can be defined inside or outside the class:
def partial_dict(dct, keys):
return {k:dct[k] for k in keys}
class Example(object):
first_d = {1:1,2:2,3:3,4:4}
second_d = partial_dict(first_d, (2,3))
class Example2(object):
a = [1,2,3,4,5]
b = [2,4]
def myfunc(A, B):
return [x for x in A if x not in B]
c = myfunc(a, b)
print(Example().second_d)
# {2: 2, 3: 3}
print(Example2().c)
# [1, 3, 5]
Functions work because they define a local scope and
variables in this local scope can be accessed from within the dict comprehension.
This was explained here, but I am not entirely comfortable with this since it does not explain why the expression part behaves differently than the iterable part of a list comprehension, generator expression or dict comprehension.
Thus I can not explain (completely) why Python behaves this way, only that this is the way it appears to behave.
It's a bit kludgy, but you could try this:
class test(object):
pass
test.first = {1:1, 2:2, 3:3, 4:4}
test.second = dict((k, test.first[k]) for k in (2,3))
...and then:
>>> test.first
{1: 1, 2: 2, 3: 3, 4: 4}
>>> test.second
{2: 2, 3: 3}
>>> t = test()
>>> t.first
{1: 1, 2: 2, 3: 3, 4: 4}
>>> t.second
{2: 2, 3: 3}
>>> test.first[5] = 5
>>> t.first
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5}
I don't think the class exists until you get to the end of its definition.
Related
In PyCharm, when I write:
return set([(sy + ady, sx + adx)])
it says "Function call can be replaced with set literal" so it replaces it with:
return {(sy + ady, sx + adx)}
Why is that? A set() in Python is not the same as a dictionary {}?
And if it wants to optimize this, why is this more effective?
Python sets and dictionaries can both be constructed using curly braces:
my_dict = {'a': 1, 'b': 2}
my_set = {1, 2, 3}
The interpreter (and human readers) can distinguish between them based on their contents. However it isn't possible to distinguish between an empty set and an empty dict, so this case you need to use set() for empty sets to disambiguate.
A very simple test suggests that the literal construction is faster (python3.5):
>>> timeit.timeit('a = set([1, 2, 3])')
0.5449375328607857
>>> timeit.timeit('a = {1, 2, 3}')
0.20525191631168127
This question covers some issues of performance of literal constructions over builtin functions, albeit for lists and dicts. The summary seems to be that literal constructions require less work from the interpreter.
It is an alternative syntax for set()
>>> a = {1, 2}
>>> b = set()
>>> b.add(1)
>>> b.add(2)
>>> b
set([1, 2])
>>> a
set([1, 2])
>>> a == b
True
>>> type(a) == type(b)
True
dict syntax is different. It consists of key-value pairs. For example:
my_obj = {1:None, 2:None}
Another example how set and {} are not interchangeable (as jonrsharpe mentioned):
In: f = 'FH'
In: set(f)
Out: {'F', 'H'}
In: {f}
Out: {'FH'}
set([iterable]) is the constructor to create a set from the optional iterable iterable. And {} is to create set / dict object literals. So what is created depends on how you use it.
In [414]: x = {}
In [415]: type(x)
Out[415]: dict
In [416]: x = {1}
In [417]: type(x)
Out[417]: set
In [418]: x = {1: "hello"}
In [419]: type(x)
Out[419]: dict
input: two functions f and g, represented as dictionaries, such that g ◦ f exists. output: dictionary that represents the function g ◦ f.
example: given f = {0:’a’, 1:’b’} and g = {’a’:’apple’, ’b’:’banana’}, return {0:’apple’, 1:’banana’}.
The closest to the correct answer I have is with {i:g[j] for i in f for j in g} which outputs {0: 'apple', 1: 'apple'}. What am I doing wrong?
Other answers are great, but they create an entire dictionary which may be slow. If you only want a read-only composition, then the following class will solve your problems:
class Composition_mapping(collections.abc.Mapping):
def __init__(self, f, g):
self.f = f
self.g = g
def __iter__(self):
return iter(self.g)
def __len__(self):
return len(self.g)
def keys(self):
return self.g.keys()
def __getitem__(self, item):
return self.f[self.g[item]]
For example:
a = {1: 5}
b = {5:9, 7 : 8}
c = Composition_mapping(b,a)
print(c[1])
>>> 9
If you decide to make it a dict, you can always do:
c = dict(c)
print(c)
>>> {1: 9}
This is safe, because Composition_mapping satisfies the mapping protocol, which means it is considered as a mapping. A mapping is an interface (protocol) for read-only dictionary-like structures. Note that it is not necessary to inherit from collections.abc.Mapping, you just need to implement the methods __getitem__, __iter__, __len__ __contains__, keys, items, values, get, __eq__, and __ne__ to be a mapping; after all Python favors duck typing. The reason my code inherits from collections.abc.Mapping is that it implements all the other methods for you except __getitem__, __iter__, __len__. See the official documentation for the details about protocols.
Creating a composition with this method will have constant time cost no matter how big the composed functions are. The look-ups can be a little bit slower due to double look-ups and extra function call, but you can always use c = dict(c) if you are going to make lots of calls in a performance critical part. More importantly, if you change a and b anywhere in your code, the changes will be reflected on c.
You need to just iterate over one dict f, and in the comprehension, replace the value of f with the value of g
f = {0:'a', 1:'b'}
g = {'a':'apple', 'b': 'banana'}
new_dict = {k: g[v] for k, v in f.items()}
# Value of new_dict = {0: 'apple', 1: 'banana'}
The correct dict comprehension would be:
{i:g[f[i]] for i in f}
You did:
{i:g[j] for i in f for j in g}
That mapped every i to every i to every value in g, but then dict removed duplicate keys.
To see what is going on, try generating a list instead:
>>> [(i, g[j]) for i in f for j in g]
[(0, 'apple'), (0, 'banana'), (1, 'apple'), (1, 'banana')]
In the correct case:
>>> [(i, g[f[i]]) for i in f]
[(0, 'apple'), (1, 'banana')]
This question already has answers here:
Accessing class variables from a list comprehension in the class definition
(8 answers)
Closed 7 years ago.
Consider following code snippet:
class C(object):
a = 0
b = 1
seq = [1, 2, 4, 16, 17]
list_comp = [a if v%2 else b for v in seq]
gen_comp = (a if v%2 else b for v in seq)
Code above in interpreted fine. Printing object bound to class variables results in:
print C.list_comp # [0, 1, 1, 1, 0]
print C.gen_comp # <generator object <genexpr> at ...>
Sad part is - attempt to retrieve value from generator results in NameError:
next(C.gen_comp) # NameError: global name 'a' is not defined
Expected behavior should be similar to list comprehension - it should yield 5 values and raise StopIteration on each next next() call.
What makes a difference here? How names are resolved in each case and why discrepancy occures?
The issue is that generator expressions run in their own namespace , hence they do not have access to names in class scope (class variables like a or b).
This is given in PEP 227 -
Names in class scope are not accessible. Names are resolved in the innermost enclosing function scope. If a class definition occurs in a chain of nested scopes, the resolution process skips class definitions.
Hence you get the NameError when trying to access the class variable in the generator expression.
A way to workaround this would be to access the required values through the class like C.a or C.b . Since the expressions inside the generator expression are only executed when next() is called on it, we can be sure that C class would have been defined by then. Example -
>>> class C(object):
... a = 0
... b = 1
... seq = [1, 2, 4, 16, 17]
... list_comp = [a if v%2 else b for v in seq]
... gen_comp = (C.a if v%2 else C.b for v in seq)
...
>>> next(C.gen_comp)
0
>>> next(C.gen_comp)
1
>>> next(C.gen_comp)
1
>>> next(C.gen_comp)
1
>>> next(C.gen_comp)
0
Note, the same issue occurs for list comprehension in Python 3.x, since in Python 3.x , list comprehensions have their own scopes. See Accessing class variables from a list comprehension in the class definition for more details.
In trying to use a list comprehension to make a list given a conditional, I see the following:
In [1]: mydicts = [{'foo':'val1'},{'foo':''}]
In [2]: mylist = [d for d in mydicts if d['foo']]
In [3]: mylist
Out[3]: [{'foo': 'val1'}]
In [4]: mydicts[1]['foo'] = 'val2'
In [5]: mydicts
Out[5]: [{'foo': 'val1'}, {'foo': 'val2'}]
In [6]: mylist
Out[6]: [{'foo': 'val1'}]
I've been reading the docs to try and understand this but have come up with nothing so far, so I'll ask my question here: why is it that mylist never includes {'foo': 'val2'} even though the reference in the list comprehension points to mydict, which by In [6] contains {'foo': 'val2'}? Is this because Python eagerly evaluates list comprehensions? Or is the lazy/eager dichotomy totally irrelevant to this?
There's no lazy evaluation of lists in Python. List comprehensions simply create a new list. If you want "lazy" evaluation, use a generator expression instead.
my_generator_expression = (d for d in mydicts if d['foo']) # note parentheses
mydicts[1]['foo'] = 'val2'
print(my_generator_expression) # >>> <generator object <genexpr> at 0x00000000>
for d in my_generator_expression:
print(d) # >>> {'foo': 'val1'}
# >>> {'foo': 'val2'}
Note that generators differ from lists in several important ways. Perhaps the most notable is that once you iterate over them, they are exhausted, so they're best to use if you only need the data they contain once.
I think you're a bit confused about what list comprehensions do.
When you do this:
[d for d in mydicts if d['foo']]
That evaluates to a new list. So, when you do this:
mylist = [d for d in mydicts if d['foo']]
You're assigning that list as the value of mylist. You can see this very easily:
assert type(mylist) == list
You're not assigning "a list comprehension" that gets reevaluated every time to mylist. There are no magic values in Python that get reevaluated every time. (You can fake them by, e.g., creating a class with a #property, but that's not really an exception; it's the expression myobj.myprop that's being reevaluated, not myprop itself.)
In fact, mylist = [d for d in mydicts if d['foo']] is basically the same mylist = [1, 2, 3].* In both cases, you're creating a new list, and assigning it to mylist. You wouldn't expect the second one to re-evaluate [1, 2, 3] each time (otherwise, doing mylist[0] = 0 wouldn't do much good, because as soon as you try to view mylist you'd be getting a new, pristine list!). The same is true here.
* In Python 3.x, they aren't just basically the same; they're both just different types of list displays. In 2.x, it's a bit more murky, and they just happen to both evaluate to new list objects.
mylist contains the result of a previous list comprehension evaluation, it won't magically updated just because you update a variable that was used for its computation.
One way (the fastest way?) to iterate over a pair of iterables a and b in sorted order is to chain them and sort the chained iterable:
for i in sorted(chain(a, b)):
print i
For instance, if the elements of each iterable are:
a: 4, 6, 1
b: 8, 3
then this construct would produce elements in the order
1, 3, 4, 6, 8
However, if the iterables iterate over objects, this sorts the objects by their memory address. Assuming each iterable iterates over the same type of object,
What is the fastest way to iterate over a particular
attribute of the objects, sorted by this attribute?
What if the attribute to be chosen differs between iterables? If iterables a and b both iterate over objects of type foo, which has attributes foo.x and foo.y of the same type, how could one iterate over elements of a sorted by x and b sorted by y?
For an example of #2, if
a: (x=4,y=3), (x=6,y=2), (x=1,y=7)
b: (x=2,y=8), (x=2,y=3)
then the elements should be produced in the order
1, 3, 4, 6, 8
as before. Note that only the x attributes from a and the y attributes from b enter into the sort and the result.
Tim Pietzcker has already answered for the case where you're using the same attribute for each iterable. If you're using different attributes of the same type, you can do it like this (using complex numbers as a ready-made class that has two attributes of the same type):
In Python 2:
>>> a = [1+4j, 7+0j, 3+6j, 9+2j, 5+8j]
>>> b = [2+5j, 8+1j, 4+7j, 0+3j, 6+9j]
>>> keyed_a = ((n.real, n) for n in a)
>>> keyed_b = ((n.imag, n) for n in b)
>>> from itertools import chain
>>> sorted_ab = zip(*sorted(chain(keyed_a, keyed_b), key=lambda t: t[0]))[1]
>>> sorted_ab
((1+4j), (8+1j), (3+6j), 3j, (5+8j), (2+5j), (7+0j), (4+7j), (9+2j), (6+9j))
Since in Python 3 zip() returns an iterator, we need to coerce it to a list before attempting to subscript it:
>>> # ... as before up to 'from itertools import chain'
>>> sorted_ab = list(zip(*sorted(chain(keyed_a, keyed_b), key=lambda t: t[0])))[1]
>>> sorted_ab
((1+4j), (8+1j), (3+6j), 3j, (5+8j), (2+5j), (7+0j), (4+7j), (9+2j), (6+9j))
Answer to question 1: You can provide a key attribute to sorted(). For example if you want to sort by the object's .name, then use
sorted(chain(a, b), key=lambda x: x.name)
As for question 2: I guess you'd need another attribute for each object (like foo.z, as suggested by Zero Piraeus) that can be accessed by sorted(), since that function has no way of telling where the object it's currently sorting used to come from. After all, it is receiving a new iterator from chain() which doesn't contain any information about whether the current element is from a or b.