I have the following setup in one module:
class A(object):
# stuff
class B(object):
# stuff
Now what I want to do is, creating an instance of class A by name (I just have the class name as a string) inside of B. How can I do this avoiding the globals function?
Why not just use A? Or do you have just a string 'A'? If yes, globals()['A'] is the way to go. The alternative would be getattr(sys.modules[__name__], 'A') but obviously globals() is more appropriate.
>>> dis.dis(lambda: getattr(sys.modules[__name__], 'Foo'))
1 0 LOAD_GLOBAL 0 (getattr)
3 LOAD_GLOBAL 1 (sys)
6 LOAD_ATTR 2 (modules)
9 LOAD_GLOBAL 3 (__name__)
12 BINARY_SUBSCR
13 LOAD_CONST 1 ('Foo')
16 CALL_FUNCTION 2
19 RETURN_VALUE
>>> dis.dis(lambda: globals()['Foo'])
1 0 LOAD_GLOBAL 0 (globals)
3 CALL_FUNCTION 0
6 LOAD_CONST 1 ('Foo')
9 BINARY_SUBSCR
10 RETURN_VALUE
>>> dis.dis(lambda: Foo)
1 0 LOAD_GLOBAL 0 (Foo)
3 RETURN_VALUE
So by just looking at the instructions used for the various ways to access Foo, using globals() is most likely faster than going through sys.modules.
Let me see if I understand you right:
You have a settings file that looks something like
...
connection-type: FooConnection
...
You have a bunch of classes
class FooConnection(Connection): ...
class BarConnection(Connection): ...
class BazConnection(Connection): ...
You want to map "FooConnection" from the settings file to the class FooConnection.
If so, I would do this instead:
Put
connection-type: Foo
in the settings file, or some other human-readable name that doesn't depend on the name of the class.
Write a mapping from human-readable names to implementations:
implementations = {
"Foo": FooConnection,
"Bar": BarConnection,
"Baz": BazConnection
}
You can change this mapping if you want to change e.g. how you implement the classes. This also lets you have synonyms.
Look up the value in the settings file in the implementations dictionary to get the class you want.
In fact, you're doing this already. Just, instead of explicitly writing down the mapping of strings to classes, you're using the globals dictionary; in other words, assuming that the end user knows the class names you want to use. That's not nice.
I'm not clear what you mean by "accessing on class A by name", but usually there are three main approaches, depending on what you actually want to do.
You create an instance of class A inside class B's __init__
You inherit from class A in class B
What #ThiefMaster posted.
Related
My apologies for what some may regard as a fundamental question. In the following simple code:
def greet(name):
def say_hi():
print('Preparing to greet...')
print('Hi', name, '!')
print('Greeting given.')
return say_hi
What is the sequence of events when 'greet' is called with a formal parameter and the interpreter encounters the 'say_hi' function. I see that a reference to it is returned (forming a closure I assume?), but is the inner function executed or simply 'read' and not called until the programmer writes code like the following:
f = greet('Caroline')
f()
Since every thing in python is about runtime (except compile time tasks like peephole optimizer and etc.), python doesn't call your function unless you call it.
You can see this behavior by using dis function from dis module, which returns the relative bytecode of your function :
>>> def greet(name):
... def say_hi():
... print('Preparing to greet...')
... print('Hi', name, '!')
... print('Greeting given.')
... return say_hi
...
>>> import dis
>>>
>>> dis.dis(greet)
2 0 LOAD_CLOSURE 0 (name)
3 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object say_hi at 0x7fdacc12c8b0, file "<stdin>", line 2>)
9 MAKE_CLOSURE 0
12 STORE_FAST 1 (say_hi)
6 15 LOAD_FAST 1 (say_hi)
18 RETURN_VALUE
As you can see in part 6 python just load the function as a code object in a CONST value.
I'm trying to use types.MethodType to modify the behaviour of some iterator.
def parse(line):
return line.upper()
def reader(f):
f.__next__ = types.MethodType(lambda x: parse(_io.TextIOWrapper.readline(x)), f)
f.__iter__ = types.MethodType(lambda x: x, f)
return f
I guess I'm using types.MethodType correctly, because running the following code I get the expected result:
>>with open("myfile.txt") as f:
>> x = reader(f)
>> print(f.__next__())
NORMAL LINE
However, as soon as I use a for loop, it seems that the parse() function is not called.
>>with open("myfile.txt") as f:
>> for line in reader(f):
>> print(line)
normal line
It's as if the for-loop was using the original next() method of my object instead of the overwritten one.
What am I missing here? I know I could achieve the same results in a simpler way, for instance yielding parsed lines in reader(), but I would really prefer to return this 'decorated' file object instead.
Thanks in advance.
There is a huge difference between your two examples. In the first one you are calling the __next__ method explicitly, while in the latter you are letting the iterator protocol call it for you. In fact you can see that even in the first case the behaviour is not what you wanted:
In [5]: with open('myfile.txt') as f:
...: print(next(reader(f))) # next here calls the original implementation!
normal line
In [6]: with open('myfile.txt') as f:
...: print(reader(f).__next__())
NORMAL LINE
You can see what the interpreter is doing by checking the bytecode using the dis module.
For example:
In [8]: import dis
In [9]: def f():
...: for x in iterable:
...: pass
In [10]: dis.dis(f)
2 0 SETUP_LOOP 14 (to 17)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_FAST 0 (x)
3 13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
Note how there is a call to GET_ITER, but no call to LOAD_ATTR. If you explicitly mention the attribute however:
In [11]: def f():
...: for x in iterable.__iter__():
...: pass
In [12]: dis.dis(f)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (iterable)
6 LOAD_ATTR 1 (__iter__)
9 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (x)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
Note that LOAD_ATTR bytecode.
When you see a LOAD_ATTR bytecode it means that the interpreter is going to perform a full-blown attribute lookup on the instance (and thus finds the attribute you just set).
However bytecodes like GET_ITER perform a special method lookup, which avoids the instance attribute lookup.
When the interpreter calls the special methods as a result of a statement he does not look them up in the instance, but in the class. This means that he will not check the __iter__ attribute you just created.
This is documented in some places. For example under object.__getattribute__, which is the method used to implement attribute lookups, there is a note:
Note: This method may still be bypassed when looking up special
methods as the result of implicit invocation via language syntax or
built-in functions. See Special method lookup.
AFAIK, since files are written in C you cannot modify the attributes of the class, so you simply cannot achieve what you wanted.
However it's extremely easy to simply create a new wrapper class:
class Wrapper:
def __init__(self, fobj):
self.fobj = fobj
def __iter__(self):
return self
def __next__(self):
return parse(next(self.fobj))
An alternative would be to create a subclass of file. In python3 this is a bit complex to do because you have to subclass io.TextIOWrapper, where its constructor takes a buffer instead of a filename, so it's slighlty more involved than python2.
However if you did create a subclass it would work fine. There may be some problems when you pass its instances to some functions, which may decide to call the original file methods, however the interpreter itself would call the __next__ and __iter__ methods that you defined.
Modifying method of instances looks tricky to me, I would try to avoid that. If you just need kind of pre-processing the text file, you can do it in a separate function, like:
def preprocess(f):
for l in f:
yield parse(l)
with open("myfile.txt") as f:
for line in preprocess(f):
print(line)
First of all, let me say that I read the many threads with similar topics on creating dynamically named variables, but they mostly relate to Python 2 or they assume you are working with classes. And yes, I read Behavior of exec function in Python 2 and Python 3.
I'm also aware that creating dynamically named variables is a bad idea in 99% of time and dictionaries are the way to got, but I just want to know whether it is still possible and how exactly exec and locals work in python 3.
I'd like to show a bit of sample code illustrating my question (fibonacci calculates fibonacci numbers, ListOfLetters provides ["A", "B", ...]):
def functionname():
for index, buchstabe in enumerate(ListOfLetters.create_list("A", "K"), 1):
exec("{} = {}".format(buchstabe, fibonacci(index)) ) #A = 1, B = 1, C = 2, D = 3, E = 5,...
print(index, buchstabe, eval(buchstabe)) #works nicely, e.g. prints "4 D 3"
print(locals()) #pritns all locals: {'B': 1, 'A': 1, 'index': 11, 'C': 2, 'H': 21, 'K': 89, ...
print(locals()['K']) #prints 89 as it should
print(eval("K")) #prints 89 as it should
print(K) #NameError: name 'K' is not defined
So at least at my current understanding, there is some inconsistency in the behaviour of locals(), since it contains the variable names added by exec() but the variables are not available in the function.
I would be greatful if someone could explain this and tell whether this is by design or if it is a real inconsistency in the language. Yes, I know that locals should not be modified , but I'm not modifying it, I'm calling exec()...
When you're not sure why something works the way it does in Python, it often can help to put the behavior that you're confused by in a function and then disassemble it from the Python bytecode with the dis module.
Lets start with a simpler version of your code:
def foo():
exec("K = 89")
print(K)
If you run foo(), you'll get the same exception you're seeing with your more complicated function:
>>> foo()
Traceback (most recent call last):
File "<pyshell#167>", line 1, in <module>
foo()
File "<pyshell#166>", line 3, in foo
print(K)
NameError: name 'K' is not defined
Lets disassemble it and see why:
>>> import dis
>>> dis.dis(foo)
2 0 LOAD_GLOBAL 0 (exec)
3 LOAD_CONST 1 ('K = 89')
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
3 10 LOAD_GLOBAL 1 (print)
13 LOAD_GLOBAL 2 (K)
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 POP_TOP
20 LOAD_CONST 0 (None)
23 RETURN_VALUE
The operation that you need to pay attention to is the one labeled "13". This is where the compiler handles looking up K within the last line of the function (print(K)). It is using the LOAD_GLOBAL opcode, which fails because "K" is not a global variable name, rather it's a value in our locals() dict (added by the exec call).
What if we persuaded the compiler to see K as a local variable (by giving it a value before running the exec), so it will know not to look for a global variable that doesn't exist?
def bar():
K = None
exec("K = 89")
print(K)
This function won't give you an error if you run it, but you won't get the expected value printed out:
>>> bar()
None
Lets disassemble to see why:
>>> dis.dis(bar)
2 0 LOAD_CONST 0 (None)
3 STORE_FAST 0 (K)
3 6 LOAD_GLOBAL 0 (exec)
9 LOAD_CONST 1 ('K = 89')
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
15 POP_TOP
4 16 LOAD_GLOBAL 1 (print)
19 LOAD_FAST 0 (K)
22 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
25 POP_TOP
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Note the opcodes used at "3" and "19". The Python compiler uses STORE_FAST and LOAD_FAST to put the value for the local variable K into slot 0 and later fetch it back out. Using numbered slots is significantly faster than inserting and fetching values from a dictionary like locals(), which is why the Python compiler does it for all local variable access in a function. You can't overwrite a local variable in a slot by modifying the dictionary returned by locals() (as exec does, if you don't pass it a dict to use for its namespace).
Indeed, lets try a third version of our function, where we peek into locals again when we have K defined as a regular local variable:
def baz():
K = None
exec("K = 89")
print(locals())
You won't see 89 in the output this time either!
>>> baz()
{"K": None}
The reason you see the old K value in locals() is explained in the function's documentation:
Update and return a dictionary representing the current local symbol table.
The slot that the local variable K's value is stored in was not changed by the exec statement, which only modifies the locals() dict. When you call locals() again, Python "update[s]" the dictionary with the value from the slot, replacing the value stored there by exec.
This is why the docs go on to say:
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
Your exec call is modifying the locals() dict, and you're finding how its changes are not always seen by your later code.
On the exec/eval/locals question
At least on the CPython implementation modifications to the locals() dictionary do not actually change the names in the local scope, which is why it's meant to be used read-only. You can change it, and you can see your changes in the dictionary object, but the actual local scope is not changed.
exec() takes two optional dictionary arguments, a global scope and a local scope. It defaults to globals() and locals(), but since changes to locals() aren't "real" outside of the dictionary, exec() only affects the "real" local scope when globals() is locals(), i.e. in a module outside of any function. (So in your case it's failing because it's inside a function scope).
The "better" way to use exec() in this case is to pass in your own dictionary, then operate on the values in that.
def foo():
exec_scope = {}
exec("y = 2", exec_scope)
print(exec_scope['y'])
foo()
In this case, exec_scope is used as the global and local scope for the exec, and after the exec it will contain {'y': 2, '__builtins__': __builtins__} (the builtins are inserted for you if not present)
If you want access to more globals you could do exec_scope = dict(globals()).
Passing in different global and local scope dictionaries can produce "interesting" behavior.
If you pass the same dictionary into successive calls to exec or eval, then they have the same scope, which is why your eval worked (it implicitly used the locals() dictionary).
On dynamic variable names
If you set the name from a string, what's so wrong about getting the value as a string (i.e. what a dictionary does)? In other words, why would you want to set locals()['K'] and then access K? If K is in your source it's not really a dynamically set name... hence, dictionaries.
I defined three functions that should change a global variable x.
def changeXto1():
global x
x = 1
def changeXto2():
from __main__ import x
x = 2
def changeXto3():
import __main__
__main__.x = 3
x = 0
print x
changeXto1()
print x
changeXto2()
print x
changeXto3()
print x
It gives the result:
0
1
1
3
changeXto1 uses the normal global statement. The result is as expected x == 1. changeXto2 uses from __main__ import to address x. This doesn't work. Afterwards x is still 1. changeXto3 uses import main to address x via __main__.x. The result afterwards is 3 as expected.
Why doesn't from __main__ import work in changeXto2, while import __main__ is working in changeXto3? Why do we need a global statement in Python if we can address global variables also with the __main__ module?
This is related to how Python translate your code to bytecode (the compilation step).
When compiling a function, Python treat all variable that are assigned as local variable and perform an optimisation to reduce the number of name lookup it would have to do. Each local variable get assigned an index, and when the function is called their value will be stored in a stack local array addressed by index. The compiler will emit LOAD_FAST and STORE_FAST opcode to access the variable.
The global syntax indicate instead to the compiler that even if the variable is assigned a value, it should not be considered a local variable, should not be assigned an index. It will instead use LOAD_GLOBAL and STORE_GLOBAL opcode to access the variable. Those opcode are slower since they use the name to do a lookup in possibly many dictionaries (locals, globals).
If a variable is only accessed for reading the value, the compiler always emit LOAD_GLOBAL since it don't know whether it is supposed to be a local or global variable, and thus assume it is a global.
So, in your first function, using global x informs the compiler that you want it to treat the write access to x as writing to a global variable instead of a local variable. The opcodes for the function make it clear:
>>> dis.dis(changeXto1)
3 0 LOAD_CONST 1 (1)
3 STORE_GLOBAL 0 (x)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE
In your third example, you import the __main__ module into a local variable named __main__ and then assign to its x field. Since module are object that store all top-level mapping as fields, you are assigning to the variable x in the __main__ module. And as you found, the __main__ module fields directly map to the values in the globals() dictionary because your code is defined in the __main__ module. The opcodes show that you don't access x directly:
>>> dis.dis(changeXto3)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 0 (None)
6 IMPORT_NAME 0 (__main__)
9 STORE_FAST 0 (__main__)
3 12 LOAD_CONST 2 (3)
15 LOAD_FAST 0 (__main__)
18 STORE_ATTR 1 (x)
21 LOAD_CONST 0 (None)
24 RETURN_VALUE
The second example is interesting. Since you assign a value to the x variable, the compiler assume it is a local variable and does the optimisation. Then, the from __main__ import x does import the module __main__ and create a new binding the value of x in the module __main__ to the local variable named x. This is always the case, from ${module} import ${name} just create a new binding the current namespace. When you assign a new value to the variable x you just change the current binding, not the binding in module __main__ that is unrelated (though if the value is mutable, and you mutate it, the change will be visible through all the bindings). Here are the opcodes:
>>> dis.dis(f2)
2 0 LOAD_CONST 1 (-1)
3 LOAD_CONST 2 (('x',))
6 IMPORT_NAME 0 (__main__)
9 IMPORT_FROM 1 (x)
12 STORE_FAST 0 (x)
15 POP_TOP
3 16 LOAD_CONST 3 (2)
19 STORE_FAST 0 (x)
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
A good way to think about this is that in Python all assignment are binding a name to a value in a dictionary, and dereference is just doing a dictionary lookup (this is a rough approximation, but pretty close to the conceptual model). When doing obj.field, then you are looking up the hidden dictionary of obj (accessible via obj.__dict__) for the "field" key.
When you have a naked variable name, then it is looked up in the locals() dictionary, then the globals() dictionary if it is different (they are the same when the code is executed at a module level). For an assignment, it always put the binding in the locals() dictionary, unless you declared that you wanted a global access by doing global ${name} (this syntax also works at top-level).
So translating your function, this is almost if you had written:
# NOTE: this is valid Python code, but is less optimal than
# the original code. It is here only for demonstration.
def changeXto1():
globals()['x'] = 1
def changeXto2():
locals()['x'] = __import__('__main__').__dict__['x']
locals()['x'] = 2
def changeXto3():
locals()['__main__'] = __import__('__main__')
locals()['__main__'].__dict__['x'] = 3
Why doesn't from __main__ import work in changeXto2, while import __main__ is working in changeXto3?
It works fine, it just doesn't do what you want. It copies the name and value into the local namespace instead of having the code access __main__'s namespace.
Why do we need a global statement in Python if we can address global variables also with the __main__ module?
Because they only do the same thing when your code is running in __main__. If you're running in, say, othermodule after importing it, then __main__ will refer to the main script and not othermodule.
So AFAIK in CPython, function definitions are compiled into function objects when executed at parse time. But what about inner functions? Do they get compiled into function objects at parse time or do they get compiled (or interpreted) every single time the function is called? Do inner functions incur any performance penalty at all?
To give a general explaination - assuming you have the following code in a module:
def outer(x=1):
def inner(y=2):
return x+y
When the file is parsed by python via compile(), the above text is turned into bytecode for how to execute the module. In the module bytecode, there are two "code objects", one for the bytecode of outer() and one for the bytecode inner(). Note that I said code objects, not functions - the code objects contain little more than the bytecode used by the function, and any information that could be known at compile time - such as the bytecode for outer() containing a ref to the bytecode for inner().
When the module actually loads, by evaluating the code object associated with the module, one thing which happens is an actual "function object" is created for outer(), and stored in the module's outer attribute. The function object acts as a collection of the bytecode and all context-relavant things that are needed to call the function (eg which globals dict it should pull from, etc) that can't be known at compile time. In a way, a code object is a template for a function, which is a template for execution of the actual bytecode with all variables filled in.
None of this involved inner()-as-a-function yet - Each time you actually get around to calling outer(), that's when a new inner() function object is created for that invocation of outer, which binds the already-created inner bytecode object to a list of globals, including the value of x as passed into that call to outer. As you can imagine, this is pretty fast, since no parsing is needed, just filling in a quick struct with some pointers to other already-existing objects.
Easy test: the default arguments to a function are called once, at define time.
>>> def foo():
... def bar(arg=count()):
... pass
... pass
...
>>> def count():
... print "defined"
...
>>> foo()
defined
>>> foo()
defined
So yes: this is a minor (very very! minor) performance hit.
>>> import dis
>>> def foo():
... def bar():
... print "stuff"
... return bar
...
>>> b = foo()
>>> dis.dis(foo)
2 0 LOAD_CONST 1 (<code object bar at 0x20bf738, file "<stdin>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_FAST 0 (bar)
4 9 LOAD_FAST 0 (bar)
12 RETURN_VALUE
>>> dis.dis(b)
3 0 LOAD_CONST 1 ('stuff')
3 PRINT_ITEM
4 PRINT_NEWLINE
5 LOAD_CONST 0 (None)
8 RETURN_VALUE
I suspect this is heavily implementation dependent, but that was CPython 2.6.6, and the inner function looks like it was compiled. Here's another example:
>>> def foo():
... def bar():
... return 1
... return dis.dis(bar)
...
>>> foo()
3 0 LOAD_CONST 1 (1)
3 RETURN_VALUE
So we can conclude that they are compiled. As for their performance characteristics, use them. If you start having performance issues, profile. I know it's not really an answer, but it almost never matters and when it does, general answers don't cut it. Function calls incur some overhead and it looks like inner functions are just like functions.
To extend nmichaels answer inner function are compiled in compile time as he guessed and there byte code is saved in the foo.func_code.co_consts and they are accessed using the opcode LOAD_CONST as you can see in the disassembly of the function.
Example:
>>> def foo():
... def inner():
... pass
>>> print foo.func_code.co_consts
(None, <code object inner at 0x249c6c0, file "<ipython console>", line 2>)
I'm late on this, but as a little experimental complement to these thorough answers: you may use the builtin function id to verify whether a new object is created or not:
In []: # inner version
def foo():
def bar():
return id(bar)
return bar()
foo(), foo()
Out[]: (4352951432, 4352952752)
The actual numbers may differ, but their difference indicates that two distinct instances of bar are indeed created.
In []: # outer version
def bar():
return id(bar)
def foo():
return bar()
foo(), foo()
Out[]: (4352950952, 4352950952)
This time, as expected, the two ids are the same.
Now for some timeit measurements. Inner first, outer second:
100000 loops, best of 3: 1.93 µs per loop
1000000 loops, best of 3: 1.25 µs per loop
So, on my machine, it seems that the inner version is 50% slower (Python 2.7, IPython Notebook).