Borg pattern unexpected behaviour when used in two different modules - python

I am using the Borg pattern with mutual inclusion of modules. See the example code (not the real code but it shows the problem) below. In this case, I have two different Borgs because the class names (and I guess the class) are seen as different by the interpreter.
Is there a way to use the Borg in that case without reworking the module architecture?
Module borg.py
import borg2
class Borg:
_we_are_one = {}
def __init__(self):
self.__dict__ = Borg._we_are_one
try:
self.name
except AttributeError:
self.name = "?"
print self.__class__, id(self.__dict__)
def fct_ab():
a = Borg()
a.name = "Bjorn"
b = Borg()
print b.name
if __name__ == "__main__":
fct_ab()
borg2.fct_c()
Module borg2.py
import borg
def fct_c():
c = borg.Borg()
print c.name
The result is
__main__.Borg 40106720
__main__.Borg 40106720
Bjorn
borg.Borg 40106288
?
In order to clarify my problem:
Why does Python consider __main__.Borg and borg.Borg as two different classes?

After a long day of struggling with Singletons and Borg, my conclusion is the following:
It seems that a Python module imported multiple times using different 'import paths' is actually imported multiple times. If that module contains a singleton, you get multiple instances.
Example:
myproject/
module_A
some_folder/
module_B
module_C
If module_A imports module_C using from myproject.some_folder import module_C and module_B imports the same module_C using import module_C, the module is actually imported twice (at least according to my observations). Usually, this doesn't matter, but for singletons or borg, you actually get 2 instances of what should be unique. (That's 2 sets of borgs sharing 2 different internal states).
Solution: Give yourself an import statement convention and stick to it: I import all modules starting from a common root folder, even if the module file is located parallel to the one I am working on, so in the example above, both module_A and module_B import module_C using from myproject.some_folder import module_C.

The problem only occurs in your main-function. Move that code
to its own file and everything is as you'd expect. This code
import borg
import borg2
if __name__ == "__main__":
borg.fct_ab()
borg2.fct_c()
delivers this output:
borg.Borg 10438672
borg.Borg 10438672
Bjorn
borg.Borg 10438672
Bjorn

It's not the class names that is the problem. I'm not entirely sure why Python see the Borg class and the borg.Borg class as different, perhaps it's because you run this from __main__, I think python does not realize that __main__ and borg is the same module.
The solution is easy. Change fct_ab to:
def fct_ab():
import borg
a = borg.Borg()
a.name = "Bjorn"
b = borg.Borg()
print b.name
This solves the problem.

I've fixed the issue in my real application by fixing an error in the import.
In fact, I have two different modules using the same 3rd module.
The 1st one was importing mypackage.mymodule while the 2nd one was importing mymodule. mypackage is installed as a python egg and the code I was working on is on my development folder.
So both codes were importing different modules and I guess that it is normal to have two different classes in this case.
Regarding the example code I've used, the problem comes from the current modules to receive the main as name. I've tried to rename by doing __name__ = 'borg'. It works but it breaks the if __name__ == "__main__" condistion. As a conclusion, I would say that mutual inclusion must be avoid and is in most cases not necessary.
Thanks all for your help.

The solution --- as has already been mentioned --- is to avoid a recursive import of the main module, but borg.py is not being "imported twice". The problem is that importing it at all while it is already executing causes you define the Borg class twice, in two different namespaces.
To demonstrate, I added a few lines to the top of both borg.py and borg2.py, and inserted my print_module function before and after most points of interest:
#!/usr/bin/env python2
from __future__ import print_function
def print_module(*args, **kwargs):
print(__name__ + ': ', end='')
print(*args, **kwargs)
return
print_module('Importing module borg2...')
import borg2
print_module('Module borg2 imported.')
print_module('Defining class Borg...')
class Borg:
...
# etc.
The output is:
__main__: Importing module borg2...
borg2: Importing module borg...
borg: Importing module borg2...
borg: Module borg2 imported.
borg: Defining class Borg...
borg: id(_we_are_one) = 17350480
borg: Class Borg defined.
borg: id(Borg) = 139879572980464
borg: End of borg.py.
borg2: Module borg imported.
borg2: End of borg2.py.
__main__: Module borg2 imported.
__main__: Defining class Borg...
__main__: id(_we_are_one) = 17351632
__main__: Class Borg defined.
__main__: id(Borg) = 139879572981136
__main__: Borg 17351632
__main__: Borg 17351632
__main__: Bjorn
borg: Borg 17350480
borg2: ?
__main__: End of borg.py.
The first thing borg.py does (not counting the bits I added) is import borg2 into the __main__ namespace. This happens before the Borg class is defined anywhere.
The first thing borg2 does is import borg, which again attempts to import borg2... and Python refuses to do so. (Note that nothing happens between lines 3 and 4.) borg finally defines the Borg class and the fct_ab function in the borg namespace, and exits.
borg2 then defines fct_c and exits ("borg2: End of borg2.py."). All the import statements are done.
Now, borg.py finally gets to execute "for real". Yes, it already ran once when it was imported, but this is still the "first" time through the borg.py file. The Borg class gets defined again, this time in the __main__ namespace, and both the class and its dictionary have new IDs.
borg.py was not "imported twice". It was executed once from the command line, and it was executed once when it was imported. Since these happened in two different namespaces, the "second" definition of Borg did not replace the first, and the two functions modified two different classes, which just happened to be created from the same code.

Related

Excluding modules when importing everything in __init__.py

Problem
Consider the following layout:
package/
main.py
math_helpers/
mymath.py
__init__.py
mymath.py contains:
import math
def foo():
pass
In main.py I want to be able to use code from mymath.py like so:
import math_helpers
math_helpers.foo()
In order to do so, __init__.py contains:
from .mymath import *
However, modules imported in mymath.py are now in the math_helpers namespace, e.g. math_helpers.math is accessible.
Current approach
I'm adding the following at the end of mymath.py.
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
This seems to work, but is it the correct approach?
On the one hand there are many good reasons not to do star imports, but on the other hand, python is for consenting adults.
__all__ is the recommended approach to determining what shows up in a star import. Your approach is correct, and you can further sanitize the namespace when finished:
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
del types
While less recommended, you can also sanitize elements directly out of the module, so that they don't show up at all. This will be a problem if you need to use them in a function defined in the module, since every function object has a __globals__ reference that is bound to its parent module's __dict__. But if you only import math_helpers to call math_helpers.foo(), and don't require a persistent reference to it elsewhere in the module, you can simply unlink it at the end:
del math_helpers
Long Version
A module import runs the code of the module in the namespace of the module's __dict__. Any names that are bound at the top level, whether by class definition, function definition, direct assignment, or other means, live in the that dictionary. Sometimes, it is desirable to clean up intermediate variables, as I suggested doing with types.
Let's say your module looks like this:
test_module.py
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import types
__all__ = [name for name, thing in globals().items()
if not (name.startswith('_') or isinstance(thing, types.ModuleType))]
In this case, __all__ will be ['x', 'A']. However, the module itself will contain the following names: 'math', 'np', 'x', 'A', 'types', '__all__'.
If you run del types at the end, it will remove that name from the namespace. Clearly this is safe because types is not referenced anywhere once __all__ has been constructed.
Similarly, if you wanted to remove np by adding del np, that would be OK. The class A is fully constructed by the end of the module code, so it does not require the global name np to reference its parent class.
Not so with math. If you were to do del math at the end of the module code, the function x would not work. If you import your module, you can see that x.__globals__ is the module's __dict__:
import test_module
test_module.__dict__ is test_module.x.__globals__
If you delete math from the module dictionary and call test_module.x, you will get
NameError: name 'math' is not defined
So you under some very special circumstances you may be able to sanitize the namespace of mymath.py, but that is not the recommended approach as it only applies to certain cases.
In conclusion, stick to using __all__.
A Story That's Sort of Relevant
One time, I had two modules that implemented similar functionality, but for different types of end users. There were a couple of functions that I wanted to copy out of module a into module b. The problem was that I wanted the functions to work as if they had been defined in module b. Unfortunately, they depended on a constant that was defined in a. b defined its own version of the constant. For example:
a.py
value = 1
def x():
return value
b.py
from a import x
value = 2
I wanted b.x to access b.value instead of a.value. I pulled that off by adding the following to b.py (based on https://stackoverflow.com/a/13503277/2988730):
import functools, types
x = functools.update_wrapper(types.FunctionType(x.__code__, globals(), x.__name__, x.__defaults__, x.__closure__), x)
x.__kwdefaults__ = x.__wrapped__.__kwdefaults__
x.__module__ = __name__
del functools, types
Why am I telling you all this? Well, you can make a version of your module that does not have any stray names in your namespace. You won't be able to see changes to global variables in your functions though. This is just an exercise in pushing python beyond its normal usage. I highly don't recommend doing this, but here is a sample module that effectively freezes its __dict__ as far as the functions are concerned. This has the same members as test_module above, but with no modules in the global namespace:
import math
import numpy as np
def x(n):
return math.sqrt(n)
class A(np.ndarray):
pass
import functools, types, sys
def wrap(obj):
""" Written this way to be able to handle classes """
for name in dir(obj):
if name.startswith('_'):
continue
thing = getattr(obj, name)
if isinstance(thing, FunctionType) and thing.__module__ == __name__:
setattr(obj, name,
functools.update_wrapper(types.FunctionType(thing.func_code, d, thing.__name__, thing.__defaults__, thing.__closure__), thing)
getattt(obj, name).__kwdefaults__ = thing.__kwdefaults__
elif isinstance(thing, type) and thing.__module__ == __name__:
wrap(thing)
d = globals().copy()
wrap(sys.modules[__name__])
del d, wrap, sys, math, np, functools, types
So yeah, please don't ever do this! But if you do, stick it in a utility class somewhere.

Python Importing with OOP

This question concerns when you should have imports for Python modules and how it all interacts when you are trying to take an OOP approach to what you're making.
Let's say we have the following Modules:
ClassA.py:
class Class_A:
def doSomething(self):
#doSomething
ClassB.py
class Class_B:
def doSomethingElse(self):
#doSomethingElse
ClassC.py
class Class_C:
def __init__(self, ClassAobj, ClassBobj):
self.a = ClassAobj
self.b = ClassBobj
def doTheThing(self):
self.a.doSomething()
self.b.doSomethingElse()
Main.py:
from ClassA import Class_A
from ClassB import Class_B
from ClassC import Class_C
a = Class_A()
b = Class_B()
c = Class_C(a,b)
In here Class_C uses objects of Class_A and Class_B however it does not have import statements for those classes. Do you see this creating errors down the line, or is this fine? Is it bad practice to do this?
Would having imports for Class_A and Class_B inside of Class_C cause the program as a whole to use more memory since it would be importing them for both Main.py and ClassC.py? Or will the Python compiler see that those modules have already been imported and just skip over them?
I'm just trying to figure out how Python as a language ticks with concerns to importing and using modules. Basically, if at the topmost level of your program (your Main function) if you import everything there, would import statements in other modules be redundant?
You don't use Class_A or Class_B directly in Class_C, so you don't need to import them there.
Extra imports don't really use extra memory, there is only a single instance of each module in memory. Import just creates a name for the module in the current module namespace.
In Python, it's not idiomatic to have a single class per file. It's normal to have closely related classes all in the same file. A module name "ClassA" looks silly, that is the name of a class, not of a module.
You can only use a module inside another one if it's imported there. For instance the sys module is probably already in memory after Python starts, as so many things use it, including import statements.
An import foo statement does two things:
If the foo module is not in memory yet, it is loaded, parsed, executed and then placed in sys.modules['foo'].
A local name foo is created that also refers to the module in sys.modules.
So if you have say a print() in your module (not inside a function), then that is only executed the first time the module is imported.
Then later statements after the import can do things with foo, like foo.somefunc() or print(foo.__name__).
C does not need the import statements; all it uses is a pair of object handles (i.e. pointers). As long as it does not try to access any method or attribute of those objects, the pure assignment is fine. If you do need such additions, then you need the import statements.
This will not cause additional memory usage in Main: Python checks (as do most languages) packages already imported, and will not import one multiple times. Note that this sometimes means that you have to be careful of package dependencies and importation order.
Importing a module does two things: it executes the code stored in the module, and it adds name bindings to the module doing the importing. ClassC.py doesn't need to import ClassA or ClassB because it doesn't know or care what types the arguments to ClassC.__init__ have, as long as they behave properly when used. Any references to code needed by either object is stored in the object itself.

Python Importing - explaination

Similar Question: Understanding A Chain of Imports in Python
NB: I'm using Python 3.3
I have setup the following two files in the same directory to explain importing to myself, however I still don't get exactly what it's doing. I understand function and class definitions are statements that need to run.
untitled.py:
import string
class testing:
def func(self):
try:
print(string.ascii_lowercase)
except:
print('not imported')
class second:
x=1
print('print statement in untitled executed')
stuff.py:
from untitled import testing
try:
t=testing()
t.func()
except NameError:
print('testing not imported')
try:
print(string.ascii_uppercase)
except NameError:
print('string not imported')
try:
print(untitled.string.ascii_uppercase)
except NameError:
print('string not imported in untitled')
try:
s=second()
print(s.x)
except NameError:
print('second not imported')
This is the output I get from running stuff.py:
print statement in untitled executed
abcdefghijklmnopqrstuvwxyz
string not imported
string not imported in untitled
second not imported
The print statement in untitled.py is executed despite the import in stuff.py specifying only the testing class. Moreover what is the string module's relation inside stuff.py, as it can be called from within the testing class yet not from the outside.
Could somebody please explain this behaviour to me, what exactly does a "from import" statment do (what does it run)?
You can think of python modules as namespaces. Keep in mind that imports are not includes:
modules are only imported once
the first time, the top level code is executed
any imports, variable, function or class declarations affects only the module local namespace
Suppose you have a module called foo.py:
import eggs
bar = "Lets drink, it's a bar'
So when you do a from foo import bar in another module, you will make bar available in the current namespace. The module eggs will be available under foo.eggs if you do an import foo. If you do a from foo import *, then eggs, bar and everything else in the module namespace will be also in the current namespace - but never do that, wildcard imports are frowned upon in Python.
If you do a import foo and then import eggs, the top level code at eggs will be executed once and the module namespace will be stored in the module cache: if another module imports it the information will be pulled from this cache. If you are going to use it, then import it - no need to worry about multiple imports executing the top level code multiple times.
Python programmers are very fond of namespaces; I always try to use import foo and then foo.bar instead of from foo import bar if possible - it keeps the namespace clean and prevent name clashes.
That said, the import mechanism is hackable, you can make python import statement work even with files that are not python.
The from statement isn't any different to import with regard to loading behaviour. Always the top level code is executed, when loading the module. from just controls which parts of the loaded module are being added to the current scope (the first point is most important):
The from form uses a slightly more complex process:
find the module specified in the from clause loading and initializing it if necessary;
for each of the identifiers specified in the import clauses:
check if the imported module has an attribute by that name
if not, attempt to import a submodule with that name and then check the imported module again for that attribute
if the attribute is not found, ImportError is raised.
otherwise, a reference to that value is bound in the local namespace, using the name in the as clause if it is present, otherwise using the attribute name
Thus you can access the contents of a module partially imported with from with this inelegant trick:
print(sys.modules['untitled'].string.ascii_uppercase)
In your first file (untitled.py), when python compiler parses(since you called it in import) this file It will create 2 class code objects and execute the print statement. Note that it will even print it if you run untitled.py from command line.
In your second file(stuff.py), to add to #Paulo comments, you have only imported testing class in your namspace, so only that will be available, from the 2 code objects from untitled.py
However if you just say
import untitled
your 3rd "try" statement will work, since it will have untitled in its namespace.
Next thing. try importing untitled.testing :)

Python Etiquette: Importing Modules

Say I have two Python modules:
module1.py:
import module2
def myFunct(): print "called from module1"
module2.py:
def myFunct(): print "called from module2"
def someFunct(): print "also called from module2"
If I import module1, is it better etiquette to re-import module2, or just refer to it as module1.module2?
For example (someotherfile.py):
import module1
module1.myFunct() # prints "called from module1"
module1.module2.myFunct() # prints "called from module2"
I can also do this: module2 = module1.module2. Now, I can directly call module2.myFunct().
However, I can change module1.py to:
from module2 import *
def myFunct(): print "called from module1"
Now, in someotherfile.py, I can do this:
import module1
module1.myFunct() # prints "called from module1"; overrides module2
module1.someFunct() # prints "also called from module2"
Also, by importing *, help('module1') shows all of the functions from module2.
On the other hand, (assuming module1.py uses import module2), I can do:
someotherfile.py:
import module1, module2
module1.myFunct() # prints "called from module1"
module2.myFunct() # prints "called from module2"
Again, which is better etiquette and practice? To import module2 again, or to just refer to module1's importation?
Quoting the PEP 8 style guide:
When importing a class from a class-containing module, it's usually okay to spell this:
from myclass import MyClass
from foo.bar.yourclass import YourClass
If this spelling causes local name clashes, then spell them
import myclass
import foo.bar.yourclass
Emphasis mine.
Don't use module1.module2; you are relying on the internal implementation details of module1, which may later change what imports it is using. You can import module2 directly, so do so unless otherwise documented by the module author.
You can use the __all__ convention to limit what is imported from a module with from modulename import *; the help() command honours that list as well. Listing the names you explicitly export in __all__ helps clean up the help() text presentation:
The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. If __all__ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_'). __all__ should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).
Just import module2. Re-importing is relatively costless, since Python caches module objects in sys.modules.
Moreover, chaining dots as in module1.module2.myFunct is a violation of the Law of Demeter. Perhaps some day you will want to replace module1 with some other module module1a which does not import module2. By using import module2, you will avoid having to rewrite all occurrences of module1.module2.myFunct.
from module2 import * is generally a bad practice since it makes it hard to trace where variables come from. And mixing module namespaces can create variable-name conflicts. For example, from numpy import * is a definite no-no, since doing so would override Python's builtin sum, min, max, any, all, abs and round.

Python import as global name not defined

I have an application that runs on Postgres & Mysql. Each program checks to determine the database and then imports either postgres_db as db_util or mysql_dt as db_util. All works well when code in main references db_util, but if a class is imported, the reference to db_util is not defined.
I created the following to classes and main to test the problem and found another interesting side effect. Classes B & C reference ClassA under different import cases. B & C are identical except B is in main and C is imported.
ClassX.py
class ClassA(object):
def print_a(self):
print "this is class a"
class ClassC(object):
def ref_a(self):
print 'from C ref a ==>',
xa=ClassA()
xa.print_a()
def ref_ca(self):
print 'from C ref ca ==>',
xa=ca()
xa.print_a()
test_scope.py
from classes.ClassX import ClassA
from classes.ClassX import ClassA as ca
from classes.ClassX import ClassC as cb
class ClassB(object):
def ref_a(self):
print 'from B ref a ==>',
xa=ClassA()
xa.print_a()
def ref_ca(self):
print 'from B ref ca ==>',
xa=ca()
xa.print_a()
print 'globals:',dir()
print 'modules','ca:',ca,'cb:',cb,'CA:',ClassA
print ''
print 'from main'
xb=ClassB()
xb.ref_a()
xb.ref_ca()
print ''
print 'from imports'
xbs=cb()
xbs.ref_a()
xbs.ref_ca()
And the results:
globals: ['ClassA', 'ClassB', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'ca', 'cb']
modules ca: <class 'classes.ClassX.ClassA'> cb: <class 'classes.ClassX.ClassC'> CA: <class 'classes.ClassX.ClassA'>
from main
from B ref a ==> this is class a
from B ref ca ==> this is class a
from imports
from C ref a ==> this is class a
from C ref ca ==>
Traceback (most recent call last):
File "test_scope.py", line 32, in <module>
xbs.ref_ca()
File "R:\python\test_scripts\scope\classes\ClassX.py", line 13, in ref_ca
xa=ca()
NameError: global name 'ca' is not defined
Press any key to continue . . .
From my test, I see that the object ca (imported as) is not available to the ClassC, however, the module ClassA is available (imported without as).
Why the difference between import and import as behavior? I am unclear why mains globals are not available to classes main imports.
What is a good approach to dynamically determine appropriate db_util module to import and have it accessible to other imported classes?
Update:
After reading yet another post on Namespaces: "Visibility of global variables from imported modules", I understand that in my example above the reason classA is available to ClassC is that A & C are in the same imported file, thus the same namespace.
So the remaining question is a design question:
if I have code like this:
if db == 'MySQL':
from mysql_db import db_util
elif db == 'Postgres'
from postgres_db import db_util
What is a good approach to make db_util available to all imported modules?
UPDATE:
from the reponse by Blckknght, I added the code
cb.ca =ca
to the scope_test script. This requires the class call to xa=ca() be changed to xa=self.ca(). I also think that adding objects to a class from outside the class, though Python allows it, is not a good design methodology and will make debugging a nightmare.
However, since I think modules and classes should be standalone or specifically declare their dependencies, I am going to implement the class like this, using the code sample from above.
break out ClassA and ClassC to separate modules and at the top of ClassC, before the class definition, do the imports
from ClassA import ClassA
from ClassA import ClassA as ca
class ClassB(object):
and in my real situation, where I need to import the db_util module into several modules
ci.py #new module to select class for appropriate db
if db == 'MySQL':
from mysql_db import db_util
elif db == 'Postgres'
from postgres_db import db_util
in each module needing the db_util class
import ci
db_util=ci.db_util #add db_util to module globals
class Module(object):
One problem with this is it requires each module using the db_util to import it, but it does make the dependencies known.
I will close this question and want to thank Blckknght and Armin Rigo for their responses which help clarify this issue for me. I would appreciate any design related feedback.
In Python, each module has it's own global namespace. When you do an import, you're only adding the imported modules to the current module's namespace, not to the namespace of any other module. If you want to put it in another namespace, you need to tell Python this explicitly.
main.py:
if db == "mysql": # or whatever your real logic is
import mysql_db as db_util
elif db == "postgres":
import postgres_db as db_util
import some_helper_module
some_helper_module.db_util = db_util # explicitly add to another namespace
#...
Other modules:
import some_helper_module
db = some_helper_module.db_util.connect() # or whatever the real API is
#...
Note that you can't usually use your main module (which is executed as a script) as a shared namespace. That's because Python uses the module's __name__ attribute to determine how to cache the module (so that you always get the same object from multiple imports), but a script is always given a __name__ of "__main__" rather than its real name. If another module imports main, they'll get a separate (duplicate) copy!
You're approaching the problem with the wrong point of view. Every module is a namespace that starts empty and is filled with a name for (typically) each statement it runs:
import foo => defines foo
from foo import bar => defines bar
from foo import bar as baz => defines baz
class kls:
pass => defines kls
def fun():
pass => defines fun
var = 6 * 7 => defines var
Looking at ClassX.py we see that the name ca is not defined in this module, but ClassA and ClassC are. So the execution of the line xa=ca() from ClassX.py fails.
In general, the idea is that every single module imports what it needs. You can also patch a name "into" a module from outside, but this is generally considered very bad style (reserved for very special cases).

Categories

Resources