How to debug a sys.modules entry that gets overwritten?

How to debug a sys.modules entry that gets overwritten? - python

I'm trying to debug an issue that causes sys.modules['numpy'] to get overwritten. I've added some print statements to numpy.__init__, and when I try to import numpy, I get this output:
numpy.__init__ running
id(sys.modules) = 89034704
id(sys.modules['numpy']) = 161528304
numpy.__init__ running
id(sys.modules) = 89034704
id(sys.modules['numpy']) = 177135864
Numpy has a number of circular imports, which should work as described in this answer. But in my case, instead of getting the partially initialized numpy module from sys.modules, numpy gets imported again, and numpy.__init__ gets executed a second time, leading to a crash.
How can I instrument sys.modules to get some visibility into who is overwriting sys.modules['numpy'] and when? Normally I would write a dict subclass, but I don't think it's safe to change sys.modules to point to my own object. I tried overriding sys.modules.__setattr__, but that's a read-only attribute.
Context: I'm trying to debug this issue in PyCall, a Julia library. PyCall embeds a Python interpreter in a running Julia process, and delegates the import to PyImport_ImportModule from cpython. The problem above happens inside a single call to PyImport_ImportModule, so I hope this question should be answerable with knowledge of python / cpython, but without knowledge of Julia / PyCall.

You can change sys.modules from a plain dict to one that prints out assignments, e.g:
import sys
import traceback
class noisydict(dict):
def __setitem__(self, key, value):
print('ASSIGNED: key={!r} value={!r} at:'.format(key, value))
traceback.print_stack()
return dict.__setitem__(self, key, value)
sys.modules = noisydict(sys.modules)
This may or may not work if the overwriting happens in C code (such code may directly access the underlying dict.__setitem__ rather than just do a sys.modules[name] = newmodule as Python code would) but it's worth a try!

Thanks to #BrenBarn for pointing me to https://stackoverflow.com/a/14778568/744071. The following worked for my purposes:
importhack.py:
import traceback
old_import = __import__
def my_import(module, *args, **kwargs):
print "my_import({}) caused by:".format(module)
traceback.print_stack()
return old_import(module, *args, **kwargs)
__builtins__['__import__'] = my_import
Usage:
>>> import importhack
>>> import numpy
I believe the original problem in PyCall.jl was caused by calling PyImport_ImportModule before the Python interpreter was fully initialized.

Related

Python global variable in import * [duplicate]

I've run into a bit of a wall importing modules in a Python script. I'll do my best to describe the error, why I run into it, and why I'm tying this particular approach to solve my problem (which I will describe in a second):
Let's suppose I have a module in which I've defined some utility functions/classes, which refer to entities defined in the namespace into which this auxiliary module will be imported (let "a" be such an entity):
module1:
def f():
print a
And then I have the main program, where "a" is defined, into which I want to import those utilities:
import module1
a=3
module1.f()
Executing the program will trigger the following error:
Traceback (most recent call last):
File "Z:\Python\main.py", line 10, in <module>
module1.f()
File "Z:\Python\module1.py", line 3, in f
print a
NameError: global name 'a' is not defined
Similar questions have been asked in the past (two days ago, d'uh) and several solutions have been suggested, however I don't really think these fit my requirements. Here's my particular context:
I'm trying to make a Python program which connects to a MySQL database server and displays/modifies data with a GUI. For cleanliness sake, I've defined the bunch of auxiliary/utility MySQL-related functions in a separate file. However they all have a common variable, which I had originally defined inside the utilities module, and which is the cursor object from MySQLdb module.
I later realised that the cursor object (which is used to communicate with the db server) should be defined in the main module, so that both the main module and anything that is imported into it can access that object.
End result would be something like this:
utilities_module.py:
def utility_1(args):
code which references a variable named "cur"
def utility_n(args):
etcetera
And my main module:
program.py:
import MySQLdb, Tkinter
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
And then, as soon as I try to call any of the utilities functions, it triggers the aforementioned "global name not defined" error.
A particular suggestion was to have a "from program import cur" statement in the utilities file, such as this:
utilities_module.py:
from program import cur
#rest of function definitions
program.py:
import Tkinter, MySQLdb
db=MySQLdb.connect(#blahblah) ; cur=db.cursor() #cur is defined!
from utilities_module import *
But that's cyclic import or something like that and, bottom line, it crashes too. So my question is:
How in hell can I make the "cur" object, defined in the main module, visible to those auxiliary functions which are imported into it?
Thanks for your time and my deepest apologies if the solution has been posted elsewhere. I just can't find the answer myself and I've got no more tricks in my book.

Globals in Python are global to a module, not across all modules. (Many people are confused by this, because in, say, C, a global is the same across all implementation files unless you explicitly make it static.)
There are different ways to solve this, depending on your actual use case.
Before even going down this path, ask yourself whether this really needs to be global. Maybe you really want a class, with f as an instance method, rather than just a free function? Then you could do something like this:
import module1
thingy1 = module1.Thingy(a=3)
thingy1.f()
If you really do want a global, but it's just there to be used by module1, set it in that module.
import module1
module1.a=3
module1.f()
On the other hand, if a is shared by a whole lot of modules, put it somewhere else, and have everyone import it:
import shared_stuff
import module1
shared_stuff.a = 3
module1.f()
… and, in module1.py:
import shared_stuff
def f():
print shared_stuff.a
Don't use a from import unless the variable is intended to be a constant. from shared_stuff import a would create a new a variable initialized to whatever shared_stuff.a referred to at the time of the import, and this new a variable would not be affected by assignments to shared_stuff.a.
Or, in the rare case that you really do need it to be truly global everywhere, like a builtin, add it to the builtin module. The exact details differ between Python 2.x and 3.x. In 3.x, it works like this:
import builtins
import module1
builtins.a = 3
module1.f()

As a workaround, you could consider setting environment variables in the outer layer, like this.
main.py:
import os
os.environ['MYVAL'] = str(myintvariable)
mymodule.py:
import os
myval = None
if 'MYVAL' in os.environ:
myval = os.environ['MYVAL']
As an extra precaution, handle the case when MYVAL is not defined inside the module.

This post is just an observation for Python behaviour I encountered. Maybe the advices you read above don't work for you if you made the same thing I did below.
Namely, I have a module which contains global/shared variables (as suggested above):
#sharedstuff.py
globaltimes_randomnode=[]
globalist_randomnode=[]
Then I had the main module which imports the shared stuff with:
import sharedstuff as shared
and some other modules that actually populated these arrays. These are called by the main module. When exiting these other modules I can clearly see that the arrays are populated. But when reading them back in the main module, they were empty. This was rather strange for me (well, I am new to Python). However, when I change the way I import the sharedstuff.py in the main module to:
from globals import *
it worked (the arrays were populated).
Just sayin'

A function uses the globals of the module it's defined in. Instead of setting a = 3, for example, you should be setting module1.a = 3. So, if you want cur available as a global in utilities_module, set utilities_module.cur.
A better solution: don't use globals. Pass the variables you need into the functions that need it, or create a class to bundle all the data together, and pass it when initializing the instance.

The easiest solution to this particular problem would have been to add another function within the module that would have stored the cursor in a variable global to the module. Then all the other functions could use it as well.
module1:
cursor = None
def setCursor(cur):
global cursor
cursor = cur
def method(some, args):
global cursor
do_stuff(cursor, some, args)
main program:
import module1
cursor = get_a_cursor()
module1.setCursor(cursor)
module1.method()

Since globals are module specific, you can add the following function to all imported modules, and then use it to:
Add singular variables (in dictionary format) as globals for those
Transfer your main module globals to it
.
addglobals = lambda x: globals().update(x)
Then all you need to pass on current globals is:
import module
module.addglobals(globals())

Since I haven't seen it in the answers above, I thought I would add my simple workaround, which is just to add a global_dict argument to the function requiring the calling module's globals, and then pass the dict into the function when calling; e.g:
# external_module
def imported_function(global_dict=None):
print(global_dict["a"])
# calling_module
a = 12
from external_module import imported_function
imported_function(global_dict=globals())
>>> 12

The OOP way of doing this would be to make your module a class instead of a set of unbound methods. Then you could use __init__ or a setter method to set the variables from the caller for use in the module methods.

Update
To test the theory, I created a module and put it on pypi. It all worked perfectly.
pip install superglobals
Short answer
This works fine in Python 2 or 3:
import inspect
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
save as superglobals.py and employ in another module thusly:
from superglobals import *
superglobals()['var'] = value
Extended Answer
You can add some extra functions to make things more attractive.
def superglobals():
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals
def getglobal(key, default=None):
"""
getglobal(key[, default]) -> value
Return the value for key if key is in the global dictionary, else default.
"""
_globals = dict(inspect.getmembers(
inspect.stack()[len(inspect.stack()) - 1][0]))["f_globals"]
return _globals.get(key, default)
def setglobal(key, value):
_globals = superglobals()
_globals[key] = value
def defaultglobal(key, value):
"""
defaultglobal(key, value)
Set the value of global variable `key` if it is not otherwise st
"""
_globals = superglobals()
if key not in _globals:
_globals[key] = value
Then use thusly:
from superglobals import *
setglobal('test', 123)
defaultglobal('test', 456)
assert(getglobal('test') == 123)
Justification
The "python purity league" answers that litter this question are perfectly correct, but in some environments (such as IDAPython) which is basically single threaded with a large globally instantiated API, it just doesn't matter as much.
It's still bad form and a bad practice to encourage, but sometimes it's just easier. Especially when the code you are writing isn't going to have a very long life.

Misunderstanding differences between inside-class and outside-class imports in Python [duplicate]

This question already has answers here:
Short description of the scoping rules?
(9 answers)
Closed 1 year ago.
Context: I'm writing a translator from one Python API to another, both in Python 3.5+. I load the file to be translated with a class named FileLoader, described by Fileloader.py. This file loader allows me to transfer the file's content to other classes doing the translation job.
All of the .py files describing each class are in the same folder
I tried two different ways to import my FileLoader module inside the other modules containing the classes doing the translation job. One seems to work, but the other didn't and I don't understand why.
Here are two code examples illustrating both ways:
The working way
import FileLoader
class Parser:
#
def __init__(self, fileLoader):
if isinstance(fileLoader, FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
The crashing way
class Parser:
import FileLoader
#
def __init__(self, fileLoader):
if isinstance(fileLoader, FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
I thought doing the import inside the class's scope (where it's the only scope FileLoader is used) would be enough, since it would know how to relate to the FileLoader module and its content. I'm obviously wrong since it's the first way which worked.
What am I missing about scopes in Python? Or is it about something different?

2 things : this won't work. And there is no benefit to doing it this way.
First, why not?
class Parser:
#this assigns to the Parser namespace, to refer to it
#within a method you need to use `self.FileLoader` or
#Parser.FileLoader
import FileLoader
#`FileLoader` works fine here, under the Parser indentation
#(in its namespace, but outside of method)
copy_of_FileLoader = FileLoader
#
def __init__(self, fileLoader):
# you need to refer to modules under in Parser namespace
# with that `self`, just like you would with any other
# class or instance variable 👇
if isinstance(fileLoader, self.FileLoader.FileLoader)
self._fileLoader = fileLoader
else:
# raise a nice exception
#works here again, since we are outside of method,
#in `Parser` scope/indent.
copy2_of_FileLoader = FileLoader
Second it's not Pythonic and it doesn't help
Customary for the Python community would be to put import FileLoader at the top of the program. Since it seems to be one of your own modules, it would go after std library imports and after third party module imports. You would not put it under a class declaration.
Unless... you had a good (probably bad actually reason to).
My own code, and this doesn't reflect all that well on me, sometimes has stuff like.
class MainManager(batchhelper.BatchManager):
....
def _load(self, *args, **kwargs):
👉 from pssystem.models import NotificationConfig
So, after stating this wasn't a good thing, why am I doing this?
Well, there are some specific circumstances to my code going here. This is a batch, command-line, script, usable within a Django context and it uses some Django ORM models. In order for those to be used, Django needs to be imported first and then setup. But that often happens too early in the context of these types of batch programs and I get circular import errors, with Django complaining that it hasn't initialized yet.
The solution? Defer execution until the method is called, when all the other modules have been imported and Django has been setup elsewhere.
NotificationConfig is now available, but only within that method as it is a local variable in it. It works, but... it's really not great practice.
Remember: anything in the global scope gets executed at module load time, anything under classes at module load time, anything withing method/function bodies when the method/function is called.
#happens at module load time, you could have circular import errors
import X1
class DoImportsLater:
.
#happens at module load time, you could have circular import errors
import X2
def _load(self, *args, **kwargs):
#only happens when this method is called, if ever
#so you shouldn't be seeing circular imports
import X3
import X1 is std practice, Pythonic.
import X2, what are doing, is not and doesn't help
import X3, what I did, is a hack and is covering up circular import references. But it "fixes" the issue.

Detect circular imports in Python [duplicate]

I'm working with a project that contains about 30 unique modules. It wasn't designed too well, so it's common that I create circular imports when adding some new functionality to the project.
Of course, when I add the circular import, I'm unaware of it. Sometimes it's pretty obvious I've made a circular import when I get an error like AttributeError: 'module' object has no attribute 'attribute' where I clearly defined 'attribute'. But other times, the code doesn't throw exceptions because of the way it's used.
So, to my question:
Is it possible to programmatically detect when and where a circular import is occuring?
The only solution I can think of so far is to have a module importTracking that contains a dict importingModules, a function importInProgress(file), which increments importingModules[file], and throws an error if it's greater than 1, and a function importComplete(file) which decrements importingModules[file]. All other modules would look like:
import importTracking
importTracking.importInProgress(__file__)
#module code goes here.
importTracking.importComplete(__file__)
But that looks really nasty, there's got to be a better way to do it, right?

To avoid having to alter every module, you could stick your import-tracking functionality in a import hook, or in a customized __import__ you could stick in the built-ins -- the latter, for once, might work better, because __import__ gets called even if the module getting imported is already in sys.modules, which is the case during circular imports.
For the implementation I'd simply use a set of the modules "in the process of being imported", something like (benjaoming edit: Inserting a working snippet derived from original):
beingimported = set()
originalimport = __import__
def newimport(modulename, *args, **kwargs):
if modulename in beingimported:
print "Importing in circles", modulename, args, kwargs
print " Import stack trace -> ", beingimported
# sys.exit(1) # Normally exiting is a bad idea.
beingimported.add(modulename)
result = originalimport(modulename, *args, **kwargs)
if modulename in beingimported:
beingimported.remove(modulename)
return result
import __builtin__
__builtin__.__import__ = newimport

Not all circular imports are a problem, as you've found when an exception is not thrown.
When they are a problem, you'll get an exception the next time you try to run any of your tests. You can change the code when this happens.
I don't see any change required from this situation.
Example of when it's not a problem:
a.py
import b
a = 42
def f():
return b.b
b.py
import a
b = 42
def f():
return a.a

Circular imports in Python are not like PHP includes.
Python imported modules are loaded the first time into an import "handler", and kept there for the duration of the process. This handler assigns names in the local namespace for whatever is imported from that module, for every subsequent import. A module is unique, and a reference to that module name will always point to the same loaded module, regardless of where it was imported.
So if you have a circular module import, the loading of each file will happen once, and then each module will have names relating to the other module created into its namespace.
There could of course be problems when referring to specific names within both modules (when the circular imports occur BEFORE the class/function definitions that are referenced in the imports of the opposite modules), but you'll get an error if that happens.

import uses __builtin__.__import__(), so if you monkeypatch that then every import everywhere will pick up the changes. Note that a circular import is not necessarily a problem though.

Import modules that don't exist (yet)

I wish to create my own variation of amoffat'ssh module, where it can import pretty much any command from user's UNIX path, such as:
from sh import hg
However, I am having a hard time finding a way to intercept / override python's own import [...] and from [...] import [...]. At this point I simply need a way to at least get [the name of] the object of the from import, at which point I can simply setattr() and partial() my way from there, I hope. I'm at a complete loss of how to do this at the moment, however, and hence, have no code to show for it.
The gist of what I'm going for:
from test import t # Even though "t" doesn't exist in the module (yet)
Any help with the full code would be greatly appreciated!
Final Answer, consolidated:
def __getattr__(name):
if name == '__path__': raise AttributeError
print(name)

There is actually a straightforward way if you are on Python 3.7+, PEP-562, which allows you to define __getattr__ at the module level:
def __getattr__(name):
if name == "t":
return "magic"
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
There is also a function __dir__ that you can define to declare what the builtin dir() will say about names in your module.
What sh does is more sophisticated, as they want to support versions below 3.7: Modifying sys.modules and replacing the module with a special object that pretends to be a module.

As #L3viathan pointed out, this is easy starting with Python 3.7: just define a __getattr__ function in your special module. So, for example, you could create an "echo" module (just returns the name of the object you requested) like this:
echo.py (Python >=3.7)
def __getattr__(name):
return name
Then you could use it like this:
from echo import x
print(repr(x))
# 'x'
On earlier versions of Python, you have to subclass the module, as hinted in PEP-562. This also works in Python 3.7.
echo.py (Python >=2)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
return name
sys.modules[__name__] = EchoModule(__name__)
You would use this the same way as the 3.7 version: from echo import something.
Update
For some reason Python tries to retrieve the attribute twice for each from echo import <x> call. It also calls __getattr__('__path__') when the module is loaded. You can avoid side effects in these cases with the following code:
echo.py (only define attributes once)
import sys, types
class EchoModule(types.ModuleType):
def __getattr__(self, name):
# don't define __path__ attribute
if name == '__path__':
raise AttributeError
print("importing {}".format(name))
# create the attribute in case it's required again
setattr(self, name, name)
# return the new attribute
return getattr(self, name)
sys.modules[__name__] = EchoModule(__name__)
This code creates an attribute in the echo module each time a previously unused attribute is imported (sort of like collections.defaultdict). Then, if Python tries to import that same attribute again later, it will pull it directly from the module instead of calling __getattr__ (this is normal behavior for object attributes).
There is also some code here to avoid setting a spurious __path__ attribute; this also avoids running your code when __path__ is requested. Note that this may actually be the most important part; when I tested, just raising AttributeError for __path__ was enough to prevent the double-access to the named attribute.

Lazy loading python sub-modules, importlib fails first time

I'm experimenting with the idea of lazy-loading of symbols in a package's __init__.py by subclassing ModuleType and defining properties for each of the submodules. Accessing the symbol in the package namespace would trigger the import. I've got it working, but for some reason, my call to import_module fails on the first attempt and I don't understand why.
I have a minimal example. Assume a package like this:
my_package:
__init__.py
m1.py
this is the __init__.py
import sys
import importlib
from types import ModuleType
class MyModule(ModuleType):
#property
def m1(self):
try:
_m1 = importlib.import_module('.m1', __package__)
except AttributeError:
print('second try ...')
_m1 = importlib.import_module('.m1', __package__)
return _m1
old = sys.modules[__name__]
new = MyModule(__name__)
new.__path__ = old.__path__
for k, v in list(old.__dict__.items()):
new.__dict__[k] = v
sys.modules[__name__] = new
The import_module call always fails with an AttributeError: module 'my_package' has no attribute 'm1'. However, the second call always succeeds. In other words, when I do my_package.m1 I always get m1, but it always prints 'second try ...'.
Note, the behavior is dependent on python version. The call to import_lib works fine the first time on python2.7.

Here is the difference between python2 vs python3.
In python3, the importlib.import_module call ultimately ends up
here
which is a call to setattr. Since you didn't define a .setter for
your property, you get the AttributeError.
In python2, the importlib.import_module call ends up
here
which is a call to the builtin __import__ which presumably operates
directly on the module __dict__.
The only question is how in the world it ever works in python3. I
would have thought it would always resulted in a AttributeError.
Your code works fine as long as you make a .setter:
#m1.setter
def m1(self, mod):
self.__dict__['m1'] = mod
It actually turns out that the .setter can do anything at all,
including pass since you are unconditionally making the call to
import_module.
I would consider using the .setattr above and changing the getter to:
#property
def m1(self):
if not self.__dict__.get('m1'):
self.__dict__['m1'] = importlib.import_module('.m1', __package__)
return self.__dict__['m1']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to debug a sys.modules entry that gets overwritten? - python

Related

Python global variable in import * [duplicate]

Misunderstanding differences between inside-class and outside-class imports in Python [duplicate]

Detect circular imports in Python [duplicate]

Import modules that don't exist (yet)

Lazy loading python sub-modules, importlib fails first time

Categories

Resources