Does Python cache imported files? - python

Consider the following:
a.py
foo = 1
b.py
bar = 2
c.py
import a
kik = 3
d.py
import a
import c
def main():
import b
main()
main()
How many times is a.py loaded?
How many times is b.py loaded?
More generally, I would like to know how is Python handling imported files and functions/variables?

Both a and b are loaded once. When you import a module, its content is cached so when you load the same module again, you're not calling upon the original script for the import, done using a "finder":
https://www.python.org/dev/peps/pep-0451/#finder
https://docs.python.org/3/library/importlib.html#importlib.abc.MetaPathFinder
This works across modules so if you had a d.py of which import b, it will bind to the same cache as an import within c.py.
Some interesting builtin modules can help understand what happens during an import:
https://docs.python.org/3/reference/import.html#importsystem
When a module is first imported, Python searches for the module and if found, it creates a module object 1, initializing it.
Notably here the first import, all imports after follow the __import__. Internal caches of finders are stored at sys.meta_path.
https://docs.python.org/3/library/functions.html#import
You can leverage the import system to invalidate those caches for example:
https://docs.python.org/3/library/importlib.html#importlib.import_module
If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system.
The imp (and importlib py3.4+) allows the recompilation of a module after import:
import imp
import a
imp.reload(a)
https://docs.python.org/3/library/importlib.html#importlib.reload
Python module’s code is recompiled and the module-level code re-executed, defining a new set of objects which are bound to names in the module’s dictionary by reusing the loader which originally loaded the module.
https://docs.python.org/3/library/imp.html

Related

Circular imports Python

I've read this post about circular imports in Python. It describes the following scenario and argues that this raises an error when run:
# module1
import module2
def function1():
module2.function2()
def function3():
print('Goodbye, World!')
# module2
import module1
def function2():
print('Hello, World!')
module1.function3()
# __init__.py
import module1
module1.function1()
But when I run this (Python 3.95), it runs perfectly fine. The post is pretty old and it doesn't specify the Python version it uses. Maybe there was some change in latter Pythons that support this?
Here's a simplified sequence of events that happen in the code in Python 3:
__init__.py starts running
An empty __main__ module is added to sys.modules
import module1 starts loading module1.py
An empty module1 module is added to sys.modules
import module2 starts loading module2.py
An empty module2 module is added to sys.modules
module2.function2 is created and added to module2.__dict__
The fact that function2 references names in module1 does not affect the creation of the function object in any way
module2 is fully loaded and execution returns to module1
module1.function1 and module1.function3 are created and added to module1.__dict__
Again, it does not matter what names the functions reference because they are not being called. AttributeError and NameError can be raised at runtime if necessary.
module1 is fully loaded and execution returns to __main__
module1.function runs successfully, since all the names it references are resolvable.
As you can see, there are no circular import issues in this particular sequence of imports because module1 and module2 do not attempt to call each other's functions. The current import system allows both modules to load before the functions are called.
The post you mention is from 2017, and must be using a version of python from before 3.0. A hint is found in the link in the following quote, which links to the python-2.x docs:
This approach doesn't contradict Python syntax, as the Python documentation says: "It is customary but not required to place all import statements at the beginning of a module (or script, for that matter)".
The paragraph after that is a bit misleading by the way:
The Python documentation also says that it is advisable to use import X, instead of other statements, such as from module import *, or from module import a,b,c.
While star imports are certainly discouraged, specific-name imports of the form from module import a,b,c are generally very much encouraged with few exceptions.

Python Importing modules in a package

I currently have a module I created that has a number of functions.
It's getting quite large so I figured I should make it into a package and split the functions up to make it more manageable.
I'm just testing out how this all works before I do this for real so apologies if it seems a bit tenuous.
I've created a folder called pack_test and in it I have:
__init__.py
foo.py
bar.py
__init__.py contains:
__all__ = ['foo', 'bar']
from . import *
import subprocess
from os import environ
In the console I can write import pack_test as pt and this is fine, no errors.
pt. and two tabs shows me that I can see pt.bar, pt.environ, pt.foo and pt.subprocess in there.
All good so far.
If I want to reference subprocess or environ in foo.py or bar.py how do I do it in there?
If in bar.py I have a function which just does return subprocess.call('ls') it errors saying NameError: name 'subprocess' is not defined. There must be something I'm missing which enables me to reference subprocess from the level above? Presumably, once I can get the syntax from that I can also just call environ in a similar way?
The alternative as I could see it would be to have import subprocess in both foo.py and bar.py but then this seems a bit odd to me to have it appear across multiple files when I could have it the once at a higher level, particularly if I went on to have a large number of files rather than just 2 in this example.
TL;DR:
__init__.py :
import foo
import bar
__all__ = ["foo", "bar"]
foo.py:
import subprocess
from os import environ
# your code here
bar.py
import subprocess
from os import environ
# your code here
There must be something I'm missing which enables me to reference subprocess from the level above?
Nope, this is the expected behaviour.
import loads a module (if it isn't already), caches it in sys.modules (idem), and bind the imported names in the current namespace. Each Python module has (or "is") it's own namespace (there's no real "global" namespace). IOW, you have to import what you need in each module, ie if foo.py needs subprocess, it must explicitely import it.
This can seem a bit tedious at first but in the long run it really helps wrt/ maintainability - you just have to read the imports at the top of your module (pep 08: always put all imports at the beginning of the module) to know where a name comes from.
Also you should not use star imports (aka wild card imports aka from xxx import *) anywhere else than in your python shell (and even then...) - it's a maintainance time bomb. Not only because you don't know where each name comes from, but also because it's a sure way to rebind an already import name. Imagine that your foo module defines function "func". Somewhere you have "from foo import *; from bar import *", then later in the code a call to func. Now someone edits bar.py and adds a (distinct) "func" function, and suddenly you call fails, because you're not calling the expected "func". Now enjoy debugging this... And real-life examples are usually a bit more complex than this.
So if you fancy your mental sanity, don't be lazy, don't try to be smart either, just do the simple obvious thing: explicitely import the names you're interested in at the top of your modules.
(been here, done that etc)
You could create modules.py containing
import subprocess
import os
Then in foo.py or any of your files just have.
from modules import *
Your import statements in your files are then static and just update modules.py when you want to add an additional module accessible to them all.

How can i pass imports in Python higher up the hierarchy?

I am developping a program which runs on two different plattforms. Depending on which plattform I want to run it, the import directories and the names of the libraries change. For this reason i set a variable called RUN_ON_PC to True or False.
I want to implement a helper which sets the paths correctly and imports the libraries with the correct name depending of the platform and gives an interface with the same name of the libraries to the main program. The module myimporthelper is either in the "/mylib" or in the "/sd/mylib" directory. The other module names in these directories differ.
I try to do the following which is not working, since the imported modules from myimporthelper.py are not visible to main.py:
main.py:
RUN_ON_PC = True
import sys
if RUN_ON_PC:
sys.path.append("/mylib1")
else:
sys.path.append("/sd/mylib1")
import myimporthelper
myimporthelper.importall(RUN_ON_PC)
a = moduleA.ClassA() -> produces NameError: name not defined
myimporthelper.py:
import sys
def importall(run_on_pc):
if (run_on_pc == True):
sys.path.append("C:\\Users\\.....\\mylib")
import module1 as moduleA
else:
sys.path.append("/sd/mylib")
import module_a as moduleA
I want to keep the main.py short and want to outsource the platform dependent importing stuff to other module. I was not able to find a solution for this and would aprecciate any help.
Thanks a lot in advance.
You just have to qualify the name with the helper module name
a = myimporthelper.moduleA.ClassA()
But the moduleA name has to be accessible. If you import it inside a function in the helper it won't be, because of scope, unless you assign it to a name you previously declared as global in the helper module function.

Can I import a built-in module twice in both my script and custom module

Is there a downside to importing the same built-in module in both my script and custom module?
I have a script that: imports my custom module and imports the built-in csv module to open a csv file and append any necessary content to a list.
I then have a method in my custom module in which i pass a path, filename and list and writes a csv, but I have to import the csv module again (in my module).
I do not understand what happens when I import the csv module twice so I wanted to know if there is a more uniformed way of doing what I'm doing or if this is ok.
No, there is no downside. Importing a module does two things:
If not yet in memory, load the module, storing the resulting object in sys.modules.
Bind names to either the module object (import modulename) or to attributes of the module object (from modulename import objectname).
Additional imports only execute step 2, as the module is already loaded.
See The import system in the Python reference documentation for the nitty gritty details:
The import statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope.
The short answer is no, there is no downside.
That being said, it may be helpful to understand what imports mean, particularly for anyone new to programming or coming from a different language background.
I imagine your code looks something like this:
# my_module.py
import os
import csv
def bar(path, filename, list):
full_path = os.path.join(path, filename)
with open(full_path, 'w') as f:
csv_writer = csv.writer
csv_writer.writerows(list)
and
# my_script.py
import csv
import my_module
def foo(path):
contents = []
with open(path, 'r') as f:
csv_reader = csv.reader(f)
for row in csv_reader:
contents.append(row)
As a high-level overview, when you do an import in this manner, Python determines whether the module has already been imported. If not, then it searches the Python path to determine where the imported module lives on the file system, then it loads the imported module's code into memory and executes it. The interpreter takes all objects that are created during the execution of the imported module and makes them attributes on a new module object that the interpreter creates. Then the interpreter stores this module object into a dictionary-like structure that maps the module name to the module object. Finally, the interpreter brings the imported module's name into the importing module's scope.
This has some interesting consequences. For example, it means that you could simply use my_module.csv to access the csv module within my_script.py. It also means that importing csv in both is trivial and is probably the clearest thing you can do.
One very interesting consequence is that if any statements that get executed during import have any side effects, those side effects will only happen when the module is first loaded by the interpreter. For example, suppose you had two modules a.py and b.py with the following code:
# a.py
print('hello world')
# b.py
print('goodbye world')
import a
If you run import a followed by import b then you will see
>>> import a
hello world
>>> import b
goodbye world
>>>
However, if you import in the opposite order, you get this:
>>> import b
goodbye world
hello world
>>> import a
>>>
Anyway, I think I've rambled enough and I hope I've adequately answered the question while giving some background. If this is at all interesting, I'd recommend Allison Kaptur's PyCon 2014 talk about import.
You can import the same module in separate files (custom modules) as far as I know. Python keeps track of already imported modules and knows how to resolve a second import.

How to prevent a module from being imported twice?

When writing python modules, is there a way to prevent it being imported twice by the client codes? Just like the c/c++ header files do:
#ifndef XXX
#define XXX
...
#endif
Thanks very much!
Python modules aren't imported multiple times. Just running import two times will not reload the module. If you want it to be reloaded, you have to use the reload statement. Here's a demo
foo.py is a module with the single line
print("I am being imported")
And here is a screen transcript of multiple import attempts.
>>> import foo
Hello, I am being imported
>>> import foo # Will not print the statement
>>> reload(foo) # Will print it again
Hello, I am being imported
Imports are cached, and only run once. Additional imports only cost the lookup time in sys.modules.
As specified in other answers, Python generally doesn't reload a module when encountering a second import statement for it. Instead, it returns its cached version from sys.modules without executing any of its code.
However there are several pitfalls worth noting:
Importing the main module as an ordinary module effectively creates two instances of the same module under different names.
This occurs because during program startup the main module is set up with the name __main__. Thus, when importing it as an ordinary module, Python doesn't detect it in sys.modules and imports it again, but with its proper name the second time around.
Consider the file /tmp/a.py with the following content:
# /tmp/a.py
import sys
print "%s executing as %s, recognized as %s in sys.modules" % (__file__, __name__, sys.modules[__name__])
import b
Another file /tmp/b.py has a single import statement for a.py (import a).
Executing /tmp/a.py results in the following output:
root#machine:/tmp$ python a.py
a.py executing as __main__, recognized as <module '__main__' from 'a.py'> in sys.modules
/tmp/a.pyc executing as a, recognized as <module 'a' from '/tmp/a.pyc'> in sys.modules
Therefore, it is best to keep the main module fairly minimal and export most of its functionality to an external module, as advised here.
This answer specifies two more possible scenarios:
Slightly different import statements utilizing different entries in sys.path leading to the same module.
Attempting another import of a module after a previous one failed halfway through.

Categories

Resources