When module A imports B, and B imports A, I get error:
ImportError: cannot import name from partially initialized module (most likely due to a circular import).
However, if I merge contents of both modules into one, no error happens. Is there a reason why python does not treat circularly imported files as one big one?
I understand, that ambiguity arises: "Which to run first?", but that can be easily achieved by slightly adjusting syntax of imports and specifying the priority/order of import. Something like:
from foo import A priority 2
from foo import B priority 1
Please provide counter-arguments, if you see any.
I think the main reason that answers your question lays in that Python is an interpreted language. Your script does not run, it is read by the interpreter line by line subsequently (if a different, specific, pattern is not explicitly coded), and the interpreter executes the code you ask for.
For many reasons, one of which is performance, the interpreter cannot (or, better, is not designed to) inspect your whole script to search for patterns. This is completely different from what happens in compiled languages (e.g. C/C++), where the compiler spends much time looking at your code, building up an idea of what you want to do overall, and then (and only then) generates a byte code optimized to do the job you asked for (the optimisation parameters are up to you).
Coming back to Python, imagine when script A is importing script B, and at a certain line of script B, script A (that is still being imported... Recall? The interpreter proceeds line by line) is requested to be imported. It's easy to understand that I cannot proceed with the importing of script A since it relies on B and B is not yet fully imported.
When asking why something has been designed that way, please, do not thing to the most banal example, but to the most complex (or weirdest) one.
Some technics like TYPE_CHECKING can be used to avoid circular error while declaring types from a partially imported module. That's fine, since it's a specific, narrow case, where you're explicitly stating that you're not going to use any object from module A in module B before the former has been completely imported, but you just need to declare in advance their type being.
Hope I've been clear, though I'm aware my answer is not rigorous. Suggestion from those who are more expert than me in programming theory are warmly adviced.
Your example demonstrates the simplest case of circular imports. Solving the general case is vastly more complicated, possibly intractable.
Now consider import chains with multiple loops and loops on loops.
Related
Question
If I have import statements nested in an if/else block, am I increasing efficiency? I know some languages do "one passes" over code for import and syntax issues. I'm just not sure how in depth Python goes into this.
My Hypothesis
Because Python is interpreted and not compiled, by nesting the import statements within the else block, those libraries will not be imported until that line is reached, thus saving system resources unless otherwise needed.
Scenario
I have written a script that will be used by both the more computer literate and those are lesser so. My department is very comfortable with running scripts from the command line with arguments so I have set it up to take arguments for what it needs and, if it does not find the arguments it was expecting, it will launch a GUI with headings, buttons, and more verbose instructions. However, this means that I am importing libraries that are only being used in the event that the arguments were not provided.
Additional Information
The GUI is very, very basic (A half dozen text fields and possibly fewer buttons) so I am not concerned with just creating and spawning a custom GUI class in which the necessary libraries would be imported. If this gets more complicated, I'll consider it in the future or even push to change to web interface.
My script fully functions as I would expect it to. The question is simply about resource consumption.
import statements are executed as they're encountered in normal execution, so if the conditional prevents that line from being executed, the import doesn't occur, and you'll have avoided unnecessary work.
That said, if the module is going to be imported in some other way (say, unconditionally imported module B depends on A, and you're conditionally importing A), the savings are trivial; after the first import of a module, subsequent imports just get a new reference to the same cached module; the import machinery has to do some complicated stuff to handle import hooks and the like first, but in the common case, it's still fairly cheap (sub-microsecond when importing an already cached module).
The only way this will save you anything is if the module in question would not be imported in any way otherwise, in which case you avoid the work of loading it and the memory used by the loaded module.
In legacy system, We have created init module which load information and used by various module(import statement). It's big module which consume more memory and process longer time and some of information is not needed or not used till now. There is two propose solution.
Can we determine in Python who is using this module.Fox Example
LoadData.py ( init Module)
contain 100 data member
A.py
import LoadData
b = LoadData.name
B.py
import LoadData
b = LoadData.width
In above example, A.py using name and B.py is using width and rest information is not required (98 data member is not required).
is there anyway which help us to determine usage of LoadData module along with usage of data member.
In simple, we need to traverse A.py and B.py and find manually to identify usage of object.
I am trying to implement first solution as I have more than 1000 module and it will be painful to determine by traversing each module. I am open to any tool which can integrate into python
Your question is quite broad, so I can't give you an exact answer. However, what I would generally do here is to run a linter like flake8 over the whole codebase to show you where you have unused imports and if you have references in your files to things that you haven't imported. It won't tell you if a whole file is never imported by anything, but if you remove all unused imports, you can then search your codebase for imports of a particular module and if none are found, you can (relatively) safely delete that module.
You can integrate tools like flake8 with most good text editors, so that they highlight mistakes in real time.
As you're trying to work with legacy code, you'll more than likely have many errors when you run the tool, as it looks out for style issues as well as the kinds of import/usage issues that youre mention. I would recommend fixing these as a matter of principle (as they they are non-functional in nature), and then making sure that you run flake8 as part of your continuous integration to avoid regressions. You can, however, disable particular warnings with command-line arguments, which might help you stage things.
Another thing you can start to do, though it will take a little longer to yield results, is write and run unit tests with code coverage switched on, so you can see areas of your codebase that are never executed. With a large and legacy project, however, this might be tough going! It will, however, help you gain better insight into the attribute usage you mention in point 1. Because Python is very dynamic, static analysis can only go so far in giving you information about atttribute usage.
Also, make sure you are using a version control tool (such as git) so that you can track any changes and revert them if you go wrong.
Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.
There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.
Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.
What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.
Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.
I'm just wondering, I often have really long python files and imports tend to stack quite quickly.
PEP8 says that the imports should always be written at the beginning of the file.
Do all the imported libraries get imported when calling a function coded in the file? Or do only the necessary libraries get called?
Does it make sense to worry about this? Is there no reason to import libraries within the functions or classes that need them?
Every time Python hits an import statement, it checks to see if that module has already been imported, and if not, imports it. So the imports at the top of your file will happen as soon as your file is run or imported by another module.
There is some overhead to this, so it's generally best to keep imports at the top of your file so that cost gets taken care of up front.
The best place for imports is at the top of your file. That documents the dependencies in one place and makes errors from their absence appear earlier. The import itself actually occurs at the time of the import statement, but this seldom matters much.
It is not typical that you have anything to gain by not importing a library until you are in a function or method that needs it. (There is never anything to gain by doing so inside the body of a class.) It is rare that you want optional dependencies and even rarer that this is the right technique to get them, though. Perhaps you can share a compelling use case?
Does it make sense to worry about
this?
No
There no reason to import libraries within the functions or classes that need them.
It's just slow because the import statement has to check to see if it's been imported once, and realize that it has been imported.
If you put this in a function that's called frequently, you can waste some time with all the import checking.
Imports happen when the module that contains the imports gets executed or imported, not when the functions are called.
Ordinarily, I wouldn't worry about it. If you are encountering slowdowns, you might profile to see if your problem is related to this. If it is, you can check to see if your module can divided up into smaller modules.
But if all the files are getting used by the same program, you'll just end up importing everything anyway.
If a function inside a module is the only one to import a given other module (say you have a function sending tweets, only if some configuration option is on), then it makes sense to import that specific module in the function.
Unless I see some profiling data proving otherwise, my guess is that the overhead of an import statement in a function is completely negligible.
Let from module import function be called the FMIF coding style.
Let import module be called the IM coding style.
Let from package import module be called the FPIM coding style.
Why is IM+FPIM considered a better coding style than FMIF? (See this post for the inspiration for this question.)
Here are some criteria which lead me to prefer FMIF over IM:
Shortness of code: It allows me to use shorter function names and thus help stick to the 80 columns-per-line convention.
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...). Although this is a subjective criterion, I think most people would agree.
Ease of redirection: If I use FMIF and for some reason at some later time want to redirect python to define function from alt_module instead of module I need to change just one line: from alt_module import function. If I were to use IM, I'd need to change many lines of code.
I realize FPIM goes some way to nullifying the first two issues, but what about the third?
I am interested in all reasons why IM+FPIM may be better than FMIF,
but in particular, I'd be interested in elaboration on the following points mentioned here:
Pros for IM:
ease of mocking/injecting in tests. (I am not very familiar with mocking, though I recently learned what the term means. Can you show code which demonstrates how IM is better than FMIF here?)
ability for a module to change flexibly by redefining some entries. (I must be misunderstanding something, because this seems to be an advantage of FMIF over IM. See my third reason in favor of FMIF above.)
predictable and controllable behavior on serialization and recovery of your data. (I really don't understand how the choice of IM or FMIF affects this issue. Please elaborate.)
I understand that FMIF "pollutes my namespace", but beyond being a negative-sounding phrase, I don't appreciate how this hurts the code in any concrete way.
PS. While writing this question I received a warning that the question appears subjective and is likely to be closed. Please don't close it. I'm not looking for subjective opinion, but rather concrete coding situations where IM+FPIM is demonstrably better than FMIF.
Many thanks.
The negatives you list for IM/FPIM can often be ameliorated by appropriate use of an as clause. from some.package import mymodulewithalongname as mymod can usefully shorten your code and enhance its readability, and if you rename mymodulewithalongname to somethingcompletelydifferent tomorrow, the as clause can be used as a single statement to edit.
Consider your pro-FMIF point 3 (call it R for redirection) vs your pro-FPIM point 2 (call it F for flexibility): R amounts to facilitating the loss of integrity of module boundaries, while F strenghtens it. Multiple functions, classes and variables in a module are often intended to work together: they should not be independently switched to different meanings. For example, consider module random and its functions seed and uniform: if you were to switch the import of just one of them to a different module, then you'd break the normal connection between calls to seed and results of calls to uniform. When a module is well designed, with cohesion and integrity, R's facilitation of breaking down the module's boundaries is actually a negative -- it makes it easier to do something you're better off not doing.
Vice versa, F is what enables coordinated switching of coupled functions, classes, and variables (so, generally, of entities that belong together, by modularity). For example, to make testing repeatable (FPIM pro-point 1), you mock both seed and random in the random module, and if your code follows FPIM, you're all set, coordination guaranteed; but if you have code that has imported the functions directly, you have to hunt down each such module and repeat the mocking over and over and over again. Making tests perfectly repeatable typically also requires "coordinated mocking" of date and time functions -- if you use from datetime import datetime in some modules, you need to find and mock them all (as well as all those doing from time import time, and so forth) to ensure that all the times received when the various parts of the system ask "so what time is it now?" are perfectly consistent (if you use FPIM, you just mock the two relevant modules).
I like FPIM, because there's really not much added value by using a multiply qualified name rather than a singly qualified one (while the difference between barenames and qualified names is huge -- you get so much more control with a qualified name, be it singly or multiply, than you possibly ever can with a barename!).
Ah well, can't devote all of the working day to responding to each and every one of your points -- your question should probably be half a dozen questions;-). I hope this at least addresses "why is F better than R" and some of the mocking/testing issues -- it boils down to preserving and enhancing well-designed modularity (via F) rather than undermining it (via R).
The classic text on this, as so often, is from Fredrik Lundh, the effbot. His advice: always use import - except when you shouldn't.
In other words, be sensible. Personally I find that anything that's several modules deep tends to get imported via from x.y.z import a - the main example being Django models. But as much as anything else it's a matter of style, and you should have a consistent one - especially with modules like datetime, where both the module and the class it contains are called the same thing. Do you need to write datetime.datetime.now() or just datetime.now()? (In my code, always the former.)
Items 1 and 2 in your list of questions seem to be the same issue. Python's dynamic nature means it is fairly simple to replace an item in a module's namespace no matter which of the methods you use. The difficulty comes if one function in a module refers to another, which is the one you want to mock. In this case, importing the module rather than the functions means you can do module.function_to_replace = myreplacementfunc and everything works transparently - but that is as easy to do via FPIM as it is via IM.
I also don't understand how item 3 has anything to do with anything. I think your item 4, however, is based on a bit of a misunderstanding. None of the methods you give will 'pollute your namespace'. What does do that is from module import *, where you have no idea at all what you're importing and so functions can appear in your code with no clue given to the reader where they came from. That's horrible, and should be avoided at all costs.
Great answers here (I upvoted them all), and here are my thoughts on this matter:
First, addressing each of your bullets:
(Allegedly) Pros of FMIF:
Shortness of code: shorter function names help stick to the 80 columns-per-line.
Perhaps, but module names are usually short enough so this is not relevant. Sure, there's datetime, but also os, re, sys, etc. And Python has free line breaks inside { [ (. And for nested modules there's always as in both IM and FPIM
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...).
Strongly disagree. When reading foreign code (or my own code after a few months) it's hard to know where each function comes from. Qualified names saves me from going back and forth from line 2345 to module declarations header. And it also gives you context: "chisquare? What's that? Oh, it's from scypy? Ok, some math-related stuff then". And, once again, you can always abbreviate scipy.stats.stats as scypyst. scypyst.chisquare(...) is short enough with all benefits of a qualified name.
import os.path as osp is another good example, considering it's very common to chain 3 or more of its functions together in a single call: join(expanduser(),basename(splitext())) etc.
Ease of redirection: one-line redefinition of a function from altmodule instead of module.
How often you want to redefine a single function but not whole module? Module boundaries and function coordination should be preserved, and Alex already explained this in great depth. For most (all?) real-world scenarios, if alt_module.x is a viable replacement for module.x, then probably alt_module itself is a drop in alternative for module, so both IM and FPIM are one-liners just like FMIF, provided you use as.
I realize FPIM goes some way to nullifying the first two issues...
Actually, as is the one that mitigates the first 2 issues (and the 3rd), not FPIM. You can use IM for that too: import some.long.package.path.x as x for the same result as FPIM.
So none of the above are really pros of FMIF. And the reasons I prefer IM/FPIM are:
For the sake of simplicity and consistency, when I import something, either IM or FPIM, I'm always importing a module, not an object from a module. Remember FMIF can be (ab-)used to import functions, classes, variables, or even other modules! Think about the mess of from somemodule import sys, somevar, os, SomeClass, datetime, someFunc.
Also, if you want more than a single object from a module, FMIF will pollute your namespace more than IM or FPIM, which will use a single name no matter how many objects you want to use. And such objects will have a qualified name, which is a pro, not a con: as I've said in issue 2, IMHO a it improves readability.
it all comes down to consistency, simplicity, organization. "Import modules, not objects" is a good, easy mind model to stick with.
Like Alex Martelli, I am fond of using as when importing a function.
One thing I have done is to use some prefix on all the functions that were imported from the same module:
from random import seed as r_seed
from random import random as r_random
r_seed is shorter to type than random.seed but somewhat preserves the module boundaries. Someone casually looking at your code can see r_seed() and r_random() and have a chance to grok that they are related.
Of course, you can always simply do:
import random as r
and then use r.random() and r.seed(), which may be the ideal compromise for this case. I only use the prefix trick when I'm importing one or two functions from a module. When I want to use many functions from the same module, I'll import the module, perhaps with an as to shorten the name.
I agree with MestreLion the most here (and so an upvote).
My perspective: I review code frequently that I am unfamiliar with, and not knowing what module a function is coming from just looking at the function is quite frustrating.
Code is written once and read many times, and so readability and maintainability trumps ease of typing.
In a similar vein, typically code is not being written for the benefit of the coder, but for the benefit of another entity.
Your code should be readable to someone who knows python better than you, but is unfamiliar with the code.
Full path imports can also better help IDE's point you at the correct source of the function or object you're looking at.
For all of these reasons and the reasons MestreLion noted, I conclude that it is best practice to import and use the full path.