Python module getting too big

Python module getting too big - python

My module is all in one big file that is getting hard to maintain. What is the standard way of breaking things up?
I have one module in a file my_module.py, which I import like this:
import my_module
"my_module" will soon be a thousand lines, which is pushing the limits of my ability to keep everything straight. I was thinking of adding files my_module_base.py, my_module_blah.py, etc. And then, replacing my_module.py with
from my_module_base import *
from my_module_blah import *
# etc.
Then, the user code does not need to change:
import my_module # still works...
Is this the standard pattern?

It depends on what your module is doing actually. Usually it is always a good idea to make your module a directory with an '__init__.py' file inside. So you would first transform your your_module.py to something like your_module/__init__.py.
After that you continue according to your business logic. Here some examples:
do you have utility functions which are not directly used by the modules API put them in some file called utils.py
do you have some classes dealing with the database or representing your database models put them in models.py
do you have some internal configuration it might make sense to put it into some extra file called settings.py or config.py
These are just examples (a little bit stolen from the Django approach of reusable apps ^^). As said, it depends a lot what your module does. If it is still too big afterwards it also makes sense to create submodules (as subdirectories with their own __init__.py).

i'm sure there are lots of opinions on this, but I'd say you break it into more well-defined functional units (modules), contained in a package. Then you use:
from mypackage import modulex
Then use the package name to reference the object:
modulex.MyClass()
etc.
You should (almost) never use
from mypackage import *
Since that can introduce bugs (duplicate names from different modules will end up clobbering one).

No, that is not the standard pattern. from something import * is usually not a good practice as it will import lot of things you did not intend to. Instead follow the same approach as you did, but include the modules specifically from one to another for e.g.
In base.py if you are having def myfunc then in main.py use from base import myfunc So that for your users, main.myfunc would work too. Of course, you need to take care that you don't end up doing a circular import.
Also, if you see that from something import * is required, then control the import values using the __all__ construct.

Related

Shared import in Python [duplicate]

If I were to create a module that was called for example imp_mod.py and inside it contained all (subjectively used) relevant modules that I frequently used.
Would importing this module into my main program allow me access to the imports contained inside imp_mod.py?
If so, what disadvantages would this bring?
I guess a major advantage would be a reduction of time spent importing even though its only a couple of seconds saved...

Yes, it would allow you to access them. If you place these imports in imp_mod.py:
from os import listdir
from collections import defaultdict
from copy import deepcopy
Then, you could do this in another file, say, myfile.py:
import imp_mod
imp_mod.listdir
imp_mod.defaultdict
imp_mod.deepcopy
You're wrong about reduction of importing time, as what happens is the opposite. Python will need to import imp_mod and then import the other modules afterwards, while the first import would not be needed if you were importing these modules in myfile.py itself. If you do the same imports in another file, they will already be in cache, so virtually no time is spent in the next import.
The real disadvantage here is less readability. Whoever looks at imp_mod.listdir, for example, will ask himself what the heck is this method and why it has the same name as that os module's method. When he had to open imp_mod.py just to find out that it's the same method, well, he probably wouldn't be happy. I wouldn't.

As lucasnadalutti mentioned, you can access them by importing your module.
In terms of advantages, it can make your main program care less about where the imports are coming from if the imp_mod handles all imports, however, as your program gets more complex and starts to include more namespaces, this approach can get more messy. You can start to handle a bit of this by using __init__.py within directories to handle imports to do a similar thing, but as things get more complex, personally, I feel it add a little more complexity. I'd rather just know where a module came from to look it up.

Circular imports hell

Python is extremely elegant language. Well, except... except imports. I still can't get it work the way it seems natural to me.
I have a class MyObjectA which is in file mypackage/myobjecta.py. This object uses some utility functions which are in mypackage/utils.py. So in my first lines in myobjecta.py I write:
from mypackage.utils import util_func1, util_func2
But some of the utility functions create and return new instances of MyObjectA. So I need to write in utils.py:
from mypackage.myobjecta import MyObjectA
Well, no I can't. This is a circular import and Python will refuse to do that.
There are many question here regarding this issue, but none seems to give satisfactory answer. From what I can read in all the answers:
Reorganize your modules, you are doing it wrong! But I do not know
how better to organize my modules even in such a simple case as I
presented.
Try just import ... rather than from ... import ...
(personally I hate to write and potentially refactor all the full
name qualifiers; I love to see what exactly I am importing into
module from the outside world). Would that help? I am not sure,
still there are circular imports.
Do hacks like import something in the inner scope of a function body just one line before you use something from other module.
I am still hoping there is solution number 4) which would be Pythonic in the sense of being functional and elegant and simple and working. Or is there not?
Note: I am primarily a C++ programmer, the example above is so much easily solved by including corresponding headers that I can't believe it is not possible in Python.

There is nothing hackish about importing something in a function body, it's an absolutely valid pattern:
def some_function():
import logging
do_some_logging()
Usually ImportErrors are only raised because of the way import() evaluates top level statements of the entire file when called.
In case you do not have a logic circular dependency...
, nothing is impossible in python...
There is a way around it if you positively want your imports on top:
From David Beazleys excellent talk Modules and Packages: Live and Let Die! - PyCon 2015, 1:54:00, here is a way to deal with circular imports in python:
try:
from images.serializers import SimplifiedImageSerializer
except ImportError:
import sys
SimplifiedImageSerializer = sys.modules[__package__ + '.SimplifiedImageSerializer']
This tries to import SimplifiedImageSerializer and if ImportError is raised (due to a circular import error or the it not existing) it will pull it from the importcache.
PS: You have to read this entire post in David Beazley's voice.

Don't import mypackage.utils to your main module, it already exists in mypackage.myobjecta. Once you import mypackage.myobjecta the code from that module is being executed and you don't need to import anything to your current module, because mypackage.myobjecta is already complete.

What you want isn't possible. There's no way for Python to know in which order it needs to execute the top-level code in order to do what you ask.
Assume you import utils first. Python will begin by evaluating the first statement, from mypackage.myobjecta import MyObjectA, which requires executing the top level of the myobjecta module. Python must then execute from mypackage.utils import util_func1, util_func2, but it can't do that until it resolves the myobjecta import.
Instead of recursing infinitely, Python resolves this situation by allowing the innermost import to complete without finishing. Thus, the utils import completes without executing the rest of the file, and your import statement fails because util_func1 doesn't exist yet.
The reason import myobjecta works is that it allows the symbols to be resolved later, after the body of every module has executed. Personally, I've run into a lot of confusion even with this kind of circular import, and so I don't recommend using them at all.
If you really want to use a circular import anyway, and you want them to be "from" imports, I think the only way it can reliably work is this: Define all symbols used by another module before importing from that module. In this case, your definitions for util_func1 and util_func2 must be before your from mypackage.myobjecta import MyObjectA statement in utils, and the definition of MyObjectA must be before from mypackage.utils import util_func1, util_func2 in myobjecta.
Compiled languages like C# can handle situations like this because the top level is a collection of definitions, not instructions. They don't have to create every class and every function in the order given. They can work things out in whatever order is required to avoid any cycles. (C++ does it by duplicating information in prototypes, which I personally feel is a rather hacky solution, but that's also not how Python works.)
The advantage of a system like Python is that it's highly dynamic. Yes you can define a class or a function differently based on something you only know at runtime. Or modify a class after it's been created. Or try to import dependencies and go without them if they're not available. If you don't feel these things are worth the inconvenience of adhering to a strict dependency tree, that's totally reasonable, and maybe you'd be better served by a compiled language.

Pythonistas frown upon importing from a function. Pythonistas usually frown upon global variables. Yet, I saw both and don't think the projects that used them were any worse than others done by some strict Pythhonistas. The feature does exist, not going into a long argument over its utility.
There's an alternative to the problem of importing from a function: when you import from the top of a file (or the bottom, really), this import will take some time (some small time, but some time), but Python will cache the entire file and if another file needs the same import, Python can retrieve the module quickly without importing. Whereas, if you import from a function, things get complicated: Python will have to process the import line each time you call the function, which might, in a tiny way, slow your program down.
A solution to this is to cache the module independently. Okay, this uses imports inside function bodies AND global variables. Wow!
_MODULEA = None
def util1():
if _MODULEA is None:
from mymodule import modulea as _MODULEA
obj = _MODULEA.ClassYouWant
return obj
I saw this strategy adopted with a project using a flat API. Whether you like it or not (and I'm not sure about that myself), it works and is fast, because the import line is executed only once (when the function first executes). Still, I would recommend restructuring: problems with circular imports show a problem in structure, usually, and this is always worth fixing. I do agree, though, it would be nice if Python provided more useful errors when this kind of situation happens.

Intra-package imports do not always work

I have a Django project structured like so:
appname/
models/
__init__.py
a.py
base.py
c.py
... where appname/models/__init__.py contains only statements like so:
from appname.models.base import Base
from appname.models.a import A
from appname.models.c import C
... and where appname/models/base.py contains:
import django.db.models
class Base(django.db.models.Model):
...
and where appname/models/a.py contains:
import appname.models as models
class A(models.Base):
....
...and similarly for appname/models/c.py, etc..
I am quite happy with this structure of my code, but of course it does not work, because of circular imports.
When appname/__init__.py is run, appname/models/a.py will get run, but that module imports "appname.models", which has not finished executing yet. Classic circular import.
So this supposedly indicates that my code is structured poorly and needs to be re-designed in order to avoid circular dependency.
What are the options to do that?
Some solutions I can think of and then why I don't want to use them:
Combine all my model code into a single file: Having 20+ classes in the same file is a far worse style than what I am trying to do (with separate files), in my opinion.
Move the "Base" model class into another package outside of "appname/models": This means that I would end up with package in my project that contains base/parent classes that should ideally be split into the packages in which their child/sub classes are located. Why should I have base/parent classes for models, forms, views, etc. in the same package and not in their own packages (where the child/sub classes would be located), other than to avoid circular imports?
So my question is not just how to avoid circular imports, but to do so in a way that is just as clean (if not cleaner) that what I tried to implement.
Does anyone have a better way?

Edit
I have researched this more thoroughly and come to the conclusion that this is a bug in either core Python or the Python documentation. More information is available at this question and answer.
Python's PEP 8 indicates a clear preference for absolute over relative imports. This problem has a workaround that involves relative imports, and there is a possible fix in the import machinery.
My original answer below gives examples and workarounds.
Original answer
The problem, as you have correctly deduced, is circular dependencies. In some cases, Python can handle these just fine, but if you get too many nested imports, it has issues.
For example, if you only have one package level, it is actually fairly hard to get it to break (without mutual imports), but as soon as you nest packages, it works more like mutual imports, and it starts to become difficult to make it work. Here is an example that provokes the error:
level1/__init__.py
from level1.level2 import Base
level1/level2/__init__.py
from level1.level2.base import Base
from level1.level2.a import A
level1/level2/a.py
import level1.level2.base
class A(level1.level2.base.Base): pass
level1/level2/base
class Base: pass
The error can be "fixed" (for this small case) in several different ways, but many potential fixes are fragile. For example, if you don't need the import of A in the level2 __init__ file, removing that import will fix the problem (and your program can later execute import level1.level2.a.A), but if your package gets more complex, you will see the errors creeping in again.
Python sometimes does a good job of making these complex imports work, and the rules for when they will and won't work are not at all intuitive. One general rule is that from xxx.yyy import zzz can be more forgiving than import xxx.yyy followed by xxx.yyy.zzz. In the latter case, the interpreter has to have finished binding yyy into the xxx namespace when it is time to retrieve xxx.yyy.zzz, but in the former case, the interpreter can traverse the modules in the package before the top-level package namespace is completely set up.
So for this example, the real problem is the bare import in a.py This could easily be fixed:
from level1.level2.base import Base
class A(Base): pass
Consistently using relative imports is a good way to enforce this use of from ... import for the simple reason that relative imports do not work without the from'. To use relative imports with the example above,level1/level2/a.py` should contain:
from .base import Base
class A(Base): pass
This breaks the problematic import cycle and everything else works fine. If the imported name (such as Base) is too confusingly generic when not prefixed with the source module name, you can easily rename it on import:
from .base import Base as BaseModel
class A(BaseModel): pass
Although that fixes the current problem, if the package structure gets more complex, you might want to consider using relative imports more generally. For example, level1/level2/__init__.py could be:
from .base import Base
from .a import A

Proper way of setting classes and constants in python package

I'm writing a small package for internal use and come to a design problem. I define a few classes and constants (i.e., server IP address) in some file, let's call it mathfunc.py. Now, some of these classes and constants will be used in other files in the same package. My current setup is like this:
/mypackage
__init__.py
mathfunc.py
datefunc.py
So, at the moment I think I have to import mathfunc.py in datefunc.py to use the classes defined there (or alternatively import both of them all the time). This sounds wrong to me because then I'll be in a lot of pain importing lots of files everywhere. Is it a proper design at all or there is some other way? Maybe I can put all definitions in some file which will not be a subpackage on its own, but will be used by all other files?

Nope, that's pretty much how Python works. If you want to use objects declared in another file, you have to import from it.
Tips:
You can keep your namespace clean by only importing the things you need, rather than using from foo import *.
If you really need to do a "circular import" (where A needs things in B, and B needs things in A) you can solve that by only importing inside the functions where you need the object, not at the top of a file.

Automatically import to all Python files in the given folder?

I am relatively quite new to Python and I try to learn the "Pythonic" way of doing things to build a solid foundation in terms of Python development. Perhaps what I want to achieve is not Python at all, but I am nonetheless seeking to find out the "right" way to solve this issue.
I am building an application, for which I am creating modules. I just noticed that a module of mine has 7 different .py Python files, all importing 3 different same things. So all these files share these imports.
I tried removing them, and inserting these import to the empty init.py in the folder, but it did not do the trick.
If possible, since these imports are needed by all these module files, I would not like them to be imported in each file one by one.
What can I do to perform the common import?
Thank you very much, I really appreciate your kind help.

As the Zen of Python states, "Explicit is better than implicit", and this is a good example.
It's very useful to have the dependencies of a module listed explicitly in the imports and it means that every symbol in a file can be traced to its origin with a simple text search. E.g. if you search for some_identifier in your file, you'll either find a definition in the file, or from some_module import some_identifier. It's even more obvious with direct references to some_module.some_identifier. (This is also one reason why you should not do from module import *.)
One thing you could do, without losing the above property, is to import your three shared modules into a fourth module:
#fourth.py
import first
import second
import third
then...
#another.py
import fourth
fourth.first.some_function()
#etc.
If you can't stomach that (it does make calls more verbose, after all) then the duplication of three imports is fine, really.

I agree with DrewV, it is perfectly pythonic to do
File1:
import xyz
import abc
...
File2:
import xyz
An almost identical question has also been addressed in the following link:
python multiple imports for a common module
As it explains, Python does the job of optimising the module load, so you can write multiple import statements and not worry about performance losses, because the module is only loaded once. In fact, listing out all the imports in each file makes it explicitly clear what each file depends on.
And for a discussion of how imports interact with namespaces, see:
Python imports across modules and global variables

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python module getting too big - python

Related

Shared import in Python [duplicate]

Circular imports hell

Intra-package imports do not always work

Proper way of setting classes and constants in python package

Automatically import to all Python files in the given folder?

Categories

Resources