I'm cleaning up some Python (2.7) code that I inherited, and have come across a circular import scenario that I'd like to get rid of. The code currently runs (by abusing the import function), but it's a messy and causes issues when other code doesn't access it in a specific way.
The file structure is essentially this:
/deep/nested/path/__init__.py
/deep/nested/path/objects.py
/deep/nested/path/api.py
objects is a collection of data models
api exposes developer interface with functions to get/create instances of objects.
the circular import occurs because some objects need to invoke api functions to create child objects.
this section of code handles analytics and is executed a lot (many objects, deep recursion). the package namespace is fairly nested too -- so using the package path has a tangible effect on performance.
i'm very tempted to just move the factory functions needed by objects into that file, and then import them back into api for general use. that would solve my problems (and eliminate a dot), but lose some of the code organization (which is actually pretty decent). I'm hoping for another set of eyes to give some input.
while there are several questions about circular imports already here, i'm not concerned with getting this to work (which it does). i'm concerned with minimizing the dot notation. api.factory and objects.foo work, but package.api.factory wont.
Perhaps it would be better to move those factory functions into a third
module. Then objects can import it to creates its objects; api can
import it if needed; other modules can import it if they need what it
contains.
Related
I have an application that dynamically generates a lot of Python modules with class factories to eliminate a lot of redundant boilerplate that makes the code hard to debug across similar implementations and it works well except that the dynamic generation of the classes across the modules (hundreds of them) takes more time to load than simply importing from a file. So I would like to find a way to save the modules to a file after generation (unless reset) then load from those files to cut down on bootstrap time for the platform.
Does anyone know how I can save/export auto-generated Python modules to a file for re-import later. I already know that pickling and exporting as a JSON object won't work because they make use of thread locks and other dynamic state variables and the classes must be defined before they can be pickled. I need to save the actual class definitions, not instances. The classes are defined with the type() function.
If you have ideas of knowledge on how to do this I would really appreciate your input.
You’re basically asking how to write a compiler whose input is a module object and whose output is a .pyc file. (One plausible strategy is of course to generate a .py and then byte-compile that in the usual fashion; the following could even be adapted to do so.) It’s fairly easy to do this for simple cases: the .pyc format is very simple (but note the comments there), and the marshal module does all of the heavy lifting for it. One point of warning that might be obvious: if you’ve already evaluated, say, os.getcwd() when you generate the code, that’s not at all the same as evaluating it when loading it in a new process.
The “only” other task is constructing the code objects for the module and each class: this requires concatenating a large number of boring values from the dis module, and will fail if any object encountered is non-trivial. These might be global/static variables/constants or default argument values: if you can alter your generator to produce modules directly, you can probably wrap all of these (along with anything else you want to defer) in function calls by compiling something like
my_global=(lambda: open(os.devnull,'w'))()
so that you actually emit the function and then a call to it. If you can’t so alter it, you’ll have to have rules to recognize values that need to be constructed in this fashion so that you can replace them with such calls.
Another detail that may be important is closures: if your generator uses local functions/classes, you’ll need to create the cell objects, perhaps via “fake” closures of your own:
def cell(x): return (lambda: x).__closure__[0]
Django==2.2.5
In the examples below two custom filters and two auxiliary functions.
It is a fake example, not a real code.
Two problems with this code:
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here? To organize a separate module for functions that can be imported? And sort them alphabetically?
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while get_salted_str is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
My code example:
def combine(str1, str2):
return "{}_{}".format(str1, str2)
def get_salted_str(str):
SALT = "slkdjghslkdjfghsldfghaaasd"
return combine(str, SALT)
#register.filter
def get_salted_string(str):
return combine(str, get_salted_str(str))
#register.filter
def get_salted_peppered_string(str):
salted_str = get_salted_str(str)
PEPPER = "1234128712908369735619346"
return "{}_{}".format(PEPPER, salted_str)
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here?
Good documentation and proper modularization.
To organize a separate module for functions that can be imported?
Technically, all functions (except of course nested ones) can be imported. Now I assume you meant: "for functions that are meant to be imported from other modules", but even then, it doesn't mean much - it often happens that a function primarily intended for "internal use" (helper function used within the same module) later becomes useful for other modules.
Also, the proper way to regroup function is not based on whether those are for internal or public use (this is handled by prefixing 'internal use only' functions with a single leading underscore), but on how those functions are related.
NB: I use the term "function" because that's how you phrased your question, but this applies to all other names (classes etc).
And sort them alphabetically?
Bad idea IMHO - it doesn't make any sense from a function POV, and can cause issue when merging diverging branches.
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while "get_salted_str" is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Why would you prevent get_salted_str from being imported by another module actually ?
'protected' (single leading underscore) names are for implementation parts that the module's client code should not mess with nor even be aware of - this is called "encapsulation" -, the goal being to allow for implementation changes that won't break the client code.
In your example, get_salted_str() is a template filter, so it's obviously part of your package's public API.
OTHO, combine really looks like an implementation detail - the fact that some unrelated code in another package may need to combine two strings with the same separator seems mostly accidental, and if you expose combine as part of the module's API you cannot change it's implementation anyway. This is typically an implementation function as far as I can tell from your example (and also it's so trivial that it really doesn't warrant being exposed as ar as I'm concerned).
As a more general level: while avoiding duplication is a very audable goal, you must be careful of overdoing it. Some duplication is actually "accidental" - at some point in time, two totally unrelated parts of the code have a few lines in common, but for totally different reasons, and the forces that may lead to a change in one point of the code are totally unrelated to the other part. So before factoring out seemingly duplicated code, ask yourself if this code is doing the same thing for the same reasons and whether changing this code in one part should affect the other part too.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
This is nothing specific to Django, nor even to Python. Writing well organized code relies on the same heuristics whatever the language: you want high cohesions (all functions / classes etc in a same module should be related and provide solutions to the same problems) and low coupling (a module should depend on as few other modules as possible).
NB: I'm talking about "modules" here but the same rules hold for packages (a package is kind of a super-module) or classes (a class is a kind of mini-module too - except that you can have multiple instances of it).
Now it must be said that proper modularisation - like proper naming etc - IS hard. It takes time and experience (and a lot of reflexion) to develop (no pun intended but...) a "feel" for it, and even then you often find yourself reorganizing things quite a bit during your project's lifetime. And, well, there almost always be some messy area somewhere, because sometimes finding out where a given feature really belongs is a bit of wild guess (hint: look for modules or packages named "util" or "utils" or "helpers" - those are usually where the dev regrouped stuff that didn't clearly belong anywhere else).
There are a lot of ways to go about this, so here is the way I always handle this:
1. Reusable functions in a project
First and foremost: Documentation. When working in a big team you definitely need to document reusable function.
Second, packages. When creating a lot of auxiliary/helper functions, that might have a use outside the current module or app, it can be useful to bundle them all together. I often create a 'base' or 'utils' package in my Django project where I bundle all sorts of functions.
The django.contrib package is a pretty good example of all sorts of helper packages bundled into one.
My rule of thumb is, if I find that I reuse some function/piece of code, I move it to my utils package, and if it's related to something else in that package, I bundle them together. That makes it pretty easy to keep track of all the functions there are.
2. Private functions
Python doesn't really have private members, but the generally accepted way to 'mark' a member as private is to add an underscore, like _get_salted_str
3. Style guide
With regards to auxiliary functions, I'm not aware of any styleguide.
'Private' members : https://docs.python.org/3/tutorial/classes.html#private-variables
Ok so it is like this.
I'd rather not give away my code but if you really need it I will. I have two modules that need a bit from each other. the modules are called webhandler and datahandler.
In webhandler I have a line:
import datahandler
and in datahandler I have another line:
import webhandler
Now I know this is terrible code and a circular import like this causes the code to run twice (which is what im trying to avoid).
However the datahandler module needs to access several functions from the webhandler module, and the webhandler module needs access to several variables that are generated in the datahandler module. I dont see any workaround other than moving functions to different modules but that would ruin the organisation of my program and make no logical sense with the module naming.
Any help?
Circular dependencies are a form of code smell. If you have two modules that depend on each other, then that’s a very bad sign, and you should restructure your code.
There are a few different ways to do this; which one is best depends on what you are doing, and what parts of each module are actually used by another.
A very simple solution would be to just merge both modules, so you only have a single module that only depends on itself, or rather on its own contents. This is simple, but since you had separated modules before, it’s likely that you are introducing new problems that way because you no longer have a separation of concerns.
Another solution would be to make sure that the dependencies are actually required. If there are only a few parts of a module that depend on the other, maybe you could move those bits around in a way that the circular dependency is no longer required, or utilize the way imports work to make the circular dependencies no longer a problem.
The better solution would probably be to move the dependencies into a separate new module. If naming is really the hardest problem about that, then you’re probably doing it right. It might “ruin the organisation of [your] program” but since you have circular dependencies, there is something inherently wrong with your setup anyway.
What others have said about not doing circular imports is the best solution, but if you end up absolutely needing them (possibly for backwards compatibility or clarity of code), it's usually within just one method or function of one of the modules. Thus you can safely do this:
# modA.py
import modB
# modB.py
def functionDependingOnA():
import modA
...
There's a slight overhead to doing the import each time the function is called, but it is rather low unless it's called all the time. (about 400ns in my testing).
You could also do like this to avoid even that lookup:
# modA -- same as above.
# modB.py
_imports = {}
def _importA():
import modA
_imports['modA'] = modA
return modA
def functionDependingOnA():
modA = _imports.get('modA') or _importA()
This version only added 40ns of time on the second and subsequent calls, or about the same amount of time as an empty local function call.
SqlAlchemy uses the Dependency Injection pattern, where the required module is passed to a function by a decorator:
#util.dependencies("sqlalchemy.orm.util")
def identity_key(cls, orm_util, *args, **kwargs):
return orm_util.identity_key(*args, **kwargs)
This approach is basically the same as doing an import inside a function, but has slightly better performance.
the webhandler module needs access to several variables that are generated in the datahandler module
It might make sense to push any "generated" data to a third location. So datahandler functions call config.setvar( name, value ) when appropriate and webhandler functions call config.getvar( name ) when they need to. config would be a third sub-module, containing simple setvar and getvar functions that you write (wrappers around setting/getting elements of a global dictionary would be the simplest approach).
Then the datahandler code would import webhandler, config but webhandler would only need to import config.
I agree with poke however, that the need for such a question betrays the fact that you probably haven't yet got the design finalized as neatly and logically as you thought. If it were me, I would re-think the way modules are divided up.
I am working on my program, GarlicSim, in which a user creates a simulation, then he is able to manipulate it as he desires, and then he can save it to file.
I recently tried implementing the saving feature. The natural thing that occured to me is to pickle the Project object, which contains the entire simulation.
Problem is, the Project object also includes a module-- That is the "simulation package", which is a package/module that contains several critical objects, mostly functions, that define the simulation. I need to save them together with the simulation, but it seems that it is impossible to pickle a module, as I witnessed when I tried to pickle the Project object and an exception was raised.
What would be a good way to work around that limitation?
(I should also note that the simulation package gets imported dynamically in the program.)
If the project somehow has a reference to a module with stuff you need, it sounds like you might want to refactor the use of that module into a class within the module. This is often better anyway, because the use of a module for stuff smells of a big fat global. In my experience, such an application structure will only lead to trouble.
(Of course the quick way out is to save the module's dict instead of the module itself.)
If you have the original code for the simulation package modules, which I presume are dynamically generated, then I would suggest serializing that and reconstructing the modules when loaded. You would do this in the Project.__getstate__() and Project.__setstate__() methods.
Ruby uses require, Python uses import. They're substantially different models, and while I'm more used to the require model, I can see a few places where I think I like import more. I'm curious what things people find particularly easy — or more interestingly, harder than they should be — with each of these models.
In particular, if you were writing a new programming language, how would you design a code-loading mechanism? Which "pros" and "cons" would weigh most heavily on your design choice?
The Python import has a major feature in that it ties two things together -- how to find the import and under what namespace to include it.
This creates very explicit code:
import xml.sax
This specifies where to find the code we want to use, by the rules of the Python search path.
At the same time, all objects that we want to access live under this exact namespace, for example xml.sax.ContentHandler.
I regard this as an advantage to Ruby's require. require 'xml' might in fact make objects inside the namespace XML or any other namespace available in the module, without this being directly evident from the require line.
If xml.sax.ContentHandler is too long, you may specify a different name when importing:
import xml.sax as X
And it is now avalable under X.ContentHandler.
This way Python requires you to explicitly build the namespace of each module. Python namespaces are thus very "physical", and I'll explain what I mean:
By default, only names directly defined in the module are available in its namespace: functions, classes and so.
To add to a module's namespace, you explicitly import the names you wish to add, placing them (by reference) "physically" in the current module.
For example, if we have the small Python package "process" with internal submodules machine and interface, and we wish to present this as one convenient namespace directly under the package name, this is and example of what we could write in the "package definition" file process/__init__.py:
from process.interface import *
from process.machine import Machine, HelperMachine
Thus we lift up what would normally be accessible as process.machine.Machine up to process.Machine. And we add all names from process.interface to process namespace, in a very explicit fashion.
The advantages of Python's import that I wrote about were simply two:
Clear what you include when using import
Explicit how you modify your own module's namespace (for the program or for others to import)
A nice property of require is that it is actually a method defined in Kernel. Thus you can override it and implement your own packaging system for Ruby, which is what e.g. Rubygems does!
PS: I am not selling monkey patching here, but the fact that Ruby's package system can be rewritten by the user (even to work like python's system). When you write a new programming language, you cannot get everything right. Thus if your import mechanism is fully extensible (into totally all directions) from within the language, you do your future users the best service. A language that is not fully extensible from within itself is an evolutionary dead-end. I'd say this is one of the things Matz got right with Ruby.
Python's import provides a very explicit kind of namespace: the namespace is the path, you don't have to look into files to know what namespace they do their definitions in, and your file is not cluttered with namespace definitions. This makes the namespace scheme of an application simple and fast to understand (just look at the source tree), and avoids simple mistakes like mistyping a namespace declaration.
A nice side effect is every file has its own private namespace, so you don't have to worry about conflicts when naming things.
Sometimes namespaces can get annoying too, having things like some.module.far.far.away.TheClass() everywhere can quickly make your code very long and boring to type. In these cases you can import ... from ... and inject bits of another namespace in the current one. If the injection causes a conflict with the module you are importing in, you can simply rename the thing you imported: from some.other.module import Bar as BarFromOtherModule.
Python is still vulnerable to problems like circular imports, but it's the application design more than the language that has to be blamed in these cases.
So python took C++ namespace and #include and largely extended on it. On the other hand I don't see in which way ruby's module and require add anything new to these, and you have the exact same horrible problems like global namespace cluttering.
Disclaimer, I am by no means a Python expert.
The biggest advantage I see to require over import is simply that you don't have to worry about understanding the mapping between namespaces and file paths. It's obvious: it's just a standard file path.
I really like the emphasis on namespacing that import has, but can't help but wonder if this particular approach isn't too inflexible. As far as I can tell, the only means of controlling a module's naming in Python is by altering the filename of the module being imported or using an as rename. Additionally, with explicit namespacing, you have a means by which you can refer to something by its fully-qualified identifier, but with implicit namespacing, you have no means to do this inside the module itself, and that can lead to potential ambiguities that are difficult to resolve without renaming.
i.e., in foo.py:
class Bar:
def myself(self):
return foo.Bar
This fails with:
Traceback (most recent call last):
File "", line 1, in ?
File "foo.py", line 3, in myself
return foo.Bar
NameError: global name 'foo' is not defined
Both implementations use a list of locations to search from, which strikes me as a critically important component, regardless of the model you choose.
What if a code-loading mechanism like require was used, but the language simply didn't have a global namespace? i.e., everything, everywhere must be namespaced, but the developer has full control over which namespace the class is defined in, and that namespace declaration occurs explicitly in the code rather than via the filename. Alternatively, defining something in the global namespace generates a warning. Is that a best-of-both-worlds approach, or is there an obvious downside to it that I'm missing?