When I try to design package structures and class hierarchies within those packages in Python 3 projects, I'm constantly affected with circular import issues. This is whenever I want to implement a class in some module, subclassing a base class from a parent package (i.e. from some __init__.py from some parent directory).
Although I technically understand why that happens, I was not able to come up with a good solution so far. I found some threads here, but none of them go deeper in that particular scenario, let alone mentioning solutions.
In particular:
Putting everything in one file is maybe not great. It potentially can be quite a mass of things. Maybe 90% of the entire project code?! Maybe it wants to define a common subclass of things (whatever, e.g. widget base class for a ui lib), and then just lots of subclasses in nicely organized subpackages? There are obvious reasons why we would not write masses of stuff in one file.
Implementing your base class outside of __init__.py and just importing it there can work for a few places, but messes up the hierarchy with lots of aliases for the same thing (e.g. myproject.BaseClass vs. myproject.foo.BaseClass. Just not nice, is it? It also leads to a lot of boilerplate code that also wants to be maintained.
Implementing it outside, and even not importing it in __init__.py makes those "fully qualified" notations longer everywhere in the code, because it has to contain .foo everywhere I use BaseClass (yes, I usually do import myproject...somemodule, which is not what everybody does, but should not be bad afaik).
Some kinds of rather dirty tricks could help, e.g. defining those subclasses inside some kind of factory methods, so they are not defined at module level. Can work in a few situations, but is just terrible in general.
All of them are maybe okay in a single isolated situation, more as kind of a workaround imho, but in larger scale it ruins a lot. Also, the dirtier the tricks, the more it also breaks the services an IDE can provide.
All kinds of vague statements like 'then your class structure is probably bad and needs reworking' puzzle me a bit. I'm more or less distantly aware of some kinds of general good coding practices, although I never read "Clean Code" or similar stuff.
So, what is wrong with subclassing a class from a parent module from that abstract perspective? And if it is not, what is the secret workaround that people use in Python3? Is there a one, or is everybody basically dealing with one of the mentioned hacks? Sure, everywhere are tradeoffs to be made. But here I'm really struggling, even after many years of writing Python code.
Okay, thank you! I think I was a bit on the wrong way here in understanding the actual problem. It's essentially what #tdelaney said. So it's more complicated than what I initially wrote. It's allowed to do that, but you must not import the submodule in some parent packages. This is what I do here in one place for offering some convenience things. So, it's maybe not perfect, maybe the point where you could argue that my structure is not well designed (but, well, things are always a compromise). At least it's admittedly not true what I posted initially. And the workaround to use a few function level imports is maybe kind of okay in that place.
Related
Django==2.2.5
In the examples below two custom filters and two auxiliary functions.
It is a fake example, not a real code.
Two problems with this code:
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here? To organize a separate module for functions that can be imported? And sort them alphabetically?
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while get_salted_str is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
My code example:
def combine(str1, str2):
return "{}_{}".format(str1, str2)
def get_salted_str(str):
SALT = "slkdjghslkdjfghsldfghaaasd"
return combine(str, SALT)
#register.filter
def get_salted_string(str):
return combine(str, get_salted_str(str))
#register.filter
def get_salted_peppered_string(str):
salted_str = get_salted_str(str)
PEPPER = "1234128712908369735619346"
return "{}_{}".format(PEPPER, salted_str)
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here?
Good documentation and proper modularization.
To organize a separate module for functions that can be imported?
Technically, all functions (except of course nested ones) can be imported. Now I assume you meant: "for functions that are meant to be imported from other modules", but even then, it doesn't mean much - it often happens that a function primarily intended for "internal use" (helper function used within the same module) later becomes useful for other modules.
Also, the proper way to regroup function is not based on whether those are for internal or public use (this is handled by prefixing 'internal use only' functions with a single leading underscore), but on how those functions are related.
NB: I use the term "function" because that's how you phrased your question, but this applies to all other names (classes etc).
And sort them alphabetically?
Bad idea IMHO - it doesn't make any sense from a function POV, and can cause issue when merging diverging branches.
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while "get_salted_str" is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Why would you prevent get_salted_str from being imported by another module actually ?
'protected' (single leading underscore) names are for implementation parts that the module's client code should not mess with nor even be aware of - this is called "encapsulation" -, the goal being to allow for implementation changes that won't break the client code.
In your example, get_salted_str() is a template filter, so it's obviously part of your package's public API.
OTHO, combine really looks like an implementation detail - the fact that some unrelated code in another package may need to combine two strings with the same separator seems mostly accidental, and if you expose combine as part of the module's API you cannot change it's implementation anyway. This is typically an implementation function as far as I can tell from your example (and also it's so trivial that it really doesn't warrant being exposed as ar as I'm concerned).
As a more general level: while avoiding duplication is a very audable goal, you must be careful of overdoing it. Some duplication is actually "accidental" - at some point in time, two totally unrelated parts of the code have a few lines in common, but for totally different reasons, and the forces that may lead to a change in one point of the code are totally unrelated to the other part. So before factoring out seemingly duplicated code, ask yourself if this code is doing the same thing for the same reasons and whether changing this code in one part should affect the other part too.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
This is nothing specific to Django, nor even to Python. Writing well organized code relies on the same heuristics whatever the language: you want high cohesions (all functions / classes etc in a same module should be related and provide solutions to the same problems) and low coupling (a module should depend on as few other modules as possible).
NB: I'm talking about "modules" here but the same rules hold for packages (a package is kind of a super-module) or classes (a class is a kind of mini-module too - except that you can have multiple instances of it).
Now it must be said that proper modularisation - like proper naming etc - IS hard. It takes time and experience (and a lot of reflexion) to develop (no pun intended but...) a "feel" for it, and even then you often find yourself reorganizing things quite a bit during your project's lifetime. And, well, there almost always be some messy area somewhere, because sometimes finding out where a given feature really belongs is a bit of wild guess (hint: look for modules or packages named "util" or "utils" or "helpers" - those are usually where the dev regrouped stuff that didn't clearly belong anywhere else).
There are a lot of ways to go about this, so here is the way I always handle this:
1. Reusable functions in a project
First and foremost: Documentation. When working in a big team you definitely need to document reusable function.
Second, packages. When creating a lot of auxiliary/helper functions, that might have a use outside the current module or app, it can be useful to bundle them all together. I often create a 'base' or 'utils' package in my Django project where I bundle all sorts of functions.
The django.contrib package is a pretty good example of all sorts of helper packages bundled into one.
My rule of thumb is, if I find that I reuse some function/piece of code, I move it to my utils package, and if it's related to something else in that package, I bundle them together. That makes it pretty easy to keep track of all the functions there are.
2. Private functions
Python doesn't really have private members, but the generally accepted way to 'mark' a member as private is to add an underscore, like _get_salted_str
3. Style guide
With regards to auxiliary functions, I'm not aware of any styleguide.
'Private' members : https://docs.python.org/3/tutorial/classes.html#private-variables
The following surprised me, although perhaps it shouldn't have. However, I've never seen this done elsewhere
def bar_method(self):
print( f"in Foo method bar, baz={self.baz}")
# and pages more code in the real world
class Foo(object):
def __init__(self):
self.baz = "Quux"
bar = bar_method
>>> foo = Foo()
>>> foo.bar()
in Foo method bar, baz=Quux
>>>
This follows from the definition of def which is an assignment of a function object to its name in the current context.
The great advantage, as far as I am concerned, is that I can move large method definitions outside the class body and even into a different file, and merely link them into the class with a single assignment. Instantiating the class binds them as usual.
My question is simply, why have I never seen this done before? Is there anything lurking here that might bite me? Or if it's regarded as bad style, why?
(If you are wondering about my context, it's about Django View and Form subclasses. I would dearly love to keep them short, so that the business logic behind them is easy to follow. I'd far rather that methods of only cosmetic significance were moved elsewhere).
The great advantage, as far as I am concerned, is that I can move large method definitions outside the class body and even into a different file
I personnally wouldn't consider this as "a great advantage".
My question is simply, why have I never seen this done before?
Because there are very few good reasons to do so, and a lot of good reasons to NOT do so.
Or if it's regarded as bad style, why?
Because it makes it much harder to understand what's going on, quite simply. Debugging is difficult enough, and each file you have to navigate too adds to the "mental charge".
I would dearly love to keep them short, so that the business logic behind them is easy to follow. I'd far rather that methods of only cosmetic significance were moved elsewhere
Then put the important parts at the beginning of the class statement and the cosmetic ones at the end.
Also, ask yourself whether you're doing things at the right place - as far as I'm concerned, "business logic" mainly belongs to the domain layer (models), and "cosmetic" rather suggests presentation layer (views / templates). Of course some objects (forms and views specially) are actually dealing with both business rules and presentation, but even then you can often move some of this domain part to models (which the view or form will call on) and some of this presentation part to the template layer (using filters / custom tags).
My 2 cents...
NB: sorry if the above sounds a bit patronizing - I don't know what your code looks like, so I can't tell whether you're obviously already aware of this and doing the right thing already and just trying to improve further, or you're a beginner still fighting with how to achieve proper code separation...
I know there are dozens of similar questions about Python imports, so I'll try to phrase it a bit differently. I'll spare you the details of the days of desperation that lie behind me, and instead approach the issue from a more general point of view: What does it take to make imports of the form from package.module.submodule import SomeClass work in Python 3.6 in 2018?
When I look into random modules of large Python projects like Django, Tensorflow or Twisted this seems to be the import pattern they all use, probably because it has the two benefits of a) making clear where an object comes from and b) keeping the actual invocation of that object short and clean.
I thought it can't be wrong to learn from these projects and tried to emulate this pattern in my own package – and as others before me I have run into the hell of circular imports.
Thankfully there are already a lot of posts on this topic, and their recommendations seem to fall into two broad categories:
1) Change the imports
A lot of suggestions go in the direction of using some other form of import syntax. Quite common seems to be the approach to avoid from X.Y import Z statements and instead rely only on import X.Y and then use Y.Z() in the code. Long story short, I have tried this and rewritten my entire package in that manner, and it didn't change anything. It also looks very unpythonic and ugly to me.
2) Fix the circular dependencies
That is the second family of recommendations, and I agree entirely that it would be preferable to dodge this issue entirely by employing good design from the start. The problem is that I don't really have a coherent idea how to do this.
My project is certainly not the pinnacle of excellent software design, but it's also not spaghetti code. It's ten modules in a single package (no submodules) of around 1000 lines of code in total. Each module has a single responsibility: One has the main function, one is a flask instance, another one an API client, another one has all the data models and so on. To me its seems pretty much unavoidable that the imports between these modules (not the classes themselves) sometimes end up being circular. I have of course tried to break these circles (they are usually four or five modules long), but that always leads to other circles, if not immediately, then five commits later.
Now, I'm just a second year CS student and I'm entirely willing to accept that this might just be a point where I still have to learn much, much more. But to start with that it would be really helpful if someone could explain how this problem is dealt with in some of these larger projects mentioned above. Shouldn't they run into circular imports all the time? How do they prevent contributors from introducing a circle somewhere accidentally? Do they constantly refactor there codebase when they find out that an import isn't possible because it would lead to a circle? What's the general approach here?
At this moment we are working on a large project.
This project is supposed to create EDIFACT messages. This is not so hard at first but the catch is there are a lot of customers that have their own implementation of the standard.
On top of that we are working with several EDIFACT standards (D96A and D01B in our case.)
Some customer exceptions might be as small as having a divergent field length, but some have made their own implementation completely different.
At this moment we have listed the customer exceptions in a list (Just to keep them consistent) and in the code we use something like:
if NAME_LENGTH_IS_100 in customer_exceptions:
this.max_length = 100
else:
this.max_length = 70
For a couple of simple exceptions this works just fine, but at this moment the code starts to get really cluttered and we are thinking about refactoring the code.
I am thinking about some kind of factory pattern, but I am not sure about the implementation.
Another option would be to create a base package and make a separate implementation for every customer that is diverging from the standard.
I hope someone can help me out with some advice.
Thanks in advance.
I think your question is too broad to be answered properly (I was up to click the close button because of this but decided otherwise). The reason for this is the following:
There is nothing wrong the code snippet you provided. It should be part of some kind of initialization routine, then this is just fine the way it is. It also doesn't hurt to have things like this in a large amount.
But how to handle more complex cases depends greatly on the cases themselves.
For lots of situations it might be sufficient to have such variables which represent the customer's special choices.
For other aspects I'd propose to have a Customer base class with subclasses thereof, for each customer one (or maybe the customers can even be hierarchically grouped, then a nice inheritance tree could reflect this).
For other cases again I'd propose aspect-oriented programming by use of Python decorators to tweak the behavior of methods, functions, and classes.
Since this depends greatly on your concrete usecases, I think this question cannot be answered more concretely than this.
Why not put the all this in a resource file with the standard as default and each exception handle in a surcharged value, then you'll just need to read the right key for the right client and you code stay clean.
I've fully graduated from writing scripts to writing modules. Now that I have a module full of functions, I'm not quite sure if I should order them in some way.
Alphabetical seems to make sense to me, but I wanted to see if there were others schools of thought on how they should be ordered in a module. Maybe try to approximate the flow of the code or some other method?
I did some searching on this and didn't really find anything, except for that functions need to be defined before calling them, which isn't really relevant to my question.
Thanks for any thoughts people can provide!
Code should be made to be easily readable by a human; Readability counts (from The Zen of Python).
Stick to the conventions of PEP-8, unless you have good reason not to do so.
My suggestion would be to start with the main parts of the module in a sequence that makes sense for this particular module. Helper functions and classes go below that in a top-down fashion.
Modern editors are quite capable of finding function or method definitions in code, so the precise sequence under the top level doesn't matter as much as they used to.
If your editor supports it consider using folding.