Pythonic way to order functions within a module - python

I've fully graduated from writing scripts to writing modules. Now that I have a module full of functions, I'm not quite sure if I should order them in some way.
Alphabetical seems to make sense to me, but I wanted to see if there were others schools of thought on how they should be ordered in a module. Maybe try to approximate the flow of the code or some other method?
I did some searching on this and didn't really find anything, except for that functions need to be defined before calling them, which isn't really relevant to my question.
Thanks for any thoughts people can provide!

Code should be made to be easily readable by a human; Readability counts (from The Zen of Python).
Stick to the conventions of PEP-8, unless you have good reason not to do so.
My suggestion would be to start with the main parts of the module in a sequence that makes sense for this particular module. Helper functions and classes go below that in a top-down fashion.
Modern editors are quite capable of finding function or method definitions in code, so the precise sequence under the top level doesn't matter as much as they used to.
If your editor supports it consider using folding.

Related

Python: What is wrong with subclassing a class from a parent package?

When I try to design package structures and class hierarchies within those packages in Python 3 projects, I'm constantly affected with circular import issues. This is whenever I want to implement a class in some module, subclassing a base class from a parent package (i.e. from some __init__.py from some parent directory).
Although I technically understand why that happens, I was not able to come up with a good solution so far. I found some threads here, but none of them go deeper in that particular scenario, let alone mentioning solutions.
In particular:
Putting everything in one file is maybe not great. It potentially can be quite a mass of things. Maybe 90% of the entire project code?! Maybe it wants to define a common subclass of things (whatever, e.g. widget base class for a ui lib), and then just lots of subclasses in nicely organized subpackages? There are obvious reasons why we would not write masses of stuff in one file.
Implementing your base class outside of __init__.py and just importing it there can work for a few places, but messes up the hierarchy with lots of aliases for the same thing (e.g. myproject.BaseClass vs. myproject.foo.BaseClass. Just not nice, is it? It also leads to a lot of boilerplate code that also wants to be maintained.
Implementing it outside, and even not importing it in __init__.py makes those "fully qualified" notations longer everywhere in the code, because it has to contain .foo everywhere I use BaseClass (yes, I usually do import myproject...somemodule, which is not what everybody does, but should not be bad afaik).
Some kinds of rather dirty tricks could help, e.g. defining those subclasses inside some kind of factory methods, so they are not defined at module level. Can work in a few situations, but is just terrible in general.
All of them are maybe okay in a single isolated situation, more as kind of a workaround imho, but in larger scale it ruins a lot. Also, the dirtier the tricks, the more it also breaks the services an IDE can provide.
All kinds of vague statements like 'then your class structure is probably bad and needs reworking' puzzle me a bit. I'm more or less distantly aware of some kinds of general good coding practices, although I never read "Clean Code" or similar stuff.
So, what is wrong with subclassing a class from a parent module from that abstract perspective? And if it is not, what is the secret workaround that people use in Python3? Is there a one, or is everybody basically dealing with one of the mentioned hacks? Sure, everywhere are tradeoffs to be made. But here I'm really struggling, even after many years of writing Python code.
Okay, thank you! I think I was a bit on the wrong way here in understanding the actual problem. It's essentially what #tdelaney said. So it's more complicated than what I initially wrote. It's allowed to do that, but you must not import the submodule in some parent packages. This is what I do here in one place for offering some convenience things. So, it's maybe not perfect, maybe the point where you could argue that my structure is not well designed (but, well, things are always a compromise). At least it's admittedly not true what I posted initially. And the workaround to use a few function level imports is maybe kind of okay in that place.

Style guide: how to not get into a mess in a big project

Django==2.2.5
In the examples below two custom filters and two auxiliary functions.
It is a fake example, not a real code.
Two problems with this code:
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here? To organize a separate module for functions that can be imported? And sort them alphabetically?
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while get_salted_str is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
My code example:
def combine(str1, str2):
return "{}_{}".format(str1, str2)
def get_salted_str(str):
SALT = "slkdjghslkdjfghsldfghaaasd"
return combine(str, SALT)
#register.filter
def get_salted_string(str):
return combine(str, get_salted_str(str))
#register.filter
def get_salted_peppered_string(str):
salted_str = get_salted_str(str)
PEPPER = "1234128712908369735619346"
return "{}_{}".format(PEPPER, salted_str)
When a project becomes big I forget what aux functions I have already written. Not to mention team programming. What is the solution here?
Good documentation and proper modularization.
To organize a separate module for functions that can be imported?
Technically, all functions (except of course nested ones) can be imported. Now I assume you meant: "for functions that are meant to be imported from other modules", but even then, it doesn't mean much - it often happens that a function primarily intended for "internal use" (helper function used within the same module) later becomes useful for other modules.
Also, the proper way to regroup function is not based on whether those are for internal or public use (this is handled by prefixing 'internal use only' functions with a single leading underscore), but on how those functions are related.
NB: I use the term "function" because that's how you phrased your question, but this applies to all other names (classes etc).
And sort them alphabetically?
Bad idea IMHO - it doesn't make any sense from a function POV, and can cause issue when merging diverging branches.
Some functions from here may be reused outside this package, and some may not. Say, the combine function seems to be reusable, while "get_salted_str" is definitely for this module only. I think that it is better to distinguish between functions that may be imported and those that may not. Is it better to use underline symbol to mark unimported functions? Like this: _get_salted_str. This may ease the first problem a bit.
Why would you prevent get_salted_str from being imported by another module actually ?
'protected' (single leading underscore) names are for implementation parts that the module's client code should not mess with nor even be aware of - this is called "encapsulation" -, the goal being to allow for implementation changes that won't break the client code.
In your example, get_salted_str() is a template filter, so it's obviously part of your package's public API.
OTHO, combine really looks like an implementation detail - the fact that some unrelated code in another package may need to combine two strings with the same separator seems mostly accidental, and if you expose combine as part of the module's API you cannot change it's implementation anyway. This is typically an implementation function as far as I can tell from your example (and also it's so trivial that it really doesn't warrant being exposed as ar as I'm concerned).
As a more general level: while avoiding duplication is a very audable goal, you must be careful of overdoing it. Some duplication is actually "accidental" - at some point in time, two totally unrelated parts of the code have a few lines in common, but for totally different reasons, and the forces that may lead to a change in one point of the code are totally unrelated to the other part. So before factoring out seemingly duplicated code, ask yourself if this code is doing the same thing for the same reasons and whether changing this code in one part should affect the other part too.
Does Django style guide or any other pythonic style guide mention solutions to the two above mentioned problems?
This is nothing specific to Django, nor even to Python. Writing well organized code relies on the same heuristics whatever the language: you want high cohesions (all functions / classes etc in a same module should be related and provide solutions to the same problems) and low coupling (a module should depend on as few other modules as possible).
NB: I'm talking about "modules" here but the same rules hold for packages (a package is kind of a super-module) or classes (a class is a kind of mini-module too - except that you can have multiple instances of it).
Now it must be said that proper modularisation - like proper naming etc - IS hard. It takes time and experience (and a lot of reflexion) to develop (no pun intended but...) a "feel" for it, and even then you often find yourself reorganizing things quite a bit during your project's lifetime. And, well, there almost always be some messy area somewhere, because sometimes finding out where a given feature really belongs is a bit of wild guess (hint: look for modules or packages named "util" or "utils" or "helpers" - those are usually where the dev regrouped stuff that didn't clearly belong anywhere else).
There are a lot of ways to go about this, so here is the way I always handle this:
1. Reusable functions in a project
First and foremost: Documentation. When working in a big team you definitely need to document reusable function.
Second, packages. When creating a lot of auxiliary/helper functions, that might have a use outside the current module or app, it can be useful to bundle them all together. I often create a 'base' or 'utils' package in my Django project where I bundle all sorts of functions.
The django.contrib package is a pretty good example of all sorts of helper packages bundled into one.
My rule of thumb is, if I find that I reuse some function/piece of code, I move it to my utils package, and if it's related to something else in that package, I bundle them together. That makes it pretty easy to keep track of all the functions there are.
2. Private functions
Python doesn't really have private members, but the generally accepted way to 'mark' a member as private is to add an underscore, like _get_salted_str
3. Style guide
With regards to auxiliary functions, I'm not aware of any styleguide.
'Private' members : https://docs.python.org/3/tutorial/classes.html#private-variables

How do the for / while / print *things* work in python?

What i mean is, how is the syntax defined, i.e. how can i make my own constructs like these?
I realise in a lot of languages, things like this will be built into the compiler / spec, and so it's dealt with by the compiler (at least that how i understand it to work).
But with python, everything i've come across so far has been accessible to the programmer, and so you more or less have the freedom to do whatever you want.
How would i go about writing my own version of for or while? Is it even possible?
I don't have any actual application for this, so the answer to any WHY?! questions is just "because why not?" or "curiosity".
No, you can't, not from within Python. You can't add new syntax to the language. (You'd have to modify the source code of Python itself to make your own custom version of Python.)
Note that the iterator protocol allows you to define objects that can be used with for in a custom way, which covers a lot of the possible use cases of writing your own iteration syntax.
Well, you have a couple of options for creating your own syntax:
Write a higher-order function, like map or reduce.
Modify python at the C level. This is, as you might expect, relatively easy as compared with fiddling with many other languages. See this article for an example: http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/
Fake it using the debug facilities, or the encodings facility. See this code: http://entrian.com/goto/download.html and http://timhatch.com/projects/pybraces/
Use a preprocessor. Here's one project that tries to make this easy: http://www.fiber-space.de/langscape/doc/index.html
Use of the python facilities built in to achieve a similar effect (decorators, metaclasses, and the like).
Obviously, none of this is quite what you're looking for, but python, unlike smalltalk or lisp, isn't (necessarily) programmed in itself and guarantees to expose its own underlying execution and parsing mechanisms at runtime.
You can't make equivalent constructs. for, while, if etc. are statements, and they are built into the language with their own specific syntax. There are languages that do allow this sort of thing though (to some degree), such as Scala.
while, print, for etc. are keywords. That means they are parsed by the python parser whilst reading the code, stripped any redundant characters and result in tokens. Afterwards a lexer takes those tokens as input and builds a program tree which is then excuted by the interpreter. Said so, those constructs are used only as syntactic sugar for underlying lexical machinery and as such are not visible from inside the code.

Reverse Engineer Auto-Generated C?

How easy is it to reverse engineer an auto-generated C code? I am working on a Python project and as part of my work, am using Cython to compile the code for speedup purposes.
This does help in terms of speed, yet, I am concerned that where I work, some people would try to "peek" into the code and figure out what it does.
Cython code is basically an auto-generated C. Is it very hard to reverse engineer it?
Are there any recommendations that would make the code safer and reverse-engineering harder to do? (I assume that with enough effort, everything can be reversed engineered).
Okay -- to attempt to answer your question more directly: most auto-generated C code is fairly ugly, so somebody would need to be fairly motivated to reverse engineer it. At the same time, I don't believe I've never looked at what Cython generates, so I'm not sure how it looks.
In addition, a lot of auto-generated code is done in the form of things like state machine tables, that most programmers find fairly difficult to follow even at best. The tendency (in many cases) is to have a generic framework, with tables of data that the framework more or less "interprets" at run-time. This isn't necessarily impossible to follow, but it's enough different from most typical code that most people will give up on it fairly quickly (and if they do much, they'll typically waste a lot of time looking at the framework instead of the data, which is what really matters in cases like this).
I'll repeat, however, that I'm pretty sure I haven't looked at what Cython produces, so I can't say much about it with any real certainty.
There are (or at least used to be) commercial obfuscators intended to make C source code difficult to understand. I suspect the availability of Perl has taken a lot of the market share from them, but if you look you may still be able to find one and use it.
Absent that, it's not terribly difficult to write an obfuscator of your own, but the degree of effectiveness will probably vary with the amount of effort you're willing to put into it. Just systematically renaming any meaningful variable names into things like _ and __ can do quite a bit (e.g., profit = sales - costs; is a lot more meaningful than _ = _I_ - _i_;). Depending on the machine generated code in question, however, this may not really accomplish much -- obfuscating a generic framework may not make much difference in understanding what your code does -- and if they figure out the procedure you're following, they may be able to simply replicate the correct framework code and transplant the pieces specific to your program into the un-obfuscated framework.
You should really take a look at the code that Cython produces. To help with debugging, for example, it copies the complete Python source code into the generated file, marking each source line before generating C code for it. That makes it very easy to find the code section that you are interested in.
A very nice feature is that you can compile your code with the "-a" (annotate) option, and it will spit out an HTML file next to the C file that contains the annotated Python code. When you click on a line, you'll see the C code for that line. As a bonus, it marks lines that do a lot of Python processing in dark yellow, so that you get a simple indicator where to look for potential optimisations.
There's also special gdb support in Cython now, so you can do Cython source level debugging etc.
Ah, I guess I missed the bit that you were talking about the compiled module, whereas I was only referring to the source code that Cython generates. I agree with Jerry that it will be fairly tricky to extract something useful from the compiled module, as long as you keep the gdb support disabled (the default) and strip the debugging symbols. That is because the C compiler will do lots of inlining of helper functions all over the place and apply various low-level code optimisations, thus making it harder to extract the original macro level code patterns. However, you will see named C-API calls to CPython, and you will also see function names from your own code. Cython isn't specifically designed for code obfuscation, quite the opposite. But readable assembly has certainly never been a design goal.

Is `import module` better coding style than `from module import function`?

Let from module import function be called the FMIF coding style.
Let import module be called the IM coding style.
Let from package import module be called the FPIM coding style.
Why is IM+FPIM considered a better coding style than FMIF? (See this post for the inspiration for this question.)
Here are some criteria which lead me to prefer FMIF over IM:
Shortness of code: It allows me to use shorter function names and thus help stick to the 80 columns-per-line convention.
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...). Although this is a subjective criterion, I think most people would agree.
Ease of redirection: If I use FMIF and for some reason at some later time want to redirect python to define function from alt_module instead of module I need to change just one line: from alt_module import function. If I were to use IM, I'd need to change many lines of code.
I realize FPIM goes some way to nullifying the first two issues, but what about the third?
I am interested in all reasons why IM+FPIM may be better than FMIF,
but in particular, I'd be interested in elaboration on the following points mentioned here:
Pros for IM:
ease of mocking/injecting in tests. (I am not very familiar with mocking, though I recently learned what the term means. Can you show code which demonstrates how IM is better than FMIF here?)
ability for a module to change flexibly by redefining some entries. (I must be misunderstanding something, because this seems to be an advantage of FMIF over IM. See my third reason in favor of FMIF above.)
predictable and controllable behavior on serialization and recovery of your data. (I really don't understand how the choice of IM or FMIF affects this issue. Please elaborate.)
I understand that FMIF "pollutes my namespace", but beyond being a negative-sounding phrase, I don't appreciate how this hurts the code in any concrete way.
PS. While writing this question I received a warning that the question appears subjective and is likely to be closed. Please don't close it. I'm not looking for subjective opinion, but rather concrete coding situations where IM+FPIM is demonstrably better than FMIF.
Many thanks.
The negatives you list for IM/FPIM can often be ameliorated by appropriate use of an as clause. from some.package import mymodulewithalongname as mymod can usefully shorten your code and enhance its readability, and if you rename mymodulewithalongname to somethingcompletelydifferent tomorrow, the as clause can be used as a single statement to edit.
Consider your pro-FMIF point 3 (call it R for redirection) vs your pro-FPIM point 2 (call it F for flexibility): R amounts to facilitating the loss of integrity of module boundaries, while F strenghtens it. Multiple functions, classes and variables in a module are often intended to work together: they should not be independently switched to different meanings. For example, consider module random and its functions seed and uniform: if you were to switch the import of just one of them to a different module, then you'd break the normal connection between calls to seed and results of calls to uniform. When a module is well designed, with cohesion and integrity, R's facilitation of breaking down the module's boundaries is actually a negative -- it makes it easier to do something you're better off not doing.
Vice versa, F is what enables coordinated switching of coupled functions, classes, and variables (so, generally, of entities that belong together, by modularity). For example, to make testing repeatable (FPIM pro-point 1), you mock both seed and random in the random module, and if your code follows FPIM, you're all set, coordination guaranteed; but if you have code that has imported the functions directly, you have to hunt down each such module and repeat the mocking over and over and over again. Making tests perfectly repeatable typically also requires "coordinated mocking" of date and time functions -- if you use from datetime import datetime in some modules, you need to find and mock them all (as well as all those doing from time import time, and so forth) to ensure that all the times received when the various parts of the system ask "so what time is it now?" are perfectly consistent (if you use FPIM, you just mock the two relevant modules).
I like FPIM, because there's really not much added value by using a multiply qualified name rather than a singly qualified one (while the difference between barenames and qualified names is huge -- you get so much more control with a qualified name, be it singly or multiply, than you possibly ever can with a barename!).
Ah well, can't devote all of the working day to responding to each and every one of your points -- your question should probably be half a dozen questions;-). I hope this at least addresses "why is F better than R" and some of the mocking/testing issues -- it boils down to preserving and enhancing well-designed modularity (via F) rather than undermining it (via R).
The classic text on this, as so often, is from Fredrik Lundh, the effbot. His advice: always use import - except when you shouldn't.
In other words, be sensible. Personally I find that anything that's several modules deep tends to get imported via from x.y.z import a - the main example being Django models. But as much as anything else it's a matter of style, and you should have a consistent one - especially with modules like datetime, where both the module and the class it contains are called the same thing. Do you need to write datetime.datetime.now() or just datetime.now()? (In my code, always the former.)
Items 1 and 2 in your list of questions seem to be the same issue. Python's dynamic nature means it is fairly simple to replace an item in a module's namespace no matter which of the methods you use. The difficulty comes if one function in a module refers to another, which is the one you want to mock. In this case, importing the module rather than the functions means you can do module.function_to_replace = myreplacementfunc and everything works transparently - but that is as easy to do via FPIM as it is via IM.
I also don't understand how item 3 has anything to do with anything. I think your item 4, however, is based on a bit of a misunderstanding. None of the methods you give will 'pollute your namespace'. What does do that is from module import *, where you have no idea at all what you're importing and so functions can appear in your code with no clue given to the reader where they came from. That's horrible, and should be avoided at all costs.
Great answers here (I upvoted them all), and here are my thoughts on this matter:
First, addressing each of your bullets:
(Allegedly) Pros of FMIF:
Shortness of code: shorter function names help stick to the 80 columns-per-line.
Perhaps, but module names are usually short enough so this is not relevant. Sure, there's datetime, but also os, re, sys, etc. And Python has free line breaks inside { [ (. And for nested modules there's always as in both IM and FPIM
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...).
Strongly disagree. When reading foreign code (or my own code after a few months) it's hard to know where each function comes from. Qualified names saves me from going back and forth from line 2345 to module declarations header. And it also gives you context: "chisquare? What's that? Oh, it's from scypy? Ok, some math-related stuff then". And, once again, you can always abbreviate scipy.stats.stats as scypyst. scypyst.chisquare(...) is short enough with all benefits of a qualified name.
import os.path as osp is another good example, considering it's very common to chain 3 or more of its functions together in a single call: join(expanduser(),basename(splitext())) etc.
Ease of redirection: one-line redefinition of a function from altmodule instead of module.
How often you want to redefine a single function but not whole module? Module boundaries and function coordination should be preserved, and Alex already explained this in great depth. For most (all?) real-world scenarios, if alt_module.x is a viable replacement for module.x, then probably alt_module itself is a drop in alternative for module, so both IM and FPIM are one-liners just like FMIF, provided you use as.
I realize FPIM goes some way to nullifying the first two issues...
Actually, as is the one that mitigates the first 2 issues (and the 3rd), not FPIM. You can use IM for that too: import some.long.package.path.x as x for the same result as FPIM.
So none of the above are really pros of FMIF. And the reasons I prefer IM/FPIM are:
For the sake of simplicity and consistency, when I import something, either IM or FPIM, I'm always importing a module, not an object from a module. Remember FMIF can be (ab-)used to import functions, classes, variables, or even other modules! Think about the mess of from somemodule import sys, somevar, os, SomeClass, datetime, someFunc.
Also, if you want more than a single object from a module, FMIF will pollute your namespace more than IM or FPIM, which will use a single name no matter how many objects you want to use. And such objects will have a qualified name, which is a pro, not a con: as I've said in issue 2, IMHO a it improves readability.
it all comes down to consistency, simplicity, organization. "Import modules, not objects" is a good, easy mind model to stick with.
Like Alex Martelli, I am fond of using as when importing a function.
One thing I have done is to use some prefix on all the functions that were imported from the same module:
from random import seed as r_seed
from random import random as r_random
r_seed is shorter to type than random.seed but somewhat preserves the module boundaries. Someone casually looking at your code can see r_seed() and r_random() and have a chance to grok that they are related.
Of course, you can always simply do:
import random as r
and then use r.random() and r.seed(), which may be the ideal compromise for this case. I only use the prefix trick when I'm importing one or two functions from a module. When I want to use many functions from the same module, I'll import the module, perhaps with an as to shorten the name.
I agree with MestreLion the most here (and so an upvote).
My perspective: I review code frequently that I am unfamiliar with, and not knowing what module a function is coming from just looking at the function is quite frustrating.
Code is written once and read many times, and so readability and maintainability trumps ease of typing.
In a similar vein, typically code is not being written for the benefit of the coder, but for the benefit of another entity.
Your code should be readable to someone who knows python better than you, but is unfamiliar with the code.
Full path imports can also better help IDE's point you at the correct source of the function or object you're looking at.
For all of these reasons and the reasons MestreLion noted, I conclude that it is best practice to import and use the full path.

Categories

Resources