Related
When I try to design package structures and class hierarchies within those packages in Python 3 projects, I'm constantly affected with circular import issues. This is whenever I want to implement a class in some module, subclassing a base class from a parent package (i.e. from some __init__.py from some parent directory).
Although I technically understand why that happens, I was not able to come up with a good solution so far. I found some threads here, but none of them go deeper in that particular scenario, let alone mentioning solutions.
In particular:
Putting everything in one file is maybe not great. It potentially can be quite a mass of things. Maybe 90% of the entire project code?! Maybe it wants to define a common subclass of things (whatever, e.g. widget base class for a ui lib), and then just lots of subclasses in nicely organized subpackages? There are obvious reasons why we would not write masses of stuff in one file.
Implementing your base class outside of __init__.py and just importing it there can work for a few places, but messes up the hierarchy with lots of aliases for the same thing (e.g. myproject.BaseClass vs. myproject.foo.BaseClass. Just not nice, is it? It also leads to a lot of boilerplate code that also wants to be maintained.
Implementing it outside, and even not importing it in __init__.py makes those "fully qualified" notations longer everywhere in the code, because it has to contain .foo everywhere I use BaseClass (yes, I usually do import myproject...somemodule, which is not what everybody does, but should not be bad afaik).
Some kinds of rather dirty tricks could help, e.g. defining those subclasses inside some kind of factory methods, so they are not defined at module level. Can work in a few situations, but is just terrible in general.
All of them are maybe okay in a single isolated situation, more as kind of a workaround imho, but in larger scale it ruins a lot. Also, the dirtier the tricks, the more it also breaks the services an IDE can provide.
All kinds of vague statements like 'then your class structure is probably bad and needs reworking' puzzle me a bit. I'm more or less distantly aware of some kinds of general good coding practices, although I never read "Clean Code" or similar stuff.
So, what is wrong with subclassing a class from a parent module from that abstract perspective? And if it is not, what is the secret workaround that people use in Python3? Is there a one, or is everybody basically dealing with one of the mentioned hacks? Sure, everywhere are tradeoffs to be made. But here I'm really struggling, even after many years of writing Python code.
Okay, thank you! I think I was a bit on the wrong way here in understanding the actual problem. It's essentially what #tdelaney said. So it's more complicated than what I initially wrote. It's allowed to do that, but you must not import the submodule in some parent packages. This is what I do here in one place for offering some convenience things. So, it's maybe not perfect, maybe the point where you could argue that my structure is not well designed (but, well, things are always a compromise). At least it's admittedly not true what I posted initially. And the workaround to use a few function level imports is maybe kind of okay in that place.
This may be a subjective question so I understand if it gets shut down, but this is something I've been wondering about ever since I started to learn python in a more serious way.
Is there a generally accepted 'best practice' about whether importing an additional module to accomplish a task more cleanly is better than avoiding the call and 'working around it'?
For example, I had some feedback on a script I worked on recently, and the suggestion was that I could have replaced the code below with a glob.glob() call. I avoided this at the time, because it meant adding another import that seemed unecessary to me (and the actual flow of filtering the lines just meshed with my thought process for the task).
headers = []
with open(hhresult_file) as result_fasta:
for line in result_fasta:
if line.startswith(">"):
line = line.split("_")[0]
headers.append(line.replace(">",""))
Similarly, I decided to use an os.rename() call later in the script for moving some files rather than import shutil.
Is there a right answer here? Are there any overheads associated with calling additional modules and creating more dependencies (lets say for instance, that the module wasn't a built-in python module) vs. writing a slightly 'messier' code using modules that are already in your script?
This is quite a broad question, but I'll try to answer it succinctly.
There is no real best practice, however, it is generally a good idea to recycle code that's already been written by others. If you find a bug in the imported code, it's more beneficial than finding one in your own code because you can submit a ticket to the author and have it fixed for potentially a large group of people.
There are certainly considerations to be made when making additional imports, mostly when they are not part of the Python standard library.
Sometimes adding in a package that is a little bit too 'magical' makes code harder to understand, because it's another library or file that somebody has to look up to understand what is going on, versus just a few lines that might not be as sophisticated as the third party library, but get the job done regardless.
If you can get away with not making additional imports, you probably should, but if it would save you substantial amounts of time and headache, it's probably worth importing something that has been pre-written to deal with the problem you're facing.
It's a continual consideration that has to be made.
How easy is it to reverse engineer an auto-generated C code? I am working on a Python project and as part of my work, am using Cython to compile the code for speedup purposes.
This does help in terms of speed, yet, I am concerned that where I work, some people would try to "peek" into the code and figure out what it does.
Cython code is basically an auto-generated C. Is it very hard to reverse engineer it?
Are there any recommendations that would make the code safer and reverse-engineering harder to do? (I assume that with enough effort, everything can be reversed engineered).
Okay -- to attempt to answer your question more directly: most auto-generated C code is fairly ugly, so somebody would need to be fairly motivated to reverse engineer it. At the same time, I don't believe I've never looked at what Cython generates, so I'm not sure how it looks.
In addition, a lot of auto-generated code is done in the form of things like state machine tables, that most programmers find fairly difficult to follow even at best. The tendency (in many cases) is to have a generic framework, with tables of data that the framework more or less "interprets" at run-time. This isn't necessarily impossible to follow, but it's enough different from most typical code that most people will give up on it fairly quickly (and if they do much, they'll typically waste a lot of time looking at the framework instead of the data, which is what really matters in cases like this).
I'll repeat, however, that I'm pretty sure I haven't looked at what Cython produces, so I can't say much about it with any real certainty.
There are (or at least used to be) commercial obfuscators intended to make C source code difficult to understand. I suspect the availability of Perl has taken a lot of the market share from them, but if you look you may still be able to find one and use it.
Absent that, it's not terribly difficult to write an obfuscator of your own, but the degree of effectiveness will probably vary with the amount of effort you're willing to put into it. Just systematically renaming any meaningful variable names into things like _ and __ can do quite a bit (e.g., profit = sales - costs; is a lot more meaningful than _ = _I_ - _i_;). Depending on the machine generated code in question, however, this may not really accomplish much -- obfuscating a generic framework may not make much difference in understanding what your code does -- and if they figure out the procedure you're following, they may be able to simply replicate the correct framework code and transplant the pieces specific to your program into the un-obfuscated framework.
You should really take a look at the code that Cython produces. To help with debugging, for example, it copies the complete Python source code into the generated file, marking each source line before generating C code for it. That makes it very easy to find the code section that you are interested in.
A very nice feature is that you can compile your code with the "-a" (annotate) option, and it will spit out an HTML file next to the C file that contains the annotated Python code. When you click on a line, you'll see the C code for that line. As a bonus, it marks lines that do a lot of Python processing in dark yellow, so that you get a simple indicator where to look for potential optimisations.
There's also special gdb support in Cython now, so you can do Cython source level debugging etc.
Ah, I guess I missed the bit that you were talking about the compiled module, whereas I was only referring to the source code that Cython generates. I agree with Jerry that it will be fairly tricky to extract something useful from the compiled module, as long as you keep the gdb support disabled (the default) and strip the debugging symbols. That is because the C compiler will do lots of inlining of helper functions all over the place and apply various low-level code optimisations, thus making it harder to extract the original macro level code patterns. However, you will see named C-API calls to CPython, and you will also see function names from your own code. Cython isn't specifically designed for code obfuscation, quite the opposite. But readable assembly has certainly never been a design goal.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to know the basic principles and etiquette of writing a well structured code.
Read Code Complete, it will do wonders for everything. It'll show you where, how, and when things matter. It's pretty much the Bible of software development (IMHO.)
These are the most important two things to keep in mind when you are writing code:
Don't write code that you've already written.
Don't write code that you don't need to write.
MATLAB Programming Style Guidelines by Richard Johnson is a good resource.
Well, if you want it in layman's terms:
I reccomend people to write the shortest readable program that works.
There are a lot more rules about how to format code, name variables, design classes, separate responsibilities. But you should not forget that all of those rules are only there to make sure that your code is easy to check for errors, and to ensure it is maintainable by someone else than the original author. If keep the above reccomendation in mind, your progam will be just that.
This list could go on for a long time but some major things are:
Indent.
Descriptive variable names.
Descriptive class / function names.
Don't duplicate code. If it needs duplication put in a class / function.
Use gettors / settors.
Only expose what's necessary in your objects.
Single dependency principle.
Learn how to write good comments, not lots of comments.
Take pride in your code!
Two good places to start:
Clean-Code Handbook
Code-Complete
If you want something to use as a reference or etiquette, I often follow the official Google style conventions for whatever language I'm working in, such as for C++ or for Python.
The Practice of Programming by Rob Pike and Brian W. Kernighan also has a section on style that I found helpful.
First of all, "codes" is not the right word to use. A code is a representation of another thing, usually numeric. The correct words are "source code", and the plural of source code is source code.
--
Writing good source code:
Comment your code.
Use variable names longer than several letters. Between 5 and 20 is a good rule of thumb.
Shorter lines of code is not better - use whitespace.
Being "clever" with your code is a good way to confuse yourself or another person later on.
Decompose the problem into its components and use hierarchical design to assemble the solution.
Remember that you will need to change your program later on.
Comment your code.
There are many fads in computer programming. Their proponents consider those who are not following the fad unenlightened and not very with-it. The current major fads seem to be "Test Driven Development" and "Agile". The fad in the 1990s was 'Object Oriented Programming'. Learn the useful core parts of the ideas that come around, but don't be dogmatic and remember that the best program is one that is getting the job done that it needs to do.
very trivial example of over-condensed code off the top of my head
for(int i=0,j=i; i<10 && j!=100;i++){
if i==j return i*j;
else j*=2;
}}
while this is more readable:
int j = 0;
for(int i = 0; i < 10; i++)
{
if i == j
{
return i * j;
}
else
{
j *= 2;
if(j == 100)
{
break;
}
}
}
The second example has the logic for exiting the loop clearly visible; the first example has the logic entangled with the control flow. Note that these two programs do exactly the same thing. My programming style takes up a lot of lines of code, but I have never once encountered a complaint about it being hard to understand stylistically, while I find the more condensed approaches frustrating.
An experienced programmer can and will read both - the above may make them pause for a moment and consider what is happening. Forcing the reader to sit down and stare at the code is not a good idea. Code needs to be obvious. Each problem has an intrinsic complexity to expressing its solution. Code should not be more complex than the solution complexity, if at all possible.
That is the essence of what the other poster tried to convey - don't make the program longer than need be. Longer has two meanings: more lines of code (ie, putting braces on their own line), and more complex. Making a program more complex than need be is not good. Making it more readable is good.
Have a look to
97 Things Every Programmer Should Know.
It's free and contains a lot of gems like this one:
There is one quote that I think is
particularly good for all software
developers to know and keep close to
their hearts:
Beauty of style and harmony and grace
and good rhythm depends on simplicity.
— Plato
In one sentence I think this sums up
the values that we as software
developers should aspire to.
There are a number of things we strive
for in our code:
Readability
Maintainability
Speed of development
The elusive quality of beauty
Plato is telling us that the enabling
factor for all of these qualities is
simplicity.
The Python Style Guide is always a good starting point!
European Standards For Writing and Documenting Exchangeable Fortran 90 Code have been in my bookmarks, like forever. Also, there was a thread in here, since you are interested in MATLAB, on organising MATLAB code.
Personally, I've found that I learned more about programming style from working through SICP which is the MIT Intro to Comp SCI text (I'm about a quarter of the way through.) Than any other book. That being said, If you're going to be working in Python, the Google style guide is an excellent place to start.
I read somewhere that most programs (scripts anyways) should never be more than a couple of lines long. All the requisite functionality should be abstracted into functions or classes. I tend to agree.
Many good points have been made above. I definitely second all of the above. I would also like to add that spelling and consistency in coding be something you practice (and also in real life).
I've worked with some offshore teams and though their English is pretty good, their spelling errors caused a lot of confusion. So for instance, if you need to look for some function (e.g., getFeedsFromDatabase) and they spell database wrong or something else, that can be a big or small headache, depending on how many dependencies you have on that particular function. The fact that it gets repeated over and over within the code will first off, drive you nuts, and second, make it difficult to parse.
Also, keep up with consistency in terms of naming variables and functions. There are many protocols to go by but as long as you're consistent in what you do, others you work with will be able to better read your code and be thankful for it.
Pretty much everything said here, and something more. In my opinion the best site concerning what you're looking for (especially the zen of python parts are fun and true)
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Talks about both PEP-20 and PEP-8, some easter eggs (fun stuff), etc...
You can have a look at the Stanford online course: Programming Methodology CS106A. The instructor has given several really good instruction for writing source code.
Some of them are as following:
write programs for people to read, not just for computers to read. Both of them need to be able to read it, but it's far more important that a person reads it and understands it, and that the computer still executes it correctly. But that's the first major software engineering principle to
think about.
How to make comments:
put in comments to clarify things in the program, which are not obvious
How to make decomposition
One method solves one problem
Each method has code approximate 1~15lines
Give methods good names
Write comment for code
Unit Tests
Python and matlab are dynamic languages. As your code base grows, you will be forced to refactor your code. In contrast to statically typed languages, the compiler will not detect 'broken' parts in your project. Using unit test frameworks like xUnit not only compensate missing compiler checks, they allow refactoring with continuous verification for all parts of your project.
Source Control
Track your source code with a version control system like svn, git or any other derivative. You'll be able to back and forth in your code history, making branches or creating tags for deployed/released versions.
Bug Tracking
Use a bug tracking system, if possible connected with your source control system, in order to stay on top of your issues. You may not be able, or forced, to fix issues right away.
Reduce Entropy
While integrating new features in your existing code base, you will add more lines of code, and potentially more complexity. This will increase entropy. Try to keep your design clean, by introducing an interface, or inheritance hierarchy in order to reduce entropy again. Not paying attention to code entropy will render your code unmaintainable over time.
All of The Above Mentioned
Pure coding related topics, like using a style guide, not duplicating code, ...,
has already been mentioned.
A small addition to the wonderful answers already here regarding Matlab:
Avoid long scripts, instead write functions (sub routines) in separate files. This will make the code more readable and easier to optimize.
Use Matlab's built-in functions capabilities. That is, learn about the many many functions that Matlab offers instead of reinventing the wheel.
Use code sectioning, and whatever the other code structure the newest Matlab version offers.
Learn how to benchmark your code using timeit and profile . You'll discover that sometimes for loops are the better solution.
The best advice I got when I asked this question was as follows:
Never code while drunk.
Make it readable, make it intuitive, make it understandable, and make it commented.
Let from module import function be called the FMIF coding style.
Let import module be called the IM coding style.
Let from package import module be called the FPIM coding style.
Why is IM+FPIM considered a better coding style than FMIF? (See this post for the inspiration for this question.)
Here are some criteria which lead me to prefer FMIF over IM:
Shortness of code: It allows me to use shorter function names and thus help stick to the 80 columns-per-line convention.
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...). Although this is a subjective criterion, I think most people would agree.
Ease of redirection: If I use FMIF and for some reason at some later time want to redirect python to define function from alt_module instead of module I need to change just one line: from alt_module import function. If I were to use IM, I'd need to change many lines of code.
I realize FPIM goes some way to nullifying the first two issues, but what about the third?
I am interested in all reasons why IM+FPIM may be better than FMIF,
but in particular, I'd be interested in elaboration on the following points mentioned here:
Pros for IM:
ease of mocking/injecting in tests. (I am not very familiar with mocking, though I recently learned what the term means. Can you show code which demonstrates how IM is better than FMIF here?)
ability for a module to change flexibly by redefining some entries. (I must be misunderstanding something, because this seems to be an advantage of FMIF over IM. See my third reason in favor of FMIF above.)
predictable and controllable behavior on serialization and recovery of your data. (I really don't understand how the choice of IM or FMIF affects this issue. Please elaborate.)
I understand that FMIF "pollutes my namespace", but beyond being a negative-sounding phrase, I don't appreciate how this hurts the code in any concrete way.
PS. While writing this question I received a warning that the question appears subjective and is likely to be closed. Please don't close it. I'm not looking for subjective opinion, but rather concrete coding situations where IM+FPIM is demonstrably better than FMIF.
Many thanks.
The negatives you list for IM/FPIM can often be ameliorated by appropriate use of an as clause. from some.package import mymodulewithalongname as mymod can usefully shorten your code and enhance its readability, and if you rename mymodulewithalongname to somethingcompletelydifferent tomorrow, the as clause can be used as a single statement to edit.
Consider your pro-FMIF point 3 (call it R for redirection) vs your pro-FPIM point 2 (call it F for flexibility): R amounts to facilitating the loss of integrity of module boundaries, while F strenghtens it. Multiple functions, classes and variables in a module are often intended to work together: they should not be independently switched to different meanings. For example, consider module random and its functions seed and uniform: if you were to switch the import of just one of them to a different module, then you'd break the normal connection between calls to seed and results of calls to uniform. When a module is well designed, with cohesion and integrity, R's facilitation of breaking down the module's boundaries is actually a negative -- it makes it easier to do something you're better off not doing.
Vice versa, F is what enables coordinated switching of coupled functions, classes, and variables (so, generally, of entities that belong together, by modularity). For example, to make testing repeatable (FPIM pro-point 1), you mock both seed and random in the random module, and if your code follows FPIM, you're all set, coordination guaranteed; but if you have code that has imported the functions directly, you have to hunt down each such module and repeat the mocking over and over and over again. Making tests perfectly repeatable typically also requires "coordinated mocking" of date and time functions -- if you use from datetime import datetime in some modules, you need to find and mock them all (as well as all those doing from time import time, and so forth) to ensure that all the times received when the various parts of the system ask "so what time is it now?" are perfectly consistent (if you use FPIM, you just mock the two relevant modules).
I like FPIM, because there's really not much added value by using a multiply qualified name rather than a singly qualified one (while the difference between barenames and qualified names is huge -- you get so much more control with a qualified name, be it singly or multiply, than you possibly ever can with a barename!).
Ah well, can't devote all of the working day to responding to each and every one of your points -- your question should probably be half a dozen questions;-). I hope this at least addresses "why is F better than R" and some of the mocking/testing issues -- it boils down to preserving and enhancing well-designed modularity (via F) rather than undermining it (via R).
The classic text on this, as so often, is from Fredrik Lundh, the effbot. His advice: always use import - except when you shouldn't.
In other words, be sensible. Personally I find that anything that's several modules deep tends to get imported via from x.y.z import a - the main example being Django models. But as much as anything else it's a matter of style, and you should have a consistent one - especially with modules like datetime, where both the module and the class it contains are called the same thing. Do you need to write datetime.datetime.now() or just datetime.now()? (In my code, always the former.)
Items 1 and 2 in your list of questions seem to be the same issue. Python's dynamic nature means it is fairly simple to replace an item in a module's namespace no matter which of the methods you use. The difficulty comes if one function in a module refers to another, which is the one you want to mock. In this case, importing the module rather than the functions means you can do module.function_to_replace = myreplacementfunc and everything works transparently - but that is as easy to do via FPIM as it is via IM.
I also don't understand how item 3 has anything to do with anything. I think your item 4, however, is based on a bit of a misunderstanding. None of the methods you give will 'pollute your namespace'. What does do that is from module import *, where you have no idea at all what you're importing and so functions can appear in your code with no clue given to the reader where they came from. That's horrible, and should be avoided at all costs.
Great answers here (I upvoted them all), and here are my thoughts on this matter:
First, addressing each of your bullets:
(Allegedly) Pros of FMIF:
Shortness of code: shorter function names help stick to the 80 columns-per-line.
Perhaps, but module names are usually short enough so this is not relevant. Sure, there's datetime, but also os, re, sys, etc. And Python has free line breaks inside { [ (. And for nested modules there's always as in both IM and FPIM
Readability: chisquare(...) appears more readable than scipy.stats.stats.chisquare(...).
Strongly disagree. When reading foreign code (or my own code after a few months) it's hard to know where each function comes from. Qualified names saves me from going back and forth from line 2345 to module declarations header. And it also gives you context: "chisquare? What's that? Oh, it's from scypy? Ok, some math-related stuff then". And, once again, you can always abbreviate scipy.stats.stats as scypyst. scypyst.chisquare(...) is short enough with all benefits of a qualified name.
import os.path as osp is another good example, considering it's very common to chain 3 or more of its functions together in a single call: join(expanduser(),basename(splitext())) etc.
Ease of redirection: one-line redefinition of a function from altmodule instead of module.
How often you want to redefine a single function but not whole module? Module boundaries and function coordination should be preserved, and Alex already explained this in great depth. For most (all?) real-world scenarios, if alt_module.x is a viable replacement for module.x, then probably alt_module itself is a drop in alternative for module, so both IM and FPIM are one-liners just like FMIF, provided you use as.
I realize FPIM goes some way to nullifying the first two issues...
Actually, as is the one that mitigates the first 2 issues (and the 3rd), not FPIM. You can use IM for that too: import some.long.package.path.x as x for the same result as FPIM.
So none of the above are really pros of FMIF. And the reasons I prefer IM/FPIM are:
For the sake of simplicity and consistency, when I import something, either IM or FPIM, I'm always importing a module, not an object from a module. Remember FMIF can be (ab-)used to import functions, classes, variables, or even other modules! Think about the mess of from somemodule import sys, somevar, os, SomeClass, datetime, someFunc.
Also, if you want more than a single object from a module, FMIF will pollute your namespace more than IM or FPIM, which will use a single name no matter how many objects you want to use. And such objects will have a qualified name, which is a pro, not a con: as I've said in issue 2, IMHO a it improves readability.
it all comes down to consistency, simplicity, organization. "Import modules, not objects" is a good, easy mind model to stick with.
Like Alex Martelli, I am fond of using as when importing a function.
One thing I have done is to use some prefix on all the functions that were imported from the same module:
from random import seed as r_seed
from random import random as r_random
r_seed is shorter to type than random.seed but somewhat preserves the module boundaries. Someone casually looking at your code can see r_seed() and r_random() and have a chance to grok that they are related.
Of course, you can always simply do:
import random as r
and then use r.random() and r.seed(), which may be the ideal compromise for this case. I only use the prefix trick when I'm importing one or two functions from a module. When I want to use many functions from the same module, I'll import the module, perhaps with an as to shorten the name.
I agree with MestreLion the most here (and so an upvote).
My perspective: I review code frequently that I am unfamiliar with, and not knowing what module a function is coming from just looking at the function is quite frustrating.
Code is written once and read many times, and so readability and maintainability trumps ease of typing.
In a similar vein, typically code is not being written for the benefit of the coder, but for the benefit of another entity.
Your code should be readable to someone who knows python better than you, but is unfamiliar with the code.
Full path imports can also better help IDE's point you at the correct source of the function or object you're looking at.
For all of these reasons and the reasons MestreLion noted, I conclude that it is best practice to import and use the full path.