At this moment we are working on a large project.
This project is supposed to create EDIFACT messages. This is not so hard at first but the catch is there are a lot of customers that have their own implementation of the standard.
On top of that we are working with several EDIFACT standards (D96A and D01B in our case.)
Some customer exceptions might be as small as having a divergent field length, but some have made their own implementation completely different.
At this moment we have listed the customer exceptions in a list (Just to keep them consistent) and in the code we use something like:
if NAME_LENGTH_IS_100 in customer_exceptions:
this.max_length = 100
else:
this.max_length = 70
For a couple of simple exceptions this works just fine, but at this moment the code starts to get really cluttered and we are thinking about refactoring the code.
I am thinking about some kind of factory pattern, but I am not sure about the implementation.
Another option would be to create a base package and make a separate implementation for every customer that is diverging from the standard.
I hope someone can help me out with some advice.
Thanks in advance.
I think your question is too broad to be answered properly (I was up to click the close button because of this but decided otherwise). The reason for this is the following:
There is nothing wrong the code snippet you provided. It should be part of some kind of initialization routine, then this is just fine the way it is. It also doesn't hurt to have things like this in a large amount.
But how to handle more complex cases depends greatly on the cases themselves.
For lots of situations it might be sufficient to have such variables which represent the customer's special choices.
For other aspects I'd propose to have a Customer base class with subclasses thereof, for each customer one (or maybe the customers can even be hierarchically grouped, then a nice inheritance tree could reflect this).
For other cases again I'd propose aspect-oriented programming by use of Python decorators to tweak the behavior of methods, functions, and classes.
Since this depends greatly on your concrete usecases, I think this question cannot be answered more concretely than this.
Why not put the all this in a resource file with the standard as default and each exception handle in a surcharged value, then you'll just need to read the right key for the right client and you code stay clean.
Related
When I try to design package structures and class hierarchies within those packages in Python 3 projects, I'm constantly affected with circular import issues. This is whenever I want to implement a class in some module, subclassing a base class from a parent package (i.e. from some __init__.py from some parent directory).
Although I technically understand why that happens, I was not able to come up with a good solution so far. I found some threads here, but none of them go deeper in that particular scenario, let alone mentioning solutions.
In particular:
Putting everything in one file is maybe not great. It potentially can be quite a mass of things. Maybe 90% of the entire project code?! Maybe it wants to define a common subclass of things (whatever, e.g. widget base class for a ui lib), and then just lots of subclasses in nicely organized subpackages? There are obvious reasons why we would not write masses of stuff in one file.
Implementing your base class outside of __init__.py and just importing it there can work for a few places, but messes up the hierarchy with lots of aliases for the same thing (e.g. myproject.BaseClass vs. myproject.foo.BaseClass. Just not nice, is it? It also leads to a lot of boilerplate code that also wants to be maintained.
Implementing it outside, and even not importing it in __init__.py makes those "fully qualified" notations longer everywhere in the code, because it has to contain .foo everywhere I use BaseClass (yes, I usually do import myproject...somemodule, which is not what everybody does, but should not be bad afaik).
Some kinds of rather dirty tricks could help, e.g. defining those subclasses inside some kind of factory methods, so they are not defined at module level. Can work in a few situations, but is just terrible in general.
All of them are maybe okay in a single isolated situation, more as kind of a workaround imho, but in larger scale it ruins a lot. Also, the dirtier the tricks, the more it also breaks the services an IDE can provide.
All kinds of vague statements like 'then your class structure is probably bad and needs reworking' puzzle me a bit. I'm more or less distantly aware of some kinds of general good coding practices, although I never read "Clean Code" or similar stuff.
So, what is wrong with subclassing a class from a parent module from that abstract perspective? And if it is not, what is the secret workaround that people use in Python3? Is there a one, or is everybody basically dealing with one of the mentioned hacks? Sure, everywhere are tradeoffs to be made. But here I'm really struggling, even after many years of writing Python code.
Okay, thank you! I think I was a bit on the wrong way here in understanding the actual problem. It's essentially what #tdelaney said. So it's more complicated than what I initially wrote. It's allowed to do that, but you must not import the submodule in some parent packages. This is what I do here in one place for offering some convenience things. So, it's maybe not perfect, maybe the point where you could argue that my structure is not well designed (but, well, things are always a compromise). At least it's admittedly not true what I posted initially. And the workaround to use a few function level imports is maybe kind of okay in that place.
I've fully graduated from writing scripts to writing modules. Now that I have a module full of functions, I'm not quite sure if I should order them in some way.
Alphabetical seems to make sense to me, but I wanted to see if there were others schools of thought on how they should be ordered in a module. Maybe try to approximate the flow of the code or some other method?
I did some searching on this and didn't really find anything, except for that functions need to be defined before calling them, which isn't really relevant to my question.
Thanks for any thoughts people can provide!
Code should be made to be easily readable by a human; Readability counts (from The Zen of Python).
Stick to the conventions of PEP-8, unless you have good reason not to do so.
My suggestion would be to start with the main parts of the module in a sequence that makes sense for this particular module. Helper functions and classes go below that in a top-down fashion.
Modern editors are quite capable of finding function or method definitions in code, so the precise sequence under the top level doesn't matter as much as they used to.
If your editor supports it consider using folding.
It is time for a question on good design and performance.
Say I have three django models:
class Student(Model):
classroom = ForeignKey('Classroom')
# student info
class Classroom(Model):
teacher = ForeignKey('Teacher')
# classroom info
class Teacher(Model):
# teacher info
I want to make sure that a view has a nice way to access all the students that a teacher has. To do this, it might make sense to have a method defined on the Teacher model
def get_students(self):
# code
Now, there are a few ways to do this. One of my priorities is to keep each model relatively agnostic to overall database schema. As a result, I don't really like the following solution:
def get_students(self):
return Student.objects.filter(classroom__teacher=self)
This solution relies on students being related to teachers via a classroom; if I change this structure (maybe students will need to be related to teachers directly, rather than through a classroom), I have to now change the get_students method. If I have a bunch of models and they relate to each other through these nested relationships, changing the schema means hunting down all such filter queries. In my particular case, I have a number of models that exist in different applications and my project is getting quite big, so taking this approach means all inviting all sorts of opportunities for me to miss something and create a bug. Even if my tests are good, I will have to spend a lot of time looking for queries.
A solution that seems more elegant to me is to have a Student manager that defines a for_teacher method:
class StudentManager(Manager):
def for_teacher(self, teacher):
return self.filter(classroom__teacher=teacher)
Now, my get_students method can look like this:
def get_students(self):
return Students.objects.for_teacher(self)
With this approach, I've abstracted things so that a Teacher doesn't know how it is related to Student (namely, through Classroom). All it knows is it is related to students somehow. Of course, if I change the schema, I will have to change the StudentManager. However, if I again imagine a project with many models related to other models through different models, this method offers a way to concentrate all the schema-dependent calls in one place (the managers). This saves me from having to hunt down queries in all sorts of models (and views, perhaps).
The question is, is this a sane approach? If not, what is a preferred way to handle this?
A corollary:
As mentioned above, my project has a bunch of models in a bunch of apps and they need to know about each other somehow. So now we have an added issue: if Teacher contains a get_students method and Student has a get_teacher method, I now run into cyclic module dependencies. A potential solution to this new problem is this version of get_students (get_teacher):
def get_students(self):
from student.models import Student
return Student.objects.for_teacher(self)
I come from a world where imports are put at the top of a program, so this seems a little strange to me. Is this a reasonable approach? Are there performance considerations when doing dynamic imports like this? Will Python cache the Student import in the get_students method so it only happens once?
Thanks in advance!
Certainly it seems like a fine idea to centralize your schema-dependent operations in one place. However, I don't see any advantage to doing that in Managers instead of Models as you propose.
this method offers a way to concentrate all the schema-dependent calls in one place (the managers)
But the managers aren't in one place, there will be one for each relevant model, and they are usually put in the exact same place - models.py. In your example there's certainly no advantage, since in both cases you would have to change one method if the schema changed.
Don't get me wrong, Managers are great, and StudentManager.for_teacher() might make more sense than Teacher.get_students() based on your needs and access patterns. But I don't see an advantage from an encapsulation point of view.
As for imports, it's common and accepted to import a module inside a function if that's necessary to avoid a circular import, even though it's less Pythonic. PEP 8 advises against it, but doesn't say why. The most often cited reason (in my experience) is that it makes it harder to track the module's dependencies when the imports are spread around the file. Performance is not an important consideration here since Python will indeed cache the imported module.
It's probably also true that circular imports are often a sign that something more serious is going wrong. However, in Django it's not uncommon and not troublesome in and of itself.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to know the basic principles and etiquette of writing a well structured code.
Read Code Complete, it will do wonders for everything. It'll show you where, how, and when things matter. It's pretty much the Bible of software development (IMHO.)
These are the most important two things to keep in mind when you are writing code:
Don't write code that you've already written.
Don't write code that you don't need to write.
MATLAB Programming Style Guidelines by Richard Johnson is a good resource.
Well, if you want it in layman's terms:
I reccomend people to write the shortest readable program that works.
There are a lot more rules about how to format code, name variables, design classes, separate responsibilities. But you should not forget that all of those rules are only there to make sure that your code is easy to check for errors, and to ensure it is maintainable by someone else than the original author. If keep the above reccomendation in mind, your progam will be just that.
This list could go on for a long time but some major things are:
Indent.
Descriptive variable names.
Descriptive class / function names.
Don't duplicate code. If it needs duplication put in a class / function.
Use gettors / settors.
Only expose what's necessary in your objects.
Single dependency principle.
Learn how to write good comments, not lots of comments.
Take pride in your code!
Two good places to start:
Clean-Code Handbook
Code-Complete
If you want something to use as a reference or etiquette, I often follow the official Google style conventions for whatever language I'm working in, such as for C++ or for Python.
The Practice of Programming by Rob Pike and Brian W. Kernighan also has a section on style that I found helpful.
First of all, "codes" is not the right word to use. A code is a representation of another thing, usually numeric. The correct words are "source code", and the plural of source code is source code.
--
Writing good source code:
Comment your code.
Use variable names longer than several letters. Between 5 and 20 is a good rule of thumb.
Shorter lines of code is not better - use whitespace.
Being "clever" with your code is a good way to confuse yourself or another person later on.
Decompose the problem into its components and use hierarchical design to assemble the solution.
Remember that you will need to change your program later on.
Comment your code.
There are many fads in computer programming. Their proponents consider those who are not following the fad unenlightened and not very with-it. The current major fads seem to be "Test Driven Development" and "Agile". The fad in the 1990s was 'Object Oriented Programming'. Learn the useful core parts of the ideas that come around, but don't be dogmatic and remember that the best program is one that is getting the job done that it needs to do.
very trivial example of over-condensed code off the top of my head
for(int i=0,j=i; i<10 && j!=100;i++){
if i==j return i*j;
else j*=2;
}}
while this is more readable:
int j = 0;
for(int i = 0; i < 10; i++)
{
if i == j
{
return i * j;
}
else
{
j *= 2;
if(j == 100)
{
break;
}
}
}
The second example has the logic for exiting the loop clearly visible; the first example has the logic entangled with the control flow. Note that these two programs do exactly the same thing. My programming style takes up a lot of lines of code, but I have never once encountered a complaint about it being hard to understand stylistically, while I find the more condensed approaches frustrating.
An experienced programmer can and will read both - the above may make them pause for a moment and consider what is happening. Forcing the reader to sit down and stare at the code is not a good idea. Code needs to be obvious. Each problem has an intrinsic complexity to expressing its solution. Code should not be more complex than the solution complexity, if at all possible.
That is the essence of what the other poster tried to convey - don't make the program longer than need be. Longer has two meanings: more lines of code (ie, putting braces on their own line), and more complex. Making a program more complex than need be is not good. Making it more readable is good.
Have a look to
97 Things Every Programmer Should Know.
It's free and contains a lot of gems like this one:
There is one quote that I think is
particularly good for all software
developers to know and keep close to
their hearts:
Beauty of style and harmony and grace
and good rhythm depends on simplicity.
— Plato
In one sentence I think this sums up
the values that we as software
developers should aspire to.
There are a number of things we strive
for in our code:
Readability
Maintainability
Speed of development
The elusive quality of beauty
Plato is telling us that the enabling
factor for all of these qualities is
simplicity.
The Python Style Guide is always a good starting point!
European Standards For Writing and Documenting Exchangeable Fortran 90 Code have been in my bookmarks, like forever. Also, there was a thread in here, since you are interested in MATLAB, on organising MATLAB code.
Personally, I've found that I learned more about programming style from working through SICP which is the MIT Intro to Comp SCI text (I'm about a quarter of the way through.) Than any other book. That being said, If you're going to be working in Python, the Google style guide is an excellent place to start.
I read somewhere that most programs (scripts anyways) should never be more than a couple of lines long. All the requisite functionality should be abstracted into functions or classes. I tend to agree.
Many good points have been made above. I definitely second all of the above. I would also like to add that spelling and consistency in coding be something you practice (and also in real life).
I've worked with some offshore teams and though their English is pretty good, their spelling errors caused a lot of confusion. So for instance, if you need to look for some function (e.g., getFeedsFromDatabase) and they spell database wrong or something else, that can be a big or small headache, depending on how many dependencies you have on that particular function. The fact that it gets repeated over and over within the code will first off, drive you nuts, and second, make it difficult to parse.
Also, keep up with consistency in terms of naming variables and functions. There are many protocols to go by but as long as you're consistent in what you do, others you work with will be able to better read your code and be thankful for it.
Pretty much everything said here, and something more. In my opinion the best site concerning what you're looking for (especially the zen of python parts are fun and true)
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Talks about both PEP-20 and PEP-8, some easter eggs (fun stuff), etc...
You can have a look at the Stanford online course: Programming Methodology CS106A. The instructor has given several really good instruction for writing source code.
Some of them are as following:
write programs for people to read, not just for computers to read. Both of them need to be able to read it, but it's far more important that a person reads it and understands it, and that the computer still executes it correctly. But that's the first major software engineering principle to
think about.
How to make comments:
put in comments to clarify things in the program, which are not obvious
How to make decomposition
One method solves one problem
Each method has code approximate 1~15lines
Give methods good names
Write comment for code
Unit Tests
Python and matlab are dynamic languages. As your code base grows, you will be forced to refactor your code. In contrast to statically typed languages, the compiler will not detect 'broken' parts in your project. Using unit test frameworks like xUnit not only compensate missing compiler checks, they allow refactoring with continuous verification for all parts of your project.
Source Control
Track your source code with a version control system like svn, git or any other derivative. You'll be able to back and forth in your code history, making branches or creating tags for deployed/released versions.
Bug Tracking
Use a bug tracking system, if possible connected with your source control system, in order to stay on top of your issues. You may not be able, or forced, to fix issues right away.
Reduce Entropy
While integrating new features in your existing code base, you will add more lines of code, and potentially more complexity. This will increase entropy. Try to keep your design clean, by introducing an interface, or inheritance hierarchy in order to reduce entropy again. Not paying attention to code entropy will render your code unmaintainable over time.
All of The Above Mentioned
Pure coding related topics, like using a style guide, not duplicating code, ...,
has already been mentioned.
A small addition to the wonderful answers already here regarding Matlab:
Avoid long scripts, instead write functions (sub routines) in separate files. This will make the code more readable and easier to optimize.
Use Matlab's built-in functions capabilities. That is, learn about the many many functions that Matlab offers instead of reinventing the wheel.
Use code sectioning, and whatever the other code structure the newest Matlab version offers.
Learn how to benchmark your code using timeit and profile . You'll discover that sometimes for loops are the better solution.
The best advice I got when I asked this question was as follows:
Never code while drunk.
Make it readable, make it intuitive, make it understandable, and make it commented.
There comes a point where, in a relatively large sized project, one need to think about splitting the functionality into various functions, and then various modules, and then various packages. Sometimes across different source distributions (eg: extracting a common utility, such as optparser, into a separate project).
The question - how does one decide the parts to put in the same module, and the parts to put in a separate module? Same question for packages.
There's a classic paper by David Parnas called "On the criteria to be used in decomposing systems into modules". It's a classic (and has a certain age, so can be a little outdated).
Maybe you can start from there, a PDF is available here
http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf
Take out a pen and piece of paper. Try to draw how your software interacts on a high level. Draw the different layers of the software etc. Group items by functionality and purpose, maybe even by what sort of technology they use. If your software has multiple abstraction layers, I would say to group them by that. On a high level, the elements of a specific layer all share the same general purpose. Now that you have your software in layers, you can divide these layers into different projects based on specific functionality or specialization.
As for a certain stage that you reach in which you should do this? I'd say when you have multiple people working on the code base or if you want to keep your project as modular as possible. Hopefully your code is modular enough to do this with. If you are unable to break apart your software on a high level, then your software is probably spaghetti code and you should look at refactoring it.
Hopefully that will give you something to work with.
See How many Python classes should I put in one file?
Sketch your overall set of class definitions.
Partition these class definitions into "modules".
Implement and test the modules separately from each other.
Knit the modules together to create your final application.
Note. It's almost impossible to decompose a working application that evolved organically. So don't do that.
Decompose your design early and often. Build separate modules. Integrate to build an application.
IMHO this should probably one of the things you do earlier in the development process. I have never worked on a large-scale project, but it would make sense that you make a roadmap of what's going to be done and where. (Not trying to rib you for asking about it like you made a mistake :D )
Modules are generally grouped somehow, by purpose or functionality. You could try each implementation of an interface, or other connections.
I sympathize with you. You are suffering from self-doubt. Don't worry. If you can speak any language, including your mother tongue, you are qualified to do modularization on your own. For evidence, you may read "The Language Instinct," or "The Math Instinct."
Look around, but not too much. You can learn a lot from them, but you can learn many bad things from them too.
Some projects/framework get a lot fo hype. Yet, some of their groupings of functionality, even names given to modules are misleading. They don't "reveal intention" of the programmers. They fail the "high cohesiveness" test.
Books are no better. Please apply 80/20 rule in your book selection. Even a good, very complete, well-researched book like Capers Jones' 2010 "Software Engineering Best Practices" is clueless. It says 10-man Agile/XP team would take 12 years to do Windows Vista or 25 years to do an ERP package! It says there is no method till 2009 for segmentation, its term for modularization. I don't think it will help you.
My point is: You must pick your model/reference/source of examples very carefully. Don't over-estimate famous names and under-estimate yourself.
Here is my help, proven in my experience.
It is a lot like deciding what attributes go to which DB table, what properties/methods go to which class/object etc? On a deeper level, it is a lot like arranging furniture at home, or books in a shelf. You have done such things already. Software is the same, no big deal!
Worry about "cohesion" first. e.g. Books (Leo Tolstoy, James Joyce, DE Lawrence) is choesive .(HTML, CSS, John Keats. jQuery, tinymce) is not. And there are many ways to arrange things. Even taxonomists are still in serious feuds over this.
Then worry about "coupling." Be "shy". "Don't talk to strangers." Don't be over-friendly. Try to make your package/DB table/class/object/module/bookshelf as self-contained, as independent as possible. Joel has talked about his admiration for the Excel team that abhor all external dependencies and that even built their own compiler.
Actually it varies for each project you create but here is an example:
core package contains modules that are your project cant live without. this may contain the main functionality of your application.
ui package contains modules that deals with the user interface. that is if you split the UI from your console.
This is just an example. and it would really you that would be deciding which and what to go where.