All code in one file [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
After asking organising my Python project and then calling from a parent file in Python it's occurring to me that it'll be so much easier to put all my code in one file (data will be read in externally).
I've always thought that this was bad project organisation but it seems to be the easiest way to deal with the problems I'm thinking I will face. Have I simply gotten the wrong end of the stick with file count or have I not seen some great guide on large (for me) projects?

If you are planning to use any kind of SCM then you are going to be screwed. Having one file is a guaranteed way to have lots of collisions and merges that will be painstaking to deal with over time.
Stick to conventions and break apart your files. If nothing more than to save the guy who will one day have to maintain your code...

If your code is going to work together all the time anyway, and isn't useful separately, there's nothing wrong with keeping everything in one file. I can think of at least popular package (BeautifulSoup) that does this. Sure makes installation easier.
Of course, if it seems, down the road, that you could use part of your code with another project, or if maintainance starts to be an issue, then worry about organizing your project differently.
It seems to me from the questions you've been asking lately that you're worrying about all of this a bit prematurely. Often, for me, these sorts of issues are better tackled a little later on in the solution. Especially for smaller projects, my goal is to get a solution that is correct, and then optimal.

It's always a now verses then argument. If you're under the gun to get it done, do it. Source control will be a problem later, as with many things there's no black and white answer. You need to be responsible to both your deadline and the long term maintenance of the code.

If that's the best way to organise it, you're probably doing something wrong.
If it's more than just a toy program or a simple script, then you should break it up into separate files, etc. It's the only sane way of doing it. When your project gets big enough that you need someone else helping on it, then it will make the SCM a whole bunch easier.
Additionally, sooner or later you are going to need to add a separate utility to your project, that is going to need some common code/structures. It's far easier to do this if you have separate source files than if you have just one big one.

Looking at your earlier questions I would say all code in one file would be a good intermediate state on the way to a complete refactoring of your project. To do this you'll need a regression test suite to make sure you don't break the project while refactoring it.
Once all your code is in one file, I suggest iterating on the following:
Identify a small group of interdependent classes.
Pull those classes into a separate file.
Add unit tests for the new separate file.
Retest the entire project.
Depending on the size of your project, it shouldn't take too many iterations for you to reach something reasonable.

Since Calling from a parent file in Python indicates serious design problems, I'd say that you have two choices.
Don't have a library module try to call back to main. You'll have to rewrite things to fix this.
[An imported component calling the main program is an improper dependency. And Python doesn't support it because it's a poor design.]
Put it all in one file until you figure out a better design with proper one-way dependencies. Then you'll have to rewrite it to fix the dependency problems.
A module (a single file) should be a logical piece of related code. Not everything. Not a single class definition. There's a middle ground of modularity.
Additionally, there should be a proper one-way dependency graph from main program to components (which do NOT depend on the main program) to utility libraries and what-not (that do not know about the components OR the main program.
Circular (or mutual) dependencies often indicate a design problem. Callbacks are one way out of the problem. Another way is to decompose the circular elements to get a proper one-way graph.

Related

Keep files (modules) as "big" as possible [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm working on a Python web application in which I have some small modules that serve very specific functions: session.py, logger.py, database.py, etc. And by "small" I really do mean small; each of these files currently includes around 3-5 lines of code, or maybe up to 10 at most. I might have a few imports and a class definition or two in each. I'm wondering, is there any reason I should or shouldn't merge these into one module, something like misc.py?
My thoughts are that having separate modules helps with code clarity, and later on, if by some chance these modules grow to more than 10 lines, I won't feel so bad about having them separated. But on the other hand, it just seems like such a waste to have a bunch of files with only a few lines in each! And is there any significant difference in resource usage between the multi-file vs. single-file approach? (Of course I'm nowhere near the point where I should be worrying about resource usage, but I couldn't resist asking...)
I checked around to see whether this had been asked before and didn't see anything specific to Python, but if it's in fact a duplicate, I'd appreciate being pointed in the right direction.
My thoughts are that having separate
modules helps with code clarity, and
later on, if by some chance these
modules grow to more than 10 lines, I
won't feel so bad about having them
separated.
This. Keep it the way you have it.
As a user of modules, I greatly prefer when I can include the entire module via a single import. Don't make a user of your package do multiple imports unless there's some reason to allow for importing different alternates.
BTW, there's no reason a single modules can't consist of multiple source files. The simplest case is to use an __init__.py file to simply load all the other code into the module's namespace.
Personally I find it easier to keep things like this in a single file, just for the practicality of editing a smaller number of files in my editor.
The important thing to do is treat the different pieces of code as though they were in separate files, so you ensure that you can trivially separate them later, for the reasons you cite. So for instance, don't introduce dependencies between the different pieces that will make it hard to disentangle them later.
For command line scripts there most likely will not be much difference unless each invocation invokes all files in the module, in which case there will be a slight performance cost as n files need to be opened vs one.
For mod_python there most likely will be no difference as byte-compiled modules stay alive for the duration of the apache process.
For google app engine though there will be a performance hit unless the service is constantly used and is "hot" as each cold start would require opening all files.
Off course you can have as many modules as you like.
But now let as think a little, what happens when we put every small code snippet into one single file.
We will end up in hundreds of import statements in any less trivial module. And off course you could also save a little by having all explicit in seperated files. But guess what: Nobody can remember so many module names and you might end up in searching for the right file anyway ...
I try to put things that belong together in one single file (unless it becomes to big!). But when I have small functions or classes that do not belong to other components in my system, I have "util" modules or the like. I also try to group these for example according to my application layering or seperate them by other means. One seperation criteria could be: Utilities that are used for UI and those that are not.
Small.

Is it a good (or bad) habit to make only one class in each python file? [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm used to the Java model where you can have one public class per file. Python doesn't have this restriction, and I'm wondering what's the best practice for organizing classes.
A Python file is called a "module" and it's one way to organize your software so that it makes "sense". Another is a directory, called a "package".
A module is a distinct thing that may have one or two dozen closely-related classes. The trick is that a module is something you'll import, and you need that import to be perfectly sensible to people who will read, maintain and extend your software.
The rule is this: a module is the unit of reuse.
You can't easily reuse a single class. You should be able to reuse a module without any difficulties. Everything in your library (and everything you download and add) is either a module or a package of modules.
For example, you're working on something that reads spreadsheets, does some calculations and loads the results into a database. What do you want your main program to look like?
from ssReader import Reader
from theCalcs import ACalc, AnotherCalc
from theDB import Loader
def main( sourceFileName ):
rdr= Reader( sourceFileName )
c1= ACalc( options )
c2= AnotherCalc( options )
ldr= Loader( parameters )
for myObj in rdr.readAll():
c1.thisOp( myObj )
c2.thatOp( myObj )
ldr.laod( myObj )
Think of the import as the way to organize your code in concepts or chunks. Exactly how many classes are in each import doesn't matter. What matters is the overall organization that you're portraying with your import statements.
Since there is no artificial limit, it really depends on what's comprehensible. If you have a bunch of fairly short, simple classes that are logically grouped together, toss in a bunch of 'em. If you have big, complex classes or classes that don't make sense as a group, go one file per class. Or pick something in between. Refactor as things change.
I happen to like the Java model for the following reason. Placing each class in an individual file promotes reuse by making classes easier to see when browsing the source code. If you have a bunch of classes grouped into a single file, it may not be obvious to other developers that there are classes there that can be reused simply by browsing the project's directory structure. Thus, if you think that your class can possibly be reused, I would put it in its own file.
It entirely depends on how big the project is, how long the classes are, if they will be used from other files and so on.
For example I quite often use a series of classes for data-abstraction - so I may have 4 or 5 classes that may only be 1 line long (class SomeData: pass).
It would be stupid to split each of these into separate files - but since they may be used from different files, putting all these in a separate data_model.py file would make sense, so I can do from mypackage.data_model import SomeData, SomeSubData
If you have a class with lots of code in it, maybe with some functions only it uses, it would be a good idea to split this class and the helper-functions into a separate file.
You should structure them so you do from mypackage.database.schema import MyModel, not from mypackage.email.errors import MyDatabaseModel - if where you are importing things from make sense, and the files aren't tens of thousands of lines long, you have organised it correctly.
The Python Modules documentation has some useful information on organising packages.
I find myself splitting things up when I get annoyed with the bigness of files and when the desirable structure of relatedness starts to emerge naturally. Often these two stages seem to coincide.
It can be very annoying if you split things up too early, because you start to realise that a totally different ordering of structure is required.
On the other hand, when any .java or .py file is getting to more than about 700 lines I start to get annoyed constantly trying to remember where "that particular bit" is.
With Python/Jython circular dependency of import statements also seems to play a role: if you try to split too many cooperating basic building blocks into separate files this "restriction"/"imperfection" of the language seems to force you to group things, perhaps in rather a sensible way.
As to splitting into packages, I don't really know, but I'd say probably the same rule of annoyance and emergence of happy structure works at all levels of modularity.
I would say to put as many classes as can be logically grouped in that file without making it too big and complex.

Abbreviating several functions - guidelines? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
This is probably a really dumb question but I will ask anyway.
There are two ways to present this code:
file = "picture.jpg"
pic = makePicture(file)
show(pic)
or
show(makePicture("picture.jpg"))
This is just an example of how it can be abbreviated and I have seen it with other functions. But it always confuses me when I need to do it. Is there any rule of thumb when combining functions like this? It seems to me you work backwards picking out the functions as you go and ending with either the file or the function that chooses the file (i.e pickAFile()). Does this sound right?
Please keep explanations simple enough that a smart dog could understand.
Chiming in, because I think that style does matter. I would definitely pick show(makePicture("picture.jpg")) if you don't ever reuse "picture.jpg" and makePicture(…). The reason are that:
This is perfectly legible.
This makes the code faster to read (no need to spend more time than needed on it).
If you use variables, you are sending a signal to people reading the code (including you, after some time) that the variables are reused somewhere in the code and that they should better be put in their working (short-term) memory. Our short-term memory is limited (in the 1960s, experiments have shown that one remembers about 7 pieces of information at a time, and some modern experiments came up with lower numbers). So, if the variables are not reused anywhere, they often should be removed so as to not pollute the reader's short-term memory.
I think that your question is very valid and that you should definitely not use intermediate variables here unless they are necessary (because they are reused, or because they help break a complex expression in directly intelligible parts). This practice will make your code more legible and will give you good habits.
PS: As noted by Blender, having many nested function calls can make the code hard to read. If this is the case, I do recommend considering using intermediate variables to hold pieces of information that make sense, so that the function calls do not contain too many levels of nesting.
PPS: As noted by pcurry, nested function calls can also be easily broken down into many lines, if they become too long, which can make the code about as legible as if using intermediate variables, with the benefit of not using any:
print_summary(
energy=solar_panel.energy_produced(time_of_the_day),
losses=solar_panel.loss_ratio(),
output_path="/tmp/out.txt"
)
When you write:
pic = makePicture(file)
You call makePicture with file as its argument and put the output of that function into the variable pic. If all you do with pic is use it as an argument to show, you don't really need to use pic at all. It's just a temporary variable. Your second example does just that and passes the output of makePicture(file) directly as the first argument to show, without using a temporary variable like pic.
Now, if you're using pic somewhere else, there's really no way to get around using it. If you don't reuse the temporary variables, pick whatever way you like. Just make sure it's readable.
It's all at the discretion of the programmer, if you're planning on making a larger program you might want to keep the statements separate so you can refer back to the file.
Readability is always important if you're working with a team of programmers but if this is just something you're doing by yourself, do whatever's most comfortable.
show(makePicture("picture.jpg")) is more readable than the longer version for reasons discussed in other answers. I have also found that trying to eliminate intermediate variables very often results in a better solution. However, there are cases where a descriptive naming of a complex intermediate result will make code more readable.

How to reverse engineer a program which has no documentation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a source of python program which doesn't have any documentation or comments.
I did tried twice to understand it but most of the times I am losing my track, because there are many files.
What should be the steps to understand that program fully and quickly.
Michael Feathers' "Working Effectively with Legacy Code" is a superb starting point for such endeavors -- not particularly language-dependent (his examples are in several non-python languages, but the techniques and mindset DO extend pretty well to Python and just about any other language).
The key focus is, that you want to understand the code for a reason -- modifying it and/or porting it. So, instrumenting the legacy code -- with batteries and scaffolding of tests and tracing/logging -- is the crucial path on the long, hard slog to understanding and modifying safely and responsibly.
Feathers suggests heuristics and techniques for where to focus your efforts and how to get started when the code is a total mess (hence "legacy") - no docs, or misleading docs (describing something quite different, maybe in subtle ways, from what the code actually DOES), no tests, an untestable-without-refactoring tangle of spaghetti dependencies. This may seem an extreme case but anybody who's spent a long-ish career in programming knows it's actually more common than anyone would like;-).
In past I have used 'Python call graph' to understand the source structure
Use a debugger e.g. pdb to wak thru
the code.
Try to read code again after one day
break, that also helps
I would recommend to generate some documentation with epydoc http://epydoc.sourceforge.net/ . For sure, if no docstring exists, the result will be poor but it will give you at least one view of your application and you'lle be able to navigate in the classes more easily.
Then you can try to document by yourself when you understand something new and then regenerate the docs again. It is never too late to start something.
I hope it helps
You are lucky it's in Python which is easy to read. But it is of course possible to write tricky hard to understand code in Python as well.
The steps are:
Run the software and learn to use it, and understand it's features at least a little bit.
Read though the tests, if any.
Read through the code.
When you encounter code you don't understand, put a debug break there, and step through the code, looking at what it does.
If there aren't any tests, or the test coverage is low, write tests to increase the test coverage. It's a good way to learn the system.
Repeat until you feel you have a vague grip on the code. A vague grip is all you need if you are going to manage the code. You'll get a good grip once you start actually working with the code. For a big system that can take years, so don't try to understand it all first.
There are tools that can help you. As Stephen C says, an IDE is a good idea. I'll explain why:
Many editors analyses the code. This typically gives you code completion, but more importantly in this case, it makes it possible to just just ctrl-click on a variable to see where it comes from. This really speeds things up when you want to understand otehr peoples code.
Also, you need to learn a debugger. You will, in tricky parts of the code, have to step through them in a debugger to see what the code actually do. Pythons pdb works, but many IDE's have integrated debuggers, which make debugging easier.
That's it. Good luck.
I have had to do a lot of this in my job. What works for me may be different to what works for you, but I'll share my experience.
I start by trying to identify the data structures being used and draw diagrams showing the relationships between them. Not necessarily something formal like UML, but a sketch on paper which you understand which allows you to see the overall structure of the data being manipulated by the program. Only once I have some view of the data structures being used do I start to try to understand how the data is being manipulated.
Secondly, for a large body of software, sometimes you need to just attack bite sized pieces at first. You won't get an overall understanding straight away, but if you understand small parts in detail and keep chipping away, eventually all the pieces fall together.
I combine these two approaches, switching between them when I am getting overly frustrated or bored. Regular walks around the block are recommended :) I find this gets me good results in the end.
Good luck!
pyreverse from Logilab and PyNSource from Andy Bulka are helpful too for UML diagram generation.
I'd start with a good python IDE. See the answers for this question.
Enterprise Architect by Sparx Systems is very good at processing a source directory and generating class diagrams. It is not free, but very reasonably priced for what you get. (I am not associated with this company in any way, I've just been a satisfied user of their product for several years.)

Learning to write reusable libraries [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
We need to write simple scripts to manipulate the configuration of our load balancers (ie, drain nodes from pools, enabled or disable traffic rules). The load balancers have a SOAP API (defined through a bunch of WSDL files) which is very comprehensive but using it is quite low-level with a lot of manual error checking and list manipulation. It doesn't tend to produce reusable, robust code.
I'd like to write a Python library to handle the nitty-gritty of interacting with the SOAP interface but I don't really know where to start; all of my coding experience is with writing one-off monolithic programs for specific jobs. This is fine for small jobs but it's not helping me or my coworkers -- we're reinventing the wheel with a different number of spokes each time :~)
The API already provides methods like getPoolNames() and getDrainingNodes() but they're a bit awkward to use. Most take a list of nodes and return another list, so (say) working out which virtual servers are enabled involves this sort of thing:
names = conn.getVirtualServerNames()
enabled = conn.getEnabled(names)
for i in range(0, len(names)):
if (enabled[i]):
print names[i]
conn.setEnabled(['www.example.com'], [0])
Whereas something like this:
lb = LoadBalancer('hostname')
for name in [vs.name for vs in lb.virtualServers() if vs.isEnabled()]:
print name
www = lb.virtualServer('www.example.com').disable()
is more Pythonic and (IMHO) easier.
There are a lot of things I'm not sure about: how to handle errors, how to deal with 20-odd WSDL files (a SOAPpy/suds instance for each?) and how much boilerplate translation from the API methods to my methods I'll need to do.
This is more an example of a wider problem (how to learn to write libraries instead of one-off scripts) so I don't want answers to these specific questions -- they're there to demonstrate my thinking and illustrate my problem. I recognise a code smell in the way I do things at the moment (one-off, non-reusable code) but I don't know how to fix it. How does one get into the mindset for tackling problems at a more abstract level? How do you 'learn' software design?
"I don't really know where to start"
Clearly false. You provided an excellent example. Just do more of that. It's that simple.
"There are a lot of things I'm not sure about: how to handle errors, how to deal with 20-odd WSDL files (a SOAPpy/suds instance for each?) and how much boilerplate translation from the API methods to my methods I'll need to do."
Handle errors by raising an exception. That's enough. Remember, you're still going to have high-level scripts using your API library.
20-odd WSDL files? Just pick something for now. Don't overengineer this. Design the API -- as you did with your example -- for the things you want to do. The WSDL's and the number of instances will become clear as you go. One, Ten, Twenty doesn't really matter to users of your API library. It only matters to you, the maintainer. Focus on the users.
Boilerplate translation? As little as possible. Focus on what parts of these interfaces you use with your actual scripts. Translate just what you need and nothing more.
An API is not fixed, cast in concrete, a thing of beauty and a joy forever. It's just a module (in your case a package might be better) that does some useful stuff.
It will undergo constant change and evolution.
Don't overengineer the first release. Build something useful that works for one use case. Then add use cases to it.
"But what if I realize I did something wrong?" That's inevitable, you'll always reach this point. Don't worry about it now.
The most important thing about writing an API library is writing the unit tests that (a) demonstrate how it works and (b) prove that it actually works.
There's an excellent presentation by Joshua Bloch on API design (and thus leading to library design). It's well worth watching. IIRC it's Java-focused, but the principles will apply to any language.
If you are not afraid of C++, there is an excellent book on the subject called "Large-scale C++ Software Design".
This book will guide you through the steps of designing a library by introducing "physical" and "logical" design.
For instance, you'll learn to flatten your components' hierarchy, to restrict dependency between components, to create levels of abstraction.
The is really "the" book on software design IMHO.

Categories

Resources