How to reverse engineer a program which has no documentation [closed]

How to reverse engineer a program which has no documentation [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a source of python program which doesn't have any documentation or comments.
I did tried twice to understand it but most of the times I am losing my track, because there are many files.
What should be the steps to understand that program fully and quickly.

Michael Feathers' "Working Effectively with Legacy Code" is a superb starting point for such endeavors -- not particularly language-dependent (his examples are in several non-python languages, but the techniques and mindset DO extend pretty well to Python and just about any other language).
The key focus is, that you want to understand the code for a reason -- modifying it and/or porting it. So, instrumenting the legacy code -- with batteries and scaffolding of tests and tracing/logging -- is the crucial path on the long, hard slog to understanding and modifying safely and responsibly.
Feathers suggests heuristics and techniques for where to focus your efforts and how to get started when the code is a total mess (hence "legacy") - no docs, or misleading docs (describing something quite different, maybe in subtle ways, from what the code actually DOES), no tests, an untestable-without-refactoring tangle of spaghetti dependencies. This may seem an extreme case but anybody who's spent a long-ish career in programming knows it's actually more common than anyone would like;-).

In past I have used 'Python call graph' to understand the source structure
Use a debugger e.g. pdb to wak thru
the code.
Try to read code again after one day
break, that also helps

I would recommend to generate some documentation with epydoc http://epydoc.sourceforge.net/ . For sure, if no docstring exists, the result will be poor but it will give you at least one view of your application and you'lle be able to navigate in the classes more easily.
Then you can try to document by yourself when you understand something new and then regenerate the docs again. It is never too late to start something.
I hope it helps

You are lucky it's in Python which is easy to read. But it is of course possible to write tricky hard to understand code in Python as well.
The steps are:
Run the software and learn to use it, and understand it's features at least a little bit.
Read though the tests, if any.
Read through the code.
When you encounter code you don't understand, put a debug break there, and step through the code, looking at what it does.
If there aren't any tests, or the test coverage is low, write tests to increase the test coverage. It's a good way to learn the system.
Repeat until you feel you have a vague grip on the code. A vague grip is all you need if you are going to manage the code. You'll get a good grip once you start actually working with the code. For a big system that can take years, so don't try to understand it all first.
There are tools that can help you. As Stephen C says, an IDE is a good idea. I'll explain why:
Many editors analyses the code. This typically gives you code completion, but more importantly in this case, it makes it possible to just just ctrl-click on a variable to see where it comes from. This really speeds things up when you want to understand otehr peoples code.
Also, you need to learn a debugger. You will, in tricky parts of the code, have to step through them in a debugger to see what the code actually do. Pythons pdb works, but many IDE's have integrated debuggers, which make debugging easier.
That's it. Good luck.

I have had to do a lot of this in my job. What works for me may be different to what works for you, but I'll share my experience.
I start by trying to identify the data structures being used and draw diagrams showing the relationships between them. Not necessarily something formal like UML, but a sketch on paper which you understand which allows you to see the overall structure of the data being manipulated by the program. Only once I have some view of the data structures being used do I start to try to understand how the data is being manipulated.
Secondly, for a large body of software, sometimes you need to just attack bite sized pieces at first. You won't get an overall understanding straight away, but if you understand small parts in detail and keep chipping away, eventually all the pieces fall together.
I combine these two approaches, switching between them when I am getting overly frustrated or bored. Regular walks around the block are recommended :) I find this gets me good results in the end.
Good luck!

pyreverse from Logilab and PyNSource from Andy Bulka are helpful too for UML diagram generation.

I'd start with a good python IDE. See the answers for this question.

Enterprise Architect by Sparx Systems is very good at processing a source directory and generating class diagrams. It is not free, but very reasonably priced for what you get. (I am not associated with this company in any way, I've just been a satisfied user of their product for several years.)

Related

what is the best way to read a python function? down-up, right-left? or vice versa [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
edit: bad wording on original title "are most programing langs like python from bottom up and right to left"
question - are a lot of programming languages like python that you want to look at your statement from bottom up and right to left?
simple question i quess, yet i'm learning how to program in python and know (intermediate) how to make shell scripts in bash.
when i make bash scripts i start from the top and work my way down, yet with python it seems like i have to think about what i want and then backward engineer how to make the program run.
so does anyone have any ideas on to how to make this simpler for me to understand?
(when i do find something that works for the answer that i'm looking for in python that works, i always put a '''comment''' at the end to try to remember to look from bottom up and right to left, in that statement anyway).
this it seems (comments) are very helpful for me to understand more about what i'm trying to accomplish (especially when it comes to recursion), yet does anyone know of examples, discussions, (short cuts to understand), the backwardness of how to program a program?
one more question,
are programs just a lot of scripts put together in the proper sequences?
edit: thank you for your responses they are helpful.
moderators, please let others reply for a little bit longer, and then you can close it.
to those that have replied, i'm taking the MIT Introduction to Computer Science and Programming using Python to try to actually learn how to program with python and get a better understanding of programming in general.
to somewhat understand what i mean by what i stated above see a bash script that i wrote quite a while back ->
https://drive.google.com/open?id=13WtPvaabM9__hUWNOWzPOai9n34LeiWa
and me trying to turn it into a python program. everything that is in it i just looked up on the web, read the docs, and asked some questions here, before taking the MIT class. python attempt ->
https://drive.google.com/open?id=10NesR4FONR8k1vegJjwKdHloC5r0JrWm

You can write your program in python without using functions.
On the other hand, you can write shell scripts using functions.
So if you 'find something that works' and that code is organized in functions, then it was the choice of the developer to organize that code in functions.
Shell scripts tend to be short and are often used for one specific purpose.
In that case many people just write the code that is necessary without thinking to much about readability or code structure.
For longer scripts or programs people tend to switch to languages like python. At that point code layout and readability get more important and now many people structure the code into functions, because they made the experience that their code is easier to maintain and understand that way.
This is maybe the reason, the shell script examples you find most are 'top down left right', and the python examples are structured in functions and classes

are a lot of programming languages like python that you want to look at your statement from bottom up and right to left?
In many languages and frameworks understanding the execution flow can be pretty complicated! Your code may be grouped and called in many unexpected ways.
Programming languages also allow you to run code in parallel, so that you can have two pieces of code running at the same time. This is done using threads or processes or asynchronous programming.
As you have probably already seen, in Python there are functions, classes and modules. Those are used to "group" code so that it is easy to re-use, and also to define abstract concepts that make programming easy. Those structures make code flow more complicated (because you "jump" from a place to another), yet they are very powerful tools.
when i make bash scripts i start from the top and work my way down, yet with python it seems like i have to think about what i want and then backward engineer how to make the program run.
Bash can get complicated too, in a way similar to Python. Look at the script to install Docker: this is an example of a shell script that is not "top to bottom".
And, similarly, you can write simple Python scripts that are executed from top to bottom too. This is not true for all languages however (e.g. C or Java, where you must use functions).
so does anyone have any ideas on to how to make this simpler for me to understand?
Experience is the key. Even old programmers have to spend quite some time to learn and understand new languages and frameworks they're not familiar with.
Writing comments and documentation (like you already do) can be helpful too: you can document how and when your functions should be called, you can document what functions are going to be called. Example:
def frobnicate():
"""This function transforms foo into baz"""
# insert magic here
def main():
"""This is the program ingress point. It calls frobnicate() three times."""
for i in range(3):
frobnicate()
if __name__ == '__main__':
# Call the ingress point, defined above
main()
With free/open source software, you also have the advantage that you can look at the source. So, for example, if you want to know how a function frobnicate() gets called, you can look at the source for the keyword "frobnicate" and find your answer. It's not always straightforward, but it's a useful approach.

Good practices in writing MATLAB code? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to know the basic principles and etiquette of writing a well structured code.

Read Code Complete, it will do wonders for everything. It'll show you where, how, and when things matter. It's pretty much the Bible of software development (IMHO.)

These are the most important two things to keep in mind when you are writing code:
Don't write code that you've already written.
Don't write code that you don't need to write.

MATLAB Programming Style Guidelines by Richard Johnson is a good resource.

Well, if you want it in layman's terms:
I reccomend people to write the shortest readable program that works.
There are a lot more rules about how to format code, name variables, design classes, separate responsibilities. But you should not forget that all of those rules are only there to make sure that your code is easy to check for errors, and to ensure it is maintainable by someone else than the original author. If keep the above reccomendation in mind, your progam will be just that.

This list could go on for a long time but some major things are:
Indent.
Descriptive variable names.
Descriptive class / function names.
Don't duplicate code. If it needs duplication put in a class / function.
Use gettors / settors.
Only expose what's necessary in your objects.
Single dependency principle.
Learn how to write good comments, not lots of comments.
Take pride in your code!
Two good places to start:
Clean-Code Handbook
Code-Complete

If you want something to use as a reference or etiquette, I often follow the official Google style conventions for whatever language I'm working in, such as for C++ or for Python.
The Practice of Programming by Rob Pike and Brian W. Kernighan also has a section on style that I found helpful.

First of all, "codes" is not the right word to use. A code is a representation of another thing, usually numeric. The correct words are "source code", and the plural of source code is source code.
--
Writing good source code:
Comment your code.
Use variable names longer than several letters. Between 5 and 20 is a good rule of thumb.
Shorter lines of code is not better - use whitespace.
Being "clever" with your code is a good way to confuse yourself or another person later on.
Decompose the problem into its components and use hierarchical design to assemble the solution.
Remember that you will need to change your program later on.
Comment your code.
There are many fads in computer programming. Their proponents consider those who are not following the fad unenlightened and not very with-it. The current major fads seem to be "Test Driven Development" and "Agile". The fad in the 1990s was 'Object Oriented Programming'. Learn the useful core parts of the ideas that come around, but don't be dogmatic and remember that the best program is one that is getting the job done that it needs to do.
very trivial example of over-condensed code off the top of my head
for(int i=0,j=i; i<10 && j!=100;i++){
if i==j return i*j;
else j*=2;
}}
while this is more readable:
int j = 0;
for(int i = 0; i < 10; i++)
{
if i == j
{
return i * j;
}
else
{
j *= 2;
if(j == 100)
{
break;
}
}
}
The second example has the logic for exiting the loop clearly visible; the first example has the logic entangled with the control flow. Note that these two programs do exactly the same thing. My programming style takes up a lot of lines of code, but I have never once encountered a complaint about it being hard to understand stylistically, while I find the more condensed approaches frustrating.
An experienced programmer can and will read both - the above may make them pause for a moment and consider what is happening. Forcing the reader to sit down and stare at the code is not a good idea. Code needs to be obvious. Each problem has an intrinsic complexity to expressing its solution. Code should not be more complex than the solution complexity, if at all possible.
That is the essence of what the other poster tried to convey - don't make the program longer than need be. Longer has two meanings: more lines of code (ie, putting braces on their own line), and more complex. Making a program more complex than need be is not good. Making it more readable is good.

Have a look to
97 Things Every Programmer Should Know.
It's free and contains a lot of gems like this one:
There is one quote that I think is
particularly good for all software
developers to know and keep close to
their hearts:
Beauty of style and harmony and grace
and good rhythm depends on simplicity.
— Plato
In one sentence I think this sums up
the values that we as software
developers should aspire to.
There are a number of things we strive
for in our code:
Readability
Maintainability
Speed of development
The elusive quality of beauty
Plato is telling us that the enabling
factor for all of these qualities is
simplicity.

The Python Style Guide is always a good starting point!

European Standards For Writing and Documenting Exchangeable Fortran 90 Code have been in my bookmarks, like forever. Also, there was a thread in here, since you are interested in MATLAB, on organising MATLAB code.

Personally, I've found that I learned more about programming style from working through SICP which is the MIT Intro to Comp SCI text (I'm about a quarter of the way through.) Than any other book. That being said, If you're going to be working in Python, the Google style guide is an excellent place to start.
I read somewhere that most programs (scripts anyways) should never be more than a couple of lines long. All the requisite functionality should be abstracted into functions or classes. I tend to agree.

Many good points have been made above. I definitely second all of the above. I would also like to add that spelling and consistency in coding be something you practice (and also in real life).
I've worked with some offshore teams and though their English is pretty good, their spelling errors caused a lot of confusion. So for instance, if you need to look for some function (e.g., getFeedsFromDatabase) and they spell database wrong or something else, that can be a big or small headache, depending on how many dependencies you have on that particular function. The fact that it gets repeated over and over within the code will first off, drive you nuts, and second, make it difficult to parse.
Also, keep up with consistency in terms of naming variables and functions. There are many protocols to go by but as long as you're consistent in what you do, others you work with will be able to better read your code and be thankful for it.

Pretty much everything said here, and something more. In my opinion the best site concerning what you're looking for (especially the zen of python parts are fun and true)
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Talks about both PEP-20 and PEP-8, some easter eggs (fun stuff), etc...

You can have a look at the Stanford online course: Programming Methodology CS106A. The instructor has given several really good instruction for writing source code.
Some of them are as following:
write programs for people to read, not just for computers to read. Both of them need to be able to read it, but it's far more important that a person reads it and understands it, and that the computer still executes it correctly. But that's the first major software engineering principle to
think about.
How to make comments:
 put in comments to clarify things in the program, which are not obvious
How to make decomposition
One method solves one problem
Each method has code approximate 1~15lines
Give methods good names
Write comment for code

Unit Tests
Python and matlab are dynamic languages. As your code base grows, you will be forced to refactor your code. In contrast to statically typed languages, the compiler will not detect 'broken' parts in your project. Using unit test frameworks like xUnit not only compensate missing compiler checks, they allow refactoring with continuous verification for all parts of your project.
Source Control
Track your source code with a version control system like svn, git or any other derivative. You'll be able to back and forth in your code history, making branches or creating tags for deployed/released versions.
Bug Tracking
Use a bug tracking system, if possible connected with your source control system, in order to stay on top of your issues. You may not be able, or forced, to fix issues right away.
Reduce Entropy
While integrating new features in your existing code base, you will add more lines of code, and potentially more complexity. This will increase entropy. Try to keep your design clean, by introducing an interface, or inheritance hierarchy in order to reduce entropy again. Not paying attention to code entropy will render your code unmaintainable over time.
All of The Above Mentioned
Pure coding related topics, like using a style guide, not duplicating code, ...,
has already been mentioned.

A small addition to the wonderful answers already here regarding Matlab:
Avoid long scripts, instead write functions (sub routines) in separate files. This will make the code more readable and easier to optimize.
Use Matlab's built-in functions capabilities. That is, learn about the many many functions that Matlab offers instead of reinventing the wheel.
Use code sectioning, and whatever the other code structure the newest Matlab version offers.
Learn how to benchmark your code using timeit and profile . You'll discover that sometimes for loops are the better solution.

The best advice I got when I asked this question was as follows:
Never code while drunk.

Make it readable, make it intuitive, make it understandable, and make it commented.

Python progression path - From apprentice to guru

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've been learning, working, and playing with Python for a year and a half now. As a biologist slowly making the turn to bio-informatics, this language has been at the very core of all the major contributions I have made in the lab. I more or less fell in love with the way Python permits me to express beautiful solutions and also with the semantics of the language that allows such a natural flow from thoughts to workable code.
What I would like to know is your answer to a kind of question I have seldom seen in this or other forums. This question seems central to me for anyone on the path to Python improvement but who wonders what his next steps should be.
Let me sum up what I do NOT want to ask first ;)
I don't want to know how to QUICKLY learn Python
Nor do I want to find out the best way to get acquainted with the language
Finally, I don't want to know a 'one trick that does it all' approach.
What I do want to know your opinion about, is:
What are the steps YOU would recommend to a Python journeyman, from apprenticeship to guru status (feel free to stop wherever your expertise dictates it), in order that one IMPROVES CONSTANTLY, becoming a better and better Python coder, one step at a time. Some of the people on SO almost seem worthy of worship for their Python prowess, please enlighten us :)
The kind of answers I would enjoy (but feel free to surprise the readership :P ), is formatted more or less like this:
Read this (eg: python tutorial), pay attention to that kind of details
Code for so manytime/problems/lines of code
Then, read this (eg: this or that book), but this time, pay attention to this
Tackle a few real-life problems
Then, proceed to reading Y.
Be sure to grasp these concepts
Code for X time
Come back to such and such basics or move further to...
(you get the point :)
I really care about knowing your opinion on what exactly one should pay attention to, at various stages, in order to progress CONSTANTLY (with due efforts, of course). If you come from a specific field of expertise, discuss the path you see as appropriate in this field.
EDIT: Thanks to your great input, I'm back on the Python improvement track! I really appreciate!

I thought the process of Python mastery went something like:
Discover list comprehensions
Discover generators
Incorporate map, reduce, filter, iter, range, xrange often into your code
Discover Decorators
Write recursive functions, a lot
Discover itertools and functools
Read Real World Haskell (read free online)
Rewrite all your old Python code with tons of higher order functions, recursion, and whatnot.
Annoy your cubicle mates every time they present you with a Python class. Claim it could be "better" implemented as a dictionary plus some functions. Embrace functional programming.
Rediscover the Strategy pattern and then all those things from imperative code you tried so hard to forget after Haskell.
Find a balance.

One good way to further your Python knowledge is to dig into the source code of the libraries, platforms, and frameworks you use already.
For example if you're building a site on Django, many questions that might stump you can be answered by looking at how Django implements the feature in question.
This way you'll continue to pick up new idioms, coding styles, and Python tricks. (Some will be good and some will be bad.)
And when you see something Pythony that you don't understand in the source, hop over to the #python IRC channel and you'll find plenty of "language lawyers" happy to explain.
An accumulation of these little clarifications over years leads to a much deeper understanding of the language and all of its ins and outs.

Understand (more deeply) Python's data types and their roles with regards to memory mgmt
As some of you in the community are aware, I teach Python courses, the most popular ones being the comprehensive Intro+Intermediate course as well as an "advanced" course which introduces a variety of areas of application development.
Quite often, I get asked a question quite similar to, "Should I take your intro or advanced course? I've already been programming Python for 1-2 years, and I think the intro one is too simple for me so I'd like to jump straight to the advanced... which course would you recommend?"
To answer their question, I probe to see how strong they are in this area -- not that it's really the best way to measure whether they're ready for any advanced course, but to see how well their basic knowledge is of Python's objects and memory model, which is a cause of many Python bugs written by those who are not only beginners but those who have gone beyond that.
To do this, I point them at this simple 2-part quiz question:
Many times, they are able to get the output, but the why is more difficult and much more important of an response... I would weigh the output as 20% of the answer while the "why" gets 80% credit. If they can't get the why, regardless how Python experience they have, I will always steer people to the comprehensive intro+intermediate course because I spend one lecture on objects and memory management to the point where you should be able to answer with the output and the why with sufficient confidence. (Just because you know Python's syntax after 1-2 years doesn't make you ready to move beyond a "beginner" label until you have a much better understanding as far as how Python works under the covers.)
A succeeding inquiry requiring a similar answer is even tougher, e.g.,
Example 3
x = ['foo', [1,2,3], 10.4]
y = list(x) # or x[:]
y[0] = 'fooooooo'
y[1][0] = 4
print x
print y
The next topics I recommend are to understanding reference counting well, learning what "interning" means (but not necessarily using it), learning about shallow and deep copies (as in Example 3 above), and finally, the interrelationships between the various types and constructs in the language, i.e. lists vs. tuples, dicts vs. sets, list comprehensions vs. generator expressions, iterators vs. generators, etc.; however all those other suggestions are another post for another time. Hope this helps in the meantime! :-)
ps. I agree with the other responses for getting more intimate with introspection as well as studying other projects' source code and add a strong "+1" to both suggestions!
pps. Great question BTW. I wish I was smart enough in the beginning to have asked something like this, but that was a long time ago, and now I'm trying to help others with my many years of full-time Python programming!!

Check out Peter Norvig's essay on becoming a master programmer in 10 years: http://norvig.com/21-days.html. I'd wager it holds true for any language.

Understand Introspection
write a dir() equivalent
write a type() equivalent
figure out how to "monkey-patch"
use the dis module to see how various language constructs work
Doing these things will
give you some good theoretical knowledge about how python is implemented
give you some good practical experience in lower-level programming
give you a good intuitive feel for python data structures

def apprentice():
read(diveintopython)
experiment(interpreter)
read(python_tutorial)
experiment(interpreter, modules/files)
watch(pycon)
def master():
refer(python-essential-reference)
refer(PEPs/language reference)
experiment()
read(good_python_code) # Eg. twisted, other libraries
write(basic_library) # reinvent wheel and compare to existing wheels
if have_interesting_ideas:
give_talk(pycon)
def guru():
pass # Not qualified to comment. Fix the GIL perhaps?

I'll give you the simplest and most effective piece of advice I think anybody could give you: code.
You can only be better at using a language (which implies understanding it) by coding. You have to actively enjoy coding, be inspired, ask questions, and find answers by yourself.
Got a an hour to spare? Write code that will reverse a string, and find out the most optimum solution. A free evening? Why not try some web-scraping. Read other peoples code. See how they do things. Ask yourself what you would do.
When I'm bored at my computer, I open my IDE and code-storm. I jot down ideas that sound interesting, and challenging. An URL shortener? Sure, I can do that. Oh, I learnt how to convert numbers from one base to another as a side effect!
This is valid whatever your skill level. You never stop learning. By actively coding in your spare time you will, with little additional effort, come to understand the language, and ultimately, become a guru. You will build up knowledge and reusable code and memorise idioms.

If you're in and using python for science (which it seems you are) part of that will be learning and understanding scientific libraries, for me these would be
numpy
scipy
matplotlib
mayavi/mlab
chaco
Cython
knowing how to use the right libraries and vectorize your code is essential for scientific computing.
I wanted to add that, handling large numeric datasets in common pythonic ways(object oriented approaches, lists, iterators) can be extremely inefficient. In scientific computing, it can be necessary to structure your code in ways that differ drastically from how most conventional python coders approach data.

Google just recently released an online Python class ("class" as in "a course of study").
http://code.google.com/edu/languages/google-python-class/
I know this doesn't answer your full question, but I think it's a great place to start!

Download Twisted and look at the source code. They employ some pretty advanced techniques.

Thoroughly Understand All Data Types and Structures
For every type and structure, write a series of demo programs that exercise every aspect of the type or data structure. If you do this, it might be worthwhile to blog notes on each one... it might be useful to lots of people!

I learned python first by myself over a summer just by doing the tutorial on the python site (sadly, I don't seem to be able to find that anymore, so I can't post a link).
Later, python was taught to me in one of my first year courses at university. In the summer that followed, I practiced with PythonChallenge and with problems from Google Code Jam.
Solving these problems help from an algorithmic perspective as well as from the perspective of learning what Python can do as well as how to manipulate it to get the fullest out of python.
For similar reasons, I have heard that code golf works as well, but i have never tried it for myself.

Learning algorithms/maths/file IO/Pythonic optimisation
This won't get you guru-hood but to start out, try working through the Project Euler problems
The first 50 or so shouldn't tax you if you have decent high-school mathematics and know how to Google. When you solve one you get into the forum where you can look through other people's solutions which will teach you even more. Be decent though and don't post up your solutions as the idea is to encourage people to work it out for themselves.
Forcing yourself to work in Python will be unforgiving if you use brute-force algorithms.
This will teach you how to lay out large datasets in memory and access them efficiently with the fast language features such as dictionaries.
From doing this myself I learnt:
File IO
Algorithms and techniques such as Dynamic Programming
Python data layout
Dictionaries/hashmaps
Lists
Tuples
Various combinations thereof, e.g. dictionaries to lists of tuples
Generators
Recursive functions
Developing Python libraries
Filesystem layout
Reloading them during an interpreter session
And also very importantly
When to give up and use C or C++!
All of this should be relevant to Bioinformatics
Admittedly I didn't learn about the OOP features of Python from that experience.

Have you seen the book "Bioinformatics Programming using Python"? Looks like you're an exact member of its focus group.

You already have a lot of reading material, but if you can handle more, I recommend you
learn about the evolution of python by reading the Python Enhancement Proposals, especially the "Finished" PEPs and the "Deferred, Abandoned, Withdrawn, and Rejected" PEPs.
By seeing how the language has changed, the decisions that were made and their rationales, you will absorb the philosophy of Python and understand how "idiomatic Python" comes about.
http://www.python.org/dev/peps/

Attempt http://challenge.greplin.com/ using Python

Teaching to someone else who is starting to learn Python is always a great way to get your ideas clear and sometimes, I usually get a lot of neat questions from students that have me to re-think conceptual things about Python.

Not precisely what you're asking for, but I think it's good advice.
Learn another language, doesn't matter too much which. Each language has it's own ideas and conventions that you can learn from. Learn about the differences in the languages and more importantly why they're different. Try a purely functional language like Haskell and see some of the benefits (and challenges) of functions free of side-effects. See how you can apply some of the things you learn from other languages to Python.

I recommend starting with something that forces you to explore the expressive power of the syntax. Python allows many different ways of writing the same functionality, but there is often a single most elegant and fastest approach. If you're used to the idioms of other languages, you might never otherwise find or accept these better ways. I spent a weekend trudging through the first 20 or so Project Euler problems and made a simple webapp with Django on Google App Engine. This will only take you from apprentice to novice, maybe, but you can then continue to making somewhat more advanced webapps and solve more advanced Project Euler problems. After a few months I went back and solved the first 20 PE problems from scratch in an hour instead of a weekend.

All code in one file [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
After asking organising my Python project and then calling from a parent file in Python it's occurring to me that it'll be so much easier to put all my code in one file (data will be read in externally).
I've always thought that this was bad project organisation but it seems to be the easiest way to deal with the problems I'm thinking I will face. Have I simply gotten the wrong end of the stick with file count or have I not seen some great guide on large (for me) projects?

If you are planning to use any kind of SCM then you are going to be screwed. Having one file is a guaranteed way to have lots of collisions and merges that will be painstaking to deal with over time.
Stick to conventions and break apart your files. If nothing more than to save the guy who will one day have to maintain your code...

If your code is going to work together all the time anyway, and isn't useful separately, there's nothing wrong with keeping everything in one file. I can think of at least popular package (BeautifulSoup) that does this. Sure makes installation easier.
Of course, if it seems, down the road, that you could use part of your code with another project, or if maintainance starts to be an issue, then worry about organizing your project differently.
It seems to me from the questions you've been asking lately that you're worrying about all of this a bit prematurely. Often, for me, these sorts of issues are better tackled a little later on in the solution. Especially for smaller projects, my goal is to get a solution that is correct, and then optimal.

It's always a now verses then argument. If you're under the gun to get it done, do it. Source control will be a problem later, as with many things there's no black and white answer. You need to be responsible to both your deadline and the long term maintenance of the code.

If that's the best way to organise it, you're probably doing something wrong.
If it's more than just a toy program or a simple script, then you should break it up into separate files, etc. It's the only sane way of doing it. When your project gets big enough that you need someone else helping on it, then it will make the SCM a whole bunch easier.
Additionally, sooner or later you are going to need to add a separate utility to your project, that is going to need some common code/structures. It's far easier to do this if you have separate source files than if you have just one big one.

Looking at your earlier questions I would say all code in one file would be a good intermediate state on the way to a complete refactoring of your project. To do this you'll need a regression test suite to make sure you don't break the project while refactoring it.
Once all your code is in one file, I suggest iterating on the following:
Identify a small group of interdependent classes.
Pull those classes into a separate file.
Add unit tests for the new separate file.
Retest the entire project.
Depending on the size of your project, it shouldn't take too many iterations for you to reach something reasonable.

Since Calling from a parent file in Python indicates serious design problems, I'd say that you have two choices.
Don't have a library module try to call back to main. You'll have to rewrite things to fix this.
[An imported component calling the main program is an improper dependency. And Python doesn't support it because it's a poor design.]
Put it all in one file until you figure out a better design with proper one-way dependencies. Then you'll have to rewrite it to fix the dependency problems.
A module (a single file) should be a logical piece of related code. Not everything. Not a single class definition. There's a middle ground of modularity.
Additionally, there should be a proper one-way dependency graph from main program to components (which do NOT depend on the main program) to utility libraries and what-not (that do not know about the components OR the main program.
Circular (or mutual) dependencies often indicate a design problem. Callbacks are one way out of the problem. Another way is to decompose the circular elements to get a proper one-way graph.

Beginner looking for beautiful and instructional Python code [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
As a complete beginner with no programming experience, I am trying to find beautiful Python code to study and play with. Please answer by pointing to a website, a book or some software project.
I have the following criterias:
complete code listings (working, hackable code)
beautiful code (highly readable, simple but effective)
instructional for the beginner (yes, hand-holding is needed)
I've tried learning how to program for too long now, never gotten to the point where the rubber hits the road. My main agenda is best spelled out by Nat Friedman's "How to become a hacker".
I'm aware of O'Reilly's "Beautiful Code", but think of it as too advanced and confusing for a beginner.

Buy Programming Collective Intelligence. Great book of interesting AI algorithms based on mining data and all of the examples are in very easy to read Python.
The other great book is Text Processing in Python

Read the Python libraries themselves. They're working, hackable, elegant, and instructional. Some is simple, some is complex.
Best of all, you got it when you downloaded Python itself. It's in your Python library directory. Nothing more to do except start poking around.

Just do it.
Seriously, you're never going to learn to be a good programmer until you write some programs. First you'll write bad programs, then you'll fix them, then you'll write better ones, etc...
If you aren't insatiably motivated to try coding, then maybe it isn't for you. One way to get motivated is to get a job that requires you to code... for me, there's nothing like having my salary and pride on the line to get me working :)

The Python project itself maintains a nice list of beginner's guides.

Beautiful is so hard to define, there's no real answer to this question. Your best advice to follow what Nat says in the post you linked:
Download the source code to the program you want to change
Untar it on your hard drive
Get it to build and run
Open the source code in an editor
Find the part of the code that you need to change to make the program do what you want it to do
Make the changes you need to make to the code and test it to make sure it works
Run the diff -u command and email the output to the mailing list
There is no point looking for beautiful code. Just look at and fix bugs in projects that you use (Django & Twisted might be good candidates).

I've seen How to Think Like a Computer Scientist recommended in many blogs.

I personally think that reading good code won't work until you have a firm understanding of the language, especially of its idioms. First, I recommend the basic Wikibook "Non-Programmer's Tutorial for Python" to start out. If most of that makes sense, you have a good understanding of the basics already.
After that, I recommend Dive into Python. You'll see a lot of other people recommending this book, because it's comprehensive and free. You'll learn a lot of language specific idioms in Dive into Python, especially in the first few chapters. As you're reading it, try to do basic programs using the techniques Mark Pilgrim shows.
Dive into Python gets into specific modules later in the book. That will probably get a little boring, and when it does, you might want to look at code. I don't feel qualified to rank the code used by these, but Django and Deluge are both bigger projects that will show you the organization of large programs. Though they will probably be overwhelming unless you take the time to really attack them one piece at a time and get a firm understanding.

I've learned quite a bit of beautiful and useful Python from O'Reilly's Python Cookbook. http://oreilly.com/catalog/9780596001674/
I've also learned much from ActiveState's Python Recipe's web page. http://code.activestate.com/recipes/langs/python/

I'd recommend you review Exaile music player for linux. It includes a lot of practically useful things like plugins, lambda, decorators, settings manager, gui (using GTK+) and much more.
Exaile source code is not an ideal but will give you enough helpful information and basic Python coding concepts.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.