This question already has answers here:
Python Compilation/Interpretation Process
(2 answers)
Closed 6 years ago.
When I learned C/C++ I not only learned the syntax of the language and the semantics of the language constructs but also how the program was executed by the computer. I learned stuff like:
All local variables are declared in a stack frame, which is allocated
each time the function is called. These frames are laid out on the
call stack one right after the other, and they're 'unwound' when the
function returns, thus quickly and efficiently 'destroying' the local
variables for that function
This, in turn, helped me figure out why it's a bad idea to take the address of a local variable & return it to the calling function. In other words, understanding C/C++'s memory model and the environment in which the code executes helps develop a deeper understanding of how to write correct programs.
Another example: malloc/new allocate objects in the 'heap' (and not the stack), which both explains why they live beyond the end of the function that allocated them. Further, knowing that these functions/keywords return the memory address that the object is located at helped me to understand how things like linked lists work. (And how to figure out whether I need another * or -> or not.)
So now I'm learning Python and I'm looking for a concise, clear, yet thorough explanation of how Python programs manage their memory and execution environment.
Searching online hasn't been particularly fruitful - I might be using the wrong search terms, but there appears to be very little out there in general.
I've looked through https://docs.python.org/ and found it to be an excellent resource for syntax and semantics ("if you type X, then Y will happen"), but it doesn't really describe what the computer is doing 'under the hood'.
I've found several posts here on Stack Overflow ( such as this, this, and this ) but these all seem to focus on specific situations.
Does anyone know of a resource that actually explains what Python is doing under the hood'?
EDIT: I'm getting feedback from StackOverflow that this question may
be a duplicate. The other question asks 'how does a .PY file get
compiled (to bytecode) and then executed by the VM?' What I'm asking
here is 'is there a page that explains how Python lays variables,
objects, functions, etc out in memory AND explains how that's all used
to actually run Python programs' (Sub-question: Is this an appropriate
way to address the concern about a duplicate question? Would it be
better to put this in a comment?)
With python what you want to know about what's going on under the hood straight away to be able to work efficiently with the language is quite different from C/C++ since it's a quite different language environment. What you want to get you head around is not so much the nitty-gritty of what's going on memory, but Python's Data Model.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The first part is what I want to do and the questions. Before discussing why I want to to that and proposing counterarguments, please read the motivation in the second part.
In short: I am not a developer. The main use of Python is fast prototyping of mathematical methods; additional motivation is learning how Python is implemented. This topic is not of crucial importance for me. If it seems lame and off-topic, feel free to remove, and I apologize for the inconvenience.
This feature does not introduce new functionality but serves as a shortcut for lambda.
The idea is borrowed from wolfram-language.
If the symbol of closing parenthesis is preceded with &, then the code inside the parentheses is interpreted as the definition of a function, where `1, `2, ... play the role of its arguments.
Example: (`1 + `2 &)(a, b) means (lambda x, y: x + y)(a, b)
Provided that I learn everything needed about Python, how hard / time consuming is to implement that extension? At the moment, I see two options:
1.a. Preprocessing text of the script before compiling (I use iPython in Anaconda).
Immediate problem: assigning unique names to arguments. Possible workaround: reserve names such as __my_lambda_123.
1.b. Modifying CPython similarly as described in https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
Imagine that I implemented that feature correctly. Do you immediately see that it breaks something essential in Python, or iPython, or Anaconda? Assume that I do not use any developers' packages such as unittest, but a lot of "scientific" packages including numpy, as well as "interface" packages such as sqlalchemy.
Motivation. I gradually study Python as a programming language and appreciate its deepness, consistency and unique philosophy. I understand that my idea is not in line with the latter. However, I use Python for implementing mathematical methods which are barely reusable. A typical life cycle is following: (1) implement some mathematical method and experiments for a research project; (1.a) maybe save some function/class in my package if it feels reusable; (2) conduct computational experiments; (3) publish a paper; (4) never use the code again. It is much easier to implement an algorithm from scratch than to structure and reuse the code since the coincidence between large parts of different methods is very rare.
My typical project is one large Python script with long code fragments. Even structuring the code into functions is not time-efficient, since the life cycle of my program does not include "deploy", "maintain", "modify". I keep the amount of structure to a minimum needed for fast implementing and debugging.
I would use wolfram-mathematica but in my recent projects, it became useless due to the limitations of its standard libraries, the poor performance of the code in Wolfram language, and overall closeness of the platform. I switch to Python for its rich selection of libraries, and also with the intent of acquiring some software developer's skills. However, at the moment, programming in the Wolfram language style is much more efficient for me. The code of algorithms feels much more readable when it is more compact (you do not need to scroll), and includes less language-specific words such as lambda.
Just as a heads up, the issue you raise in 1a is called macro hygiene.
It's also a bit sketchy to be doing the "lambda replacing" in text, before it's converted to an abstract syntax tree (AST). This is certainly going to be prone to errors since now you have to explicitly deal with various parsing issues and the actual replacement in one go.
If you do go this route (I don't recommend it), I recommend you also look at Racket's macro system, which can do what you want.
There are also other potential problems you might run into - you need to think about how you want strings such as ("`1" + `1)(a) to parse, or, for example, strings such as (`2 + `3)(a, b) - is this an error or is it ok (if so, which argument goes where?). These are the kinds of test cases you need to think about if you're sure you want to design an addition to Python's syntax.
There's also a practical consideration - you'll essentially need to support your own fork of Python, so you won't be able to get updates to the language without redeveloping this feature for each release (kind of? I think).
TLDR: I highly recommend you don't do this, just use lambdas.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
edit: bad wording on original title "are most programing langs like python from bottom up and right to left"
question - are a lot of programming languages like python that you want to look at your statement from bottom up and right to left?
simple question i quess, yet i'm learning how to program in python and know (intermediate) how to make shell scripts in bash.
when i make bash scripts i start from the top and work my way down, yet with python it seems like i have to think about what i want and then backward engineer how to make the program run.
so does anyone have any ideas on to how to make this simpler for me to understand?
(when i do find something that works for the answer that i'm looking for in python that works, i always put a '''comment''' at the end to try to remember to look from bottom up and right to left, in that statement anyway).
this it seems (comments) are very helpful for me to understand more about what i'm trying to accomplish (especially when it comes to recursion), yet does anyone know of examples, discussions, (short cuts to understand), the backwardness of how to program a program?
one more question,
are programs just a lot of scripts put together in the proper sequences?
edit: thank you for your responses they are helpful.
moderators, please let others reply for a little bit longer, and then you can close it.
to those that have replied, i'm taking the MIT Introduction to Computer Science and Programming using Python to try to actually learn how to program with python and get a better understanding of programming in general.
to somewhat understand what i mean by what i stated above see a bash script that i wrote quite a while back ->
https://drive.google.com/open?id=13WtPvaabM9__hUWNOWzPOai9n34LeiWa
and me trying to turn it into a python program. everything that is in it i just looked up on the web, read the docs, and asked some questions here, before taking the MIT class. python attempt ->
https://drive.google.com/open?id=10NesR4FONR8k1vegJjwKdHloC5r0JrWm
You can write your program in python without using functions.
On the other hand, you can write shell scripts using functions.
So if you 'find something that works' and that code is organized in functions, then it was the choice of the developer to organize that code in functions.
Shell scripts tend to be short and are often used for one specific purpose.
In that case many people just write the code that is necessary without thinking to much about readability or code structure.
For longer scripts or programs people tend to switch to languages like python. At that point code layout and readability get more important and now many people structure the code into functions, because they made the experience that their code is easier to maintain and understand that way.
This is maybe the reason, the shell script examples you find most are 'top down left right', and the python examples are structured in functions and classes
are a lot of programming languages like python that you want to look at your statement from bottom up and right to left?
In many languages and frameworks understanding the execution flow can be pretty complicated! Your code may be grouped and called in many unexpected ways.
Programming languages also allow you to run code in parallel, so that you can have two pieces of code running at the same time. This is done using threads or processes or asynchronous programming.
As you have probably already seen, in Python there are functions, classes and modules. Those are used to "group" code so that it is easy to re-use, and also to define abstract concepts that make programming easy. Those structures make code flow more complicated (because you "jump" from a place to another), yet they are very powerful tools.
when i make bash scripts i start from the top and work my way down, yet with python it seems like i have to think about what i want and then backward engineer how to make the program run.
Bash can get complicated too, in a way similar to Python. Look at the script to install Docker: this is an example of a shell script that is not "top to bottom".
And, similarly, you can write simple Python scripts that are executed from top to bottom too. This is not true for all languages however (e.g. C or Java, where you must use functions).
so does anyone have any ideas on to how to make this simpler for me to understand?
Experience is the key. Even old programmers have to spend quite some time to learn and understand new languages and frameworks they're not familiar with.
Writing comments and documentation (like you already do) can be helpful too: you can document how and when your functions should be called, you can document what functions are going to be called. Example:
def frobnicate():
"""This function transforms foo into baz"""
# insert magic here
def main():
"""This is the program ingress point. It calls frobnicate() three times."""
for i in range(3):
frobnicate()
if __name__ == '__main__':
# Call the ingress point, defined above
main()
With free/open source software, you also have the advantage that you can look at the source. So, for example, if you want to know how a function frobnicate() gets called, you can look at the source for the keyword "frobnicate" and find your answer. It's not always straightforward, but it's a useful approach.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I am supposed to be doing research with this huge Fortran 77 program (which I recently ported to Fortran 90 superficially). It is a very old piece of software used for modeling using finite element methods.
It is a monstrosity. It is roughly 240,000 lines.
Since it began its life in Fortran 77, it uses some really dirty hacks for dynamic memory allocation; basically it uses the functions from the C standard library, mixed programming with C and Fortran. I am yet to fully grasp how allocation works. The program is built to be easily extendable by the user, and the user generally needs to allocate some globally accessible arrays for later use. This is done by having an array of memory addresses, which point to the beginning addresses of dynamically allocable arrays. Of course, which element of the address array pointing to which information all depends on conventions which has to be learned by the user, before one can start to really program. There are two address arrays, one for integers, and the other for floating points.
By dirty hacks, I mean inconsistent ones. For example an update in the optimization algorithm of the GNU compilers caused the program to exit with random memory leaks.
The program is far from elegant. Global variable names are generally short (3-4 characters) and cryptic. Passing data across routines is of course accomplished by using common blocks, which include all program switches, and the aforementioned arrays.
The usage of the program is roughly like that of an interactive shell, albeit a stupid one. First, an input file is read by the program itself, then per choice, the user is dropped into a pseudo-shell, in which the user has to type 4 character wide commands, followed by the parameters. The parser then parses the command, and corresponding subroutine is called with the parameters. You would guess that there is a loop structure in this pseudo-parser (a goto bonanza, rather) which wraps the subroutine behavior in a manner more complex than it should be in the 21st century.
The format of the input file is the same (commands, then parameters), since it is the same parser. But the syntax is not really consistent (by that, I mean it lacks control structures, and some commands cause the finite state machine to do behavior that contradict with other commands; it lacks definite grammar), time to time causing the end user to discover pitfalls. The user must learn these pitfalls by experience; I did not see them in any documentation of the program. This is a problem that can easily be avoided with python, and it is not even necessary to implement a parser.
What I want to do:
Port parts of the program into python, namely the parts that don't have anything to do with numerical computation. This includes
cleaning up and abstracting the API with an OOP approach in python,
giving meaningful variable names,
migrating dynamic allocation to either numpy or Fortran 90 and losing the C part,
migrating non-numerical execution to python, and wrap the numerical objects using f2py, so there is no loss in performance. Have I told that the program is damn fast in its current state? Hopefully porting the calls to numerical subroutines and I/O to python will not slow it down to an impractical level (or will it?).
Making use of python's interactive shell as a replacement for the pseudo-shell. This way, there will not be any inconsistencies for the end user. The aforementioned commands will be simply replaced by functions defined in python. This will allow the user to actually access the data. Plus, the user will be able to extend the program without going to deep.
What I wonder:
Is f2py suitable and up-to this task of wrapping numerous subroutines and common blocks without any confusion? I have only seen single-file examples on the net for f2py; I know that numpy has used it to wrap LAPACK and stuff, but I need reassurance that f2py is a tool consistent enough for this task.
Whether there are any suggestions on the general strategy that I should follow, or pitfalls I should avoid.
How can & should I implement a system in this python-wrapped Fortran 90 environment, so that I will be able to modify (allocate and assign) globally accessible arrays and variables inside fortran routines. This should preferably omit address arrays and I should preferably be able to inject verbal representations into the namespaces. These variables should preferably be accessible inside both python and fortran.
Notes:
I may have been asking for too much, something beyond the boundaries of the possible realm. In this case, please forgive me for I am a beginner with this aspect of programming; and don't hesitate to correct me.
The "program" I have been talking about is open source but it is commercial and the license does not allow its distribution, so I decided not to mention its name. However, you could deduce it from the 2nd sentence and the description I gave throughout.
I'm doing something depressingly similar. Instead of dynamic memory allocation via C we have a single global array with integer indices (also at global scope), but otherwise it's much the same. Weird, inconsistent input file and all.
I'd advise against trying to rewrite the majority of the program, whether in python or anything else. It's time consuming, unpleasant and largely unnecessary. As an alternative, get the F77 code base to the point whether it compiles cleanly enough that you're willing to trust it, then write an interface routine.
I now have a big, ugly F77 code base which sits behind an interface. The program requires input as a text file so a large part of the interface's job is to produce that text file. Beyond that, the legacy code is reduced to a single gateway routine which takes a few arguments (including a means of identifying the text file) and returns the answer. If you use the iso_c_binding of Fortran 2003 you can expose the interface in a format C understands, at which point you can link it to whatever you wish.
As far as the modern code (mostly optimisation routines) is concerned, the legacy code base is the single subroutine behind the C interface. This is much nicer than trying to modify the old code further and probably a valid strategy for your case as well.
For an example how to generate the f2py interface library using multiple fortran files see this post.
f2py might be suitable for your task, but there are some pitfalls that might cause some problems. Some pitfalls concerning f2py are listed here and summarized below:
Concerning your specific problem you might run into problems with your allocatable arrays, because f2py was writen for Fortran77 and does not support many of the Fortran90+ features (such as allocatable arrays).
I also encountered a problem with an undocumented maximum array size (arround 400 x 200 x 20 x 20). If I used arrays bigger then that f2py would not be able to generate the python library. Especially the large matrices being passed arround in finitie element codes might be too big for interfacing. Therefore you would not have access to those in the Python part of the program.
Beneficial for you is that f2py should have no Problems with COMMON Blocks, etc. because it was especially written for Fortran77.
After passing the data through the interface to the Fortran routines, there should be no (or only minimal) slowdown if you do it right. The key is to minimize calculations in the Python part of the program per run. This includes the manipulation of the data arrays (shift, rotate, copy, etc.) but not passing of them (because the interface is pass-by-reference).
As an alternative you should have a look at Cython (also see the Link above and the linked working example therein). I think this might serve you better in the long run.
Implementation Suggestion
This suggestion is how I would do it incorporating my experiences with having done something similar (see Background below). It should largely be independent of how you interface the Python and Fortran code (f2py, Cython, ...).
Of course you should be very careful to not change the behaviour and therefore possibly the results of the program. Therefore generation of some tests and their corresponding reference in- & output files and test documentation including all steps, keystrokes, commands, etc. necessary to reproduce those results should be your first step.
In your case I would try to change the least amount possible of the Fortran program. I would try to wedge the "pseudo-shell" from the Fortran code, e.g. making it its own module, and build an interface to that module. Like that you can use all of the original Fortran code and the modifications, bugfixes and updates from your peers, even in the future. The key is to not distance your code to far from the original/ mainstream because in scientific communities usually not everybody will agree with major changes to the source code and update their workflow or source code accordingly. Therefore future work from your peers might not be made in your version, but in the original source code and it would be your own responsibility to merge those changes into your version, which gets easier the less you change.
Using that interface you can work on your python shell and maybe even build a GUI for it without having to worry about changing anything in the original progam. This reduces the risk to introduce bugs or change the results of the original. Your Shell/ GUI would therefore work as a wrapper around the original program to simplify the workflow and remove inconsistencies. All the "intelligence" and utilities, like error & cross checking of the user-input, help pages, tutorials/ howto, etc. would be implemented in the Python wrapper, which would parse these inputs, translate them to the corresponding commands for your Fortran program, send them and wait for the results.
After you have simplified the usage of the program I would write some automatisation for the tests (setup + evaluation) to complete your utilities suite. Like that even somebody new to the program would be able to make changes to the code without having to worry about unknowingly changing the results. This should enable your tools to benefit the community which will attract new users and therefore encourage further development within the community.
Only as the last step I would replace the parts of the code using C with Fortran90+ methods to simplify the code. This is an extensive change of the codebase and needs a lot of tests to ensure EVERY possible combination of commands is checked and verified before and after the changes.
This method also has the benefit, that you could possibly make your interface/ GUI open source (you have to check the licence of your program of course) as long as it is seperable from the source code of the Fortran program. The Fortran - Python interface would have to be provided, or installed/ generated from source files when your interface is loaded using some simple build skript as seen in the first link of this post.
For the manipulation of internal data I would write a seperate wrapper routine, that only handles the data interface. This should be done in Cython though to enable you to use allocatable arrays, etc. Because this interface would work with "pass-by-reference" you should be able to use the full collection of Python (numpy) tools to manipulate the arrays and data.
Background
I did something similar using our research code for helicopter rotordynamics. This is also a very old and large program written in Fortran77 (e.g. goto bonanza). The newer additions and modifications to the code are usually done in Fortran90/2003.
Using parts of this code (several subroutines & module files) I generated a python library to connect our GUI (Python & Qt) to the Fortran program; mainly for postprocessing of Fortran binary output files.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 12 years ago.
Python is a great programming language, but certain things about it just annoy the heck out of me.
As such, I:
1) wanted to find out how to remove these annoyances from the language itself, or
2) find a language that is Python-like that don't have these annoyances.
I love Python for everything except:
self: just seems stupid to me that I need to include "self" as the first parameter of a function
double-underscores: It just looks ugly and a terrible special character.
__name__ has always felt like a hack to me. Try explaining it to a novice programmer, or worse to someone who programs in perl or ruby or java for a living. Comparing the magic variable name to the magic constant ”main” feels doubly so
Blocks give me of ruby envy or smalltalk envy. I like local functions. Love them. I tolerate lambda. But I really, really would like to see a more rubyesque iterator setup where we pass a callable to the list, and the callable can be defined free-form inline. Python doesn’t really do that and so it’s less of a language lab than I might like.
Properties are unattractive, partly because of blocks being absent. I don’t really want to define a named parameter (with double-underscores, most likely) and then two named functions, and THEN declare a property. That seems like so much work for such a simple situation. It is something I will only do if all other methods fail me, or if all other methods are overriding setattr and getattr.
I do realize this might be petty annoyances, but for a language I program in daily, these small annoyances can grow to be quite large.
pypy is a complete implementation of Python in Python itself (with all of the things you consider annoyances, nevertheless a very high-level implementation language that makes altering even the core of the language itself for your own purposes easier than ever before). Download it, fork it, and edit it to fix whatever you like (be sure to eventually translate the compiler and runtime to your new non-Python language, too, of course).
If that's just too much work (whiners are rarely interested in working to fix their own complaints, no matter how easy you make such work for them), just switch to Ruby, which appears to match your tastes more closely - or find the Ruby implementation written in Ruby (I don't know how it's called, but surely such a powerful language will have one) and hack that one (to fix whatever your whines are against Ruby).
Meanwhile, at least some of your annoyances leave me quite perplexed. Take, for example, the rant about properties: I don't understand what you mean. The normal way to define a R/W property is:
#property
def thename(self):
"""add the geting-code here""
#property.set
def thename(self, value):
"""add the seting-code here""
so what the hey do you mean by
define a named parameter (with
double-underscores, most likely) and
then two named functions, and THEN
declare a property
???
I could ask equally puzzled questions about the other whines, but let's wait to see if you clarify this one first (if the clarification is of the kind "oh I didn't know about it", i.e. you're whining against a language without knowing the fundaments thereof, well, I can make a guess about what that does to your credibility, of course;-).
This is a troll, and you know the answers to your own questions:
self: Write a wrapper that does away with it and inherit from that. But it does need some name if you're going to reference the object in question, unless you just want it to just magically be present (an ugly thing indeed)...
double-underscores: don't use them. Simple. They're in no way required.
_name_: again, call it something else if you like, and inherit from that base class. I don't see what the problem here is. You still need something to provide that function.
Blocks: It sounds to me like you're trying to program ruby or java in python. You can certainly pass a callable to an iterator (and you should probably go read about generators), but defining it inline leads to serious code ugliness fast. Python makes you do it out of line so that you don't end up with half your program logic in an inline, unnamed function. I don't see what the problem here is.
Properties: I don't understand what you're saying. I certainly don't define multiple functions to use or create properties of an object in most cases.
How about trying back to the basics using UNIX bash scripts?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a source of python program which doesn't have any documentation or comments.
I did tried twice to understand it but most of the times I am losing my track, because there are many files.
What should be the steps to understand that program fully and quickly.
Michael Feathers' "Working Effectively with Legacy Code" is a superb starting point for such endeavors -- not particularly language-dependent (his examples are in several non-python languages, but the techniques and mindset DO extend pretty well to Python and just about any other language).
The key focus is, that you want to understand the code for a reason -- modifying it and/or porting it. So, instrumenting the legacy code -- with batteries and scaffolding of tests and tracing/logging -- is the crucial path on the long, hard slog to understanding and modifying safely and responsibly.
Feathers suggests heuristics and techniques for where to focus your efforts and how to get started when the code is a total mess (hence "legacy") - no docs, or misleading docs (describing something quite different, maybe in subtle ways, from what the code actually DOES), no tests, an untestable-without-refactoring tangle of spaghetti dependencies. This may seem an extreme case but anybody who's spent a long-ish career in programming knows it's actually more common than anyone would like;-).
In past I have used 'Python call graph' to understand the source structure
Use a debugger e.g. pdb to wak thru
the code.
Try to read code again after one day
break, that also helps
I would recommend to generate some documentation with epydoc http://epydoc.sourceforge.net/ . For sure, if no docstring exists, the result will be poor but it will give you at least one view of your application and you'lle be able to navigate in the classes more easily.
Then you can try to document by yourself when you understand something new and then regenerate the docs again. It is never too late to start something.
I hope it helps
You are lucky it's in Python which is easy to read. But it is of course possible to write tricky hard to understand code in Python as well.
The steps are:
Run the software and learn to use it, and understand it's features at least a little bit.
Read though the tests, if any.
Read through the code.
When you encounter code you don't understand, put a debug break there, and step through the code, looking at what it does.
If there aren't any tests, or the test coverage is low, write tests to increase the test coverage. It's a good way to learn the system.
Repeat until you feel you have a vague grip on the code. A vague grip is all you need if you are going to manage the code. You'll get a good grip once you start actually working with the code. For a big system that can take years, so don't try to understand it all first.
There are tools that can help you. As Stephen C says, an IDE is a good idea. I'll explain why:
Many editors analyses the code. This typically gives you code completion, but more importantly in this case, it makes it possible to just just ctrl-click on a variable to see where it comes from. This really speeds things up when you want to understand otehr peoples code.
Also, you need to learn a debugger. You will, in tricky parts of the code, have to step through them in a debugger to see what the code actually do. Pythons pdb works, but many IDE's have integrated debuggers, which make debugging easier.
That's it. Good luck.
I have had to do a lot of this in my job. What works for me may be different to what works for you, but I'll share my experience.
I start by trying to identify the data structures being used and draw diagrams showing the relationships between them. Not necessarily something formal like UML, but a sketch on paper which you understand which allows you to see the overall structure of the data being manipulated by the program. Only once I have some view of the data structures being used do I start to try to understand how the data is being manipulated.
Secondly, for a large body of software, sometimes you need to just attack bite sized pieces at first. You won't get an overall understanding straight away, but if you understand small parts in detail and keep chipping away, eventually all the pieces fall together.
I combine these two approaches, switching between them when I am getting overly frustrated or bored. Regular walks around the block are recommended :) I find this gets me good results in the end.
Good luck!
pyreverse from Logilab and PyNSource from Andy Bulka are helpful too for UML diagram generation.
I'd start with a good python IDE. See the answers for this question.
Enterprise Architect by Sparx Systems is very good at processing a source directory and generating class diagrams. It is not free, but very reasonably priced for what you get. (I am not associated with this company in any way, I've just been a satisfied user of their product for several years.)