Good practices in writing MATLAB code? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to know the basic principles and etiquette of writing a well structured code.

Read Code Complete, it will do wonders for everything. It'll show you where, how, and when things matter. It's pretty much the Bible of software development (IMHO.)

These are the most important two things to keep in mind when you are writing code:
Don't write code that you've already written.
Don't write code that you don't need to write.

MATLAB Programming Style Guidelines by Richard Johnson is a good resource.

Well, if you want it in layman's terms:
I reccomend people to write the shortest readable program that works.
There are a lot more rules about how to format code, name variables, design classes, separate responsibilities. But you should not forget that all of those rules are only there to make sure that your code is easy to check for errors, and to ensure it is maintainable by someone else than the original author. If keep the above reccomendation in mind, your progam will be just that.

This list could go on for a long time but some major things are:
Indent.
Descriptive variable names.
Descriptive class / function names.
Don't duplicate code. If it needs duplication put in a class / function.
Use gettors / settors.
Only expose what's necessary in your objects.
Single dependency principle.
Learn how to write good comments, not lots of comments.
Take pride in your code!
Two good places to start:
Clean-Code Handbook
Code-Complete

If you want something to use as a reference or etiquette, I often follow the official Google style conventions for whatever language I'm working in, such as for C++ or for Python.
The Practice of Programming by Rob Pike and Brian W. Kernighan also has a section on style that I found helpful.

First of all, "codes" is not the right word to use. A code is a representation of another thing, usually numeric. The correct words are "source code", and the plural of source code is source code.
--
Writing good source code:
Comment your code.
Use variable names longer than several letters. Between 5 and 20 is a good rule of thumb.
Shorter lines of code is not better - use whitespace.
Being "clever" with your code is a good way to confuse yourself or another person later on.
Decompose the problem into its components and use hierarchical design to assemble the solution.
Remember that you will need to change your program later on.
Comment your code.
There are many fads in computer programming. Their proponents consider those who are not following the fad unenlightened and not very with-it. The current major fads seem to be "Test Driven Development" and "Agile". The fad in the 1990s was 'Object Oriented Programming'. Learn the useful core parts of the ideas that come around, but don't be dogmatic and remember that the best program is one that is getting the job done that it needs to do.
very trivial example of over-condensed code off the top of my head
for(int i=0,j=i; i<10 && j!=100;i++){
if i==j return i*j;
else j*=2;
}}
while this is more readable:
int j = 0;
for(int i = 0; i < 10; i++)
{
if i == j
{
return i * j;
}
else
{
j *= 2;
if(j == 100)
{
break;
}
}
}
The second example has the logic for exiting the loop clearly visible; the first example has the logic entangled with the control flow. Note that these two programs do exactly the same thing. My programming style takes up a lot of lines of code, but I have never once encountered a complaint about it being hard to understand stylistically, while I find the more condensed approaches frustrating.
An experienced programmer can and will read both - the above may make them pause for a moment and consider what is happening. Forcing the reader to sit down and stare at the code is not a good idea. Code needs to be obvious. Each problem has an intrinsic complexity to expressing its solution. Code should not be more complex than the solution complexity, if at all possible.
That is the essence of what the other poster tried to convey - don't make the program longer than need be. Longer has two meanings: more lines of code (ie, putting braces on their own line), and more complex. Making a program more complex than need be is not good. Making it more readable is good.

Have a look to
97 Things Every Programmer Should Know.
It's free and contains a lot of gems like this one:
There is one quote that I think is
particularly good for all software
developers to know and keep close to
their hearts:
Beauty of style and harmony and grace
and good rhythm depends on simplicity.
— Plato
In one sentence I think this sums up
the values that we as software
developers should aspire to.
There are a number of things we strive
for in our code:
Readability
Maintainability
Speed of development
The elusive quality of beauty
Plato is telling us that the enabling
factor for all of these qualities is
simplicity.

The Python Style Guide is always a good starting point!

European Standards For Writing and Documenting Exchangeable Fortran 90 Code have been in my bookmarks, like forever. Also, there was a thread in here, since you are interested in MATLAB, on organising MATLAB code.

Personally, I've found that I learned more about programming style from working through SICP which is the MIT Intro to Comp SCI text (I'm about a quarter of the way through.) Than any other book. That being said, If you're going to be working in Python, the Google style guide is an excellent place to start.
I read somewhere that most programs (scripts anyways) should never be more than a couple of lines long. All the requisite functionality should be abstracted into functions or classes. I tend to agree.

Many good points have been made above. I definitely second all of the above. I would also like to add that spelling and consistency in coding be something you practice (and also in real life).
I've worked with some offshore teams and though their English is pretty good, their spelling errors caused a lot of confusion. So for instance, if you need to look for some function (e.g., getFeedsFromDatabase) and they spell database wrong or something else, that can be a big or small headache, depending on how many dependencies you have on that particular function. The fact that it gets repeated over and over within the code will first off, drive you nuts, and second, make it difficult to parse.
Also, keep up with consistency in terms of naming variables and functions. There are many protocols to go by but as long as you're consistent in what you do, others you work with will be able to better read your code and be thankful for it.

Pretty much everything said here, and something more. In my opinion the best site concerning what you're looking for (especially the zen of python parts are fun and true)
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
Talks about both PEP-20 and PEP-8, some easter eggs (fun stuff), etc...

You can have a look at the Stanford online course: Programming Methodology CS106A. The instructor has given several really good instruction for writing source code.
Some of them are as following:
write programs for people to read, not just for computers to read. Both of them need to be able to read it, but it's far more important that a person reads it and understands it, and that the computer still executes it correctly. But that's the first major software engineering principle to
think about.
How to make comments:
 put in comments to clarify things in the program, which are not obvious
How to make decomposition
One method solves one problem
Each method has code approximate 1~15lines
Give methods good names
Write comment for code

Unit Tests
Python and matlab are dynamic languages. As your code base grows, you will be forced to refactor your code. In contrast to statically typed languages, the compiler will not detect 'broken' parts in your project. Using unit test frameworks like xUnit not only compensate missing compiler checks, they allow refactoring with continuous verification for all parts of your project.
Source Control
Track your source code with a version control system like svn, git or any other derivative. You'll be able to back and forth in your code history, making branches or creating tags for deployed/released versions.
Bug Tracking
Use a bug tracking system, if possible connected with your source control system, in order to stay on top of your issues. You may not be able, or forced, to fix issues right away.
Reduce Entropy
While integrating new features in your existing code base, you will add more lines of code, and potentially more complexity. This will increase entropy. Try to keep your design clean, by introducing an interface, or inheritance hierarchy in order to reduce entropy again. Not paying attention to code entropy will render your code unmaintainable over time.
All of The Above Mentioned
Pure coding related topics, like using a style guide, not duplicating code, ...,
has already been mentioned.

A small addition to the wonderful answers already here regarding Matlab:
Avoid long scripts, instead write functions (sub routines) in separate files. This will make the code more readable and easier to optimize.
Use Matlab's built-in functions capabilities. That is, learn about the many many functions that Matlab offers instead of reinventing the wheel.
Use code sectioning, and whatever the other code structure the newest Matlab version offers.
Learn how to benchmark your code using timeit and profile . You'll discover that sometimes for loops are the better solution.

The best advice I got when I asked this question was as follows:
Never code while drunk.

Make it readable, make it intuitive, make it understandable, and make it commented.

Related

How to memorize syntax of minimalist coding? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
So I really like using Edabit to just practice coding, and it always comes at a surprise when I'm done with my answer, and I see other people's solution. They make my 10 line code into just a single line! I've heard of minimalist coding before, and have even tried to implement it, however, I just find it a lot more difficult to read and not what I'm used to seeing.
Everyone else makes it look so seamless, and easy, but when I try using it, I end up having to re-research the syntax and how it works.
An example:
My answer to a challenge:
def wumbo(words):
print(words)
words = list(words)
print(words)
count = -1
for char in words:
count+= 1
if char == "M":
words[count] = "W"
print("".join(words))
return "".join(words)
That's 11 lines of code! Even if you remove the print statements (They're for debugging), it's still a lot compared to other people's solutions.
These are examples of other people's:
def wumbo(words):
return ''.join([i if i != 'M' else 'W' for i in words])
I think that^^ is an example of minimalistic coding?
And another example:
import re
wumbo=lambda w:re.sub('M','W',w)
I don't even know what "lamda" is!
So, are there any good resources in learning how to code really 'cleanly'? Or does it just take a lot of practice and memorization of syntax?
So, are there any good resources in learning how to code really 'cleanly'? Or does it just take a lot of practice and memorization of syntax?
I'd say that clean and concise code comes with practice and experience. You will learn to recognise some patterns that can be minimised using different methods.
For instance:
List comprehensions instead of for loops
Lambda functions instead of fully fletched functions
Regular expressions
High order functions like map
and much more.
I feel you need not worry about this.
Please understand coding for a code challenge and for collaborative project are different things. challenge need not be readable code as far as you are passing given test cases. where as if your code is becoming more readable at expense of 10 lines its acceptable. companies prefer it longer and readable code. as that code is maintained for long term and by many engineers.
this doesn't give you permission to write optimized code.
Many programmers solve the problem by writing code and then try optimizing code. it comes by practice and consistent efforts. there is no blog or book which will teach you this overnight.
I think you are on right track.
you wrote your code and looked for others implementations.
do learn from there implementations.
in given example lambda and regular expression was the things.
lambda is new way of programming called functional programming.
using which not just code become concise but it becomes faster too.
don't get overwhelmed by such things do read try to implement wherever possible and they will become part of your style.
do read documentation pages and try verifying things written in those docs.
keep asking question to your self once you verified your code is working.
"can I optimist this further ?" "is there any better way to achieve this ?"
try reading on internet about various ways of solving problem which you are trying to solve. try picking up whichever suits you.
get involved in opensource project try solving github issues.
the maintainer will review your patch and provide you appropriate feedback.
opensource code is also considered as refined code as many developers keep resolving corner cases and improving code quality. reading such code also will give you better idea about how to code better in given language.
lastly try teaching others whatever you have learnt or explain your code to other programmers and get it reviewed. this helps you improve step by step.
Note that in Python, there is a philosophy of writing code
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
These guides form the concept of Pythonic.
Although you may thing minimalist coding is fun and took less time to do, it is not always the best practice when it comes to code maintain and debug, as you often will spend more time understanding the code rather than actually coding it.
If you must chase this path, I suggest that you really dig deeper into the different syntax and code structure to Python first, such as constructing a list from a loop or list comprehension already make a lot of difference. The core of minimalist coding is how well you understand your own algorithm and how to achieve the same goal by reducing the steps/code.
Reference: Pythonic

Extending Python syntax for own needs: how hard? will it break Anaconda? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The first part is what I want to do and the questions. Before discussing why I want to to that and proposing counterarguments, please read the motivation in the second part.
In short: I am not a developer. The main use of Python is fast prototyping of mathematical methods; additional motivation is learning how Python is implemented. This topic is not of crucial importance for me. If it seems lame and off-topic, feel free to remove, and I apologize for the inconvenience.
This feature does not introduce new functionality but serves as a shortcut for lambda.
The idea is borrowed from wolfram-language.
If the symbol of closing parenthesis is preceded with &, then the code inside the parentheses is interpreted as the definition of a function, where `1, `2, ... play the role of its arguments.
Example: (`1 + `2 &)(a, b) means (lambda x, y: x + y)(a, b)
Provided that I learn everything needed about Python, how hard / time consuming is to implement that extension? At the moment, I see two options:
1.a. Preprocessing text of the script before compiling (I use iPython in Anaconda).
Immediate problem: assigning unique names to arguments. Possible workaround: reserve names such as __my_lambda_123.
1.b. Modifying CPython similarly as described in https://hackernoon.com/modifying-the-python-language-in-7-minutes-b94b0a99ce14
Imagine that I implemented that feature correctly. Do you immediately see that it breaks something essential in Python, or iPython, or Anaconda? Assume that I do not use any developers' packages such as unittest, but a lot of "scientific" packages including numpy, as well as "interface" packages such as sqlalchemy.
Motivation. I gradually study Python as a programming language and appreciate its deepness, consistency and unique philosophy. I understand that my idea is not in line with the latter. However, I use Python for implementing mathematical methods which are barely reusable. A typical life cycle is following: (1) implement some mathematical method and experiments for a research project; (1.a) maybe save some function/class in my package if it feels reusable; (2) conduct computational experiments; (3) publish a paper; (4) never use the code again. It is much easier to implement an algorithm from scratch than to structure and reuse the code since the coincidence between large parts of different methods is very rare.
My typical project is one large Python script with long code fragments. Even structuring the code into functions is not time-efficient, since the life cycle of my program does not include "deploy", "maintain", "modify". I keep the amount of structure to a minimum needed for fast implementing and debugging.
I would use wolfram-mathematica but in my recent projects, it became useless due to the limitations of its standard libraries, the poor performance of the code in Wolfram language, and overall closeness of the platform. I switch to Python for its rich selection of libraries, and also with the intent of acquiring some software developer's skills. However, at the moment, programming in the Wolfram language style is much more efficient for me. The code of algorithms feels much more readable when it is more compact (you do not need to scroll), and includes less language-specific words such as lambda.
Just as a heads up, the issue you raise in 1a is called macro hygiene.
It's also a bit sketchy to be doing the "lambda replacing" in text, before it's converted to an abstract syntax tree (AST). This is certainly going to be prone to errors since now you have to explicitly deal with various parsing issues and the actual replacement in one go.
If you do go this route (I don't recommend it), I recommend you also look at Racket's macro system, which can do what you want.
There are also other potential problems you might run into - you need to think about how you want strings such as ("`1" + `1)(a) to parse, or, for example, strings such as (`2 + `3)(a, b) - is this an error or is it ok (if so, which argument goes where?). These are the kinds of test cases you need to think about if you're sure you want to design an addition to Python's syntax.
There's also a practical consideration - you'll essentially need to support your own fork of Python, so you won't be able to get updates to the language without redeveloping this feature for each release (kind of? I think).
TLDR: I highly recommend you don't do this, just use lambdas.

Reverse Engineer Auto-Generated C?

How easy is it to reverse engineer an auto-generated C code? I am working on a Python project and as part of my work, am using Cython to compile the code for speedup purposes.
This does help in terms of speed, yet, I am concerned that where I work, some people would try to "peek" into the code and figure out what it does.
Cython code is basically an auto-generated C. Is it very hard to reverse engineer it?
Are there any recommendations that would make the code safer and reverse-engineering harder to do? (I assume that with enough effort, everything can be reversed engineered).
Okay -- to attempt to answer your question more directly: most auto-generated C code is fairly ugly, so somebody would need to be fairly motivated to reverse engineer it. At the same time, I don't believe I've never looked at what Cython generates, so I'm not sure how it looks.
In addition, a lot of auto-generated code is done in the form of things like state machine tables, that most programmers find fairly difficult to follow even at best. The tendency (in many cases) is to have a generic framework, with tables of data that the framework more or less "interprets" at run-time. This isn't necessarily impossible to follow, but it's enough different from most typical code that most people will give up on it fairly quickly (and if they do much, they'll typically waste a lot of time looking at the framework instead of the data, which is what really matters in cases like this).
I'll repeat, however, that I'm pretty sure I haven't looked at what Cython produces, so I can't say much about it with any real certainty.
There are (or at least used to be) commercial obfuscators intended to make C source code difficult to understand. I suspect the availability of Perl has taken a lot of the market share from them, but if you look you may still be able to find one and use it.
Absent that, it's not terribly difficult to write an obfuscator of your own, but the degree of effectiveness will probably vary with the amount of effort you're willing to put into it. Just systematically renaming any meaningful variable names into things like _ and __ can do quite a bit (e.g., profit = sales - costs; is a lot more meaningful than _ = _I_ - _i_;). Depending on the machine generated code in question, however, this may not really accomplish much -- obfuscating a generic framework may not make much difference in understanding what your code does -- and if they figure out the procedure you're following, they may be able to simply replicate the correct framework code and transplant the pieces specific to your program into the un-obfuscated framework.
You should really take a look at the code that Cython produces. To help with debugging, for example, it copies the complete Python source code into the generated file, marking each source line before generating C code for it. That makes it very easy to find the code section that you are interested in.
A very nice feature is that you can compile your code with the "-a" (annotate) option, and it will spit out an HTML file next to the C file that contains the annotated Python code. When you click on a line, you'll see the C code for that line. As a bonus, it marks lines that do a lot of Python processing in dark yellow, so that you get a simple indicator where to look for potential optimisations.
There's also special gdb support in Cython now, so you can do Cython source level debugging etc.
Ah, I guess I missed the bit that you were talking about the compiled module, whereas I was only referring to the source code that Cython generates. I agree with Jerry that it will be fairly tricky to extract something useful from the compiled module, as long as you keep the gdb support disabled (the default) and strip the debugging symbols. That is because the C compiler will do lots of inlining of helper functions all over the place and apply various low-level code optimisations, thus making it harder to extract the original macro level code patterns. However, you will see named C-API calls to CPython, and you will also see function names from your own code. Cython isn't specifically designed for code obfuscation, quite the opposite. But readable assembly has certainly never been a design goal.

Python progression path - From apprentice to guru

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've been learning, working, and playing with Python for a year and a half now. As a biologist slowly making the turn to bio-informatics, this language has been at the very core of all the major contributions I have made in the lab. I more or less fell in love with the way Python permits me to express beautiful solutions and also with the semantics of the language that allows such a natural flow from thoughts to workable code.
What I would like to know is your answer to a kind of question I have seldom seen in this or other forums. This question seems central to me for anyone on the path to Python improvement but who wonders what his next steps should be.
Let me sum up what I do NOT want to ask first ;)
I don't want to know how to QUICKLY learn Python
Nor do I want to find out the best way to get acquainted with the language
Finally, I don't want to know a 'one trick that does it all' approach.
What I do want to know your opinion about, is:
What are the steps YOU would recommend to a Python journeyman, from apprenticeship to guru status (feel free to stop wherever your expertise dictates it), in order that one IMPROVES CONSTANTLY, becoming a better and better Python coder, one step at a time. Some of the people on SO almost seem worthy of worship for their Python prowess, please enlighten us :)
The kind of answers I would enjoy (but feel free to surprise the readership :P ), is formatted more or less like this:
Read this (eg: python tutorial), pay attention to that kind of details
Code for so manytime/problems/lines of code
Then, read this (eg: this or that book), but this time, pay attention to this
Tackle a few real-life problems
Then, proceed to reading Y.
Be sure to grasp these concepts
Code for X time
Come back to such and such basics or move further to...
(you get the point :)
I really care about knowing your opinion on what exactly one should pay attention to, at various stages, in order to progress CONSTANTLY (with due efforts, of course). If you come from a specific field of expertise, discuss the path you see as appropriate in this field.
EDIT: Thanks to your great input, I'm back on the Python improvement track! I really appreciate!
I thought the process of Python mastery went something like:
Discover list comprehensions
Discover generators
Incorporate map, reduce, filter, iter, range, xrange often into your code
Discover Decorators
Write recursive functions, a lot
Discover itertools and functools
Read Real World Haskell (read free online)
Rewrite all your old Python code with tons of higher order functions, recursion, and whatnot.
Annoy your cubicle mates every time they present you with a Python class. Claim it could be "better" implemented as a dictionary plus some functions. Embrace functional programming.
Rediscover the Strategy pattern and then all those things from imperative code you tried so hard to forget after Haskell.
Find a balance.
One good way to further your Python knowledge is to dig into the source code of the libraries, platforms, and frameworks you use already.
For example if you're building a site on Django, many questions that might stump you can be answered by looking at how Django implements the feature in question.
This way you'll continue to pick up new idioms, coding styles, and Python tricks. (Some will be good and some will be bad.)
And when you see something Pythony that you don't understand in the source, hop over to the #python IRC channel and you'll find plenty of "language lawyers" happy to explain.
An accumulation of these little clarifications over years leads to a much deeper understanding of the language and all of its ins and outs.
Understand (more deeply) Python's data types and their roles with regards to memory mgmt
As some of you in the community are aware, I teach Python courses, the most popular ones being the comprehensive Intro+Intermediate course as well as an "advanced" course which introduces a variety of areas of application development.
Quite often, I get asked a question quite similar to, "Should I take your intro or advanced course? I've already been programming Python for 1-2 years, and I think the intro one is too simple for me so I'd like to jump straight to the advanced... which course would you recommend?"
To answer their question, I probe to see how strong they are in this area -- not that it's really the best way to measure whether they're ready for any advanced course, but to see how well their basic knowledge is of Python's objects and memory model, which is a cause of many Python bugs written by those who are not only beginners but those who have gone beyond that.
To do this, I point them at this simple 2-part quiz question:
Many times, they are able to get the output, but the why is more difficult and much more important of an response... I would weigh the output as 20% of the answer while the "why" gets 80% credit. If they can't get the why, regardless how Python experience they have, I will always steer people to the comprehensive intro+intermediate course because I spend one lecture on objects and memory management to the point where you should be able to answer with the output and the why with sufficient confidence. (Just because you know Python's syntax after 1-2 years doesn't make you ready to move beyond a "beginner" label until you have a much better understanding as far as how Python works under the covers.)
A succeeding inquiry requiring a similar answer is even tougher, e.g.,
Example 3
x = ['foo', [1,2,3], 10.4]
y = list(x) # or x[:]
y[0] = 'fooooooo'
y[1][0] = 4
print x
print y
The next topics I recommend are to understanding reference counting well, learning what "interning" means (but not necessarily using it), learning about shallow and deep copies (as in Example 3 above), and finally, the interrelationships between the various types and constructs in the language, i.e. lists vs. tuples, dicts vs. sets, list comprehensions vs. generator expressions, iterators vs. generators, etc.; however all those other suggestions are another post for another time. Hope this helps in the meantime! :-)
ps. I agree with the other responses for getting more intimate with introspection as well as studying other projects' source code and add a strong "+1" to both suggestions!
pps. Great question BTW. I wish I was smart enough in the beginning to have asked something like this, but that was a long time ago, and now I'm trying to help others with my many years of full-time Python programming!!
Check out Peter Norvig's essay on becoming a master programmer in 10 years: http://norvig.com/21-days.html. I'd wager it holds true for any language.
Understand Introspection
write a dir() equivalent
write a type() equivalent
figure out how to "monkey-patch"
use the dis module to see how various language constructs work
Doing these things will
give you some good theoretical knowledge about how python is implemented
give you some good practical experience in lower-level programming
give you a good intuitive feel for python data structures
def apprentice():
read(diveintopython)
experiment(interpreter)
read(python_tutorial)
experiment(interpreter, modules/files)
watch(pycon)
def master():
refer(python-essential-reference)
refer(PEPs/language reference)
experiment()
read(good_python_code) # Eg. twisted, other libraries
write(basic_library) # reinvent wheel and compare to existing wheels
if have_interesting_ideas:
give_talk(pycon)
def guru():
pass # Not qualified to comment. Fix the GIL perhaps?
I'll give you the simplest and most effective piece of advice I think anybody could give you: code.
You can only be better at using a language (which implies understanding it) by coding. You have to actively enjoy coding, be inspired, ask questions, and find answers by yourself.
Got a an hour to spare? Write code that will reverse a string, and find out the most optimum solution. A free evening? Why not try some web-scraping. Read other peoples code. See how they do things. Ask yourself what you would do.
When I'm bored at my computer, I open my IDE and code-storm. I jot down ideas that sound interesting, and challenging. An URL shortener? Sure, I can do that. Oh, I learnt how to convert numbers from one base to another as a side effect!
This is valid whatever your skill level. You never stop learning. By actively coding in your spare time you will, with little additional effort, come to understand the language, and ultimately, become a guru. You will build up knowledge and reusable code and memorise idioms.
If you're in and using python for science (which it seems you are) part of that will be learning and understanding scientific libraries, for me these would be
numpy
scipy
matplotlib
mayavi/mlab
chaco
Cython
knowing how to use the right libraries and vectorize your code is essential for scientific computing.
I wanted to add that, handling large numeric datasets in common pythonic ways(object oriented approaches, lists, iterators) can be extremely inefficient. In scientific computing, it can be necessary to structure your code in ways that differ drastically from how most conventional python coders approach data.
Google just recently released an online Python class ("class" as in "a course of study").
http://code.google.com/edu/languages/google-python-class/
I know this doesn't answer your full question, but I think it's a great place to start!
Download Twisted and look at the source code. They employ some pretty advanced techniques.
Thoroughly Understand All Data Types and Structures
For every type and structure, write a series of demo programs that exercise every aspect of the type or data structure. If you do this, it might be worthwhile to blog notes on each one... it might be useful to lots of people!
I learned python first by myself over a summer just by doing the tutorial on the python site (sadly, I don't seem to be able to find that anymore, so I can't post a link).
Later, python was taught to me in one of my first year courses at university. In the summer that followed, I practiced with PythonChallenge and with problems from Google Code Jam.
Solving these problems help from an algorithmic perspective as well as from the perspective of learning what Python can do as well as how to manipulate it to get the fullest out of python.
For similar reasons, I have heard that code golf works as well, but i have never tried it for myself.
Learning algorithms/maths/file IO/Pythonic optimisation
This won't get you guru-hood but to start out, try working through the Project Euler problems
The first 50 or so shouldn't tax you if you have decent high-school mathematics and know how to Google. When you solve one you get into the forum where you can look through other people's solutions which will teach you even more. Be decent though and don't post up your solutions as the idea is to encourage people to work it out for themselves.
Forcing yourself to work in Python will be unforgiving if you use brute-force algorithms.
This will teach you how to lay out large datasets in memory and access them efficiently with the fast language features such as dictionaries.
From doing this myself I learnt:
File IO
Algorithms and techniques such as Dynamic Programming
Python data layout
Dictionaries/hashmaps
Lists
Tuples
Various combinations thereof, e.g. dictionaries to lists of tuples
Generators
Recursive functions
Developing Python libraries
Filesystem layout
Reloading them during an interpreter session
And also very importantly
When to give up and use C or C++!
All of this should be relevant to Bioinformatics
Admittedly I didn't learn about the OOP features of Python from that experience.
Have you seen the book "Bioinformatics Programming using Python"? Looks like you're an exact member of its focus group.
You already have a lot of reading material, but if you can handle more, I recommend you
learn about the evolution of python by reading the Python Enhancement Proposals, especially the "Finished" PEPs and the "Deferred, Abandoned, Withdrawn, and Rejected" PEPs.
By seeing how the language has changed, the decisions that were made and their rationales, you will absorb the philosophy of Python and understand how "idiomatic Python" comes about.
http://www.python.org/dev/peps/
Attempt http://challenge.greplin.com/ using Python
Teaching to someone else who is starting to learn Python is always a great way to get your ideas clear and sometimes, I usually get a lot of neat questions from students that have me to re-think conceptual things about Python.
Not precisely what you're asking for, but I think it's good advice.
Learn another language, doesn't matter too much which. Each language has it's own ideas and conventions that you can learn from. Learn about the differences in the languages and more importantly why they're different. Try a purely functional language like Haskell and see some of the benefits (and challenges) of functions free of side-effects. See how you can apply some of the things you learn from other languages to Python.
I recommend starting with something that forces you to explore the expressive power of the syntax. Python allows many different ways of writing the same functionality, but there is often a single most elegant and fastest approach. If you're used to the idioms of other languages, you might never otherwise find or accept these better ways. I spent a weekend trudging through the first 20 or so Project Euler problems and made a simple webapp with Django on Google App Engine. This will only take you from apprentice to novice, maybe, but you can then continue to making somewhat more advanced webapps and solve more advanced Project Euler problems. After a few months I went back and solved the first 20 PE problems from scratch in an hour instead of a weekend.

How to reverse engineer a program which has no documentation [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a source of python program which doesn't have any documentation or comments.
I did tried twice to understand it but most of the times I am losing my track, because there are many files.
What should be the steps to understand that program fully and quickly.
Michael Feathers' "Working Effectively with Legacy Code" is a superb starting point for such endeavors -- not particularly language-dependent (his examples are in several non-python languages, but the techniques and mindset DO extend pretty well to Python and just about any other language).
The key focus is, that you want to understand the code for a reason -- modifying it and/or porting it. So, instrumenting the legacy code -- with batteries and scaffolding of tests and tracing/logging -- is the crucial path on the long, hard slog to understanding and modifying safely and responsibly.
Feathers suggests heuristics and techniques for where to focus your efforts and how to get started when the code is a total mess (hence "legacy") - no docs, or misleading docs (describing something quite different, maybe in subtle ways, from what the code actually DOES), no tests, an untestable-without-refactoring tangle of spaghetti dependencies. This may seem an extreme case but anybody who's spent a long-ish career in programming knows it's actually more common than anyone would like;-).
In past I have used 'Python call graph' to understand the source structure
Use a debugger e.g. pdb to wak thru
the code.
Try to read code again after one day
break, that also helps
I would recommend to generate some documentation with epydoc http://epydoc.sourceforge.net/ . For sure, if no docstring exists, the result will be poor but it will give you at least one view of your application and you'lle be able to navigate in the classes more easily.
Then you can try to document by yourself when you understand something new and then regenerate the docs again. It is never too late to start something.
I hope it helps
You are lucky it's in Python which is easy to read. But it is of course possible to write tricky hard to understand code in Python as well.
The steps are:
Run the software and learn to use it, and understand it's features at least a little bit.
Read though the tests, if any.
Read through the code.
When you encounter code you don't understand, put a debug break there, and step through the code, looking at what it does.
If there aren't any tests, or the test coverage is low, write tests to increase the test coverage. It's a good way to learn the system.
Repeat until you feel you have a vague grip on the code. A vague grip is all you need if you are going to manage the code. You'll get a good grip once you start actually working with the code. For a big system that can take years, so don't try to understand it all first.
There are tools that can help you. As Stephen C says, an IDE is a good idea. I'll explain why:
Many editors analyses the code. This typically gives you code completion, but more importantly in this case, it makes it possible to just just ctrl-click on a variable to see where it comes from. This really speeds things up when you want to understand otehr peoples code.
Also, you need to learn a debugger. You will, in tricky parts of the code, have to step through them in a debugger to see what the code actually do. Pythons pdb works, but many IDE's have integrated debuggers, which make debugging easier.
That's it. Good luck.
I have had to do a lot of this in my job. What works for me may be different to what works for you, but I'll share my experience.
I start by trying to identify the data structures being used and draw diagrams showing the relationships between them. Not necessarily something formal like UML, but a sketch on paper which you understand which allows you to see the overall structure of the data being manipulated by the program. Only once I have some view of the data structures being used do I start to try to understand how the data is being manipulated.
Secondly, for a large body of software, sometimes you need to just attack bite sized pieces at first. You won't get an overall understanding straight away, but if you understand small parts in detail and keep chipping away, eventually all the pieces fall together.
I combine these two approaches, switching between them when I am getting overly frustrated or bored. Regular walks around the block are recommended :) I find this gets me good results in the end.
Good luck!
pyreverse from Logilab and PyNSource from Andy Bulka are helpful too for UML diagram generation.
I'd start with a good python IDE. See the answers for this question.
Enterprise Architect by Sparx Systems is very good at processing a source directory and generating class diagrams. It is not free, but very reasonably priced for what you get. (I am not associated with this company in any way, I've just been a satisfied user of their product for several years.)

Categories

Resources