Performance of Python worth the cost?

Performance of Python worth the cost? - python

I'm looking at implementing a fuzzy logic controller based on either PyFuzzy (Python) or FFLL (C++) libraries.
I'd prefer to work with python but am unsure if the performance will be acceptable in the embedded environment it will work in (either ARM or embedded x86 proc both ~64Mbs of RAM).
The main concern is that response times are as fast as possible (an update rate of 5hz+ would be ideal >2Hz is required). The system would be reading from multiple (probably 5) sensors from an RS232 port and provide 2/3 outputs based on the results of the fuzzy evaluation.
Should I be concerned that Python will be too slow for this task?

In general, you shouldn't obsess over performance until you've actually seen it become a problem. Since we don't know the details of your app, we can't say how it'd perform if implemented in Python. And since you haven't implemented it yet, neither can you.
Implement the version you're most comfortable with, and can implement fastest, first. Then benchmark it. And if it is too slow, you have three options which should be done in order:
First, optimize your Python code
If that's not enough, write the most performance-critical functions in C/C++, and call that from your Python code
And finally, if you really need top performance, you might have to rewrite the whole thing in C++. But then at least you'll have a working prototype in Python, and you'll have a much clearer idea of how it should be implemented. You'll know what pitfalls to avoid, and you'll have an already correct implementation to test against and compare results to.

Python is very slow at handling large amounts of non-string data. For some operations, you may see that it is 1000 times slower than C/C++, so yes, you should investigate into this and do necessary benchmarks before you make time-critical algorithms in Python.
However, you can extend python with modules in C/C++ code, so that time-critical things are fast, while still being able to use python for the main code.

Make it work, then make it work fast.

If most of your runtime is spent in C libraries, the language you use to call these libraries isn't important. What language are your time-eating libraries written in ?

From your description, speed should not be much of a concern (and you can use C, cython, whatever you want to make it faster), but memory would be. For environments with 64 Mb max (where the OS and all should fit as well, right ?), I think there is a good chance that python may not be the right tool for target deployment.
If you have non trivial logic to handle, I would still prototype in python, though.

I never really measured the performance of pyfuzzy's examples, but as the new version 0.1.0 can read FCL files as FFLL does. Just describe your fuzzy system in this format, write some wrappers, and check the performance of both variants.
For reading FCL with pyfuzzy you need the antlr python runtime, but after reading you should be able to pickle the read object, so you don't need the antlr overhead on the target.

Related

Are there advantages to use the Python/C interface instead of Cython?

I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.

The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.

First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython

The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.

Python vs C : Line of Code Comparison vs Dev Time

Hi I'm currently learning Python since the syntax feels so succinct and the idioms match well with my mental model.
However I'm also interested in learning about OS internals and reverse engineering software, which ultimately means knowing C in a rather thorough capacity.
When originally picking a language I did lots of reading and comparisons, and it seems that a number thrown out a lot is that to write short idiomatic statements in Python would require the equivalent of a few hundred lines of C (I'd guess code for memory management, writing the code for dictionaries,lists etc) that we take for granted as built into the Python language.
1) With an average C programmer, is that 100-200 lines of code per Python idiom anywhere near accurate?
Because C doesn't come built-in with Python-like constructs such as dictionaries/lists(with all their nice methods etc):
2) Do C programmers tend to build these constructs from scratch and then re-use them between projects to greatly reduce the actual amount of hand coding for their projects?
I assume re-using libraries like boost:: stuff also again, reduces some of the boilerplate hand coding also...
3) But does using popular libraries and re-using common code one has written before in C for basic constructs/etc, how much does that revise the lines of code written in C compared to the code in Python of a enthusiast sized code base?
I know specific numbers aren't possible, but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python without being a Linus Torvalds style coding machine?
Thanks!

but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python
No.
You've missed the most important point.
Python's interactive. It's not edit-compile-link-execute-break-debug. It's edit-debug.

Boost is C++, not C (emphatically not C -- virtually all of it makes heavy use of templates and such that aren't part of C).
Yes, C programmers tend to build up personal libraries of code for all sorts of "stuff" -- data structures, algorithms, user interfaces, and so on. There are also a fair number of other libraries for everything from basic string manipulation to database connectivity, user interfaces, basic algorithms and data structures, etc.
Comparing productivity between the two can be difficult though -- even if something can be done in one line of code either way, there's a greater chance that the C programmer will end up doing extra work to find and learn to use that particular library. OTOH, if he has used it before, the two might be directly competitive of (in a few cases) C might be more productive.
I'd guess Python ends up more productive more often, but trying to guess how much so is difficult (and lines of code usually won't be a good indication either).

As I did serious c programming I read a book that claimed libraries are worth to write. (Especially in C which considered a low level language)
Libraries are build for reuse.
If you use libraries you write one line like detectFace( faceDesriptor ) or renderPDF( document) is doesn't matter whether an idiom in another language is more concise or not.
Lines of code isn't a proper metric if it is about what would more efficient.

It depends.
Try to write an interrupt handler in python. Someone could probably make it work but it's going to be a dancing bear, the dancing is not good but it is surprising that a bear can do it. Want to write an OS or do some embedded programming you're not going to be able to use python. It's telling that the main python implementation is written in C.
That being said I'm amazed at some of the low-level stuff that you can do with python. The high-level stuff is almost a given if you're measuring lines of code. Python is just a higher-level language.
They are both very useful tools, just for different types of projects. Knowing both would be very useful, particularly when you need to interface to some new functionality in python that doesn't yet have a python binding.
For the types of projects most developers work on python is going to be more consice and quicker to write and debug. You may be able to make a library of reusable C code, but a good python programmer will be doing the same thing with their python code, at a higher level.

I think Python is more productive for small projects (up to a few thousand lines of code).
On the other hand, C is better suited for large projects (even though IMHO there are better languages for that, such as Ada): static type-checking allows to find many errors at compile time that are much more difficult to detect at run-time, especially in a large program.
In a larger C project, the lack of lists and other powerful data structures that are found in Python can be compensated by implementing or using custom libraries. I agree with user stacker that by using well-designed libraries your C code can be pretty concise.

Depends greatly on the task and the size of the project. For many small interesting tasks, I would not be surprised by 100:1 smaller Python code simply because the standard libraries are extremely good. If you find, buy, or build C/C++ libraries that do what you want, I imagine the ratio would be much more like 3:1 on big projects.
However, finding, buying, and building C/C++ libraries does take time and effort, so I believe in the vast majority of cases, Python is going to be much faster to develop in.

what next after pyparsing?

I have a huge grammar developed for pyparsing as part of a large, pure Python application.
I have reached the limit of performance tweaking and I'm at the point where the diminishing returns make me start to look elsewhere. Yes, I think I know most of the tips and tricks and I've profiled my grammar and my application to dust.
What next?
I hope to find a parser that gives me the same readability, usability (I'm using many advanced features of pyparsing such as parse-actions to start the post processing of the input which is being parsed) and python integration but at 10× the performance.
I love the fact the the grammar is pure Python.
All my basic blocks are regular expressions, so reusing them would be nice.
I know I can't have everything so I am willing to give up on some of the features I have today to get to the requested 10× performance.
Where do I go from here?

It looks like the pyparsing folks have anticipated your problem. From https://github.com/pyparsing/pyparsing/blob/master/docs/HowToUsePyparsing.rst :
Performance of pyparsing may be slow for complex grammars and/or large input strings. The psyco package can be used to improve the speed of the pyparsing module with no changes to grammar or program logic - observed improvments have been in the 20-50% range.
However, as Vangel noted in the comments below, psyco is an obsolete project as of March 2012. Its successor is the PyPy project, which starts from the same basic approach to performance: use a JIT native-code compiler instead of a bytecode interpreter. You should be able to achieve similar or greater gains with PyPy if switching Python implementations will work for you.
If you're really a speed demon, but want to keep some of the legibility and declarative syntax, I'd suggest having a look at ANTLR. Probably not the Python-generating backend; I'm skeptical whether that's mature or high-performance enough for your needs. I'm talking about the goods: the C backend that started it all.
Wrap a Python C extension module around the entry point to the parser, and turn it loose.
Having said that, you'll be giving up a lot in this transition: basically any Python you want to do in your parser will have to be done through the C API (not altogether pretty). Also, you'll have to get used to very different ways of doing things. ANTLR has its charms, but it's not based on combinators, so there's not the easy and fluid relationship between your grammar and your language that there is in pyparsing. Plus, it's its own DSL, much like lex/yacc, which can present a learning curve – but, because it's LL based, you'll probably find it easier to adapt to your needs.

Switch to a generated C/C++ parser (using ANTLR, flex/bison, etc.). If you can delay all the action rules until after you are done parsing, you might be able to build an AST with trivial code and then pass that back to your python code via something like SWIG and process on it with your current actions rules. OTOH, for that to give you a speed boost, the parsing has to be the heavy lifting. If your action rules are the big cost, then this will buy you nothing unless you write your action rules in C as well (but you might have to do it anyway to avoid paying for whatever impedance mismatch you get between the python and C code).

If you really want performance for large grammars, look no farther than SimpleParse (which itself relies on mxTextTools, a C extension). However, know now that it comes at the cost of being more cryptic and requiring that you be well-versed in EBNF.
It's definitely not the more Pythonic route, and you're going to have to start all over with an EBNF grammar to use SimpleParse.

A bit late to the party, but PLY (Python Lex-Yacc), has served me very well. PLY gives you a pure Python framework for constructing lex-based tokenizers, and yacc-based LR parsers.
I went this route when I hit performance issues with pyparsing.
Here is a somewhat old but still interesting article on Python parsing which includes benchmarks for ANTLR, PLY and pyparsing. PLY is roughly 4 times faster than pyparsing in this test.

There's no way to know what kind of benefit you'll get without just testing it, but it's within the range of possibility that you could get 10x benefit just from using Unladen Swallow if your process is long-running and repetitive. (Also, if you have many things to parse and you typically start a new interpreter for each one, Unladen Swallow gets faster - to a point - the longer you run your process, so while parsing one input might not show much gain, you might get significant gains on the 2nd and 3rd inputs in the same process).
(Note: pull the latest out of SVN - you'll get far better performance than the latest tarball)

Python on an Real-Time Operation System (RTOS)

I am planning to implement a small-scale data acquisition system on an RTOS platform. (Either on a QNX or an RT-Linux system.)
As far as I know, these jobs are performed using C / C++ to get the most out of the system. However I am curious to know and want to learn some experienced people's opinions before I blindly jump into the coding action whether it would be feasible and wiser to write everything in Python (from low-level instrument interfacing through a shiny graphical user interface). If not, mixing with timing-critical parts of the design with "C", or writing everything in C and not even putting a line of Python code.
Or at least wrapping the C code using Python to provide an easier access to the system.
Which way would you advise me to work on? I would be glad if you point some similar design cases and further readings as well.
Thank you
NOTE1: The reason of emphasizing on QNX is due to we already have a QNX 4.25 based data acquisition system (M300) for our atmospheric measurement experiments. This is a proprietary system and we can't access the internals of it. Looking further on QNX might be advantageous to us since 6.4 has a free academic licensing option, comes with Python 2.5, and a recent GCC version. I have never tested a RT-Linux system, don't know how comparable it to QNX in terms of stability and efficiency, but I know that all the members of Python habitat and non-Python tools (like Google Earth) that the new system could be developed on works most of the time out-of-the-box.

I've built several all-Python soft real-time (RT) systems, with primary cycle times from 1 ms to 1 second. There are some basic strategies and tactics I've learned along the way:
Use threading/multiprocessing only to offload non-RT work from the primary thread, where queues between threads are acceptable and cooperative threading is possible (no preemptive threads!).
Avoid the GIL. Which basically means not only avoiding threading, but also avoiding system calls to the greatest extent possible, especially during time-critical operations, unless they are non-blocking.
Use C modules when practical. Things (usually) go faster with C! But mainly if you don't have to write your own: Stay in Python unless there really is no alternative. Optimizing C module performance is a PITA, especially when translating across the Python-C interface becomes the most expensive part of the code.
Use Python accelerators to speed up your code. My first RT Python project greatly benefited from Psyco (yeah, I've been doing this a while). One reason I'm staying with Python 2.x today is PyPy: Things always go faster with LLVM!
Don't be afraid to busy-wait when critical timing is needed. Use time.sleep() to 'sneak up' on the desired time, then busy-wait during the last 1-10 ms. I've been able to get repeatable performance with self-timing on the order of 10 microseconds. Be sure your Python task is run at max OS priority.
Numpy ROCKS! If you are doing 'live' analytics or tons of statistics, there is NO way to get more work done faster and with less work (less code, fewer bugs) than by using Numpy. Not in any other language I know of, including C/C++. If the majority of your code consists of Numpy calls, you will be very, very fast. I can't wait for the Numpy port to PyPy to be completed!
Be aware of how and when Python does garbage collection. Monitor your memory use, and force GC when needed. Be sure to explicitly disable GC during time-critical operations. All of my RT Python systems run continuously, and Python loves to hog memory. Careful coding can eliminate almost all dynamic memory allocation, in which case you can completely disable GC!
Try to perform processing in batches to the greatest extent possible. Instead of processing data at the input rate, try to process data at the output rate, which is often much slower. Processing in batches also makes it more convenient to gather higher-level statistics.
Did I mention using PyPy? Well, it's worth mentioning twice.
There are many other benefits to using Python for RT development. For example, even if you are fairly certain your target language can't be Python, it can pay huge benefits to develop and debug a prototype in Python, then use it as both a template and test tool for the final system. I had been using Python to create quick prototypes of the "hard parts" of a system for years, and to create quick'n'dirty test GUIs. That's how my first RT Python system came into existence: The prototype (+Psyco) was fast enough, even with the test GUI running!
-BobC
Edit: Forgot to mention the cross-platform benefits: My code runs pretty much everywhere with a) no recompilation (or compiler dependencies, or need for cross-compilers), and b) almost no platform-dependent code (mainly for misc stuff like file handling and serial I/O). I can develop on Win7-x86 and deploy on Linux-ARM (or any other POSIX OS, all of which have Python these days). PyPy is primarily x86 for now, but the ARM port is evolving at an incredible pace.

I can't speak for every data acquisition setup out there, but most of them spend most of their "real-time operations" waiting for data to come in -- at least the ones I've worked on.
Then when the data does come in, you need to immediately record the event or respond to it, and then it's back to the waiting game. That's typically the most time-critical part of a data acquisition system. For that reason, I would generally say stick with C for the I/O parts of the data acquisition, but there aren't any particularly compelling reasons not to use Python on the non-time-critical portions.
If you have fairly loose requirements -- only needs millisecond precision, perhaps -- that adds some more weight to doing everything in Python. As far as development time goes, if you're already comfortable with Python, you would probably have a finished product significantly sooner if you were to do everything in Python and refactor only as bottlenecks appear. Doing the bulk of your work in Python will also make it easier to thoroughly test your code, and as a general rule of thumb, there will be fewer lines of code and thus less room for bugs.
If you need to specifically multi-task (not multi-thread), Stackless Python might be beneficial as well. It's like multi-threading, but the threads (or tasklets, in Stackless lingo) are not OS-level threads, but Python/application-level, so the overhead of switching between tasklets is significantly reduced. You can configure Stackless to multitask cooperatively or preemptively. The biggest downside is that blocking IO will generally block your entire set of tasklets. Anyway, considering that QNX is already a real-time system, it's hard to speculate whether Stackless would be worth using.
My vote would be to take the as-much-Python-as-possible route -- I see it as low cost and high benefit. If and when you do need to rewrite in C, you'll already have working code to start from.

Generally the reason advanced against using a high-level language in a real-time context is uncertainty -- when you run a routine one time it might take 100us; the next time you run the same routine it might decide to extend a hash table, calling malloc, then malloc asks the kernel for more memory, which could do anything from returning instantly to returning milliseconds later to returning seconds later to erroring, none of which is immediately apparent (or controllable) from the code. Whereas theoretically if you write in C (or even lower) you can prove that your critical paths will "always" (barring meteor strike) run in X time.

Our team have done some work combining multiple languages on QNX and had quite a lot of success with the approach. Using python can have a big impact on productivity, and tools like SWIG and ctypes make it really easy to optimize code and combine features from the different languages.
However, if you're writing anything time critical, it should almost certainly be written in c. Doing this means you avoid the implicit costs of an interpreted langauge like the GIL (Global Interpreter Lock), and contention on many small memory allocations. Both of these things can have a big impact on how your application performs.
Also python on QNX tends not to be 100% compatible with other distributions (ie/ there are sometimes modules missing).

One important note: Python for QNX is generally available only for x86.
I'm sure you can compile it for ppc and other archs, but that's not going to work out of the box.

Why is (python|ruby) interpreted?

What are the technical reasons why languages like Python and Ruby are interpreted (out of the box) instead of compiled? It seems to me like it should not be too hard for people knowledgeable in this domain to make these languages not be interpreted like they are today, and we would see significant performance gains. So certainly I am missing something.

Several reasons:
faster development loop, write-test vs write-compile-link-test
easier to arrange for dynamic behavior (reflection, metaprogramming)
makes the whole system portable (just recompile the underlying C code and you are good to go on a new platform)
Think of what would happen if the system was not interpreted. Say you used translation-to-C as the mechanism. The compiled code would periodically have to check if it had been superseded by metaprogramming. A similar situation arises with eval()-type functions. In those cases, it would have to run the compiler again, an outrageously slow process, or it would have to also have the interpreter around at run-time anyway.
The only alternative here is a JIT compiler. These systems are highly complex and sophisticated and have even bigger run-time footprints than all the other alternatives. They start up very slowly, making them impractical for scripting. Ever seen a Java script? I haven't.
So, you have two choices:
all the disadvantages of both a compiler and an interpreter
just the disadvantages of an interpreter
It's not surprising that generally the primary implementation just goes with the second choice. It's quite possible that some day we may see secondary implementations like compilers appearing. Ruby 1.9 and Python have bytecode VM's; those are ½-way there. A compiler might target just non-dynamic code, or it might have various levels of language support declarable as options. But since such a thing can't be the primary implementation, it represents a lot of work for a very marginal benefit. Ruby already has 200,000 lines of C in it...
I suppose I should add that one can always add a compiled C (or, with some effort, any other language) extension. So, say you have a slow numerical operation. If you add, say Array#newOp with a C implementation then you get the speedup, the program stays in Ruby (or whatever) and your environment gets a new instance method. Everybody wins! So this reduces the need for a problematic secondary implementation.

Exactly like (in the typical implementation of) Java or C#, Python gets first compiled into some form of bytecode, depending on the implementation (CPython uses a specialized form of its own, Jython uses JVM just like a typical Java, IronPython uses CLR just like a typical C#, and so forth) -- that bytecode then gets further processed for execution by a virtual machine (AKA interpreter), which may also generate machine code "just in time" -- known as JIT -- if and when warranted (CLR and JVM implementations often do, CPython's own virtual machine typically doesn't but can be made to do so e.g. with psyco or Unladen Swallow).
JIT may pay for itself for sufficiently long-running programs (if memory's way cheaper than CPU cycles), but it may not (due to slower startup times and larger memory footprint), especially when the types also have to be inferred or specialized as part of the code generation. Generating machine code without type inference or specialization is easy if that's what you want, e.g. freeze does it for you, but it really doesn't present the advantages that "machine code fetishists" attribute to it. E.g., you get an executable binary of 1.5 to 2 MB in lieu of a tiny "hello world" .pyc -- not much point!-). That executable is stand-alone and distributable as such, but it will only work on a very specific narrow range of operating systems and CPU architectures, so the tradeoffs are quite iffy in most cases. And, the time it takes to prepare the executable is quite long indeed, so it would be a crazy choice to make that mode of operation the default one.

Merely replacing an interpreter with a compiler won't give you as big a performance boost as you might think for a language like Python. When most time is actually spend doing symbolic lookups of object members in dictionaries, it doesn't really matter if the call to the function performing such lookup is interpreted, or is native machine code - the difference, while not quite negligible, will be dwarfed by lookup overhead.
To really improve performance, you need optimizing compilers. And optimization techniques here are very different from what you have with C++, or even Java JIT - an optimizing compiler for a dynamically typed / duck typed language such as Python needs to do some very creative type inference (including probabilistic - i.e. "90% chance of it being T" and then generating efficient machine code for that case with a check/branch before it) and escape analysis. This is hard.

I think the biggest reason for the languages being interpreted is portability. As a programmer you can write code that will run in an interpreter not a specific OS. So your programs behave more uniformly across platforms (more so than compiled languages). Another advantage I can think of is it's easier to have a dynamic type system in an interpreted language. I think the creators of the language were thinking having a language where programmers can be more productive due to automatic memory management, dynamic type system and meta programming wins over any performance loss due to the language being interpreted. If you are concerned about performance you can always compile the language to native machine code employing a technique like JIT compilation.

Today, there is no longer a strong distinction between "compiled" and "interpreted" languages. Python is in fact compiled just as much as Java is, the only differences are:
The Python compiler is much faster than the Java compiler
Python automatically compiles source code as it is executed, there is no separate "compile" step required
Python bytecode is different from JVM bytecode
Python even has a function called compile() which is an interface to the compiler.
It sounds like the distinction you are making is between "dynamically typed" and "statically typed" languages. In dynamic languages such as Python, you can write code like:
def fn(x, y):
return x.foo(y)
Notice that the types of x and y are not specified. At runtime, this function will look at x to see whether it has a member function named foo, and if so will call it with y. If not, it will throw a runtime error that indicates no such function was found. This sort of runtime lookup is much easier to represent using an intermediate representation like bytecode, where a runtime VM does the lookup instead of having to generate machine code to do the lookup itself (or, call a function to do the lookup which is what the bytecode will do anyway).
Python has projects such as Psyco, PyPy, and Unladen Swallow that take various approaches to compiling Python object code into something closer to native code. There is active research in this area but there is not (as yet) a simple answer.

The effort required to create a good compiler to generate native code for a new language is staggering. Small research groups typically take 5 to 10 years (examples: SML/NJ, Haskell, Clean, Cecil, lcc, Objective Caml, MLton, and many others). And when the language in question requires type checking and other decisions to be made at run time, a compiler writer has to work much harder to get good native-code performance (for an excellent example, see work by Craig Chambers and later Urs Hoelzle on Self). The performance gains you might hope for are harder to realize than you might think. This phenomenon partly explains why so many dynamically typed languages are interpreted.
As noted, a decent interpreter is also instantly portable, while porting compilers to new machine architectures takes substantial effort (and is a problem I personally have been working on for over 20 years, with some time off for good behavior). So an interpreter is a way to reach a wide audience quickly.
Finally, although fast compilers and slow interpreters exist, it's usually easer to make the edit-translate-go cycle faster by using an interpreter. (For some nice examples of fast compilers see the aforementioned lcc as well as Ken Thompson's go compiler. For an example of a relatively slow interpreter see GHCi.

Well, isn't one of the strengths of these languages that they are so easily scriptable? They wouldn't be if they were compiled. And on the other hand, dynamic languages are easier to intereprete than to compile.

In a compiled language, the loop you get into when making software is
Make a change
Compile changes
Test changes
goto 1
Interpreted languages tend to be faster to make stuff in because you get to cut out step two of that process (and when you're dealing with a large system where compile times can be upwards of two minutes, step two can add a significant amount of time).
This isn't necessarily the reason python|ruby designers thought of, but keep in mind that "How efficiently does the machine run this?" is only half the software development problem.
It also seems like it would be easier to compile code in a language that's interpreted naturally than it would be to add an interpreter to a language that's compiled by default.

REPL. Don't knock it 'till you've tried it. :)

By design.
The authors wanted something where they can write scripts into.
Python gets compiled the first time it is executed though

Compiling Ruby at least is notoriously hard. I'm working on one, and as part of that I wrote a blog post enumerating some of the issues here.
Specifically, Ruby is suffering from a very unclear (i.e. non-existent) boundary between the "read" and "execute" phase of the program that makes it hard to compile efficiently. You could just emulate what the interpreter does, but then you're not going to see much speed up, so it wouldn't be worth the effort. If you want to compile it efficiently you then face a lot of additional complications to handle the extreme level of dynamism in Ruby.
The good news is that there are techniques for overcoming this. Self, Smalltalk and Lisp/Scheme's have dealt quite successfully with most of the same issues. But it takes time to sift through it and figure out how to make it work with Ruby. It also doesn't help that Ruby has a very convoluted grammar.

Raw compute performance is probably not a goal of most interpreted languages. Interpreted languages are typically more concerned about programmer productivity than raw speed. In most cases these languages are plenty fast enough for the tasks the languages were designed to tackle.
Given that, and that just about the only advantages of a compiler are type checking (difficult to do in a dynamic language) and speed, there's not much incentive to write compilers for most interpreted languages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.