Why are there no languages that are both interpreted and (really) compiled?

Why are there no languages that are both interpreted and (really) compiled? - python

I am an (old) engineer not a programmer so forgive me for asking a naïve question.
My understanding is that to get really fast execution times for a program, it needs to be compiled to native machine code. And there are a relatively small number of languages still in use that do this (e.g. C and C++).
But I much prefer the syntax of Python over that of the C-derived compiled languages. However my understanding is that interpreted Python (and pseudo-compiled Python run on a virtual machine) cannot match the execution speed of a truly compiled language.
Is there some reason that a true native-code Python compiler cannot be developed?
[I am interested specifically in Python but I am not aware of any language that can be interpreted and also compiled to native machine code.]

The key distinction is a clean separation between compile time and run time. In Python, for instance, import happens at runtime, and can happen conditionally. And per the halting problem, that means a compiler cannot determine up front if a given import will happen. Yet, this affects the code that would need to be generated.
As Bill the Lizard notes, if the language does have a clean distinction, an interpreter can still choose to ignore it. C's #include can happen before main, but that does not mean an interpreter must do so.
Outside the syntax, Python is also virtually uncompilable due to weak typing. In C, + has a very limited set of meanings - integer, floating point or pointer, and the compiler will know the static type of the arguments. C++ has far more extensive overloading, but the same basic principle applies. virtual functions allow some run-time flexibility, but from an finite set of options, all compiled before main starts.
Also not syntax is the memory model - both C and C++ have a memory model that's an improved derivative of Java's memory model, which makes threading quite efficient. (Unlike Java, not every object can be synchronized, you need special members). As CPU's gain more and more cores, the advantages only continue to grow. Compilers can see pretty well where memory and CPU registers need to be brought in sync.

Related

Access variable from several threads in Python language

What guarantees are provided for this code execution in the case of using multithreading in Python:
a=b
Where a, b are integer variables (i.e. they have class 'int' in a Python Programming language). Also "a", "b" are variables from a global context.
Q1: I think it's not necessary that this instruction be translated into something like "mov [a], [b]" (in x86). If write code in ASM x86 there is a possibility to specify "lock mov [a], [b]". Of course, it's too low level and scripting language like Python, I bet, does a lot of instruction on top.
But still, some language like C++ has volatile keyword which will restrict compiler to create instruction which for optimization purpose store variables in a CPU registers.
How in Python language specify that? Is it possible?
Q2: With this line of code, there is another problem, even if it's possible to create some form of volatile behavior (maybe), but another question how to put Memory Fence. For people who are working with Python/Bash and less with compile languages, there is a thing that threads may execute in separate CPU cores, and sometimes you maybe not so lucky and will read the incorrect values from the CPU core cache. In compiling language which is translated in x86 or ARM instruction it's you that responsible for that at the end of the day, but for Python - I have no idea who is responsible for that.

You cannot make any general statements about the execution of a program that is written in Python, as it depends too much on the implementation details of the interpreter that is being used (CPython, Jython, IronPython, etc.), or the libraries that are used by the program. As far as I know, the Python language itself barely makes any promises at this level, and doesn't expose any relevant mechanics, with the exception of for high-level concepts such as threads, processes, mutexes, etc.
If you're concerned about the execution of your program, then simply don't use just Python. Instead, you should implement the critical parts of your codebase in C or C++, and use Python bindings to interact with those components. That way, you can manage memory and execution problems at the appropriate level with programming languages that are better suited, while also being able to use Python to tie most of the codebase together.

C++ shared_ptr vs. Python object

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
On the other hand, Python objects seem to be essentially shared_ptrs (ref_count and garbage collection).
I am wondering what makes them work nicely in Python but potentially dangerous in C++. In other words, what are the differences between Python and C++ in dealing with shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
I know e.g. Python automatically detects cycles between objects which prevents memory leaks that dangling cyclic shared_ptrs can cause in C++.

"I know e.g. Python automatically detects cycles" -- that's what makes them work nicely, at least so far as the "potential bugs" relate to memory leaks.
Besides which, C++ programs are more commonly written under tight performance constraints than Python programs (which IMO is a combination of different genuine requirements with some fairly bogus differences in rules-of-thumb, but that's another story). A fairly high proportion of the Python objects I use don't strictly need reference counting, they have exactly one owner and a unique_ptr would be fine (or for that matter a data member of class type). In C++ it's considered (by the people writing the advice you're reading) worth taking the performance advantage and the explicitly simplified design. In Python it's usually not considered a problem, you pay the performance and you keep the flexibility to decide later that it's shared after all without any code change required (other than to take additional references that outlive the original, I mean).
Btw in any language, shared mutable objects have "potential bugs" associated with them, if you lose track of what objects will or won't change when you're not looking at them. I don't just mean race conditions: even in a single-threaded program you need to be aware that C++ Predicates shouldn't change anything and that you (often) can't mutate a container while iterating over it. I don't see this as a difference between C++ and Python, though. Rather, to some extent you should be slightly wary of shared objects in Python too, and when you proliferate references to an object at least understand why you're doing it.
So, on to the list of issues in the question you link to:
cyclic references -- as mentioned, Python rolls its sleeves up, finds them and frees them. For reasons to do with the design and specific uses of the languages, cycle-breaking garbage collection is rather difficult to implement in C++, although not impossible.
creating multiple unrelated shared_ptrs to the same object -- no analog is possible in Python, since the reference-counter isn't open to the user to mess up.
Constructing an anonymous temporary shared pointer -- doesn't arise in Python, there's no risk of a memory leak that way in Python since there's no "gap" in which the object exists but is not yet subject to collection if it becomes unreferenced.
Calling the get() function to get the raw pointer and use it after the pointed-to object goes out of scope -- well, you can mess this up if you're writing Python/C, but not in pure Python.
Passing a reference of or a raw pointer to a shared_ptr should be dangerous too, since it won't increment the internal count -- there's no means in Python to add a reference without the language taking care of the refcount.
we passed 'this' to some thread workers instead of 'shared_from_this' -- in other words, forgot to create a shared_ptr when needed. Can't do this in Python.
most of the predicates you know and love from <functional> don't play nicely with shared_ptr -- Python refcounting is so built in to the runtime (or I suppose to be precise I should say: garbage collection is so built in to the language design) that there are no libraries that fail to cope with it.
Using shared_ptr for really small objects (like char short) could be an overhead -- issue exists in Python, and Python programmers generally don't sweat it. If you need an array of "primitive type" then you can use numpy to reduce overhead. Sometimes Python programs run out of memory and you need to do something about it, that's life ;-)
Giving out a shared_ptr< T > to this inside a class definition is also dangerous. Use enabled_shared_from_this instead -- it may not be obvious, but this is "don't create multiple unrelated shared_ptr to the same object" again.
You need to be careful when you use shared_ptr in multithread code -- it's possible to create race conditions in Python too, this is part of "shared mutable objects are tricksy".
Most of this is to do with the fact that in C++ you have to explicitly do something to get refcounting, and you don't get it if you don't ask for it. This provides several opportunities for error that Python doesn't make available to the programmer because it just does it for you. If you use shared_ptr correctly then apart from the existence of libraries that don't co-operate with it, none of these problems comes up in C++ either. Those who are cautious of using it for these reasons are basically saying they're afraid they'll use it incorrectly, or at any rate more afraid than that they'll misuse some alternative. Much of C++ programming is trading different potential bugs off against each other until you come up with a design that you consider yourself competent to execute. Furthermore it has "don't pay for what you don't need" as a design philosophy. Between these two factors, you don't do anything without a really good explanation, a significant benefit, and a carefully checked design. shared_ptr is no different ;-)

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).
I wouldn't agree. The tendency goes towards generally using these smart pointers unless you have a very good reasons not to do so.
shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?
Well, I don't know about your favourite largish signal processing framework ecosystem, but GNU Radio uses shared_ptrs for all their blocks, which are the core elements of the GNU Radio architecture. In fact, blocks are classes, with private constructors, which are only accessible by a friend make function, which returns a shared_ptr. We haven't had problems with this -- and GNU Radio had good reason to adopt such a model. Now, we don't have a single place where users try to use deallocated block objects, not a single block is leaked. Nice!
Also, we use SWIG and a gateway class for a few C++ types that can't just be represented well as Python types. All this works very well on both sides, C++ and Python. In fact, it works so very well, that we can use Python classes as blocks in the C++ runtime, wrapped in shared_ptr.
Also, we never had performance problems. GNU Radio is a high rate, highly optimized, heavily multithreaded framework.

compiler vs interpreter ( on basis of construction and design )

After viewing lots of posts about the difference between compilers and interpreters, I'm still not able to figure out the difference in their construction and internal mechanism.
The most common difference I read was that a compiler produces a target program which is executable { means machine code as its output } which can run on a system and than be fed with input.
Whereas an interpreter simply runs the input line by line { what exactly is happening here ?} and produces the output.
My main doubts are :
1) A compiler consists of a lexical analyzer, parser, intermediate code generator and code generator but what are the parts of an interpreter?
2) Who gives the run-time support to interpreted languages, I mean who manages the heap and stacks for recursive functions?
3) This is specific to the Python language:
Python comprises of a compiler stage and than interpreter stage as well
compiler produces some byte-code and and than this byte-code is interpreted by its Virtual Machine.
if I were to design only the compiler for Python (Python -> bytecode)
a) will I have to manage memory { write code to manage stack and heap } for it?
b) how will this compiler differ from the traditional compiler or say interpreter?
I know this is a whole lot to ask here but I really want to understand these minute details.
I'm referring the compiler book by Alfred V. Aho
Based on the feedback and some further study I think I should modify my question
A compiler need not produce only machine code as its output
But one question is still bugging me
Let say I want to design a ( Python->bytecode ) compiler and then bytecode will be interpreted by the virtual machine.. (correct me if I'm wrong ).
Then I'll have to write a lexical analyzer for Python and then a parser which will generate some sort of abstract syntax tree.. after this do I have to generate some intermediate code (3 address code as mentioned in the dragon book) or direct bytecode instructions ( which I suppose will be given in the VM's documentation ) ?
Will I have to write code for handling stack as well to provide support for recursion and scope ?

First off, "compiler" does not imply "outputs machine code". You can compile from any language to any other, be it a high-level programming language, some intermediate format, code for a virtual machine (bytecode) or code for a physical machine (machine code).
Like a compiler, an interpreter needs to read and understand the language it implements. Thus you have the same front-end code (though today's interpreters usually implement far simpler language - the bytecode only; therefore these interpreters only need a very simple frontend). Unlike a compiler, an interpreter's backend doesn't generate code, but executes it. Obviously, this is a different problem entirely and hence an interpreter looks quite difference from a compiler. It emulates a computer (often one that's far more high-level than real life machines) instead of producing a representation of an equivalent program.
Assuming today's somewhat-high-level virtual machines, this is the job of the interpreter - there are dedicated instructions for, among other things, calling functions and creating objects, and garbage collection is baked into the VM. When you target lower-level machines (such as the x86 instruction set), many such details need to be baked into the generated code, be it directly (system calls or whatever) or via calling into the C standard library implementation.
3.
a) Probably not, as a VM dedicated to Python won't require it. It would be very easy to screw up, unnecessary and arguably incompatible to Python semantics as it allows manual memory management. On the other hand, if you were to target something low-level like LLVM, you'd have to take special care - the details depend on the target language. That's part of why nobody does it.
b) It would be a perfectly fine compiler, and obviously not an interpreter. You'd probably have a simpler backend than a compiler targeting machine code, and due to the nature of the input language you wouldn't have quite as much analysis and optimization to do, but there's no fundamental difference.

From interpeted to native code: "dynamic" languages compiler support

First, I am aware that dynamic languages is a term used mainly by a vendor; I am using it just to have a container word to include languages like Perl (a favorite of mine), Python, Tcl, Ruby, PHP and so on. They are interpreted but I am interested here to refer to languages featuring strong capability to support the programmer efficiency and the support for typical constructs of modern interpreted languages
My question is: there are dynamic languages can be compiled efficiently in native executable code - typically for Windows platforms? Which ones? Maybe using some third part ad-hoc tools? I am not talking about huge executables carrying with them a full interpreter or some similar tricks nor some smart module able to include its own dependances or some required modules, but a honest, straight, standard, solid executable code.
If not, there is some technical reason inhibiting the availability of such a best-of-both-world feature?
Thanks!
Daniel

I think you're operating under a misunderstanding: These executables aren't huge because they just lump the interpreter in there, they're huge because the whole runtime is in there.
On Windows, most of your runtime is already installed, so you don't have to ship it. You think your program is small, but a quick look at the virtual memory mappings will tell you that even a small "hello-world" type program written in C is a couple megabytes big.
That's just how big useful runtimes are.
If you really want to keep your ship-size small, your only choice is to use the runtimes that are already there, and that means C/C++ and (recently) dot-net.
If you really can't swallow the runtime, Forth is as small as it gets.
The best, most aggressive dynamic languages with the best compilers for Windows are the commercial Lisps. They do a lot of inlining and pruning when producing executables, so you end up shipping only what you use. They are still 1.5x to 5x larger than C/C++ programs.
As far as languages that you know: Perl is as fat as they get. ActiveState has perlapp which I'm sure you're already aware of, but you dismissed because of it's size. Revisit it if you can.
Now, to answer your question (is) there is some technical reason inhibiting the availability of such a best-of-both-world feature?: Yes.
Perl cannot be statically analyzed (proof), which means there's no way for a perl compiler to tell what can be discarded. That means every part of Perl's runtime needs to be available to your program becuase there's no way for your program to indicate what parts can be discarded.
That means that getting a smaller executable is equivalent to getting a smaller runtime, and you should be comfortable accepting that if the perl developers knew how to make the perl runtime smaller without discarding any features, they'd probably do it.
If you are willing to write in a strict subset of Python or PHP, these languages can be analyzed. Shed Skin and HipHop-php are pretty good, but they're still quite large, and they don't support all of Pythons and PHP's features which means that some modules will simply not work. To my knowledge, nobody has implemented pruning for either of these languages (most of the focus in these compilers is in improving their lackluster performance) and it may be another decade or more before anyone bothers, however these still will be the restrictions you have to accept when doing this sort of thing.

The PyPy project does what you describe for a fairly complete subset of Python.
In the general case, this is a very hard problem to solve, largely due to the very attributes that make these languages "dynamic": late binding, weakly-typed variables, data structures and containers, eval facilities, a fuzzy divide between programming and meta-programming, etc. But a lot of effort is being poured into it, such as the JavaScript JIT-compiler projects listed here.

Shed Skin is an experimental (and restricted) Python-to-C++ compiler that can do what you describe. As Marcelo indicates above with PyPy, there are limitations on what you can compile with Shed Skin, but if you are willing to accept the restrictions, you can achieve large speedups.

Why is (python|ruby) interpreted?

What are the technical reasons why languages like Python and Ruby are interpreted (out of the box) instead of compiled? It seems to me like it should not be too hard for people knowledgeable in this domain to make these languages not be interpreted like they are today, and we would see significant performance gains. So certainly I am missing something.

Several reasons:
faster development loop, write-test vs write-compile-link-test
easier to arrange for dynamic behavior (reflection, metaprogramming)
makes the whole system portable (just recompile the underlying C code and you are good to go on a new platform)
Think of what would happen if the system was not interpreted. Say you used translation-to-C as the mechanism. The compiled code would periodically have to check if it had been superseded by metaprogramming. A similar situation arises with eval()-type functions. In those cases, it would have to run the compiler again, an outrageously slow process, or it would have to also have the interpreter around at run-time anyway.
The only alternative here is a JIT compiler. These systems are highly complex and sophisticated and have even bigger run-time footprints than all the other alternatives. They start up very slowly, making them impractical for scripting. Ever seen a Java script? I haven't.
So, you have two choices:
all the disadvantages of both a compiler and an interpreter
just the disadvantages of an interpreter
It's not surprising that generally the primary implementation just goes with the second choice. It's quite possible that some day we may see secondary implementations like compilers appearing. Ruby 1.9 and Python have bytecode VM's; those are ½-way there. A compiler might target just non-dynamic code, or it might have various levels of language support declarable as options. But since such a thing can't be the primary implementation, it represents a lot of work for a very marginal benefit. Ruby already has 200,000 lines of C in it...
I suppose I should add that one can always add a compiled C (or, with some effort, any other language) extension. So, say you have a slow numerical operation. If you add, say Array#newOp with a C implementation then you get the speedup, the program stays in Ruby (or whatever) and your environment gets a new instance method. Everybody wins! So this reduces the need for a problematic secondary implementation.

Exactly like (in the typical implementation of) Java or C#, Python gets first compiled into some form of bytecode, depending on the implementation (CPython uses a specialized form of its own, Jython uses JVM just like a typical Java, IronPython uses CLR just like a typical C#, and so forth) -- that bytecode then gets further processed for execution by a virtual machine (AKA interpreter), which may also generate machine code "just in time" -- known as JIT -- if and when warranted (CLR and JVM implementations often do, CPython's own virtual machine typically doesn't but can be made to do so e.g. with psyco or Unladen Swallow).
JIT may pay for itself for sufficiently long-running programs (if memory's way cheaper than CPU cycles), but it may not (due to slower startup times and larger memory footprint), especially when the types also have to be inferred or specialized as part of the code generation. Generating machine code without type inference or specialization is easy if that's what you want, e.g. freeze does it for you, but it really doesn't present the advantages that "machine code fetishists" attribute to it. E.g., you get an executable binary of 1.5 to 2 MB in lieu of a tiny "hello world" .pyc -- not much point!-). That executable is stand-alone and distributable as such, but it will only work on a very specific narrow range of operating systems and CPU architectures, so the tradeoffs are quite iffy in most cases. And, the time it takes to prepare the executable is quite long indeed, so it would be a crazy choice to make that mode of operation the default one.

Merely replacing an interpreter with a compiler won't give you as big a performance boost as you might think for a language like Python. When most time is actually spend doing symbolic lookups of object members in dictionaries, it doesn't really matter if the call to the function performing such lookup is interpreted, or is native machine code - the difference, while not quite negligible, will be dwarfed by lookup overhead.
To really improve performance, you need optimizing compilers. And optimization techniques here are very different from what you have with C++, or even Java JIT - an optimizing compiler for a dynamically typed / duck typed language such as Python needs to do some very creative type inference (including probabilistic - i.e. "90% chance of it being T" and then generating efficient machine code for that case with a check/branch before it) and escape analysis. This is hard.

I think the biggest reason for the languages being interpreted is portability. As a programmer you can write code that will run in an interpreter not a specific OS. So your programs behave more uniformly across platforms (more so than compiled languages). Another advantage I can think of is it's easier to have a dynamic type system in an interpreted language. I think the creators of the language were thinking having a language where programmers can be more productive due to automatic memory management, dynamic type system and meta programming wins over any performance loss due to the language being interpreted. If you are concerned about performance you can always compile the language to native machine code employing a technique like JIT compilation.

Today, there is no longer a strong distinction between "compiled" and "interpreted" languages. Python is in fact compiled just as much as Java is, the only differences are:
The Python compiler is much faster than the Java compiler
Python automatically compiles source code as it is executed, there is no separate "compile" step required
Python bytecode is different from JVM bytecode
Python even has a function called compile() which is an interface to the compiler.
It sounds like the distinction you are making is between "dynamically typed" and "statically typed" languages. In dynamic languages such as Python, you can write code like:
def fn(x, y):
return x.foo(y)
Notice that the types of x and y are not specified. At runtime, this function will look at x to see whether it has a member function named foo, and if so will call it with y. If not, it will throw a runtime error that indicates no such function was found. This sort of runtime lookup is much easier to represent using an intermediate representation like bytecode, where a runtime VM does the lookup instead of having to generate machine code to do the lookup itself (or, call a function to do the lookup which is what the bytecode will do anyway).
Python has projects such as Psyco, PyPy, and Unladen Swallow that take various approaches to compiling Python object code into something closer to native code. There is active research in this area but there is not (as yet) a simple answer.

The effort required to create a good compiler to generate native code for a new language is staggering. Small research groups typically take 5 to 10 years (examples: SML/NJ, Haskell, Clean, Cecil, lcc, Objective Caml, MLton, and many others). And when the language in question requires type checking and other decisions to be made at run time, a compiler writer has to work much harder to get good native-code performance (for an excellent example, see work by Craig Chambers and later Urs Hoelzle on Self). The performance gains you might hope for are harder to realize than you might think. This phenomenon partly explains why so many dynamically typed languages are interpreted.
As noted, a decent interpreter is also instantly portable, while porting compilers to new machine architectures takes substantial effort (and is a problem I personally have been working on for over 20 years, with some time off for good behavior). So an interpreter is a way to reach a wide audience quickly.
Finally, although fast compilers and slow interpreters exist, it's usually easer to make the edit-translate-go cycle faster by using an interpreter. (For some nice examples of fast compilers see the aforementioned lcc as well as Ken Thompson's go compiler. For an example of a relatively slow interpreter see GHCi.

Well, isn't one of the strengths of these languages that they are so easily scriptable? They wouldn't be if they were compiled. And on the other hand, dynamic languages are easier to intereprete than to compile.

In a compiled language, the loop you get into when making software is
Make a change
Compile changes
Test changes
goto 1
Interpreted languages tend to be faster to make stuff in because you get to cut out step two of that process (and when you're dealing with a large system where compile times can be upwards of two minutes, step two can add a significant amount of time).
This isn't necessarily the reason python|ruby designers thought of, but keep in mind that "How efficiently does the machine run this?" is only half the software development problem.
It also seems like it would be easier to compile code in a language that's interpreted naturally than it would be to add an interpreter to a language that's compiled by default.

REPL. Don't knock it 'till you've tried it. :)

By design.
The authors wanted something where they can write scripts into.
Python gets compiled the first time it is executed though

Compiling Ruby at least is notoriously hard. I'm working on one, and as part of that I wrote a blog post enumerating some of the issues here.
Specifically, Ruby is suffering from a very unclear (i.e. non-existent) boundary between the "read" and "execute" phase of the program that makes it hard to compile efficiently. You could just emulate what the interpreter does, but then you're not going to see much speed up, so it wouldn't be worth the effort. If you want to compile it efficiently you then face a lot of additional complications to handle the extreme level of dynamism in Ruby.
The good news is that there are techniques for overcoming this. Self, Smalltalk and Lisp/Scheme's have dealt quite successfully with most of the same issues. But it takes time to sift through it and figure out how to make it work with Ruby. It also doesn't help that Ruby has a very convoluted grammar.

Raw compute performance is probably not a goal of most interpreted languages. Interpreted languages are typically more concerned about programmer productivity than raw speed. In most cases these languages are plenty fast enough for the tasks the languages were designed to tackle.
Given that, and that just about the only advantages of a compiler are type checking (difficult to do in a dynamic language) and speed, there's not much incentive to write compilers for most interpreted languages.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.