According to Wikipedia
Computer scientists consider a language "type-safe" if it does not allow operations or conversions that violate the rules of the type system.
Since Python runtime checks ensure that type system rules are satisfied, we should consider Python a type safe language.
The same point is made by Jason Orendorff and Jim Blandy in Programming Rust:
Note that being type safe is independent of whether a language checks types at compile time or at run time: C checks at compile time, and is not type safe; Python checks at runtime, and is type safe.
Both separate notion of static type checking and type safety.
Is that correct?
Many programmers will equate static type checking to type-safety:
"language A has static type checking and so it is type-safe"
"language B has dynamic type checking and so it is not type-safe"
Sadly, it's not that simple.
In the Real World
For example, C and C++ are not type-safe because you can undermine the type-system via Type punning.
Also, the C/C++ language specifications extensively allow undefined behaviour (UB) rather than explicitly handling errors and this has become the source of security exploits such as the stack smashing exploit and the format string attack. Such exploits shouldn't be possible in type-safe languages. Early versions of Java had a type bug with its Generics that proved it is was not completely type-safe.
Still today, for programming languages like Python, Java, C++, ... it's hard to show that these languages are completely type-safe because it requires a mathematical proof. These languages are massive and compilers/interpreters have bugs that are continually being reported and getting fixed.
[ Wikipedia ] Many languages, on the other hand, are too big for human-generated type safety proofs, as they often require checking thousands of cases. .... certain errors may occur at run-time due to bugs in the implementation, or in linked libraries written in other languages; such errors could render a given implementation type unsafe in certain circumstances.
In Academia
Type safety and type systems, while applicable to real-world programming have their roots and definitions coming from academia – and so a formal definition of what exactly is "type safety" comes with difficulty – especially when talking about real programming languages used in the real world. Academics like to mathematically (formally) define tiny programming languages called toy languages. Only for these languages is it possible to show formally that they are type-safe (and prove they the operations are logically correct).
[ Wikipedia ] Type safety is usually a requirement for any toy language proposed in academic programming language research
For example, academics struggled to prove Java is type-safe, so they created a smaller version called Featherweight Java and proved in a paper that it is type-safe. Similarly, this Ph.D. paper by Christopher Lyon Anderson took a subset of Javascript, called it JS0 and proved it was type-safe.
It's practically assumed proper languages like python, java, c++ are not completely type-safe because they are so large. It's so easy for a tiny bug to slip through the cracks that would undermine the type system.
Summary
No python is probably not completely type-safe – nobody has proved it, it's too hard to prove. You're more likely to find a tiny bug in the language that would demonstrate that it is not type-safe.
In fact, most programming languages are probably not completely type-safe - all for the same reasons (only toy academic ones have been proven to be)
You really shouldn't believe static-typed languages are necessarily type safe. They are usually safer than dynamically-typed languages, but to say that they are completely type-safe with certainty is wrong
as there's no proof for this.
References: http://www.pl-enthusiast.net/2014/08/05/type-safety/
and https://en.wikipedia.org/wiki/Type_system
Not in your wildest dreams.
#!/usr/bin/python
counter = 100 # An integer assignment
miles = 1000.0 # A floating point
name = "John" # A string
print counter
print miles
print name
counter = "Mary had a little lamb"
print counter
When you run that you see:
python p1.py
100
1000.0
John
Mary had a little lamb
You cannot consider any language "type safe" by any stretch of the imagination when it allows you to switch a variable's content from integer to string without any significant effort.
In the real world of professional software development what we mean by "type safe" is that the compiler will catch the stupid stuff. Yes, in C/C++ you can take extraordinary measures to circumvent type safety. You can declare something like this
union BAD_UNION
{
long number;
char str[4];
} data;
But the programmer has to go the extra mile to do that. We didn't have to go extra inches to butcher the counter variable in python.
A programmer can do nasty things with casting in C/C++ but they have to deliberately do it; not accidentally.
The one place that will really burn you is class casting. When you declare a function/method with a base class parameter then pass in the pointer to a derived class, you don't always get the methods and variables you want because the method/function expects the base type. If you overrode any of that in your derived class you have to account for it in the method/function.
In the real world a "type safe" language helps protect a programmer from accidentally doing stupid things. It also protects the human species from fatalities.
Consider an insulin or infusion pump. Something that pumps limited amounts of life saving/prolonging chemicals into the human body at a desired rate/interval.
Now consider what happens when there is a logic path that has the pump stepper control logic trying to interpret the string "insulin" as the integer amount to administer. The outcome will not be good. Most likely it will be fatal.
Because nobody has said it yet, it's also worth pointing out that Python is a strongly typed language, which should not be confused with dynamically typed. Python defers type checking until the last possible moment, and usually results in an exception being thrown. This explains the behavior Mureinik mentions. That having been said, Python also does automatic conversion often. Meaning that it will attempt to convert an int to a float for a arithmetic operation, for example.
You can enforce type safety in your programs manually by checking types of inputs. Because everything is an object, you can always create classes that derive from base classes, and use the isinstance function to verify the type (at runtime of course). Python 3 has added type hints, but this is not enforced. And mypy has added static type checking to the language if you care to use it, but this does not guarantee type safety.
The wikipedia article associates type-safe to memory-safe, meaning, that the same memory area cannot be accessed as e.g. integer and string. In this way Python is type-safe. You cannot change the type of a object implicitly.
In Python, you'll get a runtime error if you use a variable from the wrong type in the wrong context. E.g.:
>>> 'a' + 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
Since this check only happens in runtime, and not before you run the program, Python is not a typesafe language (PEP-484 notwithstanding).
in general, large scale / complex systems need type checking, first at compile type (static) and run time (dynamic). this is not academia, but rather a simple, common sense rule of thumb like "compiler is your friend". beyond runtime performance implication, there are other major implications, as following:
the 3 axis of scalability are:
build time (ability to design and manufacture safe systems in time and budget)
runtime (obvious)
maintain time (ability to maintain (fix bugs) and extend existing systems in a safe manner, generally by refactoring)
the only way to do safe refactoring is to have everything fully tested (use test driven development or at least unit testing and at least decent coverage testing as well, this is not qa, this is development/r&d). what is not covered, will break and systems like that are rather garbage than engineering artifacts.
now let's say that we have a simple function, sum, returning the sum of two numbers. one can imagine doing unit testing on this function, based on the fact that both parameters and returned type are known. we are not talking about function templates, which boil down to the trivial example. please write a simple unit test on the same function called sum where both parameters and return type can literally be of any kind, they can be integers, floats, strings and/or any other kind of user defined types having the plus operator overloaded/implemented. how do you write such a simple test case?!? how complex does the test case need to be in order to cover every possible scenario?
complexity means cost. without proper unit testing and test coverage, there no safe way to do any refactory, so the product is maintenance garbage, in not immediately visible, clearly in long term, because performing any refactoring in blind would be like driving a car without a driver license, drunk like a skunk and of course, without insurance.
go figure! :-)
assume that you have a function sum, taking two arguments
if arguments are un-typed (can can anything) then... well... that is unacceptable for any serious software engineer working on real life large systems
here is why:
naive answer would be "compiler is your friend". despite being about 65 years old, this is true and hey, this is not only about having static types! ide(s) use compiler services for a lot of things, which, for average joe programmer look like magic... (code completion, design time (editing) assistance, etc
o more realistic reason consists in something completely unknown to developers without a strong background in computer science and more, in software engineering. there are 3 axes of scalability: a. design/write and deploy, b. runtime & c. maintain time, based on refactoring. who do you think is the most expensive one? being clearly recurring on any real life serious system? the third one (c). in order to satisfy (c), you need to do it safe. in order to do any safe refactoring, you need to have unit testing AND coverage testing (so you can estimate the level of coverage your unit testing suite covers) - remember, when something is not automatically tested, it will break (at run time, late in the cycle, at customers site, you name it) - SO, in order to have a decent product, you need to have decent unit testing and test coverage
now, let's get to our intellectually challenging function (sum). is sum(a,b) does not specify the types of a and b, there is no way to do decent unit testing. tests like assent sum(1,1) is 2 IS A LIE, because it does not cover anything but assumed integer arguments. in real life, when a and b are type hermaphrodites, then there is no way to write real unit testing against function sum! various frameworks even pretend to derive test coverage results from crippled test cases as the one described above. that is (obvious) another LIE.
that's all i had to say! thanks for reading, the only reason i posted this is, perhaps, to make you think of this and, maybe (MAYBE..) one day to do software engineering...
We just had a big error in a piece of code. The error was because we had this:
if sys.errno:
my_favorite_files.append(sys.errno)
instead of this:
if args.errno:
my_favorite_files.append(sys.errno)
The aggressive casting anything to Boolean because it makes if statements easier is something that I would not expect to find in a language that is type-safe.
Related
Before coming to a concrete example, let me mention the problem. As a beginning Python programmer with extensive experience in C++, I'm always missing variable declarations. I could yield to the temptation of documenting the type of every nontrivial identifier, but I have a feeling that that would not be terribly pythonic. For one thing, it would be silly that neither the interpreter nor any tool parse these informal declarations. And if the interpreter did, that would be an entirely different language.
As an alternative to writing mere comments, I am contemplating switching to a mode of creating datatypes whose only purpose is to enforce types/interfaces. They would streamline the code and would make me detect type errors at earlier stages. For this convenience I would be paying a little loss in efficiency from the indirection.
For example, to avoid writing as a comment "Dictionary of Employee objects indexed by employeeID", I would write a wrapper class called "EmployeeDict", whose interface would limit the operations that can/cannot be performed.
Would such an idea fly in the long term? Does it defeat the spirit of Python in some way? Is it used by experienced Pythonistas?
For those conversant in C++, I would in other words be translating
typedef std::map<EmployeeId, Employee> MyMap;
into a type. (Though I am not actually porting any code across.)
Update
Even if it's unphythonic, as HumphreyTriscuit confirms, I am loath to write comments that get read by humans without also automating a little the type checking. It's nice that this issue is resolved in 3.5, but I'm stuck for the time being with 2.7, and so I'll mark jsbueno's answer correct until someone can suggest a way—à la "assert isinstance(param, dict)", but one that also concisely confirms the type of the key/value, somewhat paralleling C++—to solve this problem in 2.7.
Actually, as of Python 3.5, the language comes bundled with tools for parameter type annotations that is introspectable by third party tools- of which tehre might be some ut there already.
Anyway, take a look at https://www.python.org/dev/peps/pep-0484/
Even if you don't use any other tools - the way described on PEP 484 above is the "Pythonic way" of declaring types, that won't conflict with other 3rd party tools. So,if you want to write a tool chain of yours as you describe, you should start by creating using function annotations as described on that PEP.
That is good for documenting (and enfocing if the case be), parameters and return values. For class attributes, you can check this answer of mine, based on crafting a special __setitem__ method on a abse class of your hierarchy:
Force python class member variable to be specific type
As for local variables - there is no way to enforce/check their type but code comments.
And a last advise to keep you "on the Python way" remember to be permissive and check for interfaces, rather than specific classes.
I rather like Python's syntactic sugar; and standard library functions.
However the one feature which I dislike; is implicit typing.
Is there a distribution of Python with explicit typing; which is still compatible with e.g.: packages on PyPi?
[I was looking into RPython]
From python 3, the ability to use type annotation was introduced into the python standard with PEP 3017.
Fast-forward to python 3.5 and PEP 0484 builds on this to introduce type hinting along with the typing module which enables one to specify the types for a variable or the return type of a function.
from typing import Iterator
def fib(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a + b
Above example taken from https://pawelmhm.github.io
According to the 484 notes:
While these annotations are available at runtime through the usual
__annotations__ attribute, no type checking happens at runtime. Instead, the proposal assumes the existence of a separate off-line
type checker which users can run over their source code voluntarily.
Essentially, such a type checker acts as a very powerful linter.
(While it would of course be possible for individual users to employ a
similar checker at run time for Design By Contract enforcement or JIT
optimization, those tools are not yet as mature.)
tl;dr
Although python provides this form of "static typing", it is not enforced at run time and the python interpreter simply ignores any type specifications you have provided and will still use duck typing to infer types. Therefore, it is up to you to find a linter which will detect any issues with the types.
Furthermore
The motivation for including typing in the python standard was mostly influenced by mypy, so it might be worth checking them out. They also provide examples which may prove useful.
The short answer is no. What you are asking for is deeply built into Python, and can't be changed without changing the language so drastically that is wouldn't be Python.
I'm assuming you don't like variables that are re-typed when re-assigned to? You might consider other ways to check for this if this is a problem with your code.
No You can not have cake and eat cake.
Python is great because its dynamically typed! Period. (That's why it have such nice standard library too)
There is only 2 advantages of statically typed languages 1) speed - when algorithms are right to begin with and 2) compilation errors
As for 1)
Use PyPi,
Profile,
Use ctype libs for great performance.
Its typical to have only 10% or less code that is performance critical. All that other 90%? Enjoy advantages of dynamic typing.
As for 2)
Use Classes (And contracts)
Use Unit Testing
Use refactoring
Use good code editor
Its typical to have data NOT FITTING into standard data types, which are too strict or too loose in what they allow to be stored in them. Make sure that You validate Your data on Your own.
Unit Testing is must have for algorithm testing, which no compiler can do for You, and should catch any problems arising from wrong data types (and unlike compiler they are as fine grained as you need them to be)
Refactoring solves all those issues when You are not sure if given changes wont break Your code (and again, strongly typed data can not guarantee that either).
And good code editor can solve so many problems... Use Sublime Text for a while. And then You will know what I mean.
(To be sure, I do not give You answer You want to have. But rather I question Your needs, especially those that You did not included in Your question)
Now in 2021, there's a library called Deal that not only provides a robust static type checker, but also allows you to specify pre- and post-conditions, loop invariants, explicitly state expectations regarding exceptions and IO/side-effects, and even formally prove correctness of code (for an albeit small subset of Python).
Here's an example from their GitHub:
# the result is always non-negative
#deal.post(lambda result: result >= 0)
# the function has no side-effects
#deal.pure
def count(items: List[str], item: str) -> int:
return items.count(item)
# generate test function
test_count = deal.cases(count)
Now we can:
Run python3 -m deal lint or flake8 to statically check errors.
Run python3 -m deal test or pytest to generate and run tests.
Just use the function in the project and check errors in runtime.
Since comments are limited...
As an interpreted language Python is by definition weakly typed. This is not a bad thing more as a control in place for the programmer to preempt potential syntactical bugs, but in truth that won't stop logical bugs from happening any less, and thus makes the point moot.
Even though the paper on RPython makes it's point, it is focused on Object Oriented Programming. You must bear in mind that Python is more an amalgamation of OOP and Functional Programming, likely others too.
I encourage reading of this page, it is very informative.
I'm a long time Python developer and I really love the dynamic nature of the language, but I wonder if Python would benefit from optional static typing.
Would it be beneficial to be able to apply static typing to the API of a library, and what would the disadvantages of this be?
I quickly sketched up a decorator implementing runtime-static type checking on pastebin and it works like this:
# A TypeError will be thrown if the argument "string" is not a "str" and if
# the returned value is not an "int"
#typed(int, string = str)
def getStringLength(string):
return len(string)
Would it be practical to use a decorator like this on the API-functions of a library? In my point of view type checking is not needed in the internal workings of a domain specific module of a library, but on the connection points between the library and it's client a simple version of design by contract by applying type checking could be useful. Especially as a type of enforced documentation which clearly states to the client of the library what it expects and returns.
Like this example where addObjectToQueue() and isObjectProcessed() are exposed for use by the client and processTheQueueAndDoAdvancedStuff() is an internal library function. I think type checking could be useful on the outward facing functions but would only bloat and restrict the dynamicness and usefulness of python if used on the internal functions.
# some_library_module.py
#typed(int, name = string)
def addObjectToQueue(name):
return random.randint() # Some object id
def processTheQueueAndDoAdvancedStuff(arg_of_library_specific_type)
# Function body here
#typed(bool, object_id = int)
def isObjectProcessed(object_id):
return True
What would the disadvantages of using this technique be?
What would the disadvantages of my naive implementation on pastebin be?
I don't want answers discussing the conversion of Python to a statically typed language, but thoughts about API design-specific pros/cons. (please move this to programmers.stackexchange.com if you consider it not a question)
Personally, I don't find this idea attractive for Python. This is all just my opinion, of course, but for context I'll tell you that Python and Haskell are probably my two favourite programming languages - I like languages at both extreme ends of the static vs dynamic typing spectrum.
I see the main benefits of static typing as follows:
Increased likelihood that your code is correct once the compiler has accepted it; if I know I've threaded my values through all the operations I invoked in such a way that the result type of one always matches the input type of another, and the final result type is the one I wanted, it increases the probability that I've selected the correct operations. This point is of deeply arguable value, since it only really matters if you're not testing very much, which would be bad. But it is true that, when programming in Haskell, when I sit back and say "there, done!" I am actually done a lot of the time, whereas that's almost never true of my Python code.
The compiler automatically points out most of the places that need changing when I make an incompatible change to a data structure or interface (most of the time). Again, tests are still needed to actually be sure you've caught all the implications, but most of the time the compiler's nagging is actually sufficient, in my experience, which deeply simplifies such refactoring; you can go straight from implementing the core of the refactoring to testing that the program still works okay, because the actual work of making all the flow-on changes is almost mechanical.
Efficient implementation. The compiler gets to use all the knowledge it has about types to do optimisation.
Your suggested system doesn't really provide any of these benefits.
Having written a program making use of your library, I still don't know if it contains any type-incorrect uses of your functions until I do extensive testing with full code coverage to see if any execution path contains a bad call.
When I refactor something, I need to go through many many rounds of "run full test suite, look for exception, find where it came from, fix the code" to get anything at all like a static-typing compiler's problem detection.
Python will still be behaving as if those variables could be anything at any time.
And to get even that much, you've sacrificed the flexibility of Python duck-typing; it's not enough that I provide a sufficiently "list-like" object, I have to actually provide a list.
To me, this sort of static typing is the worst of both worlds. The main dynamic typing argument is "you have to test your code anyway, so you may as well use those tests to catch type errors and free yourself from having to work around the type system when it doesn't help you". That may or may not be a good argument with respect to a really good static type system, but it absolutely is a compelling argument with respect to a weak partial static type system that only detects type errors at runtime. I don't think nicer error messages (which is all it really buys you most of the time; a type error not caught at the interface is almost certainly going to throw an exception deeper in the call stack) is worth the loss of flexibility.
What are the technical reasons why languages like Python and Ruby are interpreted (out of the box) instead of compiled? It seems to me like it should not be too hard for people knowledgeable in this domain to make these languages not be interpreted like they are today, and we would see significant performance gains. So certainly I am missing something.
Several reasons:
faster development loop, write-test vs write-compile-link-test
easier to arrange for dynamic behavior (reflection, metaprogramming)
makes the whole system portable (just recompile the underlying C code and you are good to go on a new platform)
Think of what would happen if the system was not interpreted. Say you used translation-to-C as the mechanism. The compiled code would periodically have to check if it had been superseded by metaprogramming. A similar situation arises with eval()-type functions. In those cases, it would have to run the compiler again, an outrageously slow process, or it would have to also have the interpreter around at run-time anyway.
The only alternative here is a JIT compiler. These systems are highly complex and sophisticated and have even bigger run-time footprints than all the other alternatives. They start up very slowly, making them impractical for scripting. Ever seen a Java script? I haven't.
So, you have two choices:
all the disadvantages of both a compiler and an interpreter
just the disadvantages of an interpreter
It's not surprising that generally the primary implementation just goes with the second choice. It's quite possible that some day we may see secondary implementations like compilers appearing. Ruby 1.9 and Python have bytecode VM's; those are ½-way there. A compiler might target just non-dynamic code, or it might have various levels of language support declarable as options. But since such a thing can't be the primary implementation, it represents a lot of work for a very marginal benefit. Ruby already has 200,000 lines of C in it...
I suppose I should add that one can always add a compiled C (or, with some effort, any other language) extension. So, say you have a slow numerical operation. If you add, say Array#newOp with a C implementation then you get the speedup, the program stays in Ruby (or whatever) and your environment gets a new instance method. Everybody wins! So this reduces the need for a problematic secondary implementation.
Exactly like (in the typical implementation of) Java or C#, Python gets first compiled into some form of bytecode, depending on the implementation (CPython uses a specialized form of its own, Jython uses JVM just like a typical Java, IronPython uses CLR just like a typical C#, and so forth) -- that bytecode then gets further processed for execution by a virtual machine (AKA interpreter), which may also generate machine code "just in time" -- known as JIT -- if and when warranted (CLR and JVM implementations often do, CPython's own virtual machine typically doesn't but can be made to do so e.g. with psyco or Unladen Swallow).
JIT may pay for itself for sufficiently long-running programs (if memory's way cheaper than CPU cycles), but it may not (due to slower startup times and larger memory footprint), especially when the types also have to be inferred or specialized as part of the code generation. Generating machine code without type inference or specialization is easy if that's what you want, e.g. freeze does it for you, but it really doesn't present the advantages that "machine code fetishists" attribute to it. E.g., you get an executable binary of 1.5 to 2 MB in lieu of a tiny "hello world" .pyc -- not much point!-). That executable is stand-alone and distributable as such, but it will only work on a very specific narrow range of operating systems and CPU architectures, so the tradeoffs are quite iffy in most cases. And, the time it takes to prepare the executable is quite long indeed, so it would be a crazy choice to make that mode of operation the default one.
Merely replacing an interpreter with a compiler won't give you as big a performance boost as you might think for a language like Python. When most time is actually spend doing symbolic lookups of object members in dictionaries, it doesn't really matter if the call to the function performing such lookup is interpreted, or is native machine code - the difference, while not quite negligible, will be dwarfed by lookup overhead.
To really improve performance, you need optimizing compilers. And optimization techniques here are very different from what you have with C++, or even Java JIT - an optimizing compiler for a dynamically typed / duck typed language such as Python needs to do some very creative type inference (including probabilistic - i.e. "90% chance of it being T" and then generating efficient machine code for that case with a check/branch before it) and escape analysis. This is hard.
I think the biggest reason for the languages being interpreted is portability. As a programmer you can write code that will run in an interpreter not a specific OS. So your programs behave more uniformly across platforms (more so than compiled languages). Another advantage I can think of is it's easier to have a dynamic type system in an interpreted language. I think the creators of the language were thinking having a language where programmers can be more productive due to automatic memory management, dynamic type system and meta programming wins over any performance loss due to the language being interpreted. If you are concerned about performance you can always compile the language to native machine code employing a technique like JIT compilation.
Today, there is no longer a strong distinction between "compiled" and "interpreted" languages. Python is in fact compiled just as much as Java is, the only differences are:
The Python compiler is much faster than the Java compiler
Python automatically compiles source code as it is executed, there is no separate "compile" step required
Python bytecode is different from JVM bytecode
Python even has a function called compile() which is an interface to the compiler.
It sounds like the distinction you are making is between "dynamically typed" and "statically typed" languages. In dynamic languages such as Python, you can write code like:
def fn(x, y):
return x.foo(y)
Notice that the types of x and y are not specified. At runtime, this function will look at x to see whether it has a member function named foo, and if so will call it with y. If not, it will throw a runtime error that indicates no such function was found. This sort of runtime lookup is much easier to represent using an intermediate representation like bytecode, where a runtime VM does the lookup instead of having to generate machine code to do the lookup itself (or, call a function to do the lookup which is what the bytecode will do anyway).
Python has projects such as Psyco, PyPy, and Unladen Swallow that take various approaches to compiling Python object code into something closer to native code. There is active research in this area but there is not (as yet) a simple answer.
The effort required to create a good compiler to generate native code for a new language is staggering. Small research groups typically take 5 to 10 years (examples: SML/NJ, Haskell, Clean, Cecil, lcc, Objective Caml, MLton, and many others). And when the language in question requires type checking and other decisions to be made at run time, a compiler writer has to work much harder to get good native-code performance (for an excellent example, see work by Craig Chambers and later Urs Hoelzle on Self). The performance gains you might hope for are harder to realize than you might think. This phenomenon partly explains why so many dynamically typed languages are interpreted.
As noted, a decent interpreter is also instantly portable, while porting compilers to new machine architectures takes substantial effort (and is a problem I personally have been working on for over 20 years, with some time off for good behavior). So an interpreter is a way to reach a wide audience quickly.
Finally, although fast compilers and slow interpreters exist, it's usually easer to make the edit-translate-go cycle faster by using an interpreter. (For some nice examples of fast compilers see the aforementioned lcc as well as Ken Thompson's go compiler. For an example of a relatively slow interpreter see GHCi.
Well, isn't one of the strengths of these languages that they are so easily scriptable? They wouldn't be if they were compiled. And on the other hand, dynamic languages are easier to intereprete than to compile.
In a compiled language, the loop you get into when making software is
Make a change
Compile changes
Test changes
goto 1
Interpreted languages tend to be faster to make stuff in because you get to cut out step two of that process (and when you're dealing with a large system where compile times can be upwards of two minutes, step two can add a significant amount of time).
This isn't necessarily the reason python|ruby designers thought of, but keep in mind that "How efficiently does the machine run this?" is only half the software development problem.
It also seems like it would be easier to compile code in a language that's interpreted naturally than it would be to add an interpreter to a language that's compiled by default.
REPL. Don't knock it 'till you've tried it. :)
By design.
The authors wanted something where they can write scripts into.
Python gets compiled the first time it is executed though
Compiling Ruby at least is notoriously hard. I'm working on one, and as part of that I wrote a blog post enumerating some of the issues here.
Specifically, Ruby is suffering from a very unclear (i.e. non-existent) boundary between the "read" and "execute" phase of the program that makes it hard to compile efficiently. You could just emulate what the interpreter does, but then you're not going to see much speed up, so it wouldn't be worth the effort. If you want to compile it efficiently you then face a lot of additional complications to handle the extreme level of dynamism in Ruby.
The good news is that there are techniques for overcoming this. Self, Smalltalk and Lisp/Scheme's have dealt quite successfully with most of the same issues. But it takes time to sift through it and figure out how to make it work with Ruby. It also doesn't help that Ruby has a very convoluted grammar.
Raw compute performance is probably not a goal of most interpreted languages. Interpreted languages are typically more concerned about programmer productivity than raw speed. In most cases these languages are plenty fast enough for the tasks the languages were designed to tackle.
Given that, and that just about the only advantages of a compiler are type checking (difficult to do in a dynamic language) and speed, there's not much incentive to write compilers for most interpreted languages.
I would be interested to learn about large scale development in Python and especially in how do you maintain a large code base?
When you make incompatibility changes to the signature of a method, how do you find all the places where that method is being called. In C++/Java the compiler will find it for you, how do you do it in Python?
When you make changes deep inside the code, how do you find out what operations an instance provides, since you don't have a static type to lookup?
How do you handle/prevent typing errors (typos)?
Are UnitTest's used as a substitute for static type checking?
As you can guess I almost only worked with statically typed languages (C++/Java), but I would like to try my hands on Python for larger programs. But I had a very bad experience, a long time ago, with the clipper (dBase) language, which was also dynamically typed.
Don't use a screw driver as a hammer
Python is not a statically typed language, so don't try to use it that way.
When you use a specific tool, you use it for what it has been built. For Python, it means:
Duck typing : no type checking. Only behavior matters. Therefore your code must be designed to use this feature. A good design means generic signatures, no dependences between components, high abstraction levels.. So if you change anything, you won't have to change the rest of the code. Python will not complain either, that what it has been built for. Types are not an issue.
Huge standard library. You do not need to change all your calls in the program if you use standard features you haven't coded yourself. And Python come with batteries included. I keep discovering them everyday. I had no idea of the number of modules I could use when I started and tried to rewrite existing stuff like everybody. It's OK, you can't get it all right from the beginning.
You don't write Java, C++, Python, PHP, Erlang, whatever, the same way. They are good reasons why there is room for each of so many different languages, they do not do the same things.
Unit tests are not a substitute
Unit tests must be performed with any language. The most famous unit test library (JUnit) is from the Java world!
This has nothing to do with types. You check behaviors, again. You avoid trouble with regression. You ensure your customer you are on tracks.
Python for large scale projects
Languages, libraries and frameworks
don't scale. Architectures do.
If you design a solid architecture, if you are able to make it evolves quickly, then it will scale. Unit tests help, automatic code check as well. But they are just safety nets. And small ones.
Python is especially suitable for large projects because it enforces some good practices and has a lot of usual design patterns built-in. But again, do not use it for what it is not designed. E.g : Python is not a technology for CPU intensive tasks.
In a huge project, you will most likely use several different technologies anyway. As a SGBD (French for DBMS) and a templating language, or else. Python is no exception.
You will probably want to use C/C++ for the part of your code you need to be fast. Or Java to fit in a Tomcat environment. Don't know, don't care. Python can play well with these.
As a conclusion
My answer may feel a bit rude, but don't get me wrong: this is a very good question.
A lot of people come to Python with old habits. I screwed myself trying to code Java like Python. You can, but will never get the best of it.
If you have played / want to play with Python, it's great! It's a wonderful tool. But just a tool, really.
I had some experience with modifying "Frets On Fire", an open source python "Guitar Hero" clone.
as I see it, python is not really suitable for a really large scale project.
I found myself spending a large part of the development time debugging issues related to assignment of incompatible types, things that static typed laguages will reveal effortlessly at compile-time.
also, since types are determined on run-time, trying to understand existing code becomes harder, because you have no idea what's the type of that parameter you are currently looking at.
in addition to that, calling functions using their name string with the __getattr__ built in function is generally more common in Python than in other programming languages, thus getting the call graph to a certain function somewhat hard (although you can call functions with their name in some statically typed languages as well).
I think that Python really shines in small scale software, rapid prototype development, and gluing existing programs together, but I would not use it for large scale software projects, since in those types of programs maintainability becomes the real issue, and in my opinion python is relatively weak there.
Since nobody pointed out pychecker, pylint and similar tools, I will: pychecker and pylint are tools that can help you find incorrect assumptions (about function signatures, object attributes, etc.) They won't find everything that a compiler might find in a statically typed language -- but they can find problems that such compilers for such languages can't find, too.
Python (and any dynamically typed language) is fundamentally different in terms of the errors you're likely to cause and how you would detect and fix them. It has definite downsides as well as upsides, but many (including me) would argue that in Python's case, the ease of writing code (and the ease of making it structurally sound) and of modifying code without breaking API compatibility (adding new optional arguments, providing different objects that have the same set of methods and attributes) make it suitable just fine for large codebases.
my 0.10 EUR:
i have several python application in 'production'-state. our company use java, c++ and python. we develop with the eclipse ide (pydev for python)
unittests are the key-solution for the problem. (also for c++ and java)
the less secure world of "dynamic-typing" will make you less careless about your code quality
BY THE WAY:
large scale development doesn't mean, that you use one single language!
large scale development often uses a handful of languages specific to the problem.
so i agree to the-hammer-problem :-)
PS: static-typing & python
Here are some items that have helped me maintain a fairly large system in python.
Structure your code in layers. i.e separate biz logic, presentation logic and your persistence layers. Invest a bit of time in defining these layers and make sure everyone on the project is brought in. For large systems creating a framework that forces you into a certain way of development can be key as well.
Tests are key, without unit tests you will likely end up with an unmanagable code base several times quicker than with other languages. Keep in mind that unit tests are often not sufficient, make sure to have several integration/acceptance tests you can run quickly after any major change.
Use Fail Fast principle. Add assertions for cases you feel your code maybe vulnerable.
Have standard logging/error handling that will help you quickly navigate to the issue
Use an IDE( pyDev works for me) that provides type ahead, pyLint/Checker integration to help you detect common typos right away and promote some coding standards
Carefull about your imports, never do from x import * or do relative imports without use of .
Do refactor, a search/replace tool with regular expressions is often all you need to do move methods/class type refactoring.
Incompatible changes to the signature of a method. This doesn't happen as much in Python as it does in Java and C++.
Python has optional arguments, default values, and far more flexibility in defining method signatures. Also, duck typing means that -- for example -- you don't have to switch from some class to an interface as part of a significant software change. Things just aren't as complex.
How do you find all the places where that method is being called? grep works for dynamic languages. If you need to know every place a method is used, grep (or equivalent IDE-supported search) works great.
How do you find out what operations an instance provides, since you don't have a static type to lookup?
a. Look at the source. You don't have the Java/C++ problem of object libraries and jar files to contend with. You don't need all the elaborate aids and tools that those languages require.
b. An IDE can provide signature information under many common circumstances. You can, easily, defeat your IDE's reasoning powers. When that happens, you should probably review what you're doing to be sure it makes sense. If your IDE can't reason out your type information, perhaps it's too dynamic.
c. In Python, you often work through the interactive interpreter. Unlike Java and C++, you can explore your instances directly and interactively. You don't need a sophisticated IDE.
Example:
>>> x= SomeClass()
>>> dir(x)
How do you handle/prevent typing errors? Same as static languages: you don't prevent them. You find and correct them. Java can only find a certain class of typos. If you have two similar class or variable names, you can wind up in deep trouble, even with static type checking.
Example:
class MyClass { }
class MyClassx extends MyClass { }
A typo with these two class names can cause havoc. ["But I wouldn't put myself in that position with Java," folks say. Agreed. I wouldn't put myself in that position with Python, either; you make classes that are profoundly different, and will fail early if they're misused.]
Are UnitTest's used as a substitute for static type checking? Here's the other Point of view: static type checking is a substitute for clear, simple design.
I've worked with programmers who weren't sure why an application worked. They couldn't figure out why things didn't compile; the didn't know the difference between abstract superclass and interface, and the couldn't figure out why a change in place makes a bunch of other modules in a separate JAR file crash. The static type checking gave them false confidence in a flawed design.
Dynamic languages allow programs to be simple. Simplicity is a substitute for static type checking. Clarity is a substitute for static type checking.
My general rule of thumb is to use dynamic languages for small non-mission-critical projects and statically-typed languages for big projects. I find that code written in a dynamic language such as python gets "tangled" more quickly. Partly that is because it is much quicker to write code in a dynamic language and that leads to shortcuts and worse design, at least in my case. Partly it's because I have IntelliJ for quick and easy refactoring when I use Java, which I don't have for python.
The usual answer to that is testing testing testing. You're supposed to have an extensive unit test suite and run it often, particularly before a new version goes online.
Proponents of dynamically typed languages make the case that you have to test anyway because even in a statically typed language conformance to the crude rules of the type system covers only a small part of what can potentially go wrong.