Can Python be made statically typed? - python

I know that Python is mainly slower than languages like fortran and c/c++ because it is interpreted rather than compiled.
Another reason I have also read about is that it is quite slow because it is dynamically typed, i.e. you don't have to declare variable types and it does that automatically. This is very nice because it makes the code look much cleaner and you basically don't have to worry too much about variable types.
I know that there won't be a very good reason to do this as you can just wrap eg. fortran code with Python, but is it possible to manually override this dynamicly typed nature of Python and declare all variable types manually, and thus increasing Python's speed?

If I interpret your question as "Is there a statically-typed mode for Python?", then Cython probably comes closest to offering that functionality.
Cython is a superset of Python syntax - almost any valid Python code is also valid Cython code. The Cython compiler translates the quasi-Python source code to not-for-human-eyes C, which can then be compiled into a shared object and loaded as a Python module.
You can basically take your Python code and add as many or as few static type declarations as you like. Wherever types are undeclared, Cython will add in the necessary boilerplate to correctly infer them, at the cost of worse runtime performance. This essentially allows you to choose a point in the continuum between totally dynamically typed Python code and totally statically typed C code, depending on how much runtime performance you need and how much time you are prepared to spend optimizing. It also allows you to call C functions directly, making it a very convenient way to write Python bindings for external libraries.
To get a better idea of how this works in practice, take a look at the official tutorial.

Just to be clear your question is just about as odd as asking if you can turn C into a dynamically typed language. If you want to redefine the language, then sure, you can do whatever you like. I don't think we'd call such a language "Python" anymore though.
If you're looking for speed up based on a dynamic static typing (static typing that is dynamically found) implementation of the language take a look at pypy. It's also quite fast if that's what you're looking for. Related to pypy is RPython which sort of does what you want.
Also mentioned previously is Cython which sort of does what you want.

Related

How to view the implementation of python's built-in functions in pycharm?

When I try to view the built-in function all() in PyCharm, I could just see "pass" in the function body. How to view the actual implementation so that I could know what exactly the built-in function is doing?
def all(*args, **kwargs): # real signature unknown
"""
Return True if bool(x) is True for all values x in the iterable.
If the iterable is empty, return True.
"""
pass
Assuming you’re using the usual CPython interpreter, all is a builtin function object, which just has a pointer to a compiled function statically linked into the interpreter (or libpython). Showing you the x86_64 machine code at that address probably wouldn’t be very useful to the vast majority of people.
Try running your code in PyPy instead of CPython. Many things that are builtins in CPython are plain old Python code in PyPy.1 Of course that isn’t always an option (e.g., PyPy doesn’t support 3.7 features yet, there are a few third-party extension modules that are still way too slow to use, it’s harder to build yourself if you’re on some uncommon platform…), so let’s go back to CPython.
The actual C source for that function isn’t too hard to find online. It’s in bltinmodule.c. But, unlike the source code to the Python modules in your standard library, you probably don’t have these files around. Even if you do have them, the only way to connect the binary to the source is through debugging output emitted when you compiled CPython from that source, which you probably didn’t do. But if you’re thinking that sounds like a great idea—it is. Build CPython yourself (you may want a Py_DEBUG build), and then you can just run it in your C source debugger/IDE and it can handle all the clumsy bits.
But if that sounds more scary than helpful, even though you can read basic C code and would like to find it…
How did I know where to find that code on GitHub? Well, I know where the repo is; I know the basic organization of the source into Python, Objects, Modules, etc.; I know how module names usually map to C source file names; I know that builtins is special in a few ways…
That’s all pretty simple stuff. Couldn’t you just program all that knowledge into a script, which you could then build a PyCharm plugin out of?
You can do the first 50% or so in a quick evening hack, and such things litter the shores of GitHub. But actually doing it right requires handling a ton of special cases, parsing some ugly C code, etc. And for anyone capable of writing such a thing, it’s easier to just lldb Python than to write it.
1. Also, even the things that are builtins are written in a mix of Python and a Python subset called RPython, which you might find easier to understand than C—then again, it’s often even harder to find that source, and the multiple levels that all look like Python can be hard to keep straight.

Explicitly typed version of Python?

I rather like Python's syntactic sugar; and standard library functions.
However the one feature which I dislike; is implicit typing.
Is there a distribution of Python with explicit typing; which is still compatible with e.g.: packages on PyPi?
[I was looking into RPython]
From python 3, the ability to use type annotation was introduced into the python standard with PEP 3017.
Fast-forward to python 3.5 and PEP 0484 builds on this to introduce type hinting along with the typing module which enables one to specify the types for a variable or the return type of a function.
from typing import Iterator
def fib(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a + b
Above example taken from https://pawelmhm.github.io
According to the 484 notes:
While these annotations are available at runtime through the usual
__annotations__ attribute, no type checking happens at runtime. Instead, the proposal assumes the existence of a separate off-line
type checker which users can run over their source code voluntarily.
Essentially, such a type checker acts as a very powerful linter.
(While it would of course be possible for individual users to employ a
similar checker at run time for Design By Contract enforcement or JIT
optimization, those tools are not yet as mature.)
tl;dr
Although python provides this form of "static typing", it is not enforced at run time and the python interpreter simply ignores any type specifications you have provided and will still use duck typing to infer types. Therefore, it is up to you to find a linter which will detect any issues with the types.
Furthermore
The motivation for including typing in the python standard was mostly influenced by mypy, so it might be worth checking them out. They also provide examples which may prove useful.
The short answer is no. What you are asking for is deeply built into Python, and can't be changed without changing the language so drastically that is wouldn't be Python.
I'm assuming you don't like variables that are re-typed when re-assigned to? You might consider other ways to check for this if this is a problem with your code.
No You can not have cake and eat cake.
Python is great because its dynamically typed! Period. (That's why it have such nice standard library too)
There is only 2 advantages of statically typed languages 1) speed - when algorithms are right to begin with and 2) compilation errors
As for 1)
Use PyPi,
Profile,
Use ctype libs for great performance.
Its typical to have only 10% or less code that is performance critical. All that other 90%? Enjoy advantages of dynamic typing.
As for 2)
Use Classes (And contracts)
Use Unit Testing
Use refactoring
Use good code editor
Its typical to have data NOT FITTING into standard data types, which are too strict or too loose in what they allow to be stored in them. Make sure that You validate Your data on Your own.
Unit Testing is must have for algorithm testing, which no compiler can do for You, and should catch any problems arising from wrong data types (and unlike compiler they are as fine grained as you need them to be)
Refactoring solves all those issues when You are not sure if given changes wont break Your code (and again, strongly typed data can not guarantee that either).
And good code editor can solve so many problems... Use Sublime Text for a while. And then You will know what I mean.
(To be sure, I do not give You answer You want to have. But rather I question Your needs, especially those that You did not included in Your question)
Now in 2021, there's a library called Deal that not only provides a robust static type checker, but also allows you to specify pre- and post-conditions, loop invariants, explicitly state expectations regarding exceptions and IO/side-effects, and even formally prove correctness of code (for an albeit small subset of Python).
Here's an example from their GitHub:
# the result is always non-negative
#deal.post(lambda result: result >= 0)
# the function has no side-effects
#deal.pure
def count(items: List[str], item: str) -> int:
return items.count(item)
# generate test function
test_count = deal.cases(count)
Now we can:
Run python3 -m deal lint or flake8 to statically check errors.
Run python3 -m deal test or pytest to generate and run tests.
Just use the function in the project and check errors in runtime.
Since comments are limited...
As an interpreted language Python is by definition weakly typed. This is not a bad thing more as a control in place for the programmer to preempt potential syntactical bugs, but in truth that won't stop logical bugs from happening any less, and thus makes the point moot.
Even though the paper on RPython makes it's point, it is focused on Object Oriented Programming. You must bear in mind that Python is more an amalgamation of OOP and Functional Programming, likely others too.
I encourage reading of this page, it is very informative.

Are there advantages to use the Python/C interface instead of Cython?

I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.
The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.
First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython
The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.

Wrapping a C library in Python: C, Cython or ctypes?

I want to call a C library from a Python application. I don't want to wrap the whole API, only the functions and datatypes that are relevant to my case. As I see it, I have three choices:
Create an actual extension module in C. Probably overkill, and I'd also like to avoid the overhead of learning extension writing.
Use Cython to expose the relevant parts from the C library to Python.
Do the whole thing in Python, using ctypes to communicate with the external library.
I'm not sure whether 2) or 3) is the better choice. The advantage of 3) is that ctypes is part of the standard library, and the resulting code would be pure Python – although I'm not sure how big that advantage actually is.
Are there more advantages / disadvantages with either choice? Which approach do you recommend?
Edit: Thanks for all your answers, they provide a good resource for anyone looking to do something similar. The decision, of course, is still to be made for the single case—there's no one "This is the right thing" sort of answer. For my own case, I'll probably go with ctypes, but I'm also looking forward to trying out Cython in some other project.
With there being no single true answer, accepting one is somewhat arbitrary; I chose FogleBird's answer as it provides some good insight into ctypes and it currently also is the highest-voted answer. However, I suggest to read all the answers to get a good overview.
Thanks again.
Warning: a Cython core developer's opinion ahead.
I almost always recommend Cython over ctypes. The reason is that it has a much smoother upgrade path. If you use ctypes, many things will be simple at first, and it's certainly cool to write your FFI code in plain Python, without compilation, build dependencies and all that. However, at some point, you will almost certainly find that you have to call into your C library a lot, either in a loop or in a longer series of interdependent calls, and you would like to speed that up. That's the point where you'll notice that you can't do that with ctypes. Or, when you need callback functions and you find that your Python callback code becomes a bottleneck, you'd like to speed it up and/or move it down into C as well. Again, you cannot do that with ctypes. So you have to switch languages at that point and start rewriting parts of your code, potentially reverse engineering your Python/ctypes code into plain C, thus spoiling the whole benefit of writing your code in plain Python in the first place.
With Cython, OTOH, you're completely free to make the wrapping and calling code as thin or thick as you want. You can start with simple calls into your C code from regular Python code, and Cython will translate them into native C calls, without any additional calling overhead, and with an extremely low conversion overhead for Python parameters. When you notice that you need even more performance at some point where you are making too many expensive calls into your C library, you can start annotating your surrounding Python code with static types and let Cython optimise it straight down into C for you. Or, you can start rewriting parts of your C code in Cython in order to avoid calls and to specialise and tighten your loops algorithmically. And if you need a fast callback, just write a function with the appropriate signature and pass it into the C callback registry directly. Again, no overhead, and it gives you plain C calling performance. And in the much less likely case that you really cannot get your code fast enough in Cython, you can still consider rewriting the truly critical parts of it in C (or C++ or Fortran) and call it from your Cython code naturally and natively. But then, this really becomes the last resort instead of the only option.
So, ctypes is nice to do simple things and to quickly get something running. However, as soon as things start to grow, you'll most likely come to the point where you notice that you'd better used Cython right from the start.
ctypes is your best bet for getting it done quickly, and it's a pleasure to work with as you're still writing Python!
I recently wrapped an FTDI driver for communicating with a USB chip using ctypes and it was great. I had it all done and working in less than one work day. (I only implemented the functions we needed, about 15 functions).
We were previously using a third-party module, PyUSB, for the same purpose. PyUSB is an actual C/Python extension module. But PyUSB wasn't releasing the GIL when doing blocking reads/writes, which was causing problems for us. So I wrote our own module using ctypes, which does release the GIL when calling the native functions.
One thing to note is that ctypes won't know about #define constants and stuff in the library you're using, only the functions, so you'll have to redefine those constants in your own code.
Here's an example of how the code ended up looking (lots snipped out, just trying to show you the gist of it):
from ctypes import *
d2xx = WinDLL('ftd2xx')
OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3
...
def openEx(serial):
serial = create_string_buffer(serial)
handle = c_int()
if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
return Handle(handle.value)
raise D2XXException
class Handle(object):
def __init__(self, handle):
self.handle = handle
...
def read(self, bytes):
buffer = create_string_buffer(bytes)
count = c_int()
if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
return buffer.raw[:count.value]
raise D2XXException
def write(self, data):
buffer = create_string_buffer(data)
count = c_int()
bytes = len(data)
if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
return count.value
raise D2XXException
Someone did some benchmarks on the various options.
I might be more hesitant if I had to wrap a C++ library with lots of classes/templates/etc. But ctypes works well with structs and can even callback into Python.
Cython is a pretty cool tool in itself, well worth learning, and is surprisingly close to the Python syntax. If you do any scientific computing with Numpy, then Cython is the way to go because it integrates with Numpy for fast matrix operations.
Cython is a superset of Python language. You can throw any valid Python file at it, and it will spit out a valid C program. In this case, Cython will just map the Python calls to the underlying CPython API. This results in perhaps a 50% speedup because your code is no longer interpreted.
To get some optimizations, you have to start telling Cython additional facts about your code, such as type declarations. If you tell it enough, it can boil the code down to pure C. That is, a for loop in Python becomes a for loop in C. Here you will see massive speed gains. You can also link to external C programs here.
Using Cython code is also incredibly easy. I thought the manual makes it sound difficult. You literally just do:
$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so
and then you can import mymodule in your Python code and forget entirely that it compiles down to C.
In any case, because Cython is so easy to setup and start using, I suggest trying it to see if it suits your needs. It won't be a waste if it turns out not to be the tool you're looking for.
For calling a C library from a Python application there is also cffi which is a new alternative for ctypes. It brings a fresh look for FFI:
it handles the problem in a fascinating, clean way (as opposed to ctypes)
it doesn't require to write non Python code (as in SWIG, Cython, ...)
I'll throw another one out there: SWIG
It's easy to learn, does a lot of things right, and supports many more languages so the time spent learning it can be pretty useful.
If you use SWIG, you are creating a new python extension module, but with SWIG doing most of the heavy lifting for you.
Personally, I'd write an extension module in C. Don't be intimidated by Python C extensions -- they're not hard at all to write. The documentation is very clear and helpful. When I first wrote a C extension in Python, I think it took me about an hour to figure out how to write one -- not much time at all.
If you have already a library with a defined API, I think ctypes is the best option, as you only have to do a little initialization and then more or less call the library the way you're used to.
I think Cython or creating an extension module in C (which is not very difficult) are more useful when you need new code, e.g. calling that library and do some complex, time-consuming tasks, and then passing the result to Python.
Another approach, for simple programs, is directly do a different process (compiled externally), outputting the result to standard output and call it with subprocess module. Sometimes it's the easiest approach.
For example, if you make a console C program that works more or less that way
$miCcode 10
Result: 12345678
You could call it from Python
>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678
With a little string formating, you can take the result in any way you want. You can also capture the standard error output, so it's quite flexible.
ctypes is great when you've already got a compiled library blob to deal with (such as OS libraries). The calling overhead is severe, however, so if you'll be making a lot of calls into the library, and you're going to be writing the C code anyway (or at least compiling it), I'd say to go for cython. It's not much more work, and it'll be much faster and more pythonic to use the resulting pyd file.
I personally tend to use cython for quick speedups of python code (loops and integer comparisons are two areas where cython particularly shines), and when there is some more involved code/wrapping of other libraries involved, I'll turn to Boost.Python. Boost.Python can be finicky to set up, but once you've got it working, it makes wrapping C/C++ code straightforward.
cython is also great at wrapping numpy (which I learned from the SciPy 2009 proceedings), but I haven't used numpy, so I can't comment on that.
I know this is an old question but this thing comes up on google when you search stuff like ctypes vs cython, and most of the answers here are written by those who are proficient already in cython or c which might not reflect the actual time you needed to invest to learn those to implement your solution. I am a complete beginner in both. I have never touched cython before, and have very little experience on c/c++.
For the last two days, I was looking for a way to delegate a performance heavy part of my code to something more low level than python. I implemented my code both in ctypes and Cython, which consisted basically of two simple functions.
I had a huge string list that needed to processed. Notice list and string.
Both types do not correspond perfectly to types in c, because python strings are by default unicode and c strings are not. Lists in python are simply NOT arrays of c.
Here is my verdict. Use cython. It integrates more fluently to python, and easier to work with in general. When something goes wrong ctypes just throws you segfault, at least cython will give you compile warnings with a stack trace whenever it is possible, and you can return a valid python object easily with cython.
Here is a detailed account on how much time I needed to invest in both them to implement the same function. I did very little C/C++ programming by the way:
Ctypes:
About 2h on researching how to transform my list of unicode strings to a c compatible type.
About an hour on how to return a string properly from a c function. Here I actually provided my own solution to SO once I have written the functions.
About half an hour to write the code in c, compile it to a dynamic library.
10 minutes to write a test code in python to check if c code works.
About an hour of doing some tests and rearranging the c code.
Then I plugged the c code into actual code base, and saw that ctypes does not play well with multiprocessing module as its handler is not pickable by default.
About 20 minutes I rearranged my code to not use multiprocessing module, and retried.
Then second function in my c code generated segfaults in my code base although it passed my testing code. Well, this is probably my fault for not checking well with edge cases, I was looking for a quick solution.
For about 40 minutes I tried to determine possible causes of these segfaults.
I split my functions into two libraries and tried again. Still had segfaults for my second function.
I decided to let go of the second function and use only the first function of c code and at the second or third iteration of the python loop that uses it, I had a UnicodeError about not decoding a byte at the some position though I encoded and decoded everthing explicitely.
At this point, I decided to search for an alternative and decided to look into cython:
Cython
10 min of reading cython hello world.
15 min of checking SO on how to use cython with setuptools instead of distutils.
10 min of reading on cython types and python types. I learnt I can use most of the builtin python types for static typing.
15 min of reannotating my python code with cython types.
10 min of modifying my setup.py to use compiled module in my codebase.
Plugged in the module directly to the multiprocessing version of codebase. It works.
For the record, I of course, did not measure the exact timings of my investment. It may very well be the case that my perception of time was a little to attentive due too mental effort required while I was dealing with ctypes. But it should convey the feel of dealing with cython and ctypes
There is one issue which made me use ctypes and not cython and which is not mentioned in other answers.
Using ctypes the result does not depend on compiler you are using at all. You may write a library using more or less any language which may be compiled to native shared library. It does not matter much, which system, which language and which compiler. Cython, however, is limited by the infrastructure. E.g, if you want to use intel compiler on windows, it is much more tricky to make cython work: you should "explain" compiler to cython, recompile something with this exact compiler, etc. Which significantly limits portability.
If you are targeting Windows and choose to wrap some proprietary C++ libraries, then you may soon discover that different versions of msvcrt***.dll (Visual C++ Runtime) are slightly incompatible.
This means that you may not be able to use Cython since resulting wrapper.pyd is linked against msvcr90.dll (Python 2.7) or msvcr100.dll (Python 3.x). If the library that you are wrapping is linked against different version of runtime, then you're out of luck.
Then to make things work you'll need to create C wrappers for C++ libraries, link that wrapper dll against the same version of msvcrt***.dll as your C++ library. And then use ctypes to load your hand-rolled wrapper dll dynamically at the runtime.
So there are lots of small details, which are described in great detail in following article:
"Beautiful Native Libraries (in Python)": http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/
There's also one possibility to use GObject Introspection for libraries that are using GLib.

How can I use Python for large scale development?

I would be interested to learn about large scale development in Python and especially in how do you maintain a large code base?
When you make incompatibility changes to the signature of a method, how do you find all the places where that method is being called. In C++/Java the compiler will find it for you, how do you do it in Python?
When you make changes deep inside the code, how do you find out what operations an instance provides, since you don't have a static type to lookup?
How do you handle/prevent typing errors (typos)?
Are UnitTest's used as a substitute for static type checking?
As you can guess I almost only worked with statically typed languages (C++/Java), but I would like to try my hands on Python for larger programs. But I had a very bad experience, a long time ago, with the clipper (dBase) language, which was also dynamically typed.
Don't use a screw driver as a hammer
Python is not a statically typed language, so don't try to use it that way.
When you use a specific tool, you use it for what it has been built. For Python, it means:
Duck typing : no type checking. Only behavior matters. Therefore your code must be designed to use this feature. A good design means generic signatures, no dependences between components, high abstraction levels.. So if you change anything, you won't have to change the rest of the code. Python will not complain either, that what it has been built for. Types are not an issue.
Huge standard library. You do not need to change all your calls in the program if you use standard features you haven't coded yourself. And Python come with batteries included. I keep discovering them everyday. I had no idea of the number of modules I could use when I started and tried to rewrite existing stuff like everybody. It's OK, you can't get it all right from the beginning.
You don't write Java, C++, Python, PHP, Erlang, whatever, the same way. They are good reasons why there is room for each of so many different languages, they do not do the same things.
Unit tests are not a substitute
Unit tests must be performed with any language. The most famous unit test library (JUnit) is from the Java world!
This has nothing to do with types. You check behaviors, again. You avoid trouble with regression. You ensure your customer you are on tracks.
Python for large scale projects
Languages, libraries and frameworks
don't scale. Architectures do.
If you design a solid architecture, if you are able to make it evolves quickly, then it will scale. Unit tests help, automatic code check as well. But they are just safety nets. And small ones.
Python is especially suitable for large projects because it enforces some good practices and has a lot of usual design patterns built-in. But again, do not use it for what it is not designed. E.g : Python is not a technology for CPU intensive tasks.
In a huge project, you will most likely use several different technologies anyway. As a SGBD (French for DBMS) and a templating language, or else. Python is no exception.
You will probably want to use C/C++ for the part of your code you need to be fast. Or Java to fit in a Tomcat environment. Don't know, don't care. Python can play well with these.
As a conclusion
My answer may feel a bit rude, but don't get me wrong: this is a very good question.
A lot of people come to Python with old habits. I screwed myself trying to code Java like Python. You can, but will never get the best of it.
If you have played / want to play with Python, it's great! It's a wonderful tool. But just a tool, really.
I had some experience with modifying "Frets On Fire", an open source python "Guitar Hero" clone.
as I see it, python is not really suitable for a really large scale project.
I found myself spending a large part of the development time debugging issues related to assignment of incompatible types, things that static typed laguages will reveal effortlessly at compile-time.
also, since types are determined on run-time, trying to understand existing code becomes harder, because you have no idea what's the type of that parameter you are currently looking at.
in addition to that, calling functions using their name string with the __getattr__ built in function is generally more common in Python than in other programming languages, thus getting the call graph to a certain function somewhat hard (although you can call functions with their name in some statically typed languages as well).
I think that Python really shines in small scale software, rapid prototype development, and gluing existing programs together, but I would not use it for large scale software projects, since in those types of programs maintainability becomes the real issue, and in my opinion python is relatively weak there.
Since nobody pointed out pychecker, pylint and similar tools, I will: pychecker and pylint are tools that can help you find incorrect assumptions (about function signatures, object attributes, etc.) They won't find everything that a compiler might find in a statically typed language -- but they can find problems that such compilers for such languages can't find, too.
Python (and any dynamically typed language) is fundamentally different in terms of the errors you're likely to cause and how you would detect and fix them. It has definite downsides as well as upsides, but many (including me) would argue that in Python's case, the ease of writing code (and the ease of making it structurally sound) and of modifying code without breaking API compatibility (adding new optional arguments, providing different objects that have the same set of methods and attributes) make it suitable just fine for large codebases.
my 0.10 EUR:
i have several python application in 'production'-state. our company use java, c++ and python. we develop with the eclipse ide (pydev for python)
unittests are the key-solution for the problem. (also for c++ and java)
the less secure world of "dynamic-typing" will make you less careless about your code quality
BY THE WAY:
large scale development doesn't mean, that you use one single language!
large scale development often uses a handful of languages specific to the problem.
so i agree to the-hammer-problem :-)
PS: static-typing & python
Here are some items that have helped me maintain a fairly large system in python.
Structure your code in layers. i.e separate biz logic, presentation logic and your persistence layers. Invest a bit of time in defining these layers and make sure everyone on the project is brought in. For large systems creating a framework that forces you into a certain way of development can be key as well.
Tests are key, without unit tests you will likely end up with an unmanagable code base several times quicker than with other languages. Keep in mind that unit tests are often not sufficient, make sure to have several integration/acceptance tests you can run quickly after any major change.
Use Fail Fast principle. Add assertions for cases you feel your code maybe vulnerable.
Have standard logging/error handling that will help you quickly navigate to the issue
Use an IDE( pyDev works for me) that provides type ahead, pyLint/Checker integration to help you detect common typos right away and promote some coding standards
Carefull about your imports, never do from x import * or do relative imports without use of .
Do refactor, a search/replace tool with regular expressions is often all you need to do move methods/class type refactoring.
Incompatible changes to the signature of a method. This doesn't happen as much in Python as it does in Java and C++.
Python has optional arguments, default values, and far more flexibility in defining method signatures. Also, duck typing means that -- for example -- you don't have to switch from some class to an interface as part of a significant software change. Things just aren't as complex.
How do you find all the places where that method is being called? grep works for dynamic languages. If you need to know every place a method is used, grep (or equivalent IDE-supported search) works great.
How do you find out what operations an instance provides, since you don't have a static type to lookup?
a. Look at the source. You don't have the Java/C++ problem of object libraries and jar files to contend with. You don't need all the elaborate aids and tools that those languages require.
b. An IDE can provide signature information under many common circumstances. You can, easily, defeat your IDE's reasoning powers. When that happens, you should probably review what you're doing to be sure it makes sense. If your IDE can't reason out your type information, perhaps it's too dynamic.
c. In Python, you often work through the interactive interpreter. Unlike Java and C++, you can explore your instances directly and interactively. You don't need a sophisticated IDE.
Example:
>>> x= SomeClass()
>>> dir(x)
How do you handle/prevent typing errors? Same as static languages: you don't prevent them. You find and correct them. Java can only find a certain class of typos. If you have two similar class or variable names, you can wind up in deep trouble, even with static type checking.
Example:
class MyClass { }
class MyClassx extends MyClass { }
A typo with these two class names can cause havoc. ["But I wouldn't put myself in that position with Java," folks say. Agreed. I wouldn't put myself in that position with Python, either; you make classes that are profoundly different, and will fail early if they're misused.]
Are UnitTest's used as a substitute for static type checking? Here's the other Point of view: static type checking is a substitute for clear, simple design.
I've worked with programmers who weren't sure why an application worked. They couldn't figure out why things didn't compile; the didn't know the difference between abstract superclass and interface, and the couldn't figure out why a change in place makes a bunch of other modules in a separate JAR file crash. The static type checking gave them false confidence in a flawed design.
Dynamic languages allow programs to be simple. Simplicity is a substitute for static type checking. Clarity is a substitute for static type checking.
My general rule of thumb is to use dynamic languages for small non-mission-critical projects and statically-typed languages for big projects. I find that code written in a dynamic language such as python gets "tangled" more quickly. Partly that is because it is much quicker to write code in a dynamic language and that leads to shortcuts and worse design, at least in my case. Partly it's because I have IntelliJ for quick and easy refactoring when I use Java, which I don't have for python.
The usual answer to that is testing testing testing. You're supposed to have an extensive unit test suite and run it often, particularly before a new version goes online.
Proponents of dynamically typed languages make the case that you have to test anyway because even in a statically typed language conformance to the crude rules of the type system covers only a small part of what can potentially go wrong.

Categories

Resources