Can switching in-and-out PyFrameObjects be a good implementation of continuations?

Can switching in-and-out PyFrameObjects be a good implementation of continuations? - python

I'm interested in continuations, specifically in Python's C-API. From what i understand, the nature of continuations requires un-abstracting low-level calling conventions in order to manipulate the call stack as needed. I was fortunate enough to come across a few examples of these scattered here and there. In the few examples i've come across, this un-abstraction is done using either clever C (with assumptions about the environment), or custom assembly.
However, what's cool about Python is that it has its own interpreter stack made up of PyFrameObjects. Assuming single-threaded applications for now, shouldn't it be enough to just switch in-and-out PyFrameObjectss to implement continuations in Python's C-API? Why do these authors even bother with the low-level stuff?

Generators work by manipulating the stack (actually linked list) of frame objects. But that will only help for pure Python code. It won't help you if your code has any C code running. For example, if you are in C code inside an I/O routine, you can't change the Python frame object to get execution to go somewhere else. You have to be able to change the C stack in order to do that. That's what packages like greenlets do for you.

Related

Porting an old fortran program to work with python+numpy [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I am supposed to be doing research with this huge Fortran 77 program (which I recently ported to Fortran 90 superficially). It is a very old piece of software used for modeling using finite element methods.
It is a monstrosity. It is roughly 240,000 lines.
Since it began its life in Fortran 77, it uses some really dirty hacks for dynamic memory allocation; basically it uses the functions from the C standard library, mixed programming with C and Fortran. I am yet to fully grasp how allocation works. The program is built to be easily extendable by the user, and the user generally needs to allocate some globally accessible arrays for later use. This is done by having an array of memory addresses, which point to the beginning addresses of dynamically allocable arrays. Of course, which element of the address array pointing to which information all depends on conventions which has to be learned by the user, before one can start to really program. There are two address arrays, one for integers, and the other for floating points.
By dirty hacks, I mean inconsistent ones. For example an update in the optimization algorithm of the GNU compilers caused the program to exit with random memory leaks.
The program is far from elegant. Global variable names are generally short (3-4 characters) and cryptic. Passing data across routines is of course accomplished by using common blocks, which include all program switches, and the aforementioned arrays.
The usage of the program is roughly like that of an interactive shell, albeit a stupid one. First, an input file is read by the program itself, then per choice, the user is dropped into a pseudo-shell, in which the user has to type 4 character wide commands, followed by the parameters. The parser then parses the command, and corresponding subroutine is called with the parameters. You would guess that there is a loop structure in this pseudo-parser (a goto bonanza, rather) which wraps the subroutine behavior in a manner more complex than it should be in the 21st century.
The format of the input file is the same (commands, then parameters), since it is the same parser. But the syntax is not really consistent (by that, I mean it lacks control structures, and some commands cause the finite state machine to do behavior that contradict with other commands; it lacks definite grammar), time to time causing the end user to discover pitfalls. The user must learn these pitfalls by experience; I did not see them in any documentation of the program. This is a problem that can easily be avoided with python, and it is not even necessary to implement a parser.
What I want to do:
Port parts of the program into python, namely the parts that don't have anything to do with numerical computation. This includes
cleaning up and abstracting the API with an OOP approach in python,
giving meaningful variable names,
migrating dynamic allocation to either numpy or Fortran 90 and losing the C part,
migrating non-numerical execution to python, and wrap the numerical objects using f2py, so there is no loss in performance. Have I told that the program is damn fast in its current state? Hopefully porting the calls to numerical subroutines and I/O to python will not slow it down to an impractical level (or will it?).
Making use of python's interactive shell as a replacement for the pseudo-shell. This way, there will not be any inconsistencies for the end user. The aforementioned commands will be simply replaced by functions defined in python. This will allow the user to actually access the data. Plus, the user will be able to extend the program without going to deep.
What I wonder:
Is f2py suitable and up-to this task of wrapping numerous subroutines and common blocks without any confusion? I have only seen single-file examples on the net for f2py; I know that numpy has used it to wrap LAPACK and stuff, but I need reassurance that f2py is a tool consistent enough for this task.
Whether there are any suggestions on the general strategy that I should follow, or pitfalls I should avoid.
How can & should I implement a system in this python-wrapped Fortran 90 environment, so that I will be able to modify (allocate and assign) globally accessible arrays and variables inside fortran routines. This should preferably omit address arrays and I should preferably be able to inject verbal representations into the namespaces. These variables should preferably be accessible inside both python and fortran.
Notes:
I may have been asking for too much, something beyond the boundaries of the possible realm. In this case, please forgive me for I am a beginner with this aspect of programming; and don't hesitate to correct me.
The "program" I have been talking about is open source but it is commercial and the license does not allow its distribution, so I decided not to mention its name. However, you could deduce it from the 2nd sentence and the description I gave throughout.

I'm doing something depressingly similar. Instead of dynamic memory allocation via C we have a single global array with integer indices (also at global scope), but otherwise it's much the same. Weird, inconsistent input file and all.
I'd advise against trying to rewrite the majority of the program, whether in python or anything else. It's time consuming, unpleasant and largely unnecessary. As an alternative, get the F77 code base to the point whether it compiles cleanly enough that you're willing to trust it, then write an interface routine.
I now have a big, ugly F77 code base which sits behind an interface. The program requires input as a text file so a large part of the interface's job is to produce that text file. Beyond that, the legacy code is reduced to a single gateway routine which takes a few arguments (including a means of identifying the text file) and returns the answer. If you use the iso_c_binding of Fortran 2003 you can expose the interface in a format C understands, at which point you can link it to whatever you wish.
As far as the modern code (mostly optimisation routines) is concerned, the legacy code base is the single subroutine behind the C interface. This is much nicer than trying to modify the old code further and probably a valid strategy for your case as well.

For an example how to generate the f2py interface library using multiple fortran files see this post.
f2py might be suitable for your task, but there are some pitfalls that might cause some problems. Some pitfalls concerning f2py are listed here and summarized below:
Concerning your specific problem you might run into problems with your allocatable arrays, because f2py was writen for Fortran77 and does not support many of the Fortran90+ features (such as allocatable arrays).
I also encountered a problem with an undocumented maximum array size (arround 400 x 200 x 20 x 20). If I used arrays bigger then that f2py would not be able to generate the python library. Especially the large matrices being passed arround in finitie element codes might be too big for interfacing. Therefore you would not have access to those in the Python part of the program.
Beneficial for you is that f2py should have no Problems with COMMON Blocks, etc. because it was especially written for Fortran77.
After passing the data through the interface to the Fortran routines, there should be no (or only minimal) slowdown if you do it right. The key is to minimize calculations in the Python part of the program per run. This includes the manipulation of the data arrays (shift, rotate, copy, etc.) but not passing of them (because the interface is pass-by-reference).
As an alternative you should have a look at Cython (also see the Link above and the linked working example therein). I think this might serve you better in the long run.
Implementation Suggestion
This suggestion is how I would do it incorporating my experiences with having done something similar (see Background below). It should largely be independent of how you interface the Python and Fortran code (f2py, Cython, ...).
Of course you should be very careful to not change the behaviour and therefore possibly the results of the program. Therefore generation of some tests and their corresponding reference in- & output files and test documentation including all steps, keystrokes, commands, etc. necessary to reproduce those results should be your first step.
In your case I would try to change the least amount possible of the Fortran program. I would try to wedge the "pseudo-shell" from the Fortran code, e.g. making it its own module, and build an interface to that module. Like that you can use all of the original Fortran code and the modifications, bugfixes and updates from your peers, even in the future. The key is to not distance your code to far from the original/ mainstream because in scientific communities usually not everybody will agree with major changes to the source code and update their workflow or source code accordingly. Therefore future work from your peers might not be made in your version, but in the original source code and it would be your own responsibility to merge those changes into your version, which gets easier the less you change.
Using that interface you can work on your python shell and maybe even build a GUI for it without having to worry about changing anything in the original progam. This reduces the risk to introduce bugs or change the results of the original. Your Shell/ GUI would therefore work as a wrapper around the original program to simplify the workflow and remove inconsistencies. All the "intelligence" and utilities, like error & cross checking of the user-input, help pages, tutorials/ howto, etc. would be implemented in the Python wrapper, which would parse these inputs, translate them to the corresponding commands for your Fortran program, send them and wait for the results.
After you have simplified the usage of the program I would write some automatisation for the tests (setup + evaluation) to complete your utilities suite. Like that even somebody new to the program would be able to make changes to the code without having to worry about unknowingly changing the results. This should enable your tools to benefit the community which will attract new users and therefore encourage further development within the community.
Only as the last step I would replace the parts of the code using C with Fortran90+ methods to simplify the code. This is an extensive change of the codebase and needs a lot of tests to ensure EVERY possible combination of commands is checked and verified before and after the changes.
This method also has the benefit, that you could possibly make your interface/ GUI open source (you have to check the licence of your program of course) as long as it is seperable from the source code of the Fortran program. The Fortran - Python interface would have to be provided, or installed/ generated from source files when your interface is loaded using some simple build skript as seen in the first link of this post.
For the manipulation of internal data I would write a seperate wrapper routine, that only handles the data interface. This should be done in Cython though to enable you to use allocatable arrays, etc. Because this interface would work with "pass-by-reference" you should be able to use the full collection of Python (numpy) tools to manipulate the arrays and data.
Background
I did something similar using our research code for helicopter rotordynamics. This is also a very old and large program written in Fortran77 (e.g. goto bonanza). The newer additions and modifications to the code are usually done in Fortran90/2003.
Using parts of this code (several subroutines & module files) I generated a python library to connect our GUI (Python & Qt) to the Fortran program; mainly for postprocessing of Fortran binary output files.

Using embedded C library in Python emulation

Short Question
Which would be easier to emulate (in Python) a complex (SAE J1939) communication stack from an existing embedded C library:
1) Full port - meaning manually convert all of the C functions to python modules
2) Wrap the stack in a Python wrapper - meaning call the real c code in Python
Background Information
I have already written small portions of this stack in Python, however they are very non-trival to implement with 100% coverage. Because of this very reason, we have recently purchased an off the shelf SAE J1939 stack for our embedded platforms. To clarify, I know that portions touching the hardware layer will have to be re-created and mapped to the PC's CAN drivers.
I am hoping to find someone here on SO that has or even looked into porting a 5k LOC C library to Python. If there are any C to Python tools that work well that would be helpful for me to look into as well.

My advice would be to wrap it.
Reasons for that:
if you convert function by function, you'll introduce new bugs (we're just human) and this kind of stuff is pretty hard to test
wrapping for python is done easily, using swig or even ctypes to load a dll on the fly, you'll find tons of tutorial
if your lib gets updated, you have less impact in the long term.
However, you need to
check that the license you purchase allows you to do that
know that having same implementation on embedded and PC side, it won't help tracking bugs
you might have a bit less portability than a full python implementation (anyway, not much of a point for you as your low layer needs to be rewritten per target)

Definitely wrap it. It might be as easy are running ctypesgen.py and then using it. Check this blog article about using ctypesgen to create a wrapper for libreadline http://wavetossed.blogspot.com/2011/07/asynchronous-gnu-readline.html in order to get access to the full API.

Are there advantages to use the Python/C interface instead of Cython?

I want to extend python and numpy by writing some modules in C or C++, using BLAS and LAPACK. I also want to be able to distribute the code as standalone C/C++ libraries. I would like this libraries to use both single and double precision float. Some examples of functions I will write are conjugate gradient for solving linear systems or accelerated first order methods. Some functions will need to call a Python function from the C/C++ code.
After playing a little with the Python/C API and the Numpy/C API, I discovered that many people advocate the use of Cython instead (see for example this question or this one). I am not an expert about Cython, but it seems that for some cases, you still need to use the Numpy/C API and know how it works. Given the fact that I already have (some little) knowledge about the Python/C API and none about Cython, I was wondering if it makes sense to keep on using the Python/C API, and if using this API has some advantages over Cython. In the future, I will certainly develop some stuff not involving numerical computing, so this question is not only about numpy. One of the thing I like about the Python/C API is the fact that I learn some stuff about how the Python interpreter is working.
Thanks.

The current "top answer" sounds a bit too much like FUD in my ears. For one, it is not immediately obvious that the Average Developer would write faster code in C than what NumPy+Cython gives you anyway. Quite the contrary, the time it takes to even get the necessary C code to work correctly in a Python environment is usually much better invested in writing a quick prototype in Cython, benchmarking it, optimising it, rewriting it in a faster way, benchmarking it again, and then deciding if there is anything in it that truly requires the 5-10% more performance that you may or may not get from rewriting 2% of the code in hand-tuned C and calling it from your Cython code.
I'm writing a library in Cython that currently has about 18K lines of Cython code, which translate to almost 200K lines of C code. I once managed to get a speed-up of almost 25% for a couple of very important internal base level functions, by injecting some 20 lines of hand-tuned C code in the right places. It took me a couple of hours to rewrite and optimise this tiny part. That's truly nothing compared to the huge amount of time I saved by not writing (and having to maintain) the library in plain C in the first place.
Even if you know C a lot better than Cython, if you know Python and C, you will learn Cython so quickly that it's worth the investment in any case, especially when you are into numerics. 80-95% of the code you write will benefit so much from being written in a high-level language, that you can safely lay back and invest half of the time you saved into making your code just as fast as if you had written it in a low-level language right away.
That being said, your comment that you want "to be able to distribute the code as standalone C/C++ libraries" is a valid reason to stick to plain C/C++. Cython always depends on CPython, which is quite a dependency. However, using plain C/C++ (except for the Python interface) will not allow you to take advantage of NumPy either, as that also depends on CPython. So, as usual when writing something in C, you will have to do a lot of ground work before you get to the actual functionality. You should seriously think about this twice before you start this work.

First, there is one point in your question I don't get:
[...] also want to be able to distribute the code as standalone C/C++ libraries. [...] Some functions will need to call a Python function from the C/C++ code.
How is this supposed to work?
Next, as to your actual question, there are certainly advantages of using the Python/C API directly:
Most likely, you are more familar with writing C code than writing Cython code.
Writing your code in C gives you maximum control. To get the same performance from Cython code as from equivalent C code, you'll have to be very careful. You'll not only need to make sure to declare the types of all variables, you'll also have to set some flags adequately -- just one example is bounds checking. You will need intimate knowledge how Cython is working to get the best performance.
Cython code depends on Python. It does not seem to be a good idea to write code that should also be distributed as standalone C library in Cython

The main disadvantage of the Python/C API is that it can be very slow if it's used in an inner loop. I'm seeing that calling a Python function takes a 80-160x hit over calling an equivalent C++ function.
If that doesn't bother your code then you benefit from being able to write some chunks of code in Python, have access to Python libraries, support callbacks written directly in Python. That also means that you can make some changes without recompiling, making prototyping easier.

Python vs C : Line of Code Comparison vs Dev Time

Hi I'm currently learning Python since the syntax feels so succinct and the idioms match well with my mental model.
However I'm also interested in learning about OS internals and reverse engineering software, which ultimately means knowing C in a rather thorough capacity.
When originally picking a language I did lots of reading and comparisons, and it seems that a number thrown out a lot is that to write short idiomatic statements in Python would require the equivalent of a few hundred lines of C (I'd guess code for memory management, writing the code for dictionaries,lists etc) that we take for granted as built into the Python language.
1) With an average C programmer, is that 100-200 lines of code per Python idiom anywhere near accurate?
Because C doesn't come built-in with Python-like constructs such as dictionaries/lists(with all their nice methods etc):
2) Do C programmers tend to build these constructs from scratch and then re-use them between projects to greatly reduce the actual amount of hand coding for their projects?
I assume re-using libraries like boost:: stuff also again, reduces some of the boilerplate hand coding also...
3) But does using popular libraries and re-using common code one has written before in C for basic constructs/etc, how much does that revise the lines of code written in C compared to the code in Python of a enthusiast sized code base?
I know specific numbers aren't possible, but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python without being a Linus Torvalds style coding machine?
Thanks!

but is it possible with libraries, code reuse etc, to have a development time in C close to that of Python
No.
You've missed the most important point.
Python's interactive. It's not edit-compile-link-execute-break-debug. It's edit-debug.

Boost is C++, not C (emphatically not C -- virtually all of it makes heavy use of templates and such that aren't part of C).
Yes, C programmers tend to build up personal libraries of code for all sorts of "stuff" -- data structures, algorithms, user interfaces, and so on. There are also a fair number of other libraries for everything from basic string manipulation to database connectivity, user interfaces, basic algorithms and data structures, etc.
Comparing productivity between the two can be difficult though -- even if something can be done in one line of code either way, there's a greater chance that the C programmer will end up doing extra work to find and learn to use that particular library. OTOH, if he has used it before, the two might be directly competitive of (in a few cases) C might be more productive.
I'd guess Python ends up more productive more often, but trying to guess how much so is difficult (and lines of code usually won't be a good indication either).

As I did serious c programming I read a book that claimed libraries are worth to write. (Especially in C which considered a low level language)
Libraries are build for reuse.
If you use libraries you write one line like detectFace( faceDesriptor ) or renderPDF( document) is doesn't matter whether an idiom in another language is more concise or not.
Lines of code isn't a proper metric if it is about what would more efficient.

It depends.
Try to write an interrupt handler in python. Someone could probably make it work but it's going to be a dancing bear, the dancing is not good but it is surprising that a bear can do it. Want to write an OS or do some embedded programming you're not going to be able to use python. It's telling that the main python implementation is written in C.
That being said I'm amazed at some of the low-level stuff that you can do with python. The high-level stuff is almost a given if you're measuring lines of code. Python is just a higher-level language.
They are both very useful tools, just for different types of projects. Knowing both would be very useful, particularly when you need to interface to some new functionality in python that doesn't yet have a python binding.
For the types of projects most developers work on python is going to be more consice and quicker to write and debug. You may be able to make a library of reusable C code, but a good python programmer will be doing the same thing with their python code, at a higher level.

I think Python is more productive for small projects (up to a few thousand lines of code).
On the other hand, C is better suited for large projects (even though IMHO there are better languages for that, such as Ada): static type-checking allows to find many errors at compile time that are much more difficult to detect at run-time, especially in a large program.
In a larger C project, the lack of lists and other powerful data structures that are found in Python can be compensated by implementing or using custom libraries. I agree with user stacker that by using well-designed libraries your C code can be pretty concise.

Depends greatly on the task and the size of the project. For many small interesting tasks, I would not be surprised by 100:1 smaller Python code simply because the standard libraries are extremely good. If you find, buy, or build C/C++ libraries that do what you want, I imagine the ratio would be much more like 3:1 on big projects.
However, finding, buying, and building C/C++ libraries does take time and effort, so I believe in the vast majority of cases, Python is going to be much faster to develop in.

Is there any purpose for a python application use C other than performance?

If Python was so fast as C, the latter would be present in python apps/libraries?
Example: if Python was fast as C would PIL be written completely in Python?

To access "legacy" C libraries and OS facilities.

While you can of course use ctypes to access existing C code, you might not necessarily want to, in sufficiently complex cases: when you're coding to an interface designed for (and implemented in) C, not doing compilation can mean that small errors on the caller's side, that would simply refuse to compile properly in C, can result in crashes of the whole application.
So, using C code (rather than ctypes) for the purpose of reusing good existing C code can make a lot of sense (Cython is fine too, of course, since it does generate C code that, in case of caller-side errors, should fail to compile;-).
Recoding everything from scratch, rather than reusing good, existing, solid and finely tuned code, doesn't make much sense either way, of course -- there are so many interesting new problems to conquer, that spending your time just mimicking an existing, just-fine solution to an old and already-conquered problem is likely not going to be the best, most productive, and most satisfactory way to spend your time;-).

It makes sense to use C modules in Python for:
Performance
Libraries that won't be ported to Python (because of performance reasons, for example) or that use OS-specific functions
Scripting. For example, many games use Python, Lua and other languages as scripting languages. Therefore they expose C/C++ functions to Python.
As to your example: Yes, but Python is inherently slower than C. If both were equally fast, it would make sense to use Python because C code is often more prone to attacks (buffer overflows and stuff).

To access hardware.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.