Chapel-Python integration questions

Chapel-Python integration questions - python

I'm trying to see if I can use Chapel for writing parallel code for use in a Python-based climate model:
https://github.com/CliMT/climt
I don't have any experience with Chapel, but it seems very promising for my use-case. I had a few questions about how to integrate Chapel code into my current workflow:
I know you can build importable .so files, but can the compilation stop when the Cython file is generated? I can then include it into the distribution and use standard setuptools to compile my project on Travis.
Can I pass numpy arrays to a Python extension written in Chapel?
If answer to 2. is yes, and my computation is embarassingly parallel in one dimension of the array, is there an elegant way to express this paralellism in Chapel?
If I write Chapel code that works on multiple nodes and compile it to a Python extension, how do I run it? Can I use mpirun python my_code.py kind of a command?

Unfortunately not currently. However, we do leave the generated .pxd and .py(x) files in the directory with the .so, so you could make use of those in the meanwhile (this wasn't a feature request we've considered, so if you felt motivated, definitely feel free to open an issue on our Github page: https://github.com/chapel-lang/chapel/issues).
For reference, we do this because the Cython compilation command is rather tricky. I had thought we printed the Cython command used with the chpl compilation flag --print-commands, but that doesn't look to be the case (I'll make an issue for that).
You can pass 1 dimensional numpy arrays of known primitive types to Chapel from Python. We're hoping to add support for other numpy arrays soon (hopefully in 1.21, slated for March 2020)
This is definitely doable on arrays in Chapel - I would recommend using a forall loop when traversing this dimension of your array for your computation, which will divide the indices in that dimension into a number of tasks determined by Chapel. (For those not familiar with forall loops, this link gives a good overview of the concept)
For example:
forall x in arr.domain.dim(1) {
// traverses the first dimension of arr's domain in parallel
...
}
If you compile your Chapel library into a Python extension with multilocale settings, you can specify the number of locales (nodes) needed using the numlocales argument to the extension's chpl_setup function. Doing so will take care of distributing the Chapel code for you when you run your Python program.
For example, you could write:
import MyChplLib
MyChplLib.chpl_setup(4)
...
to run your program with 4 locales (nodes).
I should probably mention that as of the 1.20 release, we don't have support for array arguments in multilocale libraries. We're still figuring out priorities for the 1.21 release, so feedback on how fast you want that would be super helpful!

Related

How to track the "calling chain" from numpy to C implementation?

I have read the tutorial and API guide of Numpy, and I learned how to extend Numpy with my own C code or how to use C to call Numpy function from this helpful documentation.
However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?
x = np.array([1, 2, 3])
y = np.array([1, 2, 3])
print(x + y)
Can I use some tools like gdb to track its stack frame step by step?
Or can I directly recognize the corresponding codes from variable naming policy? (like if I want to know the code about addition, I can search for something like function PyNumpyArrayAdd(...) )
[EDIT] I found a very useful video about how to point out the C implementation of these basic C-implemented function or operator overrides like "+" "-".
https://www.youtube.com/watch?v=mTWpBf1zewc
Got this from Andras Deak via Numpy mailing-list.
[EDIT2] There is another way to track all the functions called in Numpy using gdb. It's very heavy because it will display all the functions in Numpy that are called, including these trivial ones. And it might take some time.
First you need to download/clone the Numpy repository to your own working space and then compile it with -g option, which will attach debug informations for debugging.
Then you open a terminal in the "path/to/numpy-main" directory where the setup.py of Numpy lies, and then run gdb.
If you want to know what functions in Numpy's C implementation are called in this single python statement:
y = np.exp(x)
you can set breakpoints on all the functions implemented by Numpy using this gdb python script provided by the first answer here:
Can gdb set break at every function inside a directory?
Once you load this python script by source somename.py, you can run this command in gdb: rbreak-dir numpy/core/src
And you can set commands for each breakpoint:
commands 1-5004
> silent
> bt 1
> c
> end
(here 1-5004 is the range of the breakpoints that you want to run commands on)
Once a breakpoint is activated, this command will run and print the first layer of backtrace (which is the info of the current function you are in) and then continue. In this way, you can track all the functions in Numpy, and this is a pic from my own working environment (I took a snapshot since there are rules preventing copying any byte from working computer):
Hope my trials can help the future comers.

However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?
AFAIK, there is two main way to do that: using a debugger or by tracking the function in the code (typically by looking the wrapping part or by searching keywords in numpy/core/src/XXX/). Numpy has different kind of functions. Some are focusing more on the CPython interaction part (eg. type checking, array creation, generic iterators, etc.) and some are focusing on the computing part (doing the computation efficiently). Regarding what you want, different files needs to be inspected. core/src/umath/loops.c.src is the way to go for core computing functions doing basic independent math operations.
Can I use some tools like gdb to track its stack frame step by step?
Using a debugger is the common way to do unless you are familiar with the code of Numpy. You can try to find the Numpy entry point function by looking the wrapper code but I think it is a bit difficult as this part of the code is not very readable (many related parts are generated certainly to ease the development of avoid mistakes). The hard part with GDB is to find the first entry point of the function in Numpy (the CPython interpreter function calls are hard to track as they are many of them (sometime called recursively) and the call stack is quite big far from being clear (ie. there is no clear information about the actual statement/expression being executed). That being said, AFAIR, the entry point is often something like PyArray_XXX or array_XXX. You can also track the first function executing code of the Numpy library.
Or can I directly recognize the corresponding codes from variable naming policy?
Some functions have a standardized name like typically PyArray_XXX. That being said, core computing function generally does not. They have a name generated by a template system that parse comments and annotations and generate code based on that. For adding two array, the main computing function should be for example #TYPE#_add#isa# where #TYPE# is either INT or LONG regarding your target platform. There is a special version (ie. specialization) for floating-point numbers that makes use of an optimized pair-wise summation so sake of accuracy. This kind of naming convention is quite frequent though so you can search _add in the code or a begin repeat section with add as a kind parameter.
Related post: Numpy argmax source

Expokit realization on Python

I am looking for a Pythonic realization of Expokit, which is a software package that provides matrix exponential routines for small dense or very large sparse matrices, real or complex, i.e. it finds
w(t) = exp(t*A)*v
This package had been realized in Fortran and Matlab and can be found here https://www.maths.uq.edu.au/expokit/
I have found a python wrapper expokitpy
https://github.com/weinbe58/expokitpy and a Krylov subspace methods package KryPy https://github.com/andrenarchy/krypy. Both seem to be relevant, however neither of them goes with good enough documentation (for me) to do time-evolution.
Does somebody have a working solution with the packages mentioned above or similar?

In case this is still useful to someone, it looks like there was an effort to incorporate expokit within scipy which has now stalled and is looking for somebody to finish. Though here are some instructions to compile with Fortran and then run via Python, with good results.
It seems also to have been adopted by slepc4py, which is then used by quimb, which seems useful if you need it for quantum information (or just use its expm and expm_multiply methods).

Seasonal-Trend-Loess Method for Time Series in Python

Does anyone know if there is a Python-based procedure to decompose time series utilizing STL (Seasonal-Trend-Loess) method?
I saw references to a wrapper program to call the stl function
in R, but I found that to be unstable and cumbersome from the environment set-up perspective (Python and R together). Also, link was 4 years old.
Can someone point out something more recent (e.g. sklearn, spicy, etc.)?

I haven't tried STLDecompose but I took a peek at it and I believe it uses a general purpose loess smoother. This is hard to do right and tends to be inefficient. See the defunct STL-Java repo.
The pyloess package provides a python wrapper to the same underlying Fortran that is used by the original R version. You definitely don't need to go through a bridge to R to get this same functionality! This package is not actively maintained and I've occasionally had trouble getting it to build on some platforms (thus the fork here). But once built, it does work and is the fastest one you're likely to find. I've been tempted to modify it to include some new features, but just can't bring myself to modify the Fortran (which is pre-processed RATFOR - very assembly-language like Fortran, and I can't find a RATFOR preprocessor anywhere).
I wrote a native Java implementation, stl-decomp-4j, that can be called from python using the pyjnius package. This started as a direct port of the original Fortran, refactored to a more modern programming style. I then extended it to allow quadratic loess interpolation and to support post-decomposition smoothing of the seasonal component, features that are described in the original paper but that were not put into the Fortran/R implementation. (They apparently are in the S-plus implementation, but few of us have access to that.) The key to making this efficient is that the loess smoothing simplifies when the points are equidistant and the point-by-point smoothing is done by simply modifying the weights that one is using to do the interpolation.
The stl-decomp-4j examples include one Jupyter notebook demonstrating how to call this package from python. I should probably formalize that as a python package but haven't had time. Quite willing to accept pull requests. ;-)
I'd love to see a direct port of this approach to python/numpy. Another thing on my "if I had some spare time" list.

Here you can find an example of Seasonal-Trend decomposition using LOESS (STL), from statsmodels.
Basicaly it works this way:
from statsmodels.tsa.seasonal import STL
stl = STL(TimeSeries, seasonal=13)
res = stl.fit()
fig = res.plot()

There is indeed:
https://github.com/jrmontag/STLDecompose
In the repo you will find a jupyter notebook for usage of the package.

RSTL is a Python port of R's STL: https://github.com/ericist/rstl. It works pretty well except it is 3~5 times slower than R's STL according to the author.
If you just want to get lowess trend line, you can just use Statsmodels' lowess function
https://www.statsmodels.org/dev/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html.

Object Tracking: MATLAB vs. Python Numpy

I will soon be starting a final year Engineering project, consisting of the real-time tracking of objects moving on a 2D-surface. The objects will be registered by my algorithm using feature extraction.
I am trying to do some research to decide whether I should use MATLAB or use Python Numpy (Numerical Python). Some of the factors I am taking into account:
1.) Experience
I have reasonable experience in both, but perhaps more experience in image processing using Numpy. However, I have always found MATLAB to be very intuitive and easy to pick up.
2.) Real-Time abilities
It is very important that my choice be able to support the real-time acquisition of video data from an external camera. I found this link for MATLAB showing how to do it. I am sure that the same would be possible for Python, perhaps using the OpenCV library?
3.) Performance
I have heard, although never used, that MATLAB can easily split independent calculations across multiple cores. I should think that this would be very useful, and I am not sure whether the same is equally simple for Numpy?
4.) Price
I know that there is a cost associated with MATLAB, but I will be working at a university and thus will have access to full MATLAB without any cost to myself, so price is not a factor.
I would greatly appreciate any input from anyone who has done something similar, and what your experience was.
Thanks!

Python (with NumPy, SciPy and MatPlotLib) is the new Matlab. So I strongly recommend Python over Matlab.
I made the change over a year ago and I am very happy with the results.
Here it is a short pro/con list for Python and Matlab
Python pros:
Object Oriented
Easy to write large and "real" programs
Open Source (so it's completely free to use)
Fast (most of the heavy computation algorithms have a python wrapper to connect with C libraries e.g. NumPy, SciPy, SciKits, libSVM, libLINEAR)
Comfortable environment, highly configurable (iPython, python module for VIM, ...)
Fast growing community of Python users. Tons of documentation and people willing to help
Python cons:
Could be a pain to install (especially some modules in OS X)
Plot manipulation is not as nice/easy as in Matlab, especially 3D plots or animations
It's still a script language, so only use it for (fast) prototyping
Python is not designed for multicore programming
Matlab pros:
Very easy to install
Powerful Toolboxes (e.g. SignalProcessing, Systems Biology)
Unified documentation, and personalized support as long as you buy the licence
Easy to have plot animations and interactive graphics (that I find really useful for running experiments)
Matlab cons:
Not free (and expensive)
Based on Java + X11, which looks extremely ugly (ok, I accept I'm completely biased here)
Difficult to write large and extensible programs
A lot of Matlab users are switching to Python :)

I would recommend python.
I switched from MATLAB -> python about 1/2 way through my phd, and do not regret it. At the most simplistic, python is a much nicer language, has real objects, etc.
If you expect to be doing any parts of your code in c/c++ I would definitely recommend python. The mex interface works, but if your build gets complicated/big it starts to be a pain and I never sorted out how to effectively debug it. I also had great difficulty with mex+allocating large blocks interacting with matlab's memory management (my inability to fix that issue is what drove me to switch).
As a side note/self promotion, I have Crocker-Grier in c++ (with swig wrappers) and pure python.

If you're experienced with both languages it's not really a decision criterion.
Matlab has problems coping with real time settings especially since most computer vision algorithms are very costly. This is the advantage of using a tried and tested library such as OpenCV where many of the algorithms you'll be using are efficiently implemented. Matlab offers the possibility of compiling code into Mex-files but that is a lot of work.
Matlab has parallel for loops parfor which makes multicore processing easy (or at least easier). But the question is if that will suffice to get real-time speeds.
No comment.
The main advantage of Matlab is that you'll obtain a running program very quickly due to its good documentation. But I found that code reusability is bad with Matlab unless you put a heavy emphasis on it.
I think the final decision has to be if you have to/can run your algorithm real-time which I doubt in Matlab, but that depends on what methods you're planning to use.

Others have made a lot of great comments (I've opined on this topic before in another answer https://stackoverflow.com/a/5065585/392949) , but I just wanted to point out that Python has a number of really excellent tools for parallel computing/splitting up work across multiple cores. Here's a short and by no means comprehensive list:
IPython Parallel toolkit: http://ipython.org/ipython-doc/dev/parallel/index.html
mpi4py: https://code.google.com/p/mpi4py
The multiprocessing module in the standard library: http://docs.python.org/library/multiprocessing.html
pyzmq: http://zeromq.github.com/pyzmq/ (what the IPython parallel toolkit is based on)
parallel python (pp): http://www.parallelpython.com/
Cython's wrapping of openmp: http://docs.cython.org/src/userguide/parallelism.html
You will also probably find cython to be much to be a vastly superior tool compared to what Matlab has to offer if you ever need to interface external C-libraries or write C-extensions, and it has excellent numpy support built right in.
There is a list with a bunch of other options here:
http://wiki.python.org/moin/ParallelProcessing

best way to extend python / numpy performancewise

As there are multitude of ways to write binary modules for python, i was hopping those of you with experience could advice on the best approach if i wish to improve the performance of some segments of the code as much as possible.
As i understand, one can either write an extension using the python/numpy C-api, or wrap some already written pure C/C++/Fortran function to be called from the python code.
Naturally, tools like Cython are the easiest way to go, but i assume that writing the code by hand gives better control and provide better performance.
The question, and it may be to general, is which approach to use. Write a C or C++ extension? wrap external C/C++ functions or use callback to python functions?
I write this question after reading chapter 10 in Langtangen's "Python scripting for computational science" where there is a comparison of several methods to interface between python and C.

I would say it depends on your skills/experience and your project.
If this is very ponctual and you are profficient in C/C++ and you have already written python wrapper, then write your own extension and interface it.
If you are going to work with Numpy on other project, then go for the Numpy C-API, it's extensive and rather well documented but it is also quite a lot of documentation to process.
At least I had a lot of difficulty processing it, but then again I suck at C.
If you're not really sure go Cython, far less time consuming and the performance are in most cases very good. (my choice)
From my point of view you need to be a good C coder to do better than Cython with the 2 previous implementation, and it will be much more complexe and time consuming.
So are you a great C coder ?
Also it might be worth your while to look into pycuda or some other GPGPU stuff if you're looking for performance, depending on your hardware of course.

A good comparison of several different approaches can be found here. I have tried both cython, and wrapping my own fortran code using f2py. I found that f2py was the better way to go for my purposes. This was partly influenced by the fact that I understand fortran, but honestly modern dialects such as fortran 90 will look reasonably similar to python code using numpy and shouldn't be that hard to pick up.
With cython you start with the slow, pure python code, then you have to go through a tedious process of instrumenting your code, finding out where all the calls to the python API are, and entering the relevant cython keywords at the right places to turn it into faster C code. With fortran, you just write normal code and you are already getting full compiled speed without going a messy iterative process.
In addition, certain array operations in cython still result in slow calls to the Python API, particularly those involved in slice operations. In constrast, arrays in fortran are native types which the compiler understand and can optimise. Having said that, cython is advancing pretty rapidly, so this may change in the future.
The biggest downside I have found with f2py is that it doesn't support arrays of derived types (analogous to numpy's recarray). There was some hope that fwrap would be a replacement for f2py which would solve this issues, but it appears to be on the backburner at the moment. Incidentally it is based on cython.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.