Instrumentation for numerical linear algebra in Python - python

I use numpy for numerical linear algebra. I suspect that I can get much better performance if I make small modifications in how I carry out certain computations so that they are more memory efficient, for example.
I was wondering if there is any form of instrumentation available in python to detect cache and TLB misses. There is a very nice api, PAPI, that I learned about in a recent class but it doesn't have a Python interface:
http://icl.cs.utk.edu/papi/overview/index.html
Also, is there a good way in general to profile numpy or other python numerical code? The timeit module is hard to integrate into code. mpi4py has a nice way to profile using the MPE library. A snippet from demo code (demo/mpe-logging/cpilog.py):
communication = MPE.newLogState("Comunicate", "red")
with communication:
comm.Bcast([n, MPI.INT], root=0)
A log file is created that can be displayed graphically. But this is a bit MPI specific.
Thanks.

Robert Kern (one of the NumPy devs) wrote line_profiler for exactly this scenario. It is more suited to profiling NumPy-heavy code than hotspot/cProfile.

Maybe one of the provided profilers might help you find the hotspots?
see profiling python
These will probably not give enough detail to trigger direct action, but should indicate where to look for improvement and help to determine the point of diminishing returns.

Related

Seasonal-Trend-Loess Method for Time Series in Python

Does anyone know if there is a Python-based procedure to decompose time series utilizing STL (Seasonal-Trend-Loess) method?
I saw references to a wrapper program to call the stl function
in R, but I found that to be unstable and cumbersome from the environment set-up perspective (Python and R together). Also, link was 4 years old.
Can someone point out something more recent (e.g. sklearn, spicy, etc.)?
I haven't tried STLDecompose but I took a peek at it and I believe it uses a general purpose loess smoother. This is hard to do right and tends to be inefficient. See the defunct STL-Java repo.
The pyloess package provides a python wrapper to the same underlying Fortran that is used by the original R version. You definitely don't need to go through a bridge to R to get this same functionality! This package is not actively maintained and I've occasionally had trouble getting it to build on some platforms (thus the fork here). But once built, it does work and is the fastest one you're likely to find. I've been tempted to modify it to include some new features, but just can't bring myself to modify the Fortran (which is pre-processed RATFOR - very assembly-language like Fortran, and I can't find a RATFOR preprocessor anywhere).
I wrote a native Java implementation, stl-decomp-4j, that can be called from python using the pyjnius package. This started as a direct port of the original Fortran, refactored to a more modern programming style. I then extended it to allow quadratic loess interpolation and to support post-decomposition smoothing of the seasonal component, features that are described in the original paper but that were not put into the Fortran/R implementation. (They apparently are in the S-plus implementation, but few of us have access to that.) The key to making this efficient is that the loess smoothing simplifies when the points are equidistant and the point-by-point smoothing is done by simply modifying the weights that one is using to do the interpolation.
The stl-decomp-4j examples include one Jupyter notebook demonstrating how to call this package from python. I should probably formalize that as a python package but haven't had time. Quite willing to accept pull requests. ;-)
I'd love to see a direct port of this approach to python/numpy. Another thing on my "if I had some spare time" list.
Here you can find an example of Seasonal-Trend decomposition using LOESS (STL), from statsmodels.
Basicaly it works this way:
from statsmodels.tsa.seasonal import STL
stl = STL(TimeSeries, seasonal=13)
res = stl.fit()
fig = res.plot()
There is indeed:
https://github.com/jrmontag/STLDecompose
In the repo you will find a jupyter notebook for usage of the package.
RSTL is a Python port of R's STL: https://github.com/ericist/rstl. It works pretty well except it is 3~5 times slower than R's STL according to the author.
If you just want to get lowess trend line, you can just use Statsmodels' lowess function
https://www.statsmodels.org/dev/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html.

How to use pypy with python existing calculation code?

I am little bit new to python and I have a large code base written in python 3.3.2 (32 bit). It uses numpy 1.7.1 and takes a very long time to run because of computationally intensive calculations.
I need to parallelize the code to increase the performance. I am thinking about using pypy to parallelize but am unsure how to use it with existing code.
I have search Google but couldn't find and appropriate or satisfactory answer. I also read about using cython but I am unsure how to use that as well.
Could anyone provide pointers on increasing the performance of my code?
Since you're new to Python, I highly recommend taking the time to survey all of the possibilities before jumping into something pypy which may not be appropriate for your needs. There are lots of ways to speed up NumPy code, and the best way really depends on exactly what you're doing.
A great starting off point is Ian Oszvald's High Performance Python tutorial. Don't just watch it: follow along and try out the examples!
From there, you should think about whether parallelization will help. There are several options for parallelization, like the stdlib's multiprocessing module, but a lot of people in the scientific space are using IPython's parallelization capabilities. For learning about this, check out Min Ragan-Kelley's IPython Parallel tutorial (pt 1, pt 2, pt 3).
Once you have some sense of what Python is capable of, pick a method for speeding up your code and try it out. When you run into more specific problems, StackOverflow will be able to provide more concrete answers than just some links to tutorials ;)

Object Tracking: MATLAB vs. Python Numpy

I will soon be starting a final year Engineering project, consisting of the real-time tracking of objects moving on a 2D-surface. The objects will be registered by my algorithm using feature extraction.
I am trying to do some research to decide whether I should use MATLAB or use Python Numpy (Numerical Python). Some of the factors I am taking into account:
1.) Experience
I have reasonable experience in both, but perhaps more experience in image processing using Numpy. However, I have always found MATLAB to be very intuitive and easy to pick up.
2.) Real-Time abilities
It is very important that my choice be able to support the real-time acquisition of video data from an external camera. I found this link for MATLAB showing how to do it. I am sure that the same would be possible for Python, perhaps using the OpenCV library?
3.) Performance
I have heard, although never used, that MATLAB can easily split independent calculations across multiple cores. I should think that this would be very useful, and I am not sure whether the same is equally simple for Numpy?
4.) Price
I know that there is a cost associated with MATLAB, but I will be working at a university and thus will have access to full MATLAB without any cost to myself, so price is not a factor.
I would greatly appreciate any input from anyone who has done something similar, and what your experience was.
Thanks!
Python (with NumPy, SciPy and MatPlotLib) is the new Matlab. So I strongly recommend Python over Matlab.
I made the change over a year ago and I am very happy with the results.
Here it is a short pro/con list for Python and Matlab
Python pros:
Object Oriented
Easy to write large and "real" programs
Open Source (so it's completely free to use)
Fast (most of the heavy computation algorithms have a python wrapper to connect with C libraries e.g. NumPy, SciPy, SciKits, libSVM, libLINEAR)
Comfortable environment, highly configurable (iPython, python module for VIM, ...)
Fast growing community of Python users. Tons of documentation and people willing to help
Python cons:
Could be a pain to install (especially some modules in OS X)
Plot manipulation is not as nice/easy as in Matlab, especially 3D plots or animations
It's still a script language, so only use it for (fast) prototyping
Python is not designed for multicore programming
Matlab pros:
Very easy to install
Powerful Toolboxes (e.g. SignalProcessing, Systems Biology)
Unified documentation, and personalized support as long as you buy the licence
Easy to have plot animations and interactive graphics (that I find really useful for running experiments)
Matlab cons:
Not free (and expensive)
Based on Java + X11, which looks extremely ugly (ok, I accept I'm completely biased here)
Difficult to write large and extensible programs
A lot of Matlab users are switching to Python :)
I would recommend python.
I switched from MATLAB -> python about 1/2 way through my phd, and do not regret it. At the most simplistic, python is a much nicer language, has real objects, etc.
If you expect to be doing any parts of your code in c/c++ I would definitely recommend python. The mex interface works, but if your build gets complicated/big it starts to be a pain and I never sorted out how to effectively debug it. I also had great difficulty with mex+allocating large blocks interacting with matlab's memory management (my inability to fix that issue is what drove me to switch).
As a side note/self promotion, I have Crocker-Grier in c++ (with swig wrappers) and pure python.
If you're experienced with both languages it's not really a decision criterion.
Matlab has problems coping with real time settings especially since most computer vision algorithms are very costly. This is the advantage of using a tried and tested library such as OpenCV where many of the algorithms you'll be using are efficiently implemented. Matlab offers the possibility of compiling code into Mex-files but that is a lot of work.
Matlab has parallel for loops parfor which makes multicore processing easy (or at least easier). But the question is if that will suffice to get real-time speeds.
No comment.
The main advantage of Matlab is that you'll obtain a running program very quickly due to its good documentation. But I found that code reusability is bad with Matlab unless you put a heavy emphasis on it.
I think the final decision has to be if you have to/can run your algorithm real-time which I doubt in Matlab, but that depends on what methods you're planning to use.
Others have made a lot of great comments (I've opined on this topic before in another answer https://stackoverflow.com/a/5065585/392949) , but I just wanted to point out that Python has a number of really excellent tools for parallel computing/splitting up work across multiple cores. Here's a short and by no means comprehensive list:
IPython Parallel toolkit: http://ipython.org/ipython-doc/dev/parallel/index.html
mpi4py: https://code.google.com/p/mpi4py
The multiprocessing module in the standard library: http://docs.python.org/library/multiprocessing.html
pyzmq: http://zeromq.github.com/pyzmq/ (what the IPython parallel toolkit is based on)
parallel python (pp): http://www.parallelpython.com/
Cython's wrapping of openmp: http://docs.cython.org/src/userguide/parallelism.html
You will also probably find cython to be much to be a vastly superior tool compared to what Matlab has to offer if you ever need to interface external C-libraries or write C-extensions, and it has excellent numpy support built right in.
There is a list with a bunch of other options here:
http://wiki.python.org/moin/ParallelProcessing

Just Curious about Python+Numpy to Realtime Gesture Recognition

i 'm just finish labs meeting with my advisor, previous code is written in matlab and it run offline mode not realtime mode, so i decide to convert to python+numpy (in offline version) but after labs meeting, my advisor raise issue about speed of realtime recognition, so i have doubt about speed of python+numpy to do this project. or better in c? my project is about using electronic glove (2x sensors) to get realtime data and do data processing, recognition process
NumPy is very fast if you follow some basic rules. You should avoid Python loops, using the operators provided by NumPy instead whenever you can. This and this should be a good starting points.
After reading through that, why don't you write some simple code in both Matlab and NumPy and compare the performance? If it performs well in NumPy, it should be enough to convince your advisor, especially if the code is representative of the actual algorithms you are using in your project.
Note: you should also see that your algorithm really is suited for realtime recognition.
I think the answer depends on three things: how well you code in Matlab, how well you code in Python/Numpy, and your algorithm. Both Matlab and Python can be fast for number crunching if you're diligent about vectorizing everything and using library calls.
If your Matlab code is already very good I would be surprised if you saw much performance benefit moving to Numpy unless there's some specific idiom you can use to your advantage. You might not even see a large benefit moving to C. I this case your effort would likely be better spent tuning your algorithm.
If your Matlab code isn't so good you could 1) write better Matlab code, 2) rewrite in good Numpy code, or 3) rewrite in C.
You might look at OpenCV, which has Python libs
ctypes-opencv
and opencv-cython;
I haven't used these myself.
Ideally you want to combine a fast-running C inner loop
with a flexible Python/Numpy play-with-algorithms.
Bytheway google "opencv gesture recognition" → 6680 hits.

When Does It Make Sense To Rewrite A Python Module in C?

In a game that I am writing, I use a 2D vector class which I have written to handle the speeds of the objects. This is called a large number of times every frame as there are a lot of objects on the screen, so any increase I can make in its speed will be useful.
It is pretty simple, consisting mostly of wrappers to the related math functions. It would be quite trivial to rewrite in C, but I am not sure whether doing so will make any significant difference as all it really does is call the underlying math functions, add, multiply or divide.
So, my question is under what circumstances does it make sense to rewrite in C? Where will you see a significant speed boost, and where can you see a reasonable speed boost without rewriting an extensive amount of the program?
If you're vector-munging, give numpy a try first. Chances are you will get speeds not far from C if you utilize numpy's vector manipulation functions wisely.
Other than that, your question is very heuristic. If your code is too slow:
Profile it - chances are you'll be able to improve it in Python
Use the correct optimized C-based libraries (numpy in your case)
Try psyco
Try rewriting parts with cython
If all else fails, rewrite in C
First measure then optimize
You should never optimize anything, be it in C or any other language, without timing your code before and after your optimization:
your clever optimization could in fact induce a slow down
optimizing something that takes 1% of the total execution time will never give you more than 1% performance
The common approach is:
profile your code
identify a hotspot
time this hotspot
optimize it
time the hotspot again, see if it's faster. If it's not goto 3.
If you can't find hotspots it could mean that your app is already optimized, or that you are not using the good algorithm for your problem. In both cases profiling helps understanding what your code does.
For profiling python code under Linux, you can use pyprof2calltree which works in conjunction with kcachegrind, and is totally awesome.
Common wisdom is "profile", "measure", etc. Well - maybe. Just get in the debugger and take 10 stackshots. If more than one of them terminates in your wrapper code, then it is costing more than 10% roughly, so you should consider re-doing it in C, to save that time. Chances are you will find other things also that are costing more than that.
A nice Profiler I use on Linux is pycallgraph - however, as your program gets bigger it starts to create much larger images which are harder to trace. I'm pretty sure you can exclude modules, though.

Categories

Resources