There is a project called JyNI that allows you to run NumPy in Jython. However I haven't come across anywhere on how to get NumPy into Jython. I've tried 'pip install numpy' (which will work for normal python 3.4.3) but gives an error about a missing py3k module. Does anybody have a bit more information about this?
JyNI does state NumPy-support as its main goal, but cannot do it yet, as long as it is still in alpha-state.
However until it is mature enough you can use NumPy via
JEP (https://github.com/mrj0/jep) or
JPY (https://github.com/bcdev/jpy).
Alternatively you can use a Java numerical library for your computation, e.g. one of these:
https://github.com/mikiobraun/jblas
https://github.com/fommil/matrix-toolkits-java
Both are Java-libs that do numerical processing natively backed by blas or lapack (i.e. the same backends NumPy uses), so the performance should equal that of NumPy more or less. However they don't feature such a nice multiarray implementation as NumPy does afaik.
If you need NumPy indirectly to fulfill dependencies of some other framework, these solutions won't do it out of the box. If the dependencies are only marginal you can maybe rewrite/substitute the corresponding calls based on one of the named projects. Otherwise you'll have to wait for JyNI...
If you can make some framework running on Jython this way, please consider to make your work publicly available, ideally as a fork of the framework.
Related
I have a fairly involved setup.py cython compilation process where I consider multiple things such as openMP support and the presence or absence of C headers. Specifically, FFTW is a library that computes the FFT, and is faster than numpy's FFT, so if fftw3.h is available, I compile my module against that, otherwise I fallback onto numpy.
I would like to be able to remember how the package was compiled i.e. did the compiler support openMP and which FFT library was used. All this information is available when running setup.py but not later on and can be useful e.g. if the user would like to run a function using multiple cores, but openMP was not used during compilation, everything will run on one core. Remembering this information would allow me to show a nice error.
I am unsure what the best way to do this would be. There are plenty of options such as writing a file with the data and then reading it when necessary, but is there any standard way to do this? Basically, I'm trying to emulate numpy's show_config, but am unsure what the best way to do this would be.
I have not attempted this, but my suggestion would to mimic the config.h-behavior one sees with autotools-based building: your setup.py generates a set of definitions that you either invoke via commandline or use via a generated header file - and then you can use this to feed e.g. a compiled extension function to return an approriate data-structure. But whatever you do: I have not come across a standardized way for this.
I'm implementing a real-time LMS algorithm, and numpy.dot takes more time than my sampling time, so I need numpy to be faster (my matrices are 1D and 100 long).
I've read about building numpy with ATLAS and such, but never done such thing and spent all my day trying to do it, with zero succes...
Can someone explain why there aren't builds with ATLAS included? Can anyone provide me with one? Is there any other way to speed up dot product?
I've tried numba, and scipy.linalg.gemm_dot but none of them seemed to speed things up.
my system is Windows8.1 with Intel processor
If you download the official binaries, they should come linked with ATLAS. If you want to make sure, check the output of np.show_config(). The problem is that ATLAS (Automatically Tuned Linear Algebra System) checks many different combinations and algorithms, and keeps the best at compile time. So, when you run a precompiled ATLAS, you are running it optimised for a computer different than yours.
So, your options to improve dot are:
Compile ATLAS yourself. On Windows it may be a bit challenging, but it is doable. Note: you must use the same compiler used to compile Python. That is, if you decide to go for MinGW, you need to get Python compiled with MinGW, or build it yourself.
Try Christopher Gohlke's Numpy. It is linked against MKL, that is much faster than ATLAS (and does all the optimisations at run time).
Try Continuum analytics' Conda with accelerate (also linked with MKL). It costs money, unless you are an academic. In Linux, Conda is slower than system python because they have to use an old compiler for compatibility purposes; I don't know if that is the case on Windows.
Use Linux. Your Python life will be much easier, setting up the system to compile stuff is very easy. Also, setting up Cython is simple too, and then you can compile your whole algorithm, and probably get further speed up.
The note regarding Cython is valid for Windows too, it is just more difficult to get it working. I tried a few years ago (when I used Windows), and failed after a few days; I don't know if the situation has improved.
Alternative:
You are doing the dot product of two vectors. Then, np.dot is probably not the most efficient way. I would give a shot to spell it out in plain Python (vec1*vec2).sum() (could be very good for Numba, this expression it can actually optimise) or using numexpr:
ne.evaluate(`sum(vec1 * vec2)`)
Numexpr will also parallelise the expression automatically.
I really want to know how to utilize multi-core processing for matrix multiplication on numpy/pandas.
What I'm trying is here:
M = pd.DataFrame(...) # super high dimensional square matrix.
A = M.T.dot(M)
This takes huge processing time because of many sums of products, and I think it's straightforward to use multithreading for huge matrix multiplication. So, I was googling carefully, but I can't find how to do that on numpy/pandas. Do I need to write multi thread code manually with some python built-in threading library?
In NumPy, multithreaded matrix multiplication can be achieved with a multithreaded implementation of BLAS, the Basic Linear Algebra Subroutines. You need to:
Have such a BLAS implementation; OpenBLAS, ATLAS and MKL all include multithreaded matrix multiplication.
Have a NumPy compiled to use such an implementation.
Make sure the matrices you're multiplying both have a dtype of float32 or float64 (and meet certain alignment restrictions; I recommend using NumPy 1.7.1 or later where these have been relaxed).
A few caveats apply:
Older versions of OpenBLAS, when compiled with GCC, runs into trouble in programs that use multiprocessing, which includes most applications that use joblib. In particular, they will hang. The reason is a bug (or lack of a feature) in GCC. A patch has been submitted but not included in the mainline sources yet.
The ATLAS packages you find in a typical Linux distro may or may not be compiled to use multithreading.
As for Pandas: I'm not sure how it does dot products. Convert to NumPy arrays and back to be sure.
First of all I would also propose to convert to bumpy arrays and use numpys dot function. If you have access to an MKL build which is more or less the fastest implementation at the moment, you should try to set the environment variable OMP_NUM_THREADS. This should activate the other cores of your system. On my MAC it seems to work properly. In addition I would try to use np.einsum which seems to be faster than np.dot
But pay attention! If you have compiled an multithreaded library that is using OpenMP for parallelisation (like MKL), you have to consider, that the "default gcc" on all apple systems is not gcc, it is Clang/LLVM and Clang ist not able to build with OpenMP support at the moment, except you use the OpenMP trunk which is still experimental. So you have to install the intel compiler or any other that supports OpenMP
We have some Java code we want to use with new code we plan to write in Python, hence our interest in using Jython. However we also want to use numpy and pandas libraries to do complex statistical analysis in this Python code.
Is it possible to call numpy and pandas from Jython?
Keep an eye in JyNI which is at alpha.2 version, as of March-2014.
Not directly.
One option which I've used in the past is to use jsonrpclib (which works for both) to communicate between python and jython. There's even a server builtin which makes things quite simple. You'll just need to figure out whether the gains of using numpy are worth the additional overhead.
Especially if you don't want to use raw Numpy, but other Python frameworks that depend on it, JyNI will be the way to go once it is mature. However, it is not yet capable to import Numpy.
Until then you can use Numpy from Java by embedding CPython. See the Numpy4J-project for this (I didn't test it myself though).
You can't use numpy from Jython at this time. But if you're willing to use CPython instead of Jython, there are some open source Java projects that work with numpy (and presumably pandas).
Jep
jpy
JyNI
I'm about to reinstall numpy and scipy on my Ubuntu Lucid. As these things carry quite a few dependencies, I'm wondering if there is a comprehensive test suite to check if the new install really works.
Of course, I can just take a bunch of my scripts and run them one by one to see if they keep working, but that won't guard against a situation where at some point in the future I'll try to use something I didn't use before and it'll break (or, worse, silently produce nonsence).
Yes. Both packages have a test method for this.
import numpy
numpy.test('full')
import scipy
scipy.test('full')
You will need to have pytest and hypothesis installed to run numpy.test.
Note that binary packages for the mathematical libraries Scipy and
Numpy depend on, shipped by Linux distributions, have in some cases
showed to be subtly broken. Running Numpy and Scipy test suites with
numpy.test() and scipy.test() is recommended, as a first step to
confirm that your installation functions properly. If it doesn't, you
may want to try another set of binaries if available, or buy some
above-mentioned commercial packages.
from http://www.scipy.org/Download