I tried scipy.interpolate.RegularGridInterpolator but MATLAB and python give me results with tiny different (For example: python: -151736.1266937256 MATLAB: -151736.1266989708). And I do care about those different decimals.
Those two functions are equivalent. However, MATLAB's griddedInterpolant has multiple interpolation methods, whilst RegularGridInterpolator only seems to support linear and nearest. With MATLAB, this gives you more possibilities to choose a proper method given your data.
Your two results seem to be accurate to the 12th digit, which in most cases is a good accuracy. The difference between the results is probably due to different implementations of the interpolation method.
If you want accuracy beyond the 12th digit you should rescale your problem so that you only consider the decimals.
Related
I am working on a project related to gravitational lensing, for which I need to evaluate the confluent hypergeometric function 1F1(a,b,z) for an array z of length ~ 10^8 complex points, a = 1+0.48j and b = 1. I am looking for an efficient way to evaluate this on large array sizes. The scipy implementation is fast but does not accept complex arguments for a and b.
mpmath seems to be the best way to calculate 1F1 for complex parameters but mpmath.hyp1f1 does not accept array values. The best workaround I found for this was to use np.vectorize or np.frompyfunc to allow passing a NumPy array as a parameter. However, this is extremely slow and would take days to execute (even with gmpy2 installed). I assume this is because mpmath functions are always slow on large array sizes.
a nonpython implementation would be fine as well, as long as I can somehow save the result on disk and read it into my python code. I have seen some implementations (for example https://www.math.ucla.edu/~mason/research/pearson_final.pdf) which could possibly work but I'm not sure.
Another possible way would be to interpolate the function
(consecutive points in my input array are extremely close) but I'm not sure what would be the best way to do that.
Thanks!
I was having a very similar problem than you have.
I figured out that the mpmath package has a "hidden" set of function with (only) float precision, which one can access by writing fp. upfront. This does not exist for hyp1f1 but for the more general hyper. Meaning there is a fp.hyper in the mpmath package which is with fp.hyper([a],[b],z) equivalent to hyper1f1(a,b,z), but is a lot faster.
If you vectorize this with np.vectorize this should make your calculation substansially faster.
Disclaimer: I got an error message saying that some complex value is converted to real by dropping the imaginary part when evaluating this, but so far the results i have gotten seem sensible and compatible to the hyper1f1(a,b,z) values.
Added: It seems that fp.hyper does not like getting numpy datatypes even if they are scalars, as in the case of a,b,z beeing numpy scalars (for example one element of an numpy array) it will simply return 1 without giving an error message independent of the actual input. If you use np.vectorize however everything should be fine.
Eitherway: Use at own risc.
I have a relatively complicated function and I have calculated the analytical form of the Jacobian of this function. However, sometimes, I mess up this Jacobian.
MATLAB has a nice way to check for the accuracy of the Jacobian when using some optimization technique as described here.
The problem though is that it looks like MATLAB solves the optimization problem and then returns if the Jacobian was correct or not. This is extremely time consuming, especially considering that some of my optimization problems take hours or even days to compute.
Python has a somewhat similar function in scipy as described here which just compares the analytical gradient with a finite difference approximation of the gradient for some user provided input.
Is there anything I can do to check the accuracy of the Jacobian in MATLAB without having to solve the entire optimization problem?
A laborious but useful method I've used for this sort of thing is to check that the (numerical) integral of the purported derivative is the difference of the function at the end points. I have found this more convenient than comparing fractions like (f(x+h)-f(x))/h with f'(x) because of the difficulty of choosing h so that on the one hand h is not so small that the fraction is not dominated by rounding error and on the other h is small enough that the fraction should be close to f'(x)
In the case of a function F of a single variable, the assumption is that you have code f to evaluate F and fd say to evaluate F'. Then the test is, for various intervals [a,b] to look at the differences, which the fundamental theorem of calculus says should be 0,
Integral{ 0<=x<=b | fd(x)} - (f(b)-f(a))
with the integral being computed numerically. There is no need for the intervals to be small.
Part of the error will, of course, be due to the error in the numerical approximation to the integral. For this reason I tend to use, for example, and order 40 Gausss Legendre integrator.
For functions of several variables, you can test one variable at a time. For several functions, these can be tested one at a time.
I've found that these tests, which are of course by no means exhaustive, show up the kinds of mistakes that occur in computing derivatives quire readily.
Have you considered the usage of Complex step differentiation to check your gradient? See this description
Before I attempt to reinvent any wheels, has someone already made a Python class for piecewise polynomials using numpy.poly1d (or the like)? It should take a list of discontinuities and a list of polynomials, and allow all the operations of poly1d, or at least (for my purpose) add, subtract, multiply and integrate; I need these as I mean to use piecewise quadratics as basis functions for a least squares fit (to piecewise continuous input rather than discrete input).
(numpy.piecewise isn't it: it does not create a persistent object.)
Update: Oops, I obtained piecepoly.py somehow in November 2010. (No sign of authorship. The code is very similar to what I wrote today, but there is a comment "The functions are considered left-continuous," which is not something I'd say.)
I am doing feature scaling on my data and R and Python are giving me different answers in the scaling. R and Python give different answers for the many statistical values:
Median:
Numpy gives 14.948499999999999 with this code:np.percentile(X[:, 0], 50, interpolation = 'midpoint').
The built in Statistics package in Python gives the same answer with the following code: statistics.median(X[:, 0]).
On the other hand, R gives this results 14.9632 with this code: median(X[, 1]). Interestingly, the summary() function in R gives 14.960 as the median.
A similar difference occurs when computing the mean of this same data. R gives 13.10936 using the built-in mean() function and both Numpy and the Python Statistics package give 13.097945407088607.
Again, the same thing happens when computing the Standard Deviation. R gives 7.390328 and Numpy (with DDOF = 1) gives 7.3927612774052083. With DDOF = 0, Numpy gives 7.3927565984408936.
The IQR also gives different results. Using the built-in IQR() function in R, the given results is 12.3468. Using Numpy with this code: np.percentile(X[:, 0], 75) - np.percentile(X[:, 0], 25) the results is 12.358700000000002.
What is going on here? Why are Python and R always giving different results? It may help to know that my data has 795066 rows and is being treated as an np.array() in Python. The same data is being treated as a matrix in R.
tl;dr there are a few potential differences in algorithms even for such simple summary statistics, but given that you're seeing differences across the board and even in relatively simple computations such as the median, I think the problem is more likely that the values are getting truncated/modified/losing precision somehow in the transfer between platforms.
(This is more of an extended comment than an answer, but it was getting awkwardly long.)
you're unlikely to get much farther without a reproducible example; there are various ways to create examples to test hypotheses for the differences, but it's better if you do so yourself rather than making answerers do it.
how are you transferring data to/from Python/R? Is there some rounding in the representation used in the transfer? (What do you get for max/min, which should be based on a single number with no floating-point computations? How about if you drop one value to get an odd-length vector and take the median?)
medians: I was originally going to say that this could be a function of different ways to define quantile interpolation for an even-length vector, but the definition of the median is somewhat simpler than general quantiles, so I'm not sure. The differences you're reporting above seem way too big to be driven by floating-point computation in this case (since the computation is just an average of two values of similar magnitude).
IQRs: similarly, there are different possible definitions of percentiles/quantiles: see ?quantile in R.
median() vs summary(): R's summary() reports values at reduced precision (often useful for a quick overview); this is a common source of confusion.
mean/sd: there are some possible subtleties in the algorithm here -- for example, R sorts the vector before summing uses extended precision internally to reduce instability, I don't know if Python does or not. However, this shouldn't make as big a difference as you're seeing unless the data are a bit weird:
x <- rnorm(1000000,mean=0,sd=1)
> mean(x)
[1] 0.001386724
> sum(x)/length(x)
[1] 0.001386724
> mean(x)-sum(x)/length(x)
[1] -1.734723e-18
Similarly, there are more- and less-stable ways to compute a variance/standard deviation.
I was trying to port one code from python to matlab, but I encounter one inconsistence between numpy fft2 and matlab fft2:
peak =
4.377491037053e-223 3.029446976068e-216 ...
1.271610790463e-209 3.237410810582e-203 ...
(Large data can't be list directly, it can be accessed here:https://drive.google.com/file/d/0Bz1-hopez9CGTFdzU0t3RDAyaHc/edit?usp=sharing)
Matlab:
fft2(peak) --(sample result)
12.5663706143590 -12.4458341615690
-12.4458341615690 12.3264538927637
Python:
np.fft.fft2(peak) --(sample result)
12.56637061 +0.00000000e+00j -12.44583416 +3.42948517e-15j
-12.44583416 +3.35525358e-15j 12.32645389 -6.78073635e-15j
Please help me to explain why, and give suggestion on how to fix it.
The Fourier transform of a real, even function is real and even (ref). Therefore, it appears that your FFT should be real? Numpy is probably just struggling with the numerics while MATLAB may outright check for symmetry and force the solution to be real.
MATLAB uses FFTW3 while my research indicates Numpy uses a library called FFTPack. FFTW is one of the standards for FFT performance and uses a number of tricks to work quickly and perform calculations to the best precision possible. You can incredibly tiny numbers and this offers a number of numerical challenges that any library will be hard pressed to resolve.
You might consider executing the Python code against an FFTW3 wrapper like pyFFTW3 and see if you get similar results.
It appears that your input data is gaussian real and even, in which case we do expect the FFT2 of the signal to be real and even. If all your inputs are this way you could just take the real part. Or round to a certain precision. I would trust MATLAB's FFTW code over the Python code.
Or you could just ignore it. The differences are quite small and a value of 3e-15i is effectively zero for most applications. If you have automated the comparison, consider calling them equivalent if the mean square error of all the entries is less than some threshold (say 1e-8 or 1e-15 or 1e-20).