I am working with Python 3.6.
I am really confused, why this happened ?
In [1]: import numpy as np
In [2]: a = np.array(-1)
In [3]: a
Out[3]: array(-1)
In [4]: a ** (1/3)
/Users/wonderful/anaconda/bin/ipython:1: RuntimeWarning: invalid value encountered in power
#!/Users/wonderful/anaconda/bin/python
Out[4]: nan
Numpy does not seem to allow fractional powers of negative numbers, even if the power would not result in a complex number. (I actually had this same problem earlier today, unrelatedly). One workaround is to use
np.sign(a) * (np.abs(a)) ** (1 / 3)
change the dtype to complex numbers
a = np.array(-1, dtype=np.complex)
The problem arises when you are working with roots of negative numbers.
Related
I am solving cumulative probability functions (or equations in general if you want to think about it this way) with sympy solveset. So far so good. They return however "sets" as a type of result output. I am having troubles converting those to or saving those as standard python variable types: In my case I would like it to be a float.
My code is as follows:
import sympy as sp
from sympy import Symbol
from sympy import erf
from sympy import log
from sympy import sqrt
x = Symbol('x')
p = 0.1
sp.solveset((0.5 + 0.5*erf((log(x) - mu)/(sqrt(2)*sigma)))-p)
Out[91]:
FiniteSet(7335.64225447845*exp(-1.77553477605362*sqrt(2)))
Is there a possibility to convert this to float? just using float() does not work as I have tried this and I also have gotten so far to somehow store it as a list and then extracting the number again. However this way seems very cumbersome and not suited to my purpose. In the end I will let us say solve this equation above a 1000 times and I would like to have all the results as a neat array containing floating point numbers.
If you store the above result as follows:
q = sp.solveset((0.5 + 0.5*erf((log(x) - mu)/(sqrt(2)*sigma)))-p)
then Python says the type is sets.setsFiniteSet and if you try to access the variable q it gives you an error (working in Spyder btw):
"Spyder was unable to retrieve the value of this variable from the console - The error message was: 'tuple object has no attribute 'raise_error'".
I have no idea what that means. Thanks a lot.
The FiniteSet works like a Python set. You can convert it to a list and extract the element by indexing e.g.:
In [3]: S = FiniteSet(7335.64225447845*exp(-1.77553477605362*sqrt(2)))
In [4]: S
Out[4]:
⎧ -1.77553477605362⋅√2⎫
⎨7335.64225447845⋅ℯ ⎬
⎩ ⎭
In [5]: list(S)
Out[5]:
⎡ -1.77553477605362⋅√2⎤
⎣7335.64225447845⋅ℯ ⎦
In [6]: list(S)[0]
Out[6]:
-1.77553477605362⋅√2
7335.64225447845⋅ℯ
In [7]: list(S)[0].n()
Out[7]: 595.567591563886
I am doing data analysis in Python (Numpy) and R. My data is a vector 795067 X 3 and computing the mean, median, standard deviation, and IQR on this data yields different results depending on whether I use Numpy or R. I crosschecked the values and it looks like R gives the "correct" value.
Median:
Numpy:14.948499999999999
R: 14.9632
Mean:
Numpy: 13.097945407088607
R: 13.10936
Standard Deviation:
Numpy: 7.3927612774052083
R: 7.390328
IQR:
Numpy:12.358700000000002
R: 12.3468
Max and min of the data are the same on both platforms. I ran a quick test to better understand what is going on here.
Multiplying 1.2*1.2 in Numpy gives 1.4 (same with R).
Multiplying 1.22*1.22 gives 1.4884 in Numpy and the same with R.
However, multiplying 1.222*1.222 in Numpy gives 1.4932839999999998 which is clearly wrong! Doing the multiplication in R gives the correct answer of 1.49324.
Multiplying 1.2222*1.2222 in Numpy gives 1.4937728399999999 and 1.493773 in R. Once more, R is correct.
In Numpy, the numbers are float64 datatype and they are double in R. What is going on here? Why are Numpy and R giving different results? I know R uses IEEE754 double-precision but I don't know what precision Numpy uses. How can I change Numpy to give me the "correct" answer?
Python
The print statement/function in Python will print single-precision floats. Calculations will actually be done in the precision specified. Python/numpy uses double-precision float by default (at least on my 64-bit machine):
import numpy
single = numpy.float32(1.222) * numpy.float32(1.222)
double = numpy.float64(1.222) * numpy.float64(1.222)
pyfloat = 1.222 * 1.222
print single, double, pyfloat
# 1.49328 1.493284 1.493284
print "%.16f, %.16f, %.16f"%(single, double, pyfloat)
# 1.4932839870452881, 1.4932839999999998, 1.4932839999999998
In an interactive Python/iPython shell, the shell prints double-precision results when printing the results of statements:
>>> 1.222 * 1.222
1.4932839999999998
In [1]: 1.222 * 1.222
Out[1]: 1.4932839999999998
R
It looks like R is doing the same as Python when using print and sprintf:
print(1.222 * 1.222)
# 1.493284
sprintf("%.16f", 1.222 * 1.222)
# "1.4932839999999998"
In contrast to interactive Python shells, the interactive R shell also prints single-precision when printing the results of statements:
> 1.222 * 1.222
[1] 1.493284
Differences between Python and R
The differences in your results could result from using single-precision values in numpy. Calculations with a lot of additions/subtractions will ultimately make the problem surface:
In [1]: import numpy
In [2]: a = numpy.float32(1.222)
In [3]: a*6
Out[3]: 7.3320000171661377
In [4]: a+a+a+a+a+a
Out[4]: 7.3320003
As suggested in the comments to your actual question, make sure to use double-precision floats in your numpy calculations.
May someone please explain me why np.array([1e5])**2 is not the equivalent of np.array([100000])**2? Coming from Matlab, I found it confusing!
>>> np.array([1e5])**2
array([ 1.00000000e+10]) # correct
>>> np.array([100000])**2
array([1410065408]) # Why??
I found that this behaviour starts from 1e5, as the below code is giving the right result:
>>> np.array([1e4])**2
array([ 1.00000000e+08]) # correct
>>> np.array([10000])**2
array([100000000]) # and still correct
1e5 is a floating point number, but 10000 is an integer:
In [1]: import numpy as np
In [2]: np.array([1e5]).dtype
Out[2]: dtype('float64')
In [3]: np.array([10000]).dtype
Out[3]: dtype('int64')
But in numpy, integers have a fixed width (as opposed to python itself in which they are arbitrary length numbers), so they "roll over" when they get larger than the maximum allowed value.
(Note that in your case you are using a 32-bit build, so in fact the latter would give you dtype('int32'), which has a maximum value 2**32-1=2,147,483,647, roughly 2e9, which is less than 1e10.)
You're system is defaulting to np.int32, which can't handle 100000**2. If you use 64-bit precision, you'll be fine:
In [6]: np.array([100000], dtype=np.int32)**2
Out[6]: array([1410065408], dtype=int32)
In [7]: np.array([100000], dtype=np.int64)**2
Out[7]: array([10000000000])
What the default is (32 vs 64) depends on your numpy build.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Inaccurate Logarithm in Python
Why are the math.log10(x) and math.log(x,10) results different?
In [1]: from math import *
In [2]: log10(1000)
Out[2]: 3.0
In [3]: log(1000,10)
Out[3]: 2.9999999999999996
It's a known bug : http://bugs.python.org/issue3724
Seems logX(y) is always more precise than the equivalent log(Y, X).
math.log10 and math.log(x, 10) are using different algorithm, and the former is usually more accurate. Actually, it's a known issue(Issue6765): math.log, log10 inconsistency.
One may think in this way: log10(x) has a fixed base, hence it can be computed directly by some mathematical approximation formula(e.g. Taylor series), while log(x, 10) comes from a more general formula with two variables, which may be indirectly calculated by log(x) / log(10)(at least the precision of log(10) will affect the precision of quotient). So it's natural that the former way is both faster and more accurate, and that is reasonable considering that it takes advantage of a pre-known logarithmic base(i.e. 10).
As others have pointed out, log(1000, 10) is computed internally as log(1000) / log(10). This can be verified empirically:
In [3]: math.log(1000, 10) == math.log(1000) / math.log(10)
Out[3]: True
In [4]: math.log10(1000) == math.log(1000) / math.log(10)
Out[4]: False
The results of neither log(1000) nor log(10) can be represented as float, so the final result is also inexact.
I'm trying to compute this:
from scipy import *
3600**3400 * (exp(-3600)) / factorial(3400)
the error: unsupported long and float
Try using logarithms instead of working with the numbers directly. Since none of your operations are addition or subtraction, you could do the whole thing in logarithm form and convert back at the end.
Computing with numbers of such magnitude, you just can't use ordinary 64-bit-or-so floats, which is what Python's core runtime supports. Consider gmpy (do not get the sourceforge version, it's aeons out of date) -- with that, math, and some care...:
>>> e = gmpy.mpf(math.exp(1))
>>> gmpy.mpz(3600)**3400 * (e**(-3600)) / gmpy.fac(3400)
mpf('2.37929475533825366213e-5')
(I'm biased about gmpy, of course, since I originated and still participate in that project, but I'd never make strong claims about its floating point abilities... I've been using it mostly for integer stuff... still, it does make this computation possible!-).
You could try using the Decimal object. Calculations will be slower but you won't have trouble with really small numbers.
from decimal import Decimal
I don't know how Decimal interacts with the scipy module, however.
This numpy discussion might be relevant.
Well the error is coming about because you are trying to multiply
3600**3400
which is a long with
exp(-3600)
which is a float.
But regardless, the error you are receiving is disguising the true problem. It seems exp(-3600) is too big a number to fit in a float anyway. The python math library is fickle with large numbers, at best.
exp(-3600) is too smale, factorial(3400) is too large:
In [1]: from scipy import exp
In [2]: exp(-3600)
Out[2]: 0.0
In [3]: from scipy import factorial
In [4]: factorial(3400)
Out[4]: array(1.#INF)
What about calculate it step by step as a workaround(and it makes sense
to check the smallest and biggest intermediate result):
from math import exp
output = 1
smallest = 1e100
biggest = 0
for i,j in izip(xrange(1, 1701), xrange(3400, 1699, -1)):
output = output * 3600 * exp(-3600/3400) / i
output = output * 3600 * exp(-3600/3400) / j
smallest = min(smallest, output)
biggest = max(biggest, output)
print "output: ", output
print "smallest: ", smallest
print "biggest: ", biggest
output is:
output: 2.37929475534e-005
smallest: 2.37929475534e-005
biggest: 1.28724174494e+214