python geometric mean calculation

python geometric mean calculation - python

How to calculate in Python and without numpy the geometric mean of a list of numbers in a safe way, so I do avoid RuntimeWarning which this function produces sometimes:
data = [1,2,3,4,5]
result = reduce(mul, data) ** (1 / len(data))
I found out that i can use this log function to get the same result, but i have issue with log function not accepting negative values.
result = (1 / len(data)) * sum(list(map(math.log10, data)))
Can I map the data with abs function before map to log10?
Is there better way?

generally the n_th root of negative numbers are complex numbers
the code works with cmath base e log, exponentiation
from functools import reduce
import operator
from cmath import log, e
data = [1,2,3,4,5]
rmul = reduce(operator.mul, data) ** (1 / len(data))
rln = e**((1 / len(data)) * sum(list(map(log, data))))
rmul, rln
Out[95]: (2.605171084697352, (2.6051710846973517+0j))
data = [1,2,3,-4,5]
rmul = reduce(operator.mul, data) ** (1 / len(data))
rln = e**((1 / len(data)) * sum(list(map(log, data))))
rmul, rln
Out[96]:
((2.1076276807743737+1.531281143283889j),
(2.1076276807743732+1.5312811432838889j))
some checks:
abs(rln)
Out[97]: 2.6051710846973517
rln**5
Out[98]: (-120.00000000000003-1.4210854715202004e-14j)
for more fun and argument:
'the' square root of a positive valued a isn't singular, and positive, it is both the + and - signed values: +/- sqrt(a)
and 'the' square root of negative a is similarly both the +/- 1j * sqrt(a) values

Geometric means with negative numbers are not well-defined. There are several workarounds available which depend on your application. Please see this and also this paper. The main points are:
When all the numbers are negative you may be able to define a geometric mean by temporarily suspending the signs, take geometric mean and add them back.
If you have mix of positive and negative numbers and if odd number of them are negative then the geometric means become undefined. In any case because you're ignoring the signs the result is not meaningfule
It may be possible to separately evaluate the positive and negative parts calculate the means and them combine them with some weights as the paper does but the accuracy will depend on various factors (also described).
In terms of the code I do not get a Runtime error (see code below). If you can show an example of your code I can try to reproduce that and update my answer. And yes you cannot pass negative values to log so you have to take the absolute values where appropriate (as described above). Note that with python 2 you have to either import division from __future__() module or use a floating point number when taking fractional power otherwise you'll get wrong result.
>>> data = [1,2,3,4,5]
>>> import operator
>>> result = reduce(operator.mul, data) ** (1 / len(data))
>>> result
1
>>> result = reduce(operator.mul, data) ** (1.0 / len(data))
>>> result
2.605171084697352

Related

Modulo with fractions.Fraction class

My aim is to find np.mod(np.array[int], some_number) for a numpy array containing very large integers. Some_number is rational, but in general not an exact decimal fraction. I want to make sure that the modulos are as accurate as possible since I need to bin the results for a histogram in a later step, so any errors due to floating-point precision might mean that values will end up in the wrong bin.
I am aware that the modulo function with floats is limited by floating-point precision, so I am hesitating to use np.mod(array[int], float).
I then came across the fractions module of the python library. Can someone give advice as to whether the results obtained via np.mod(np.array[int], Fraction(int1, int2)) would be more accurate than using a float? If not, what is the best approach for such a problem?

So you have a fraction some_number=n/d
Computing the modulo is like performing this division:
a = q*(n/d) + (r/d)
the remainder is a fraction with numerator r.
It can be written like this:
a*d = q * n + r
The problem you have is that a*d could overflow.
But the problem can be written like this:
a = q1 * n + r1
d = q2 * n + r2
a*d = (q1*q2*n+q1*r2+q2*r1) * n + (r1*r2)
given that n/d is between 10 and 100, n>d, q2=0, r2=d, the algorithm is
compute a modulo n => r1
compute (r1*d) modulo n => r
divide r by d => a modulo n/d
If it's for putting in bins, you don't need step 3.

Stop Approximation in Complex Division in Python

I've been writing some code to list the Gaussian integer divisors of rational integers in Python. (Relating to Project Euler problem 153)
I seem to have reached some trouble with certain numbers and I believe it's to do with Python approximating the division of complex numbers.
Here is my code for the function:
def IsGaussian(z):
#returns True if the complex number is a Gaussian integer
return complex(int(z.real), int(z.imag)) == z
def Divisors(n):
divisors = []
#Firstly, append the rational integer divisors
for x in range(1, int(n / 2 + 1)):
if n % x == 0:
divisors.append(x)
#Secondly, two for loops are used to append the complex Guassian integer divisors
for x in range(1, int(n / 2 + 1)):
for y in range(1, int(n / 2 + 1)):
if IsGaussian(n / complex(x, y)) == n:
divisors.append(complex(x, y))
divisors.append(complex(x, -y))
divisors.append(n)
return divisors
When I run Divisors(29) I get [1, 29], but this is missing out four other divisors, one of which being (5 + 2j), which can clearly be seen to divide into 29.
On running 29 / complex(5, 2), Python gives (5 - 2.0000000000000004j)
This result is incorrect, as it should be (5 - 2j). Is there any way to somehow bypass Python's approximation? And why is it that this problem has not risen for many other rational integers under 100?
Thanks in advance for your help.

Internally, CPython uses a pair of double-precision floats for complex numbers. The behavior of numerical solutions in general is too complicated to summarize here, but some error is unavoidable in numerical calculations.
EG:
>>>print(.3/3)
0.09999999999999999
As such, it is often correct to use approximate equality rather than actual equality when testing solutions of this kind.
The isclose function in the cmath module is available for this exact reason.
>>>print(.3/3 == .1)
False
>>>print(isclose(.3/3, .1))
True
This kind of question is the domain of Numerical Analysis; this may be a useful tag for further questions on this subject.
Note that it is considered 'pythonic' for function identifiers to be in snake_case.
from cmath import isclose
def is_gaussian(z):
#returns True if the complex number is a Gaussian integer
rounded = complex(round(z.real), round(z.imag))
return isclose(rounded, z)

You could define an epsilon, by using round to round to the desired number of decimal places/precision (e.g. 10):
def IsGaussian(z, prec=10):
# returns True if the complex number is a Gaussian integer
# rounds the input number to the `prec` number of digits
z = complex(round(z.real,prec), round(z.imag,prec))
return complex(int(z.real), int(z.imag)) == z
Your code has another issue though:
if IsGaussian(n / complex(x, y)) == n:
This will only give results for n = 0 or n = 1. You probably want to remove the check for equality.

numpy mean of complex numbers with infinities

numpy seems to not be a good friend of complex infinities
While we can evaluate:
In[2]: import numpy as np
In[3]: np.mean([1, 2, np.inf])
Out[3]: inf
The following result is more cumbersome:
In[4]: np.mean([1 + 0j, 2 + 0j, np.inf + 0j])
Out[4]: (inf+nan*j)
...\_methods.py:80: RuntimeWarning: invalid value encountered in cdouble_scalars
ret = ret.dtype.type(ret / rcount)
I'm not sure the imaginary part make sense to me. But please do comment if I'm wrong.
Any insight into interacting with complex infinities in numpy?

Solution
To compute the mean we divide the sum by a real number. This division causes problems because of type promotion (see below). To avoid type promotion we can manually perform this division separately for the real and imaginary part of the sum:
n = 3
s = np.sum([1 + 0j, 2 + 0j, np.inf + 0j])
mean = np.real(s) / n + 1j * np.imag(s) / n
print(mean) # (inf+0j)
Rationale
The issue is not related to numpy but to the way complex division is performed. Observe that ((1 + 0j) + (2 + 0j) + (np.inf + 0j)) / (3+0j) also results in (inf+nanj).
The result needs to be split into a real and imagenary part. For division both operands are promoted to complex, even if you divide by a real number. So basically the division is:
a + bj
--------
c + dj
The division operation does not know that d=0. So to split the result into real and imaginary it has to get rid of the j in the denominator. This is done by multiplying numerator and denominator with the complex conjugate:
a + bj (a + bj) * (c - dj) ac + bd + bcj - adj
-------- = --------------------- = ---------------------
c + dj (c + dj) * (c - dj) c**2 + d**2
Now, if a=inf and d=0 the term a * d * j = inf * 0 * j = nan * j.

when you run the function with a np.inf in your array the result will be the infinity object for np.mean or another functions like np.max(). But in this case for calculating the mean(), since you have complex numbers and an infinity complex numbers is defined as an infinite number in the complex plane whose complex argument is unknown or undefined, you're getting non*j as the imaginary part.
In order to get around this problem, you should ignore the infinity items in such mathematical operations. You can use isfinite() function to detect them and apply the function on finite items:
In [16]: arr = np.array([1 + 0j, 2 + 0j, np.inf + 0j])
In [17]: arr[np.isfinite(arr)]
Out[17]: array([ 1.+0.j, 2.+0.j])
In [18]: np.mean(arr[np.isfinite(arr)])
Out[18]: (1.5+0j)

Because of type promotion.
When you do the division of a complex by a real, like (inf + 0j) / 2, the (real) divisor gets promoted to 2 + 0j.
And by complex division, the imaginary part is equal to (0 * 2 - inf * 0) / 4. Note the inf * 0 here which is an indeterminate form, and it evaluates to NaN. This makes the imaginary part NaN.
And back to the topic. When numpy calculates the mean of a complex array, it really doesn't try to do anything clever. First it reduces the array with the "addition" operation, obtaining the sum. After that, the sum is divided by the count. This sum contains an inf in the real part, which causes the trouble described above when the divisor (count) gets promoted from integral type to complex floating point.
Edit: a word about solution
The IEEE floating point "infinity" is really a very primitive construct that represents indeterminate forms like 1 / 0. These forms are not constant numbers, but possible limits. The special inf or NaN "floating point numbers" are placeholders that notifies you about the presence of indeterminate forms. They do nothing about the existence or type of the limit, which you must determine by the mathematical context.
Even for real numbers, the underlying limit can depend on how you approach the limit. A superficial 1 / 0 form can go to positive or negative infinity. On the complex plane, things are even more complex (well). For example, you may run into branch cuts and (different kinds of) singularities. There's no universal solution that fits all.
Tl;dr: Fix the underlying problem in the face of ambiguous/incomplete/corrupted data, or prove that the end computational result can withstand such corruption (which can happen).

get the program to recognize if its an integers or a real numbers (python)

thankyou for your help.
i am very new to programming, but have decided to learn Python. i am doing a program that can check if a number is a prime. this is mathematically done by checking if (x-1)^p -(x^p-1) is devisible by p (Capable of being divided, with no remainder) then p is a prime.
However i have run into trouble. this is my code so far:
from sympy import *
x=symbols('x')
p=11
f=(pow(x - 1, p)) - (pow(x, p) - 1) # (x-1)^p -(x^p-1)
f1=expand(f)
>>> -11*x**10 + 55*x**9 - 165*x**8 + 330*x**7 - 462*x**6 + 462*x**5 - 330*x**4 + 165*x**3 - 55*x**2 + 11*x
f2= f1/p
>>> -x**10 + 5*x**9 - 15*x**8 + 30*x**7 - 42*x**6 + 42*x**5 - 30*x**4 + 15*x**3 - 5*x**2 + x
to tell if the number p is a prime i need to check if the coefficients of the polynomium is divisible by p. so i have to check if the coefficients of f2 is whole numbers or real numbers.
this is what i would like to make a program that can check: https://www.youtube.com/watch?v=HvMSRWTE2mI
i have tried making it into int but it still shows fractions like 1/2 and 3/7. i wish that it will only show whole numbers.
how do i make it so?

What the method effective does is expand the polynomial and drop the first (x^p) and last coefficients (x^0). Then you have to iterate through the rest and check for divisibility. Since a polynomial expansion of power p produces p+1 terms (from 0 to p), we want to collect p-2 terms (from 1 to p-1). This is all summed up in the following code.
from sympy.abc import x
def is_prime_sympy(p):
poly = pow((x - 1), p).expand()
return not any(poly.coeff(x, i) % p for i in xrange(1, p))
This works, but the higher the number you input, e.g. 1013, the longer you'll notice it takes. Sympy is slow because internally it stores all expressions as some classes and all multiplications and additions take a long time. We can simply generate the coefficients using Pascal's triangle. For the polynomial (x - 1)^p, the coefficients are supposed to change sign, but we don't care about that. We just want the raw numbers. Credits to Copperfield for pointing out you only need half of the coefficients because of symmetry.
import math
def combination(n, r):
return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))
def pascals_triangle(row):
# only generate half of the coefficients because of symmetry
return (combination(row, term) for term in xrange(1, (row+1)//2))
def is_prime_math(p):
return not any(c % p for c in pascals_triangle(p))
We can time both methods now to see which one is faster.
import time
def benchmark(p):
t0 = time.time()
is_prime_math(p)
t1 = time.time()
is_prime_sympy(p)
t2 = time.time()
print 'Math: %.3f, Sympy: %.3f' % (t1-t0, t2-t1)
And some tests.
>>> benchmark(512)
Math: 0.001, Sympy: 0.241
>>> benchmark(2003)
Math: 3.852, Sympy: 41.695
We know that 512 is not a prime. The very second term we have to check for divisibility fails the test, so most of the time is actually spent generating the coefficients. Python lazily computes them while sympy must expand the whole polynomial out before we can start collecting them. This shows as that a generator approach is preferable.
2003 is prime and here we notice sympy performs 10 times as slowly. In fact, all of the time is spent generating the coefficients, as iterating over 2000 elements for a modulo operation takes no time. So if there are any further optimisations, that's where one should focus.
numpy.poly1d()
Numpy has a class that can manipulate polynomial coefficients and it's exactly what we want. It even works relatively fast for powers up to 50k. However, in its original implementation it's useless to us. That is because the coefficients are stored as signed int32, which means very quickly they will overflow and our modulo operations will be thrown off. In fact, it'll fail for even 37.
But it's fast, though, right? Maybe if we can hack it so it accepts infite precision integers... Maybe it's possible, maybe it isn't. But even if it is, we have to consider that maybe the reason why it is so fast is exactly because it uses a fixed precision type under the hood.
For the sake of curiosity, this is what the implementation would look like if it were any useful.
import numpy as np
def is_prime_numpy(p):
poly = pow(np.poly1d([1, -1]), p)
return not any(c % p for c in poly.coeffs[1:-1])
And for the curious ones, the source code is located in ...\numpy\lib\polynomial.py.

I am not sure if I understood what you mean, but for checking if a number is an integer or float you can use isinstance:
>>> isinstance(1/2.0, float)
>>> True
>>> isinstance(1/2, float)
>>> False

Checking if float is equivalent to an integer value in python

In Python 3, I am checking whether a given value is triangular, that is, it can be represented as n * (n + 1) / 2 for some positive integer n.
Can I just write:
import math
def is_triangular1(x):
num = (1 / 2) * (math.sqrt(8 * x + 1) - 1)
return int(num) == num
Or do I need to do check within a tolerance instead?
epsilon = 0.000000000001
def is_triangular2(x):
num = (1 / 2) * (math.sqrt(8 * x + 1) - 1)
return abs(int(num) - num) < epsilon
I checked that both of the functions return same results for x up to 1,000,000. But I am not sure if generally speaking int(x) == x will always correctly determine whether a number is integer, because of the cases when for example 5 is represented as 4.99999999999997 etc.
As far as I know, the second way is the correct one if I do it in C, but I am not sure about Python 3.

There is is_integer function in python float type:
>>> float(1.0).is_integer()
True
>>> float(1.001).is_integer()
False
>>>

Both your implementations have problems. It actually can happen that you end up with something like 4.999999999999997, so using int() is not an option.
I'd go for a completely different approach: First assume that your number is triangular, and compute what n would be in that case. In that first step, you can round generously, since it's only necessary to get the result right if the number actually is triangular. Next, compute n * (n + 1) / 2 for this n, and compare the result to x. Now, you are comparing two integers, so there are no inaccuracies left.
The computation of n can be simplified by expanding
(1/2) * (math.sqrt(8*x+1)-1) = math.sqrt(2 * x + 0.25) - 0.5
and utilizing that
round(y - 0.5) = int(y)
for positive y.
def is_triangular(x):
n = int(math.sqrt(2 * x))
return x == n * (n + 1) / 2

You'll want to do the latter. In Programming in Python 3 the following example is given as the most accurate way to compare
def equal_float(a, b):
#return abs(a - b) <= sys.float_info.epsilon
return abs(a - b) <= chosen_value #see edit below for more info
Also, since epsilon is the "smallest difference the machine can distinguish between two floating-point numbers", you'll want to use <= in your function.
Edit: After reading the comments below I have looked back at the book and it specifically says "Here is a simple function for comparing floats for equality to the limit of the machines accuracy". I believe this was just an example for comparing floats to extreme precision but the fact that error is introduced with many float calculations this should rarely if ever be used. I characterized it as the "most accurate" way to compare in my answer, which in some sense is true, but rarely what is intended when comparing floats or integers to floats. Choosing a value (ex: 0.00000000001) based on the "problem domain" of the function instead of using sys.float_info.epsilon is the correct approach.
Thanks to S.Lott and Sven Marnach for their corrections, and I apologize if I led anyone down the wrong path.

Python does have a Decimal class (in the decimal module), which you could use to avoid the imprecision of floats.

floats can exactly represent all integers in their range - floating-point equality is only tricky if you care about the bit after the point. So, as long as all of the calculations in your formula return whole numbers for the cases you're interested in, int(num) == num is perfectly safe.
So, we need to prove that for any triangular number, every piece of maths you do can be done with integer arithmetic (and anything coming out as a non-integer must imply that x is not triangular):
To start with, we can assume that x must be an integer - this is required in the definition of 'triangular number'.
This being the case, 8*x + 1 will also be an integer, since the integers are closed under + and * .
math.sqrt() returns float; but if x is triangular, then the square root will be a whole number - ie, again exactly represented.
So, for all x that should return true in your functions, int(num) == num will be true, and so your istriangular1 will always work. The only sticking point, as mentioned in the comments to the question, is that Python 2 by default does integer division in the same way as C - int/int => int, truncating if the result can't be represented exactly as an int. So, 1/2 == 0. This is fixed in Python 3, or by having the line
from __future__ import division
near the top of your code.

I think the module decimal is what you need

You can round your number to e.g. 14 decimal places or less:
>>> round(4.999999999999997, 14)
5.0
PS: double precision is about 15 decimal places

It is hard to argue with standards.
In C99 and POSIX, the standard for rounding a float to an int is defined by nearbyint() The important concept is the direction of rounding and the locale specific rounding convention.
Assuming the convention is common rounding, this is the same as the C99 convention in Python:
#!/usr/bin/python
import math
infinity = math.ldexp(1.0, 1023) * 2
def nearbyint(x):
"""returns the nearest int as the C99 standard would"""
# handle NaN
if x!=x:
return x
if x >= infinity:
return infinity
if x <= -infinity:
return -infinity
if x==0.0:
return x
return math.floor(x + 0.5)
If you want more control over rounding, consider using the Decimal module and choose the rounding convention you wish to employ. You may want to use Banker's Rounding for example.
Once you have decided on the convention, round to an int and compare to the other int.

Consider using NumPy, they take care of everything under the hood.
import numpy as np
result_bool = np.isclose(float1, float2)

Python has unlimited integer precision, but only 53 bits of float precision. When you square a number, you double the number of bits it requires. This means that the ULP of the original number is (approximately) twice the ULP of the square root.
You start running into issues with numbers around 50 bits or so, because the difference between the fractional representation of an irrational root and the nearest integer can be smaller than the ULP. Even in this case, checking if you are within tolerance will do more harm than good (by increasing the number of false positives).
For example:
>>> x = (1 << 26) - 1
>>> (math.sqrt(x**2)).is_integer()
True
>>> (math.sqrt(x**2 + 1)).is_integer()
False
>>> (math.sqrt(x**2 - 1)).is_integer()
False
>>> y = (1 << 27) - 1
>>> (math.sqrt(y**2)).is_integer()
True
>>> (math.sqrt(y**2 + 1)).is_integer()
True
>>> (math.sqrt(y**2 - 1)).is_integer()
True
>>> (math.sqrt(y**2 + 2)).is_integer()
False
>>> (math.sqrt(y**2 - 2)).is_integer()
True
>>> (math.sqrt(y**2 - 3)).is_integer()
False
You can therefore rework the formulation of your problem slightly. If an integer x is a triangular number, there exists an integer n such that x = n * (n + 1) // 2. The resulting quadratic is n**2 + n - 2 * x = 0. All you need to know is if the discriminant 1 + 8 * x is a perfect square. You can compute the integer square root of an integer using math.isqrt starting with python 3.8. Prior to that, you could use one of the algorithms from Wikipedia, implemented on SO here.
You can therefore stay entirely in python's infinite-precision integer domain with the following one-liner:
def is_triangular(x):
return math.isqrt(k := 8 * x + 1)**2 == k
Now you can do something like this:
>>> x = 58686775177009424410876674976531835606028390913650409380075
>>> math.isqrt(k := 8 * x + 1)**2 == k
True
>>> math.isqrt(k := 8 * (x + 1) + 1)**2 == k
False
>>> math.sqrt(k := 8 * x + 1)**2 == k
False
The first result is correct: x in this example is a triangular number computed with n = 342598234604352345342958762349.

Python still uses the same floating point representation and operations C does, so the second one is the correct way.

Under the hood, Python's float type is a C double.
The most robust way would be to get the nearest integer to num, then test if that integers satisfies the property you're after:
import math
def is_triangular1(x):
num = (1/2) * (math.sqrt(8*x+1)-1 )
inum = int(round(num))
return inum*(inum+1) == 2*x # This line uses only integer arithmetic

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.