harmonic mean in python - python

The Harmonic Mean function in Python (scipy.stats.hmean) requires that the input be positive numbers.
For example:
from scipy import stats
print stats.hmean([ -50.2 , 100.5 ])
results in:
ValueError: Harmonic mean only defined if all elements greater than zero
I don't mathematically see why this should be the case, except for the rare instance where you would end up dividing by zero. Instead of checking for a divide by zero, hmean() then throws an error upon inputing any positive number, whether a harmonic mean can be found or not.
Am I missing something here in the maths? Or is this really a limitation in SciPy?
How would you go about finding the harmonic mean of a set of numbers which might be positive or negative in python?

The harmonic mean is only defined for sets of positive real numbers. If you try and compute it for sets with negatives you get all kinds of strange and useless results even if you don't hit div by 0. For example, applying the formula to the set (3, -3, 4) gives a mean of 12!

You can just use the Harmonic Mean define equation:
len(a) / np.sum(1.0/a)
But, wikipedia says that harmonic mean is defined for positive real numbers:
http://en.wikipedia.org/wiki/Harmonic_mean

There is a statistics library if you are using Python >= 3.6:
https://docs.python.org/3/library/statistics.html
You may use its mean method like this. Let's say you have a list of numbers of which you want to find mean:
list = [11, 13, 12, 15, 17]
import statistics as s
s.harmonic_mean(list)
It has other methods too like stdev, variance, mode, mean, median etc which too are useful.

the mathematical definition of harmonic mean itself does not forbid applications to negative numbers (although you may not want to calculate the harmonic mean of +1 and -1), however, it is designed to calculate the mean for quantities like ratios so that it would give equal weight to each data point, while in arithmetic means or such the ratio of extreme data points would acquire much high weight and is thus undesired.
So you either could try to hardcode the definition by yourself like #HYRY suggested, or may have applied the harmonic mean in the wrong context.

Related

Python won't show me np.exp marketshare values

I'm trying to estimate marketshares with the following formula:
c = np.exp(-Mu*a)/(np.exp(-Mu*a)+np.exp(-Mu*b))
in which a and b are 9x9 matrices with cell values that can be larger than 1000. Because the numbers are so small, Python returns NaN values. In order to enhance precision of the estimation i have already tried np.float128 but all this does is raise the error that numpy doesn't have an attribute called float128. I have also tried longdouble, again without success. Are there other ways to make Python show the actual values of the cells instead of NaN?
You have:
c = np.exp(-Mu*a)/(np.exp(-Mu*a)+np.exp(-Mu*b))
Multipying the numerator and denominator by e^(Mu*a), you get:
c = 1/(1+np.exp(Mu*(a-b)))
This is just a reformulation of the same formula.
Now, if the exp term is still too small, and you do not need a more precise result, then your c is approximately very close to 1. And if you still need to control precision, you can take log on both sides and use the Taylor expansion of log(1+x).

Why does numpy.std() use abs()?

I checked the numpy library and found the following definition for the standard deviation in numpy:
std = sqrt(mean(abs(x - x.mean())**2))
Why is the abs() function used? - Because mathematically the square of a number will be positive per definition.
So I thought:
abs(x - x.mean())**2 == (x - x.mean())**2
The square of a real number is always positive, but this is not true for complex numbers.
A very simple example: j**2=-1
A more complex (pun intended) example: (3-2j)**2=(5-12j)
From documentation:
Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
Note:
Python uses j for the imaginary unit, while mathematicians uses i.

How to generate exponential variate without negative numbers in Python?

How to generate exponential variate without negative numbers in Python?
I tried to use this code but it generates negative number
>>> import random
>>> int(random.expovariate(0.28)
5
I thought about using if statement, but it'll affect my randomness and my final result.
From the documentation of random.expovariate:
Exponential distribution. lambd is 1.0 divided by the desired mean. It should be nonzero. (The parameter would be called “lambda”, but that is a reserved word in Python.) Returned values range from 0 to positive infinity if lambd is positive, and from negative infinity to 0 if lambd is negative.
If you want non-negative results, use non-negative arguments.

How to prevent division by zero or replace infinite values in Theano?

I'm using a cost function in Theano which involves a regularizer term that requires me to compute this term:
T.sum(c / self.squared_euclidean_distances)
as some values of self.squared_euclidean_distances might be zero this produces Nan values. How can i work around this problem? I tried to use T.isinf but were not successful. One solution would be to remove zeros in self.squared_euclidean_distances into a small number or replace infinite numbers in T.sum(c / self.squared_euclidean_distances) to zero. I just don't know how to replace those values in Theano.
Take a look at T.switch. You could do for example
T.switch(T.eq(self.squared_euclidean_distances, 0), 0, c / self.squared_euclidean_distances)
(Or, upstream, you make sure that you never compare a vector with itself using squared euclidean distance.)

python value becomes zero, how to prevent

I have a numerical problem while doing likelihood ratio tests in python. I'll not go into too much detail about what the statistics mean, my problems comes down to calculating this:
LR = LR_H0 / LR_h1
where LR is the number of interest and LR_H0 and LR_H1 are numbers that can be VERY close to zero. This leads to a few numerical issues; if LR_H1 is too small then python will recognise this as a division by zero.
ZeroDivisionError: float division by zero
Also, although this is not the main issue, if LR_H1 is small enough to allow the division then the fraction LR_H0 / LR_h1 might become too big (I'm assuming that python also has an upper limit value of what a float can be).
Any tips on what the best way is to circumvent this problem? I'm considering doing something like:
def small_enough( num ):
if num == 0.0:
return *other small number*
else:
return num
But this is not ideal because it would approximate the LR value and I would like to guarantee some precision.
Work with logarithms. Take the log of all your likelihoods, and add or subtract logarithms instead of multiplying and dividing. You'll be able to work with much greater ranges of values without losing precision.

Categories

Resources