How to generate exponential variate without negative numbers in Python?
I tried to use this code but it generates negative number
>>> import random
>>> int(random.expovariate(0.28)
5
I thought about using if statement, but it'll affect my randomness and my final result.
From the documentation of random.expovariate:
Exponential distribution. lambd is 1.0 divided by the desired mean. It should be nonzero. (The parameter would be called “lambda”, but that is a reserved word in Python.) Returned values range from 0 to positive infinity if lambd is positive, and from negative infinity to 0 if lambd is negative.
If you want non-negative results, use non-negative arguments.
Related
I checked the numpy library and found the following definition for the standard deviation in numpy:
std = sqrt(mean(abs(x - x.mean())**2))
Why is the abs() function used? - Because mathematically the square of a number will be positive per definition.
So I thought:
abs(x - x.mean())**2 == (x - x.mean())**2
The square of a real number is always positive, but this is not true for complex numbers.
A very simple example: j**2=-1
A more complex (pun intended) example: (3-2j)**2=(5-12j)
From documentation:
Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
Note:
Python uses j for the imaginary unit, while mathematicians uses i.
For an introduction to Python course, I'm looking at generating a random floating point number in Python, and I have seen a standard recommended code of
import random
lower = 5
upper = 10
range_width = upper - lower
x = random.random() * range_width + lower
for a random floating point from 5 up to but not including 10.
It seems to me that the same effect could be achieved by:
import random
x = random.randrange(5, 10) + random.random()
Since that would give an integer of 5, 6, 7, 8, or 9, and then tack on a decimal to it.
The question I have is would this second code still give a fully even probability distribution, or would it not keep the full randomness of the first version?
According to the documentation then yes random() is indeed a uniform distribution.
random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator.
So both code examples should be fine. To shorten your code, you can equally do:
random.uniform(5, 10)
Note that uniform(a, b) is simply a + (b - a) * random() so the same as your first example.
The second example depends on the version of Python you're using.
Prior to 3.2 randrange() could produce a slightly uneven distributions.
There is a difference. Your second method is theoretically superior, although in practice it only matters for large ranges. Indeed, both methods will give you a uniform distribution. But only the second method can return all values in the range that are representable as a floating point number.
Since your range is so small, there is no appreciable difference. But still there is a difference, which you can see by considering a larger range. If you take a random real number between 0 and 1, you get a floating-point representation with a given number of bits. Now suppose your range is, say, in the order of 2**32. By multiplying the original random number by this range, you lose 32 bits of precision in the result. Put differently, there will be gaps between the values that this method can return. The gaps are still there when you multiply by 4: You have lost the two least significant bits of the original random number.
The two methods can give different results, but you'll only notice the difference in fairly extreme situations (with very wide ranges). For instance, If you generate random numbers between 0 and 2/sys.float_info.epsilon (9007199254740992.0, or a little more than 9 quintillion), you'll notice that the version using multiplication will never give you any floats with fractional values. If you increase the maximum bound to 4/sys.float_info.epsilon, you won't get any odd integers, only even ones. That's because the 64-bit floating point type Python uses doesn't have enough precision to represent all integers at the upper end of that range, and it's trying to maintain a uniform distribution (so it omits small odd integers and fractional values even though those can be represented in parts of the range).
The second version of the calculation will give extra precision to the smaller random numbers generated. For instance, if you're generating numbers between 0 and 2/sys.float_info.epsilon and the randrange call returned 0, you can use the full precision of the random call to add a fractional part to the number. On the other hand if the randrange returned the largest number in the range (2/sys.float_info.epsilon - 1), very little of the precision of the fraction would be used (the number will round to the nearest integer without any fractional part remaining).
Adding a fractional value also can't help you deal with ranges that are too large for every integer to be represented. If randrange returns only even numbers, adding a fraction usually won't make odd numbers appear (it can in some parts of the range, but not for others, and the distribution may be very uneven). Even for ranges where all integers can be represented, the odds of a specific floating point number appearing will not be entirely uniform, since the smaller numbers can be more precisely represented. Large but imprecise numbers will be more common than smaller but more precisely represented ones.
I am currently translating a MATLAB program into Python. I successfully ported all the previous vector operations using numpy. However I am stuck in the following bit of code which is a cosine similarity measure.
% W and ind are different sized matrices
dist = full(W * (W(ind2(range),:)' - W(ind1(range),:)' + W(ind3(range),:)'));
for i=1:length(range)
dist(ind1(range(i)),i) = -Inf;
dist(ind2(range(i)),i) = -Inf;
dist(ind3(range(i)),i) = -Inf;
end
disp(dist)
[~, mx(range)] = max(dist);
I did not understand the following part.
dist(indx(range(i)),i) = -Inf;
What actuality is happening when you use
= -Inf;
on the right side?
In Matlab (see: Inf):
Inf returns the IEEE® arithmetic representation for positive infinity.
So Inf produces a value that is greater than all other numeric values. -Inf produces a value that is guaranteed to be less than any other numeric value. It's generally used when you want to iteratively find a maximum and need a first value to compare to that's always going to be less than your first comparison.
According to Wikipedia (see: IEEE 754 Inf):
Positive and negative infinity are represented thus:
sign = 0 for positive infinity, 1 for negative infinity.
biased exponent = all 1 bits.
fraction = all 0 bits.
Python has the same concept using '-inf' (see Note 6 here):
float also accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity.
>>> a=float('-inf')
>>> a
-inf
>>> b=-27983.444
>>> min(a,b)
-inf
It just assigns a minus infinity value to the left-hand side.
It may appear weird to assign that value, particularly because a distance cannot be negative. But it looks like it's used for effectively removing those entries from the max computation in the last line.
If Python doesn't have "infinity" (I don't know Python) and if dist is really a distance (hence nonnegative) , you could use any negative value instead of -inf to achieve the same effect, namely remove those entries from the max computation.
The -Inf is typically used to initialize a variable such that you later can use it to in a comparison in a loop.
For instance if I want to find the the maximum value in a function (and have forgotten the command max). Then I would have made something like:
function maxF = findMax(f,a,b)
maxF = -Inf;
x = a:0.001:b;
for i = 1:length(x)
if f(x) > maxF
maxF = f(x);
end
end
It is a method in matlab to make sure that any other value is larger than the current. So the comparison in Python would be -sys.maxint +1.
See for instance:
Maximum and Minimum values for ints
What is python's threshold of representable negative numbers? What's the lowest number below which Python will call any other value a - negative inifinity?
There is no most negative integer, as Python integers have arbitrary precision. The smallest float greater than negative infinity (which, depending on your implementation, can be represented as -float('inf')) can be found in sys.float_info.
>>> import sys
>>> sys.float_info.max
1.7976931348623157e+308
The actual values depend on the actual implementation, but typically uses your C library's double type. Since floating-point values typically use a sign bit, the smallest negative value is simply the inverse of the largest positive value. Also, because of how floating point values are stored (separate mantissa and exponent), you can't simply subtract a small value from the "minimum" value and get back negative infinity. Subtracting 1, for example, simply returns the same value due to limited precision.
(In other words, the possible float values are a small subset of the actual real numbers, and operations on two float values is not necessarily equivalent to the same operation on the "equivalent" reals.)
The Harmonic Mean function in Python (scipy.stats.hmean) requires that the input be positive numbers.
For example:
from scipy import stats
print stats.hmean([ -50.2 , 100.5 ])
results in:
ValueError: Harmonic mean only defined if all elements greater than zero
I don't mathematically see why this should be the case, except for the rare instance where you would end up dividing by zero. Instead of checking for a divide by zero, hmean() then throws an error upon inputing any positive number, whether a harmonic mean can be found or not.
Am I missing something here in the maths? Or is this really a limitation in SciPy?
How would you go about finding the harmonic mean of a set of numbers which might be positive or negative in python?
The harmonic mean is only defined for sets of positive real numbers. If you try and compute it for sets with negatives you get all kinds of strange and useless results even if you don't hit div by 0. For example, applying the formula to the set (3, -3, 4) gives a mean of 12!
You can just use the Harmonic Mean define equation:
len(a) / np.sum(1.0/a)
But, wikipedia says that harmonic mean is defined for positive real numbers:
http://en.wikipedia.org/wiki/Harmonic_mean
There is a statistics library if you are using Python >= 3.6:
https://docs.python.org/3/library/statistics.html
You may use its mean method like this. Let's say you have a list of numbers of which you want to find mean:
list = [11, 13, 12, 15, 17]
import statistics as s
s.harmonic_mean(list)
It has other methods too like stdev, variance, mode, mean, median etc which too are useful.
the mathematical definition of harmonic mean itself does not forbid applications to negative numbers (although you may not want to calculate the harmonic mean of +1 and -1), however, it is designed to calculate the mean for quantities like ratios so that it would give equal weight to each data point, while in arithmetic means or such the ratio of extreme data points would acquire much high weight and is thus undesired.
So you either could try to hardcode the definition by yourself like #HYRY suggested, or may have applied the harmonic mean in the wrong context.