What is the difference between uniform() and triangular()? - python

As far as i've understood, the random module has a set of methods, among which both uniform() and triangular() return a random float number between two given parameters. Is there no difference between them? Is there any specific case in which we should use one over the other?

Documentation
Documentation is your friend
random.uniform(a, b)
Return a random floating point number N such that a <= N <= b for a <= b and b <= N <= a for b < a.
The end-point value b may or may not be included in the range depending on floating-point rounding in the equation a + (b-a) * random().
random.triangular(low, high, mode)
Return a random floating point number N such that low <= N <= high and with the specified mode between those bounds. The low and high bounds default to zero and one. The mode argument defaults to the midpoint between the bounds, giving a symmetric distribution
Difference
So random.triangular models a triangular distribution which has strong similarities to a uniform distribution.
The main difference in the usage of the methods is the mode argument of random.triangular which provides more a granular level of control over the distribution.
For a regular Uni(a,b) distribution, you should probably use random.uniform. According to Wiki, even when the mode is set to 0, a triangular distribution is not the same as a uniform distribution
This distribution for a = 0, b = 1 and c = 0 is the distribution of X = |X1 − X2|, where X1, X2 are two independent random variables with standard uniform distribution.
So random.triangular should be used precisely for using triangular distributions.
Example
Here is a simple example highlighting the differences/similarities of both methods
import random
# Set Seed for reproducibility
random.seed(123)
# Parameters
lo: int = 0
hi: int = 100
mode: int = 10
sample_size: int = int(1e+6)
# Samples
uni = (random.uniform(lo, hi) for _ in range(sample_size))
tri1 = (random.triangular(lo, hi, mode) for _ in range(sample_size))
tri2 = (random.triangular(lo, hi) for _ in range(sample_size))
# Printing averages
print(round(sum(uni) / sample_size, 2))
# 50.01
print(round(sum(tri1) / sample_size, 2))
# 36.68
print(round(sum(tri2) / sample_size, 2))
# 50.0

Related

Is there a fast running rolling standard deviation algorithm?

I have a Python script in which for every new sample i have to update the standard deviation of this samples array using a rolling window of length N. Using the simple formula of the standard deviation the code is really slow. I found many different solutions for online calculation but all of them are NOT considering a rolling window for the update. Some of alternative way for computing the variance are explained here (Welford algorithm, parallel, ...)
https://en.m.wikipedia.org/wiki/Algorithms_for_calculating_variance
but none of them are actually using a rolling window through data set.
What i'm looking for is a fast algorithm which won't be prone to catastrophic cancellation phenomenon.
Formulas will be appreciated.
Thanks for your help guys.
Here's an adaptation of the code at the link I put in a comment. It takes O(1) (constant) time for each element added, regardless of window size. Note that if you configure for a window of N elements, the first N-1 results are more-than-less gibberish: it initializes the data to N zeroes. The class also saves the most recent N entries in a collections.deque of maximum size N. Note that this computes the "sample" standard deviation, not "population". Season to taste ;-)
from collections import deque
from math import sqrt
class Running:
def __init__(self, n):
self.n = n
self.data = deque([0.0] * n, maxlen=n)
self.mean = self.variance = self.sdev = 0.0
def add(self, x):
n = self.n
oldmean = self.mean
goingaway = self.data[0]
self.mean = newmean = oldmean + (x - goingaway) / n
self.data.append(x)
self.variance += (x - goingaway) * (
(x - newmean) + (goingaway - oldmean)) / (n - 1)
self.sdev = sqrt(self.variance)
Here's a by-eyeball sanity check. Seems fine. But note that the statistics module makes heroic (and slow!) efforts to maximize floating-point accuracy. The code above just accepts accumulating half a dozen fresh rounding errors per element added.
import statistics
from random import random
from math import ulp
r = Running(50)
for i in range(1000000):
r.add(random() * 100)
assert len(r.data) == 50
if i % 1000 == 0.0:
a, b = r.mean, statistics.mean(r.data)
print(i, "mean", a, b, (a - b) / ulp(b))
a, b = r.sdev, statistics.stdev(r.data)
print(i, "sdev", a, b, (a - b) / ulp(b))
Sample output (will vary across runs):
0 mean 1.4656985567210468 1.4656985567210468 0.0
0 sdev 10.364053886327875 10.364053886327877 -1.0
1000 mean 50.73313401192864 50.73313401192878 -20.0
1000 sdev 31.06576415649153 31.06576415649151 5.0
2000 mean 50.4175663202043 50.41756632020437 -10.0
2000 sdev 27.692406266774878 27.69240626677488 -1.0
3000 mean 53.054435599235525 53.0544355992356 -11.0
3000 sdev 32.439246859431556 32.439246859431606 -7.0
4000 mean 51.66216784517698 51.662167845177 -3.0
4000 sdev 31.026902004950404 31.02690200495047 -18.0
5000 mean 54.08949367166644 54.089493671666425 2.0
5000 sdev 29.405357061221196 29.40535706122128 -24.0
...
I have a function that I use for standard deviation. I modified it from a php function that calculated standard deviation. You can input an array (or a slice of an array) into it, and it will calculate the standard deviation for that array or slice.
def calc_std_dev(lst, precision=0, sample=True):
"""
:param: lst A Python list containing values
:param: precision The number of decimal places desired.
:param: sample Is the data a sample or a population? Set to True by
default.
"""
sum = 0;
length = len(lst)
if length == 0:
print("The array has zero elements.")
return false
elif (length == 1) and (sample == True):
print("The array has only 1 element.")
return false
else:
sum = math.fsum(lst)
# Calculate the arithmetic mean
mean = sum / length
carry = 0.0
for i in lst:
dev = i - mean
carry += dev * dev
if sample == True:
length = length - 1
variance = carry / length
std_dev = math.sqrt(variance)
std_dev = round(std_dev, precision)
return std_dev
When I need a rolling standard deviation, I pass in a slice of the total list to calculate the value.

How can I make this python code run faster?

I have two questions:
1- This code takes too long to execute. Any idea how I can make it faster?
With the code bellow I want generate 100 random discrete values between 700 and 1200.
I choosed the weibull distribution because I wanted to generate failure rates data please see the histogram bellow.
import random
nums = []
alpha = 0.6
beta = 0.4
while len(nums) !=100:
temp = int(random.weibullvariate(alpha, beta))
if 700 <= temp <1200:
nums.append(temp)
print(nums)
# plotting a graph
#plt.hist(nums, bins = 200)
#plt.show()
print(nums)
I wanted to generate a histogram like this one:
Histogram
2- I have this function for discrete weibull distribution
def DiscreteWeibull(q, b, x):
return q**(x**b) - q**((x + 1)**b)
How can I generate random values that follow this distribution?
Since the Weibull distribution with shape parameter K and scale parameter lambda can be characterized as this function on the Uniform (0,1) dist. U, we can 'cut' the distribution to a desired minimum and maximum value. We do this by inverting the equation, setting W to 700 or 1200, and finding the values between 0 and 1 that correspond. Here's some sample code.
def weibull_from_uniform(shape, scale, x):
assert 0 <= x <= 1
return scale * pow(-1 * math.log(x), 1.0 / shape)
scale_param = 0.6
shape_param = 0.4
min_value = 700.0
max_value = 1200.0
lower_bound = math.exp(-1 * pow(min_value / scale_param, shape_param))
upper_bound = math.exp(-1 * pow(max_value / scale_param, shape_param))
if lower_bound > upper_bound:
lower_bound, upper_bound = upper_bound, lower_bound
nums = []
while len(nums) < 100:
nums.append(weibull_from_uniform(shape_param, scale_param, random.uniform(lower_bound, upper_bound)))
print(nums)
plt.hist(nums, bins=8)
plt.show()
This code gives a histogram very similar to the one you provided; the method will give values from the same distribution as your original method, just faster. Note that this direct approach only works when our shape parameter K <= 1, so that the density function is strictly decreasing. When K > 1, the Weibull density function increases to a mode, then decreases, so you may need to draw from two uniform intervals for particular min and max values (since inverting for W and U may give two answers).
Your question is not very clear on why you thought using this Weibull distribution was a good idea, nor what distribution you are looking to achieve.
Discrete uniform distribution
Here are two ways to achieve the discrete uniform distribution on [700, 1200).
1) With random
import random
nums = [random.randrange(700, 1200) for _ in range(100)]
2) With numpy
import numpy
nums = numpy.random.randint(700, 1200, 100)
Geometric distribution
You have edited your question with an example histogram, and the mention "I wanted to generate a histogram like this one". The histogram vaguely looks like a geometric distribution.
We can use numpy.random.geometric:
import numpy
n_samples = 100
p = 0.5
a, b = 50, 650
cap = 1200
nums = numpy.random.geometric(p, size = 2 * n_samples) * a + b
nums = nums[numpy.where(nums < cap)][:n_samples]

Calculating inverse trigonometric functions with formulas

I have been trying to create custom calculator for calculating trigonometric functions. Aside from Chebyshev pylonomials and/or Cordic algorithm I have used Taylor series which have been accurate by few places of decimal.
This is what i have created to calculate simple trigonometric functions without any modules:
from __future__ import division
def sqrt(n):
ans = n ** 0.5
return ans
def factorial(n):
k = 1
for i in range(1, n+1):
k = i * k
return k
def sin(d):
pi = 3.14159265359
n = 180 / int(d) # 180 degrees = pi radians
x = pi / n # Converting degrees to radians
ans = x - ( x ** 3 / factorial(3) ) + ( x ** 5 / factorial(5) ) - ( x ** 7 / factorial(7) ) + ( x ** 9 / factorial(9) )
return ans
def cos(d):
pi = 3.14159265359
n = 180 / int(d)
x = pi / n
ans = 1 - ( x ** 2 / factorial(2) ) + ( x ** 4 / factorial(4) ) - ( x ** 6 / factorial(6) ) + ( x ** 8 / factorial(8) )
return ans
def tan(d):
ans = sin(d) / sqrt(1 - sin(d) ** 2)
return ans
Unfortunately i could not find any sources that would help me interpret inverse trigonometric function formulas for Python. I have also tried putting sin(x) to the power of -1 (sin(x) ** -1) which didn't work as expected.
What could be the best solution to do this in Python (In the best, I mean simplest with similar accuracy as Taylor series)? Is this possible with power series or do i need to use cordic algorithm?
The question is broad in scope, but here are some simple ideas (and code!) that might serve as a starting point for computing arctan. First, the good old Taylor series. For simplicity, we use a fixed number of terms; in practice, you might want to decide the number of terms to use dynamically based on the size of x, or introduce some kind of convergence criterion. With a fixed number of terms, we can evaluate efficiently using something akin to Horner's scheme.
def arctan_taylor(x, terms=9):
"""
Compute arctan for small x via Taylor polynomials.
Uses a fixed number of terms. The default of 9 should give good results for
abs(x) < 0.1. Results will become poorer as abs(x) increases, becoming
unusable as abs(x) approaches 1.0 (the radius of convergence of the
series).
"""
# Uses Horner's method for evaluation.
t = 0.0
for n in range(2*terms-1, 0, -2):
t = 1.0/n - x*x*t
return x * t
The above code gives good results for small x (say smaller than 0.1 in absolute value), but the accuracy drops off as x becomes larger, and for abs(x) > 1.0, the series never converges, no matter how many terms (or how much extra precision) we throw at it. So we need a better way to compute for larger x. One solution is to use argument reduction, via the identity arctan(x) = 2 * arctan(x / (1 + sqrt(1 + x^2))). This gives the following code, which builds on arctan_taylor to give reasonable results for a wide range of x (but beware possible overflow and underflow when computing x*x).
import math
def arctan_taylor_with_reduction(x, terms=9, threshold=0.1):
"""
Compute arctan via argument reduction and Taylor series.
Applies reduction steps until x is below `threshold`,
then uses Taylor series.
"""
reductions = 0
while abs(x) > threshold:
x = x / (1 + math.sqrt(1 + x*x))
reductions += 1
return arctan_taylor(x, terms=terms) * 2**reductions
Alternatively, given an existing implementation for tan, you could simply find a solution y to the equation tan(y) = x using traditional root-finding methods. Since arctan is already naturally bounded to lie in the interval (-pi/2, pi/2), bisection search works well:
def arctan_from_tan(x, tolerance=1e-15):
"""
Compute arctan as the inverse of tan, via bisection search. This assumes
that you already have a high quality tan function.
"""
low, high = -0.5 * math.pi, 0.5 * math.pi
while high - low > tolerance:
mid = 0.5 * (low + high)
if math.tan(mid) < x:
low = mid
else:
high = mid
return 0.5 * (low + high)
Finally, just for fun, here's a CORDIC-like implementation, which is really more appropriate for a low-level implementation than for Python. The idea here is that you precompute, once and for all, a table of arctan values for 1, 1/2, 1/4, etc., and then use those to compute general arctan values, essentially by computing successive approximations to the true angle. The remarkable part is that, after the precomputation step, the arctan computation involves only additions, subtractions, and multiplications by by powers of 2. (Of course, those multiplications aren't any more efficient than any other multiplication at the level of Python, but closer to the hardware, this could potentially make a big difference.)
cordic_table_size = 60
cordic_table = [(2**-i, math.atan(2**-i))
for i in range(cordic_table_size)]
def arctan_cordic(y, x=1.0):
"""
Compute arctan(y/x), assuming x positive, via CORDIC-like method.
"""
r = 0.0
for t, a in cordic_table:
if y < 0:
r, x, y = r - a, x - t*y, y + t*x
else:
r, x, y = r + a, x + t*y, y - t*x
return r
Each of the above methods has its strengths and weaknesses, and all of the above code can be improved in a myriad of ways. I encourage you to experiment and explore.
To wrap it all up, here are the results of calling the above functions on a small number of not-very-carefully-chosen test values, comparing with the output of the standard library math.atan function:
test_values = [2.314, 0.0123, -0.56, 168.9]
for value in test_values:
print("{:20.15g} {:20.15g} {:20.15g} {:20.15g}".format(
math.atan(value),
arctan_taylor_with_reduction(value),
arctan_from_tan(value),
arctan_cordic(value),
))
Output on my machine:
1.16288340166519 1.16288340166519 1.16288340166519 1.16288340166519
0.0122993797673 0.0122993797673 0.0122993797673002 0.0122993797672999
-0.510488321916776 -0.510488321916776 -0.510488321916776 -0.510488321916776
1.56487573286064 1.56487573286064 1.56487573286064 1.56487573286064
The simplest way to do any inverse function is to use binary search.
definitions
let assume function
x = g(y)
And we want to code its inverse:
y = f(x) = f(g(y))
x = <x0,x1>
y = <y0,y1>
bin search on floats
You can do it on integer math accessing mantissa bits like in here:
Any Faster RMS Value Calculation in C?
but if you do not know the exponent of the result prior to computation then you need to use floats for bin search too.
so the idea behind binary search is to change mantissa of y from y1 to y0 bit by bit from MSB to LSB. Then call direct function g(y) and if the result cross x revert the last bit change.
In case of using floats you can use variable that will hold approximate value of the mantissa bit targeted instead of integer bit access. That will eliminate unknown exponent problem. So at the beginning set y = y0 and actual bit to MSB value so b=(y1-y0)/2. After each iteration halve it and do as many iterations as you got mantissa bits n... This way you obtain result in n iterations within (y1-y0)/2^n accuracy.
If your inverse function is not monotonic break it into monotonic intervals and handle each as separate binary search.
The function increasing/decreasing just determine the crossing condition direction (use of < or >).
C++ acos example
so y = acos(x) is defined on x = <-1,+1> , y = <0,M_PI> and decreasing so:
double f64_acos(double x)
{
const int n=52; // mantisa bits
double y,y0,b;
int i;
// handle domain error
if (x<-1.0) return 0;
if (x>+1.0) return 0;
// x = <-1,+1> , y = <0,M_PI> , decreasing
for (y= 0.0,b=0.5*M_PI,i=0;i<n;i++,b*=0.5) // y is min, b is half of max and halving each iteration
{
y0=y; // remember original y
y+=b; // try set "bit"
if (cos(y)<x) y=y0; // if result cross x return to original y decreasing is < and increasing is >
}
return y;
}
I tested it like this:
double x0,x1,y;
for (x0=0.0;x0<M_PI;x0+=M_PI*0.01) // cycle all angle range <0,M_PI>
{
y=cos(x0); // direct function (from math.h)
x1=f64_acos(y); // my inverse function
if (fabs(x1-x0)>1e-9) // check result and output to log if error
Form1->mm_log->Lines->Add(AnsiString().sprintf("acos(%8.3lf) = %8.3lf != %8.3lf",y,x0,x1));
}
Without any difference found... so the implementation is working correctly. Of coarse binary search on 52 bit mantissa is usually slower then polynomial approximation ... on the other hand the implementation is so simple ...
[Notes]
If you do not want to take care of the monotonic intervals you can try
approximation search
As you are dealing with goniometric functions you need to handle singularities to avoid NaN or division by zero etc ...
If you're interested here more bin search examples (mostly on integers)
Power by squaring for negative exponents it contains

How to speed up calculation to find closest representation of the form N/2**M

I want to find the closest representation of a floating point number in the form N/2**M in python, where N and M are integers. I attempted to use the minimisation function from scipy.optimise but it cannot be confined to the case where N and M are integers.
I ended up using a simple implementation that iterates through values of M and N and finds the minimum, but this is computationally expensive and time consuming for arrays of many numbers, what might be a better way of doing this?
My simple implementation is shown below:
import numpy as np
def ValueRepresentation(X):
M, Dp = X
return M/(2**Dp)
def Diff(X, value):
return abs(ValueRepresentation(X) - value)
def BestApprox(value):
mindiff = 1000000000
for i in np.arange(0, 1000, 1):
for j in np.arange(0, 60, 1):
diff = Diff([i, j], value)
if diff < mindiff:
mindiff = diff
M = i
Dp = j
return M, Dp
Just use the built-in functionality:
In [10]: 2.5.as_integer_ratio() # get representation as fraction
Out[10]: (5, 2)
In [11]: (2).bit_length() - 1 # convert 2**M to M
Out[11]: 1
Note that all non-infinite, non-NaN floats are dyadic rationals, so we can rely on the denominator being an exact power of 2.
Thanks to jasonharper I realised my implementation is ridiculously inefficient and could be much simpler.
The implementation of his method is shown below:
def BestApprox_fast(value):
mindiff = 1000000000
for Dp in np.arange(0, 32, 1):
M = round(value*2**Dp)
if abs(M) < 1000:
diff = Diff([M, Dp], value)
if diff < mindiff:
mindiff = diff
M_best = M
Dp_best = Dp
return M_best, Dp_best
It is approximately 200 times quicker.
With the limits on M and N given, the range of N/2**M is a well defined discrete number scale:
[0-1000/2^26, 501-1000/2^25, 501-1000/2^24, ... 501-1000/2^1, 501-1000/2^0].
In this given discrete set, different subsets have different accuracy/resolution. The first subset [0-1000/2^26] has accuracy of 2^-26 or 26 binary bits resolution. So whenever the given number falls in the corresponding continuous domain [0,1000/2^26], the best accuracy achievable is 2^-26. Successively, the best accuracy is 2^25 when the given number is beyond the first domain but falls in domain [500/2^25,1000/2^25], which corresponds to the second subset [501-1000/2^25]. (Note the difference between discrete set and continuous domain.)
With the above logic, we know the best accuracy, defined by M, depends on where the given number falls on the scale. Thus we can implement it as following python code:
import numpy as np
limits = 1000.0/2**np.arange(0,61)
a = 103.23 # test value
for i in range(60,-1,-1):
if a <= limits[i]:
N = i
M = round(a * 2**N)
r = [M, N]
break
if a > 1000:
r = [round(a), 0]
This solution has O(c) execution time, so it is ideal for multiple invocations.

Make a number more probable to result from random

I'm using x = numpy.random.rand(1) to generate a random number between 0 and 1. How do I make it so that x > .5 is 2 times more probable than x < .5?
That's a fitting name!
Just do a little manipulation of the inputs. First set x to be in the range from 0 to 1.5.
x = numpy.random.uniform(1.5)
x has a 2/3 chance of being greater than 0.5 and 1/3 chance being smaller. Then if x is greater than 1.0, subtract .5 from it
if x >= 1.0:
x = x - 0.5
This is overkill for you, but it's good to know an actual method for generating a random number with any probability density function (pdf).
You can do that by subclassing scipy.stat.rv_continuous, provided you do it correctly. You will have to have a normalized pdf (so that its integral is 1). If you don't, numpy will automatically adjust the range for you. In this case, your pdf has a value of 2/3 for x<0.5, and 4/3 for x>0.5, with a support of [0, 1) (support is the interval over which it's nonzero):
import scipy.stats as spst
import numpy as np
import matplotlib.pyplot as plt
import ipdb
def pdf_shape(x, k):
if x < 0.5:
return 2/3.
elif 0.5 <= x and x < 1:
return 4/3.
else:
return 0.
class custom_pdf(spst.rv_continuous):
def _pdf(self, x, k):
return pdf_shape(x, k)
instance = custom_pdf(a=0, b=1)
samps = instance.rvs(k=1, size=10000)
plt.hist(samps, bins=20)
plt.show()
tmp = random()
if tmp < 0.5: tmp = random()
is pretty easy way to do it
ehh I guess this is 3x as likely ... thats what i get for sleeping through that class I guess
from random import random,uniform
def rand1():
tmp = random()
if tmp < 0.5:tmp = random()
return tmp
def rand2():
tmp = uniform(0,1.5)
return tmp if tmp <= 1.0 else tmp-0.5
sample1 = []
sample2 = []
for i in range(10000):
sample1.append(rand1()>=0.5)
sample2.append(rand2()>=0.5)
print sample1.count(True) #~ 75%
print sample2.count(True) #~ 66% <- desired i believe :)
First off, numpy.random.rand(1) doesn't return a value in the [0,1) range (half-open, includes zero but not one), it returns an array of size one, containing values in that range, with the upper end of the range having nothing to do with the argument passed in.
The function you're probably after is the uniform distribution one, numpy.random.uniform() since this will allow an arbitrary upper range.
And, to make the upper half twice as likely is a relatively simple matter.
Take, for example, a random number generator r(n) which returns a uniformly distributed integer in the range [0,n). All you need to do is adjust the values to change the distribution:
x = r(3) # 0, 1 or 2, # 1/3 probability each
if x == 2:
x = 1 # Now either 0 (# 1/3) or 1 (# 2/3)
Now the chances of getting zero are 1/3 while the chances of getting one are 2/3, basically what you're trying to achieve with your floating point values.
So I would simply get a random number in the range [0,1.5), then subtract 0.5 if it's greater than or equal to one.
x = numpy.random.uniform(high=1.5)
if x >= 1: x -= 0.5
Since the original distribution should be even across the [0,1.5) range, the subtraction should make [0.5,1.0) twice as likely (and [1.0,1.5) impossible), while keeping the distribution even within each section ([0,0.5) and [0.5,1)):
[0.0,0.5) [0.5,1.0) [1.0,1.5) before
<---------><---------><--------->
[0.0,0.5) [0.5,1.0) [0.5,1.0) after
You could take a "mixture model" approach where you split the process into two steps: first, decide whether to take option A or B, where B is twice as likely as A; then, if you chose A, return a random number between 0.0 and 0.5, else if you chose B, return one between 0.5 and 1.0.
In the example, the randint randomly returns 0, 1, or 2, so the else case is twice as likely as the if case.
m = numpy.random.randint(3)
if m==0:
x = numpy.random.uniform(0.0, 0.5)
else:
x = numpy.random.uniform(0.5, 1.0)
This is a little more expensive (two random draws instead of one) but it can generalize to more complicated distributions in a fairly straightforward way.
if you want a more fluid randomness, you can just square the output of the random function
(and subtract it from 1 to make x > 0.5 more probable instead of x < 0.5).
x = 1 - sqr(numpy.random.rand(1))

Categories

Resources