let's say I want to pull out random values from a linear distribution function, I'm not sure how I would do that..
say I have a function y = 3x then I want to be able to pull out a random value from that line.
this is what I've tried:
x,y = [],[]
for i in range(10):
a = random.uniform(0,3)
x.append(a)
b = 3*a
y.append(b)
This gives me y values that are taken from this linear function (distribution per say). Now if this is correct, how would I do the same for a distribution that looks like a horizontal line?
that is what if I had a horizontal line function y = 3, how can I get random values pulled out from there?
Just define your function, using a lambda or explicit definition, and then call it to get the y-value:
def func(x):
return 3
points = []
for i in range(10):
x = random.uniform(0, 3)
points.append((x, func(x))
A linear function with a slope of 0 in this case is fairly trivial.
EDIT: I think I understand that question a little more clearly now. You are looking to randomly generate a point that lies under the curve? That is quite tricky to directly calculate for an arbitrary function, and you probably will want a bound to your function (i.e a < x < b). Supposing we have a bound, one simple method would be to generate a random number in a box containing the curve, and simply discard it if it isn't under the curve. This will be perfectly random.
def linearfunc(x):
return 3 * x
def getRandom(func, maxi, a, b):
while True:
x = random.uniform(a, b)
y = random.uniform(0, maxi)
if y < func(x):
return (x, y)
points = [getRandom(linearFunc, 9, 0, 3) for i in range(10)]
This method requires knowing an upper bound (maxi) to the function on the specified interval, and the tighter the upper bound, the less sampling misses will occur.
Related
Suppose i have a function that returns a 3 by 3, 2d array with random entries in a given bound:
def random3by3Matrix(smallest_num, largest_num):
matrix = [[0 for x in range(3)] for y in range(3)]
for i in range(3):
for j in range(3):
matrix[i][j] = random.randrange(int(smallest_num),
int(largest_num + 1)) if smallest_num != largest_num else smallest_num
return matrix
print(random3by3Matrix(-10, 10))
Code above returns something like this:
[[-6, 10, -4], [-10, -9, 8], [10, 1, 1]]
How would I write a unittest for a function like this? I thought of using a helper function:
def isEveryEntryGreaterEqual(list1, list2):
for i in range(len(list1)):
for j in range(len(list1[0])):
if not (list1[i][j] <= list2[i][j]):
return False
return True
class TestFunction(unittest.TestCase):
def test_random3by3Matrix(self):
lower_bound = [[-10 for x in range(3)] for y in range(3)]
upper_bound = [[10 for x in range(3)] for y in range(3)]
self.assertEqual(True, isEveryEntryGreaterEqual(lower_bound, random3by3Matrix(-10,10)))
self.assertEqual(True, isEveryEntryGreaterEqual(random3by3Matrix(-10,10), upper_bound))
But is there a cleaner way to do this?
Furthermore, how would you test that all of your values are not only between the boundaries, but also distributet randomly?
Test matrix bounds
It looks like you want to test if every single element in the matrix is greater than some value, independently of where in the matrix this element is. You can make this code shorter and more readable by e.g. extracting all the elements from the matrix and checking them all in one go, instead of the double for loop. You can easily transform any array-nesting with numpy.flatten() to a 1D array, and then test the resulting 1D array in one go with python's built-in all() method. This way, you can avoid looping over all the elements yourself:
import numpy as np
def is_matrix_in_bounds(matrix, low, high):
flat_list = np.flatten(matrix) # create a 1D list
# Each element is a boolean, that is True if it's within bounds
in_bounds = [low <= e <= high for e in flat_list]
# all() returns True if each element in in_bounds is 'True
# returns False as soon as a single element in in_bounds is False
return all(in_bounds)
class TestFunction(unittest.TestCase):
def test_random3by3Matrix(self):
lower_bound = -10
upper_bound = 10
matrix = random3by3Matrix(-10,10)
self.assertEqual(True, is_matrix_in_bounds(matrix, lower_bound, upper_bound))
If you will be using things like the matrix and the bounds in multiple tests, it may be beneficial to make them class attributes, so you don't have to define them in each test function.
Test matrix randomness
Testing if some matrix is truly randomly distributed is a bit harder, since it will involve a statistical test to check if the variables are randomly distributed or not. The best you can do here is calculate the odds that they are indeed randomly distributed, and put a threshold on how low these odds are allowed to be. Since the matrix is random and the values in the matrix do not depend on each other, you're in luck, because you can again test them as if they were a 1D distribution.
To test this, you should create a second random uniform distribution, and test the goodness of fit between your matrix and the new distribution with a Kolmogorov-Smirnov test. This considers the two distributions as random samples, and tests how likely it is that they were drawn from the same underlying distribution. In your case: a random uniform distribution. If the distributions are vastly different, it will have a very low p-value (i.e. the odds of these distributions being drawn from the same underlying distribution is low). If they are similar, the p-value will be high. You want a random matrix, so you want a high p-value. The usual cutoff for this is 0.05 (which means that 1/20 distributions will be considered non-random, because they look kinda non-random by happenstance). Python provides such a test with the scipy module. Here, you can either pass two samples (a two-sample ks test), or pass the name of some distribution and specify the parameters (a one-sample ks test). For the latter case, the distribution name should be the name of a distribution in scipy.stats, and you can pass the arguments to create such a distribution via the keyword args=().
import numpy as np
from scipy import stats
def test_matrix_randomness(matrix, low, high):
lower_bound = -10
upper_bound = 10
matrix = random3by3matrix(-10, 10)
# two-sample test
random_dist = np.random.random_integers(low=lower_bound, high=upper_bound, size=3*3)
statistic, p_value = stats.kstest(random_dist, np.flatten(matrix))
# one-sample test, equivalent, but neater
# doesn't require creating a second distribution
statistic, p_value = stats.kstest(random_dist, "randint", args=(-10, 10))
self.assertEqual(True, p_value > 0.05)
Note that unittests with a random aspect will sometimes fail. Such is the nature of randomness.
see:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.randint.html#scipy.stats.randint
Initially, I have two arrays that correspond to the values of x and y in a function, but I don't know that function, I just know that the values of y depend on x. Then, I calculate a function that depends on both arrays.
I need to calculate in python the integral of that last function to obtain the total area under the curve between the first value of x and the last. Any idea of how to do that?
x = [array]
y(x) = [array]
a = 2.839*10**25
b = 4*math.pi
alpha = 0.5
z = 0.003642
def L(x,y,a,b,alpha,z):
return x*((y*b*a)/(1+z)**(1+alpha))
Your function is a function of x (in that given a value of x it spits out a value), so first you should repackage it as such (introduce a function yy which, given x, produces the requisite y), then write LL(x) = L(x, yy[x]), then use scipy.integrate to integrate it.
I've defined the following function as a method of approximating an integral using Boole's Rule:
def integrate_boole(f,l,r,N):
h=((r-l)/N)
xN = np.linspace(l,r,N+1)
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*(np.sum(fN[1:-2:2]))+12*(np.sum(fN[2:-3:4]))+14*(np.sum(fN[4:-5]))+7*fN[-1])
I used the function to get the value of the integral for sin(x)dx between 0 and pi (where N=8) and assigned it to a variable sine_int.
The answer given was 1.3938101893248442
After doing the original equation (see here) out by hand I realised this answer was quite inaccurate.
The sums of fN are giving incorrect values, but I'm not sure why. For example, np.sum(fN[4:-5]) is going to 0.
Is there a better way of coding the sums involved, or is there an error in my parameters that's causing the calculations to be inaccurate?
Thanks in advance.
EDIT
I should have made it clearer that this is supposed to be a composite version of the rule, i.e. approximating over N points where N is divisible by 4. So the typical 5 points with 4 intervals isn't going to cut it here, unfortunately. I would copy the equation I'm using into here, but I don't have an image of it and LaTex isn't an option. It should/might be clear from the code I have after return.
From a quick inspection looks like the term multiplying f(x_4) should be 32, not 14:
def integrate_boole(f,l,r,N):
h=((r-l)/N)
xN = np.linspace(l,r,N+1)
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*(np.sum(fN[1:-2:2]))+
12*(np.sum(fN[2:-3:4]))+32*(np.sum(fN[4:-5]))+7*fN[-1])
First, one of your coefficients was wrong as pointed out by #nixon. Then, I think you do not really understand how the Boole's rule works - It approximates the integral of a function only using 5 points of the function. Hence, the terms like np.sum(fN[1:-2:2]) makes no sense. You only need five points, which you can obtain with xN = np.linspace(l,r,5). Your h is simply the distance between 2 of the contiguos points h = xN[1] - xN[0]. And then, easy peasy:
import numpy as np
def integrate_boole(f,l,r):
xN = np.linspace(l,r,5)
h = xN[1] - xN[0]
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*fN[1]+12*fN[2]+32*fN[3]+7*fN[4])
def f(x):
return np.sin(x)
I = integrate_boole(f, 0, np.pi)
print(I) # Outputs 1.99857...
I'm not sure what you're hoping your code does w.r.t. Boole's rule. Why are you summing over samples of the function (i.e. np.sum(fN[2:-3:4]))? I think your N parameter is also not well defined and I'm not sure what it's supposed to represent. Maybe you're using another rule I'm not familiar with: I'll let you decide.
Regardless, here's an implementation of Boole's rule as Wikipedia defines it. Variables map to the Wikipedia version you linked:
def integ_boole(func, left, right):
h = (right - left) / 4
x1 = left
x2 = left + h
x3 = left + 2*h
x4 = left + 3*h
x5 = right # or left + 4h
result = (2*h / 45) * (7*func(x1) + 32*func(x2) + 12*func(x3) + 32*func(x4) + 7*func(x5))
return result
then, to test:
import numpy as np
print(integ_boole(np.sin, 0, np.pi))
outputs 1.9985707318238357, which is extremely close to the correct answer of 2.
HTH.
Perlin explained in pseudo-code: http://freespace.virgin.net/hugo.elias/models/m_perlin.htm
The tutorial gives me a random number generator function writen in pseudo-code. Returns a floating number in the range of (-1, 1).
function IntNoise(32-bit integer: x)
x = (x<<13) ^ x;
return ( 1.0 - ( (x * (x * x * 15731 + 789221) + 1376312589) & 7fffffff) / 1073741824.0);
end IntNoise function
So if this function returns a number in the range (-1, 1), can't I just use random.uniform(-1, 1)?
But then I meet this problem:
function Noise(x)
.
.
end function
function SmoothNoise_1D(x)
return Noise(x)/2 + Noise(x-1)/4 + Noise(x+1)/4
end function
I guess the Noise(x) function generates random numbers for 1D noise.
I can't seem to understand what the x parameter is.
Is it a seed? And, can't I use random.uniform(-1, 1)
The noise function used in Perlin noise is a seeded random number generator. That is, it must return the same value every time it is called with the same value for parameter, X. You can think of X as some position in space in a given dimension between the bounds of the region you're computing Perlin noise over.
You can use the Python random module if you can reset the state of the RNG based upon your given parameter so it always returns the same value for a given X.
import random
rand_state = random.Random()
def Noise(x):
rand_state.seed(x)
return rand_state.random()
>>> Noise(1)
0.13436424411240122
>>> Noise(2)
0.9560342718892494
>>> Noise(1)
0.13436424411240122
Note that Noise returned the same value when passing 1 in the first time, and the second. It also returned a different value when value other than 1 was input. The parameter to seed can be any hashable type in Python. For your purposes, any numeric type works.
Typically when creating Perlin noise, many calls are made to this Noise function, so you'll want it to be fast. On my machine, it takes about 14 microseconds to execute the function above. That's only ~70000 calls per second. It may be that implementing the pseudocode for IntNoise may result in better performance. Infact, the following method:
MAX_INT = (1<<31)-1
def IntNoise(x):
x = int(x)
x = ((x << 13) & MAX_INT) ^ x
x = ( x * (x * x * 15731 + 789221) + 1376312589 ) & MAX_INT
return 1.0 - x / 1073741824.0
Seems to take on average about 1.6 microseconds per invocation, or about 10 times faster than the Noise above. Its range of return values is is (-1, 1), but that can be changed by modifying the last line. I can't speak to the uniformity of its distribution, however, a picture is worth a thousand words. Blue dots below are from IntNoise, and red dots are from the python random.uniform function.
The Noise function above can be used by the smooth noise algorithm in your question. The URL you linked in the question describes what the smoothing functions are for better than I could. After reading the paragraph, study the pictures of 1D and 2D smoothing next to it to better understand their purpose.
Suppose I have a function f(x) defined between a and b. This function can have many zeros, but also many asymptotes. I need to retrieve all the zeros of this function. What is the best way to do it?
Actually, my strategy is the following:
I evaluate my function on a given number of points
I detect whether there is a change of sign
I find the zero between the points that are changing sign
I verify if the zero found is really a zero, or if this is an asymptote
U = numpy.linspace(a, b, 100) # evaluate function at 100 different points
c = f(U)
s = numpy.sign(c)
for i in range(100-1):
if s[i] + s[i+1] == 0: # oposite signs
u = scipy.optimize.brentq(f, U[i], U[i+1])
z = f(u)
if numpy.isnan(z) or abs(z) > 1e-3:
continue
print('found zero at {}'.format(u))
This algorithm seems to work, except I see two potential problems:
It will not detect a zero that doesn't cross the x axis (for example, in a function like f(x) = x**2) However, I don't think it can occur with the function I'm evaluating.
If the discretization points are too far, there could be more that one zero between them, and the algorithm could fail finding them.
Do you have a better strategy (still efficient) to find all the zeros of a function?
I don't think it's important for the question, but for those who are curious, I'm dealing with characteristic equations of wave propagation in optical fiber. The function looks like (where V and ell are previously defined, and ell is an positive integer):
def f(u):
w = numpy.sqrt(V**2 - u**2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.jnjn(ell-1, u)
kl = scipy.special.jnkn(ell, w)
kl1 = scipy.special.jnkn(ell-1, w)
return jl / (u*jl1) + kl / (w*kl1)
Why are you limited to numpy? Scipy has a package that does exactly what you want:
http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html
One lesson I've learned: numerical programming is hard, so don't do it :)
Anyway, if you're dead set on building the algorithm yourself, the doc page on scipy I linked (takes forever to load, btw) gives you a list of algorithms to start with. One method that I've used before is to discretize the function to the degree that is necessary for your problem. (That is, tune \delta x so that it is much smaller than the characteristic size in your problem.) This lets you look for features of the function (like changes in sign). AND, you can compute the derivative of a line segment (probably since kindergarten) pretty easily, so your discretized function has a well-defined first derivative. Because you've tuned the dx to be smaller than the characteristic size, you're guaranteed not to miss any features of the function that are important for your problem.
If you want to know what "characteristic size" means, look for some parameter of your function with units of length or 1/length. That is, for some function f(x), assume x has units of length and f has no units. Then look for the things that multiply x. For example, if you want to discretize cos(\pi x), the parameter that multiplies x (if x has units of length) must have units of 1/length. So the characteristic size of cos(\pi x) is 1/\pi. If you make your discretization much smaller than this, you won't have any issues. To be sure, this trick won't always work, so you may need to do some tinkering.
I found out it's relatively easy to implement your own root finder using the scipy.optimize.fsolve.
Idea: Find any zeroes from interval (start, stop) and stepsize step by calling the fsolve repeatedly with changing x0. Use relatively small stepsize to find all the roots.
Can only search for zeroes in one dimension (other dimensions must be fixed). If you have other needs, I would recommend using sympy for calculating the analytical solution.
Note: It may not always find all the zeroes, but I saw it giving relatively good results. I put the code also to a gist, which I will update if needed.
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
# Defined below
r = RootFinder(1, 20, 0.01)
args = (90, 5)
roots = r.find(f, *args)
print("Roots: ", roots)
# plot results
u = np.linspace(1, 20, num=600)
fig, ax = plt.subplots()
ax.plot(u, f(u, *args))
ax.scatter(roots, f(np.array(roots), *args), color="r", s=10)
ax.grid(color="grey", ls="--", lw=0.5)
plt.show()
Example output:
Roots: [ 2.84599497 8.82720551 12.38857782 15.74736542 19.02545276]
zoom-in:
RootFinder definition
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
class RootFinder:
def __init__(self, start, stop, step=0.01, root_dtype="float64", xtol=1e-9):
self.start = start
self.stop = stop
self.step = step
self.xtol = xtol
self.roots = np.array([], dtype=root_dtype)
def add_to_roots(self, x):
if (x < self.start) or (x > self.stop):
return # outside range
if any(abs(self.roots - x) < self.xtol):
return # root already found.
self.roots = np.append(self.roots, x)
def find(self, f, *args):
current = self.start
for x0 in np.arange(self.start, self.stop + self.step, self.step):
if x0 < current:
continue
x = self.find_root(f, x0, *args)
if x is None: # no root found.
continue
current = x
self.add_to_roots(x)
return self.roots
def find_root(self, f, x0, *args):
x, _, ier, _ = fsolve(f, x0=x0, args=args, full_output=True, xtol=self.xtol)
if ier == 1:
return x[0]
return None
Test function
The scipy.special.jnjn does not exist anymore, but I created similar test function for the case.
def f(u, V=90, ell=5):
w = np.sqrt(V ** 2 - u ** 2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.yn(ell - 1, u)
kl = scipy.special.kn(ell, w)
kl1 = scipy.special.kn(ell - 1, w)
return jl / (u * jl1) + kl / (w * kl1)
The main problem I see with this is if you can actually find all roots --- as have already been mentioned in comments, this is not always possible. If you are sure that your function is not completely pathological (sin(1/x) was already mentioned), the next one is what's your tolerance to missing a root or several of them. Put differently, it's about to what length you are prepared to go to make sure you did not miss any --- to the best of my knowledge, there is no general method to isolate all the roots for you, so you'll have to do it yourself. What you show is a reasonable first step already. A couple of comments:
Brent's method is indeed a good choice here.
First of all, deal with the divergencies. Since in your function you have Bessels in the denominators, you can first solve for their roots -- better look them up in e.g., Abramovitch and Stegun (Mathworld link). This will be a better than using an ad hoc grid you're using.
What you can do, once you've found two roots or divergencies, x_1 and x_2, run the search again in the interval [x_1+epsilon, x_2-epsilon]. Continue until no more roots are found (Brent's method is guaranteed to converge to a root, provided there is one).
If you cannot enumerate all the divergencies, you might want to be a little more careful in verifying a candidate is indeed a divergency: given x don't just check that f(x) is large, check that, e.g. |f(x-epsilon/2)| > |f(x-epsilon)| for several values of epsilon (1e-8, 1e-9, 1e-10, something like that).
If you want to make sure you don't have roots which simply touch zero, look for the extrema of the function, and for each extremum, x_e, check the value of f(x_e).
I've also encountered this problem to solve equations like f(z)=0 where f was an holomorphic function. I wanted to be sure not to miss any zero and finally developed an algorithm which is based on the argument principle.
It helps to find the exact number of zeros lying in a complex domain. Once you know the number of zeros, it is easier to find them. There are however two concerns which must be taken into account :
Take care about multiplicity : when solving (z-1)^2 = 0, you'll get two zeros as z=1 is counting twice
If the function is meromorphic (thus contains poles), each pole reduce the number of zero and break the attempt to count them.