Related
I am using this function that I found on the web, to add speckle noise to images for research purposes:
def add_speckle(k,theta,img):
gauss = np.random.gamma(k, theta, img.size)
gauss = gauss.reshape(img.shape[0], img.shape[1], img.shape[2]).astype('uint8')
noise = img + img * gauss
return noise
My issue is that I want to estimate/define the speckle noise I add as a standard deviation(sigma) parameter, and this function that I found depends on the gamma distribution or random.gamma() which it depends on the k,theta(shape,scale) parameters, which you can see in the gamma pdf equation down below:
according to my knowledge, the variance can be calculated in gamma distribution as follows:
so standard deviation or sigma is equivalent to:
I want to add speckle noise as sigma dependent, so am saying there should be a way to estimate that sigma from k,theta(shape,scale) that we make the input with, so the speckle_adding() function would look like something like this:
def add_speckle(sigma,img):
edited : for the answer in the comments :
def add_speckle(sigma,mean,img):
theta = sigma ** 2 / mean
k = mean / theta
gauss = np.random.gamma(k,theta,img.size)
gauss = gauss.reshape(img.shape[0],img.shape[1],img.shape[2]).astype('uint8')
noise = img + img * gauss
print("k=",k)
print("theta=",theta)
return noise
img = cv2.imread('/content/Hand.jpeg')
cv2.imwrite('speckle12.jpg',add_speckle(8,3,img))
thanks sir for your help, but i really understand why k,theta values changes each time i change values of mean while sigma is constant, i think it must not changes??
As you have noticed that sigma = k ** 0.5 * theta, there are infinite possibilities for parameters in the gamma distribution if only sigma is given (eg. if sigma is 1, (k, theta) can be (1,1) or (4, 0.5) and so on).
If you really want to generate the speckle with statistical inferences as input, I suggest you to add mean as the second input so that the required (k, theta) can be calculated.
The first moment (ie. mean) of a gamma distribution is simply k * theta.
Example:
def add_speckle(mean, sigma, img):
# find theta
theta = sigma ** 2 / mean
k = mean / theta
# your code proceeds...
I'm presently trying to run a vectorised batch multivariate sampling operation via Numpy. I have k mean vectors of shape [N,] corresponding to k covariance matrices of dimensions [N, N], and I'm trying to return k draws of shape [N,] from the multivariate normal distributions.
I presently have a loop that does the above,
for batch in range(batch_size):
c[batch, :] = np.random.multivariate_normal(mean = a[batch, :], cov = b[batch, :, :])
but would like to consolidate the above into a vectorised operation. The issue is that np.random.multivariate_normal can only take a 1-D array as the mean and a 2-D array as the covariance.
I can do batch-sampling via PyTorch's multivariate normal class, but I'm trying to integrate with some pre-existing Numpy code, and I'd prefer to limit the number of conversions happening.
Googling pulled up this question, which could be resolved by melting the mean, but in my case, I'm not using the same covariance matrix and can't go about things exactly the same way.
Thank you very much for your help. I figure there's a good chance that I won't be able to handle batches using the Numpy distribution because of the argument constraints, but wanted to make sure I wasn't missing anything.
I couldn't find a builtin function in numpy, but it can be self-implemented by performing a cholesky decomposition of the covariance matrix Σ = LLᵀ and then making use of the fact that, given a vector X of i.i.d. standard normal variables, the transformation LX + µ has covariance Σ and mean µ.
This can be implemented using e.g. np.linalg.cholesky() (note that this function supports batch mode!), and np.random.normal():
# cov: (*B, D, D)
# mean: (*B, D)
# result: (*S, *B, D)
L = np.linalg.cholesky(cov)
X = np.random.standard_normal((*S, *B, D, 1))
Y = (L # X).reshape(*S, *B, D) + mean
Here, packed in a function for easier use:
import numpy as np
def sample_batch_mvn(
mean: np.ndarray,
cov: np.ndarray,
size: "tuple | int" = (),
) -> np.ndarray:
"""
Batch sample multivariate normal distribution.
Arguments:
mean: expected values of shape (…M, D)
cov: covariance matrices of shape (…M, D, D)
size: additional batch shape (…B)
Returns: samples from the multivariate normal distributions
shape: (…B, …M, D)
It is not required that ``mean`` and ``cov`` have the same shape
prefix, only that they are broadcastable against each other.
"""
mean = np.asarray(mean)
cov = np.asarray(cov)
size = (size, ) if isinstance(size, int) else tuple(size)
shape = size + np.broadcast_shapes(mean.shape, cov.shape[:-1])
X = np.random.standard_normal((*shape, 1))
L = np.linalg.cholesky(cov)
return (L # X).reshape(shape) + mean
Now in order to test this function, we first need a good batch of covariance matrices. We'll generate a couple to test the sampling performance a bit:
# Generate N batch of D-dimensional covariance matrices C:
N = 5000
D = 2
L = np.zeros((N, D, D))
L[(..., *np.tril_indices(D))] = \
np.random.normal(size=(N, D * (D + 1) // 2))
cov = L # np.swapaxes(L, -1, -2)
The method used to generate the covariance matrices here in fact works by sampling the Cholesky factors L. With prior knowledge of these factors, we of course wouldn't need to compute the Cholesky decomposition in the sampling function. However, to test the general applicability of the function, we will forget about them and just pass the covariance matrices C:
mean = np.zeros(2)
samples = sample_batch_mvn(mean, cov, 1000)
print(samples.shape) # (1000, 5000, 2)
Sampling these 5 million 2D vectors takes about 0.4s on my PC.
And, as almost always, the a considerable amount of effort will go into plotting (here showing some samples for the first 9 of the 5000 covariance matrices):
import scipy.stats as stats
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, 3, figsize=(9, 9))
for ax, i in zip(axs.ravel(), range(5000)):
cc = cov[i]
xsamples = samples[:100, i, 0]
ysamples = samples[:100, i, 1]
xmin = xsamples.min()
xmax = xsamples.max()
ymin = ysamples.min()
ymax = ysamples.max()
xpad = (xmax - xmin) * 0.05
ypad = (ymax - ymin) * 0.05
xlim = (xmin - xpad, xmax + xpad)
ylim = (ymin - ypad, ymax + ypad)
xs = np.linspace(*xlim, num=51)
ys = np.linspace(*ylim, num=51)
xy = np.dstack(np.meshgrid(xs, ys))
pdf = stats.multivariate_normal.pdf(xy, mean, cc)
ax.contourf(xs, ys, pdf, 33, cmap='YlGnBu')
ax.plot(xsamples, ysamples, 'r.', alpha=.6,
markeredgecolor='k', markeredgewidth=0.5)
ax.set_xlim(*xlim)
ax.set_ylim(*ylim)
plt.show()
Some inspiration for this:
Some notes on sampling from a multivariate normal
Pinheiro and Bates, 1996, Unconstrained Parameterizations for Variance-Covariance Matrices
I tried generating and combining two unimodal distributions but think there's something wrong in my code.
N=400
mu, sigma = 100, 5
mu2, sigma2 = 10, 40
X1 = np.random.normal(mu, sigma, N)
X2 = np.random.normal(mu2, sigma2, N)
w = np.random.normal(0.5, 1, N)
X = w*X1 + (1-w)*X2
X = X.reshape(-1,2)
When I plot X I don't get a bimodal distribution
It's unclear where your problem is; it's also unclear what the purpose of the variable w is, and it's unclear how you judge you get an incorrect result, since we don't see the plot code, or any other code to confirm or reject a binomial distribution.
That is, your example is too incomplete to exactly answer your question. But I can make an educated guess.
If I do the following below:
import numpy as np
import matplotlib.pyplot as plt
N=400
mu, sigma = 100, 5
mu2, sigma2 = 10, 40
X1 = np.random.normal(mu, sigma, N)
X2 = np.random.normal(mu2, sigma2, N)
X = np.concatenate([X1, X2])
plt.hist(X)
and that yields the following figure:
I have a function, a gaussian, I have fitted this to my data from a data file. I now need to integrate the gaussian function to give the area under it.
This is my gaussian function
def I(theta,max_x,max_y,sigma):
return (max_y/(sigma*(math.sqrt(2*pi))))*np.exp(-((theta-max_x)**2)/(2*sigma**2))
COMPARING WITH GENERAL FORMULA
N(x | mu, sigma, n) := (n/(sigma*sqrt(2*pi))) * exp((-(x-mu)^2)/(2*sigma^2))
i.e n = max_y , MU = max_x , x = theta
this is what is given on another page:
If Phi(z) = integral(N(x|0,1,1), -inf, z); that is, Phi(z) is the integral of the standard normal distribution from >minus infinity up to z, then it's true by the definition of the error function that
Phi(z) = 0.5 + 0.5 * erf(z / sqrt(2)).
Likewise, if Phi(z | mu, sigma, n) = integral( N(x|sigma, mu, n),
-inf, z); that is, Phi(z | mu, sigma, n) is the integral of the normal distribution given parameters mu, sigma, and n from minus infinity up
to z, then it's true by the definition of the error function that
Phi(z | mu, sigma, n) = (n/2) * (1 + erf((x - mu) / (sigma *
sqrt(2)))).
I am unsure how this helps?? I just want to integrate my function over the plotted values under the curve. Is it saying this is the integral:
Phi(z | mu, sigma, n) = (n/2) * (1 + erf((x - mu) / (sigma * sqrt(2))))
The answer you have there is the indefinite integral. If you would like a numerical answer between two x limits, you can evaluate that function at two points and take the difference.
Your gaussian function is defined over all real numbers (−∞, +∞) but in practice, you are only interested in the middle part as the tails are very close to 0. To obtain a numerical estimate of the total area you can do as you say: evaluate the error function at two points suitably close to 0 on each side of the gaussian's peak and take the difference.
If Phi(z | mu, sigma, n) returns a function you could do:
integral = Phi(z | mu, sigma, n)
area = integral(X_HIGH) - integral(X_LOW)
I can generate Gaussian data with random.gauss(mu, sigma) function, but how can I generate 2D gaussian? Is there any function like that?
If you can use numpy, there is numpy.random.multivariate_normal(mean, cov[, size]).
For example, to get 10,000 2D samples:
np.random.multivariate_normal(mean, cov, 10000)
where mean.shape==(2,) and cov.shape==(2,2).
I'd like to add an approximation using exponential functions. This directly generates a 2d matrix which contains a movable, symmetric 2d gaussian.
I should note that I found this code on the scipy mailing list archives and modified it a little.
import numpy as np
def makeGaussian(size, fwhm = 3, center=None):
""" Make a square gaussian kernel.
size is the length of a side of the square
fwhm is full-width-half-maximum, which
can be thought of as an effective radius.
"""
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
if center is None:
x0 = y0 = size // 2
else:
x0 = center[0]
y0 = center[1]
return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
For reference and enhancements, it is hosted as a gist here. Pull requests welcome!
Since the standard 2D Gaussian distribution is just the product of two 1D Gaussian distribution, if there are no correlation between the two axes (i.e. the covariant matrix is diagonal), just call random.gauss twice.
def gauss_2d(mu, sigma):
x = random.gauss(mu, sigma)
y = random.gauss(mu, sigma)
return (x, y)
import numpy as np
# define normalized 2D gaussian
def gaus2d(x=0, y=0, mx=0, my=0, sx=1, sy=1):
return 1. / (2. * np.pi * sx * sy) * np.exp(-((x - mx)**2. / (2. * sx**2.) + (y - my)**2. / (2. * sy**2.)))
x = np.linspace(-5, 5)
y = np.linspace(-5, 5)
x, y = np.meshgrid(x, y) # get 2D variables instead of 1D
z = gaus2d(x, y)
Straightforward implementation and example of the 2D Gaussian function. Here sx and sy are the spreads in x and y direction, mx and my are the center coordinates.
Numpy has a function to do this. It is documented here. Additionally to the method proposed above it allows to draw samples with arbitrary covariance.
Here is a small example, assuming ipython -pylab is started:
samples = multivariate_normal([-0.5, -0.5], [[1, 0],[0, 1]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
samples = multivariate_normal([0.5, 0.5], [[0.1, 0.5],[0.5, 0.6]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
In case someone find this thread and is looking for somethinga little more versatile (like I did), I have modified the code from #giessel. The code below will allow for asymmetry and rotation.
import numpy as np
def makeGaussian2(x_center=0, y_center=0, theta=0, sigma_x = 10, sigma_y=10, x_size=640, y_size=480):
# x_center and y_center will be the center of the gaussian, theta will be the rotation angle
# sigma_x and sigma_y will be the stdevs in the x and y axis before rotation
# x_size and y_size give the size of the frame
theta = 2*np.pi*theta/360
x = np.arange(0,x_size, 1, float)
y = np.arange(0,y_size, 1, float)
y = y[:,np.newaxis]
sx = sigma_x
sy = sigma_y
x0 = x_center
y0 = y_center
# rotation
a=np.cos(theta)*x -np.sin(theta)*y
b=np.sin(theta)*x +np.cos(theta)*y
a0=np.cos(theta)*x0 -np.sin(theta)*y0
b0=np.sin(theta)*x0 +np.cos(theta)*y0
return np.exp(-(((a-a0)**2)/(2*(sx**2)) + ((b-b0)**2) /(2*(sy**2))))
We can try just using the numpy method np.random.normal to generate a 2D gaussian distribution.
The sample code is np.random.normal(mean, sigma, (num_samples, 2)).
A sample run by taking mean = 0 and sigma 20 is shown below :
np.random.normal(0, 20, (10,2))
>>array([[ 11.62158316, 3.30702215],
[-18.49936277, -11.23592946],
[ -7.54555371, 14.42238838],
[-14.61531423, -9.2881661 ],
[-30.36890026, -6.2562164 ],
[-27.77763286, -23.56723819],
[-18.18876597, 41.83504042],
[-23.62068377, 21.10615509],
[ 15.48830184, -15.42140269],
[ 19.91510876, 26.88563983]])
Hence we got 10 samples in a 2d array with mean = 0 and sigma = 20