Good day to you fellow programmer !
Today I would like to do something that I believe is tricky. I have a very large 2D array called tac that basically contains time curve values and a file containing a tuple of coordinates called coor which contains information on where to place these curves in a 3D array. What this set of variables represents is actually a 4D array: the first 3 dimensions represent space dimensions and the fourth is time. The whole thing is stored as is to avoid storing an immense amount of zeros.
I would like to apply, for each time (in other words, each values in the 4th dimension), a gaussian kernel to this set of data. I was able to generate this kernel and to perform the convolution quite easily for a fixed standard deviation for the whole array using scipy.ndimage.convolve. The kernel was created using scipy.signal.gaussian. Here is a brief example of the principle where tac_4d contains the 4D array (stores a lot of data I know... but one problem at the time):
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
g = gaussian_kernel_3d(1, .5)
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0)
The trick is now to convolve the array with a kernel which standard deviation is different for each SPACE coordinate. In other words, I would have a 3D array std containing standard deviations for each coordinate of the array.
It seems https://github.com/sheliak/varconvolve is the code needed to take care of this problem. However I don't really understand how to use it and quite frankly, I would prefer to come up with a genuine solution. Do you guys see a way to solve this problem?
Thanks in advance !
EDIT
Here is what I hope can be considered MCVE
import numpy as np
from scipy import signal
from scipy import ndimage as nd
def gaussian_kernel_2d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
return kernel_2d
def gaussian_kernel_3d(radius, sigma):
num = 2 * radius + 1
kernel_1d = signal.gaussian(num, std=sigma).reshape(num, 1)
kernel_2d = np.outer(kernel_1d, kernel_1d)
kernel_3d = np.outer(kernel_1d, kernel_2d).reshape(num, num, num)
kernel_3d = np.expand_dims(kernel_3d, -1)
return kernel_3d
np.random.seed(0)
number_of_tac = 150
time_samples = 915
z, y, x = 100, 150, 100
voxel_number = x * y * z
# TACs in the right order
tac = np.random.uniform(0, 4, time_samples * number_of_tac).reshape(number_of_tac, time_samples)
arr = np.array([0] * (voxel_number - number_of_tac) + [1] * number_of_tac)
np.random.shuffle(arr)
arr = arr.reshape(z, y, x)
coor = np.where(arr != 0) # non-empty voxel
# Algorithm to replace TAC in 3D space
nnz = np.zeros(arr.shape)
nnz[coor] = 1
tac_4d = np.zeros((x, y, z, time_samples))
tac_4d[np.where(nnz == 1)] = tac
# 3D convolution for all time
# TODO: find a way to make standard deviation change for each voxel
g = gaussian_kernel_3d(1, 1) # 3D kernel of std = 1
v = np.random.uniform(0, 1, x * y * z).reshape(z, y, x) # 3D array of std
cag = nd.convolve(tac_4d, g, mode='constant', cval=0.0) # convolution
Essentially, you have a 4D dataset, shape (nx, ny, nz, nt) that is sparse in (nx, ny, nz) and dense in the nt axis. If (i, j, k) are coordinates of nonzero points in the sparse dimensions, you want to convolve with a Gaussian 3D kernel that has a sigma that depends on (i, j, k).
For example, if there are nonzero points at [1, 2, 5] and [1, 4, 5] with corresponding sigmas 0.1 and 1.0, then the output at coordinates [1, 3, 5] is affected mostly by the [1, 4, 5] point because that one has the largest point spread.
Your question is ambiguous; it could also mean that point [1, 3, 5] has a its own associated sigma, for example 0.5, and pulls data from the two adjacent points with equal weight. I will assume the first definition (sigma values associated with input points, not with output points).
Because the operation is not a true convolution, there is no fast FFT-based method to do the entire operation in one operation. Instead, you have to loop over the sigma values. Fortunately, your example has only 150 nonzero points, so the loop is not too expensive.
Here is an implementation. I keep the data in sparse representation as long as possible.
import scipy.signal
import numpy as np
def kernel3d(mm, sigma):
"""Return (mm, mm, mm) shaped, normalized kernel."""
g1 = scipy.signal.gaussian(mm, std=sigma)
g3 = g1.reshape(mm, 1, 1) * g1.reshape(1, mm, 1) * g1.reshape(1, 1, mm)
return g3 * (1/g3.sum())
np.random.seed(1)
s = 2 # scaling factor (original problem: s=10)
nx, ny, nz, nt, nnz = 10*s, 11*s, 12*s, 91*s, 15*s
# select nnz random voxels to fill with time series data
randint = np.random.randint
tseries = {} # key: (i, j, k) tuple; value: time series data, shape (nt,)
for _ in range(nnz):
while True:
ijk = (randint(nx), randint(ny), randint(nz))
if ijk not in tseries:
tseries[ijk] = np.random.uniform(0, 1, size=nt)
break
ijks = np.array(list(tseries.keys())) # shape (nnz, 3)
# sigmas: key: (i, j, k) tuple; value: standard deviation
sigmas = { k: np.random.uniform(0, 2) for k in tseries.keys() }
# output will be stored as dense array, padded to avoid edge issues
# with convolution.
m = 5 # padding size
cag_4dp = np.zeros((nx+2*m, ny+2*m, nz+2*m, nt))
mm = 2*m + 1 # kernel width
for (i, j, k), tdata in tseries.items():
kernel = kernel3d(mm, sigmas[(i, j, k)]).reshape(mm, mm, mm, 1)
# convolution of one voxel by kernel is trivial.
# slice4d_c has shape (mm, mm, mm, nt).
slice4d_c = kernel * tdata
cag_4dp[i:i+mm, j:j+mm, k:k+mm, :] += slice4d_c
cag_4d = cag_4dp[m:-m, m:-m, m:-m, :]
#%%
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 2, tight_layout=True)
plt.close('all')
# find a few planes
#ks = np.where(np.any(cag_4d != 0, axis=(0, 1,3)))[0]
ks = ijks[:4, 2]
for ax, k in zip(axs.ravel(), ks):
ax.imshow(cag_4d[:, :, k, nt//2].T)
ax.set_title(f'Voxel [:, :, {k}] at time {nt//2}')
fig.show()
for ijk, sigma in sigmas.items():
print(f'{ijk}: sigma={sigma:.2f}')
Related
I'm presently trying to run a vectorised batch multivariate sampling operation via Numpy. I have k mean vectors of shape [N,] corresponding to k covariance matrices of dimensions [N, N], and I'm trying to return k draws of shape [N,] from the multivariate normal distributions.
I presently have a loop that does the above,
for batch in range(batch_size):
c[batch, :] = np.random.multivariate_normal(mean = a[batch, :], cov = b[batch, :, :])
but would like to consolidate the above into a vectorised operation. The issue is that np.random.multivariate_normal can only take a 1-D array as the mean and a 2-D array as the covariance.
I can do batch-sampling via PyTorch's multivariate normal class, but I'm trying to integrate with some pre-existing Numpy code, and I'd prefer to limit the number of conversions happening.
Googling pulled up this question, which could be resolved by melting the mean, but in my case, I'm not using the same covariance matrix and can't go about things exactly the same way.
Thank you very much for your help. I figure there's a good chance that I won't be able to handle batches using the Numpy distribution because of the argument constraints, but wanted to make sure I wasn't missing anything.
I couldn't find a builtin function in numpy, but it can be self-implemented by performing a cholesky decomposition of the covariance matrix Σ = LLᵀ and then making use of the fact that, given a vector X of i.i.d. standard normal variables, the transformation LX + µ has covariance Σ and mean µ.
This can be implemented using e.g. np.linalg.cholesky() (note that this function supports batch mode!), and np.random.normal():
# cov: (*B, D, D)
# mean: (*B, D)
# result: (*S, *B, D)
L = np.linalg.cholesky(cov)
X = np.random.standard_normal((*S, *B, D, 1))
Y = (L # X).reshape(*S, *B, D) + mean
Here, packed in a function for easier use:
import numpy as np
def sample_batch_mvn(
mean: np.ndarray,
cov: np.ndarray,
size: "tuple | int" = (),
) -> np.ndarray:
"""
Batch sample multivariate normal distribution.
Arguments:
mean: expected values of shape (…M, D)
cov: covariance matrices of shape (…M, D, D)
size: additional batch shape (…B)
Returns: samples from the multivariate normal distributions
shape: (…B, …M, D)
It is not required that ``mean`` and ``cov`` have the same shape
prefix, only that they are broadcastable against each other.
"""
mean = np.asarray(mean)
cov = np.asarray(cov)
size = (size, ) if isinstance(size, int) else tuple(size)
shape = size + np.broadcast_shapes(mean.shape, cov.shape[:-1])
X = np.random.standard_normal((*shape, 1))
L = np.linalg.cholesky(cov)
return (L # X).reshape(shape) + mean
Now in order to test this function, we first need a good batch of covariance matrices. We'll generate a couple to test the sampling performance a bit:
# Generate N batch of D-dimensional covariance matrices C:
N = 5000
D = 2
L = np.zeros((N, D, D))
L[(..., *np.tril_indices(D))] = \
np.random.normal(size=(N, D * (D + 1) // 2))
cov = L # np.swapaxes(L, -1, -2)
The method used to generate the covariance matrices here in fact works by sampling the Cholesky factors L. With prior knowledge of these factors, we of course wouldn't need to compute the Cholesky decomposition in the sampling function. However, to test the general applicability of the function, we will forget about them and just pass the covariance matrices C:
mean = np.zeros(2)
samples = sample_batch_mvn(mean, cov, 1000)
print(samples.shape) # (1000, 5000, 2)
Sampling these 5 million 2D vectors takes about 0.4s on my PC.
And, as almost always, the a considerable amount of effort will go into plotting (here showing some samples for the first 9 of the 5000 covariance matrices):
import scipy.stats as stats
import matplotlib.pyplot as plt
fig, axs = plt.subplots(3, 3, figsize=(9, 9))
for ax, i in zip(axs.ravel(), range(5000)):
cc = cov[i]
xsamples = samples[:100, i, 0]
ysamples = samples[:100, i, 1]
xmin = xsamples.min()
xmax = xsamples.max()
ymin = ysamples.min()
ymax = ysamples.max()
xpad = (xmax - xmin) * 0.05
ypad = (ymax - ymin) * 0.05
xlim = (xmin - xpad, xmax + xpad)
ylim = (ymin - ypad, ymax + ypad)
xs = np.linspace(*xlim, num=51)
ys = np.linspace(*ylim, num=51)
xy = np.dstack(np.meshgrid(xs, ys))
pdf = stats.multivariate_normal.pdf(xy, mean, cc)
ax.contourf(xs, ys, pdf, 33, cmap='YlGnBu')
ax.plot(xsamples, ysamples, 'r.', alpha=.6,
markeredgecolor='k', markeredgewidth=0.5)
ax.set_xlim(*xlim)
ax.set_ylim(*ylim)
plt.show()
Some inspiration for this:
Some notes on sampling from a multivariate normal
Pinheiro and Bates, 1996, Unconstrained Parameterizations for Variance-Covariance Matrices
I have a specific python issue, that desperately needs to be sped up by avoiding the use of a loop, yet, I am at a loss as to how to do this. I need to read in a fits image, convert this to a numpy array (roughly, 2000 x 2000 elements in size), then for each element compute the statistics of a ring of elements around it.
As I have my code now, the statistics of the ring around the element is computed with a function using masks. This is fast but, of course, I call this function 2000x2000 times (the slow part).
I am relatively new to python. I think that using the mask function is clever, but I cannot find a way around individually addressing each element. Best of thanks for any help you can provide.
# First, the function computing the statistics within a ring
around the central pixel:<br/>
# flux = image intensity at pixel (i,j)<br/>
# rad1, rad2 = inner and outer radii<br/>
# array = image array<br/>_
def snr(flux, i, j, rad1, rad2, array):
a, b = i, j
nx, ny = array.shape
y, x = np.ogrid[-a:nx-a, -b:ny-b]
mask = (x*x + y*y >= rad1*rad1) & (x*x + y*y <= rad2*rad2)
Nmask = np.count_nonzero(mask)
noise = 0.6052697 * abs(Nmask * flux - sum(array[mask]))
return noise
# Now, the call to snr for each pixel in the array data1:<br/>_
frame1 = fits.open(in_frame, mode='readonly') # read in fits file
data1 = frame1[ext].data # convert to np array
ny, nx = data1.shape # array dimensions
noise1 = zeros((ny, nx), float) # empty array
r1 = 5 # inner radius (pixels)
r2 = 7 # outer radius (pixels)
# The function is fast, but calling it 2k x 2k times is not:
for j in range(ny):
for i in range(nx):
noise1[i,j] = der_snr(data1[i,j], i, j, r1, r2, data1)
The operation that you are trying to do can be expressed as an image convolution. Try something like this:
import numpy as np
import scipy.ndimage
from astropy.io import fits
def make_kernel(inner_radius, outer_radius):
if inner_radius > outer_radius:
raise ValueError
x, y = np.ogrid[-outer_radius:outer_radius + 1, -outer_radius:outer_radius + 1]
r2 = x * x + y * y
kernel = (r2 >= inner_radius * inner_radius) & (r2 <= outer_radius * outer_radius)
return kernel
in_frame = '<file path>'
ext = '...'
frame1 = fits.open(in_frame, mode='readonly')
data1 = frame1[ext].data
inner_radius = 5
outer_radius = 7
kernel = make_kernel(inner_radius, outer_radius)
n_kernel = np.count_nonzero(kernel)
conv = scipy.ndimage.convolve(data1, kernel, mode='constant')
noise1 = 0.6052697 * np.abs(n_kernel * data1 - conv)
I'm attempting to implement the frequency domain ReLu as detailed in: http://cs231n.stanford.edu/reports/2015/pdfs/tema8_final.pdf
The formula that is confusing me is on the bottom left of page 4. I am not confident that I am computing the sum of the FFT of the dirac function properly. Am I interpreting this formula incorrectly?
import numpy as np
import matplotlib.pyplot as plt
img = np.array([[-1.0,2.3],[5,7.8]])
# Dirac is essentially the shifting matrix
# So create the 2d shifting matrix values
N = 2
x = np.arange(0, N, 1)
y = np.arange(0, N, 1)
xm, ym = np.meshgrid(x, y)
shiftMat = np.exp(1j * ((2.0 * np.pi)) * (xm + ym))
# Set equal to shift mat
# In this trivial example I know that [0,0] is only negative position
# So set to 0 and compute sum of all positions in which f(x) > 0 as detailed in paper
freqRelu = shiftMat
freqRelu[0,0] = 0
freqRelu = np.sum(freqRelu)
# Fourier Convolution and IFFT
imgFFT = np.fft.fft2(img)
freqR = np.multiply(imgFFT,freqRelu)
reluedFreq = np.real(np.fft.ifft2(freqR))
# Spatial Relu For Comparision
reluedImg = img
reluedImg[0,0] = 0
plt.subplot(121)
plt.imshow(reluedImg)
plt.subplot(122)
plt.imshow(reluedFreq)
plt.show()
print(np.allclose(reluedFreq,reluedImg))
print(reluedFreq)
print(reluedImg)
For reference this question was answered on signal stack exchange here: https://dsp.stackexchange.com/questions/49023/sum-of-diracs-in-frequency-domain
I wonder if there is a possibility to specify the shift expressed by k variable for the cross-correlation of two 1D arrays. Because with the numpy.correlate function and its mode parameter set to 'full' I will get cross-correlate coefficients for each k shift for whole length of the taken array (assuming that both arrays are the same size). Let me show you what I mean exactly on below example:
import numpy as np
# Define signal 1.
signal_1 = np.array([1, 2 ,3])
# Define signal 2.
signal_2 = np.array([1, 2, 3])
# Other definitions.
Xi = signal_1
Yi = signal_2
N = np.size(Xi)
k = 3
Xs = np.average(Xi)
Ys = np.average(Yi)
# Cross-covariance coefficient function.
def crossCovariance(Xi, Yi, N, k, Xs, Ys, forCorrelation = False):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Yi[i]-Ys)
if forCorrelation == True:
return autoCov/N
else:
return (1/(N-1))*autoCov
# Expected value function.
def E(X, P):
expectedValue = 0
for i in np.arange(0, np.size(X)):
expectedValue += X[i] * (P[i] / np.size(X))
return expectedValue
# Cross-correlation coefficient function.
def crossCorrelation(Xi, Yi, k):
# Calculate the covariance coefficient.
cov = crossCovariance(Xi, Yi, N, k, Xs, Ys, forCorrelation = True)
# Calculate standard deviations.
EX = E(Xi, np.ones(np.size(Xi)))
SDX = (E((Xi - EX) ** 2, np.ones(np.size(Xi)))) ** (1/2)
EY = E(Yi, np.ones(np.size(Yi)))
SDY = (E((Yi - EY) ** 2, np.ones(np.size(Yi)))) ** (1/2)
# Calculate correlation coefficient.
return cov / (SDX * SDY)
# Express cross-covariance or cross-correlation function in a form of a 1D vector.
def array(k, norm = True):
# If norm = True, return array of autocorrelation coefficients.
# If norm = False, return array of autocovariance coefficients.
vector = np.array([])
shifts = np.abs(np.arange(-k, k+1, 1))
for i in shifts:
if norm == True:
vector = np.append(crossCorrelation(Xi, Yi, i), vector)
else:
vector = np.append(crossCovariance(Xi, Yi, N, i, Xs, Ys), vector)
return vector
In my example, calling the method array(k, norm = True) for different values of k will give resuslt as I shown below:
k = 3, [ 0. -0.5 0. 1. 0. -0.5 0. ]
k = 2, [-0.5 0. 1. 0. -0.5]
k = 1, [ 0. 1. 0.]
k = 0, [ 1.]
My approach is good for the learning purposes but I need to move to the native numpy functions in order to speed up my analysis. How one could specify the k shift value while using the native numpy.correlate function? PS k parameter specify the "time" shift between two arrays. Thank you in advance.
Whilst I'm not aware of any built-in function for computing the cross-correlation for a particular range of signal lags, you can speed your version up a lot by vectorization, i.e. performing operations on arrays rather than single elements in an array.
This version uses only a single Python loop over the lags:
import numpy as np
def xcorr(x, y, k, normalize=True):
n = x.shape[0]
# initialize the output array
out = np.empty((2 * k) + 1, dtype=np.double)
lags = np.arange(-k, k + 1)
# pre-compute E(x), E(y)
mu_x = x.mean()
mu_y = y.mean()
# loop over lags
for ii, lag in enumerate(lags):
# use slice indexing to get 'shifted' views of the two input signals
if lag < 0:
xi = x[:lag]
yi = y[-lag:]
elif lag > 0:
xi = x[:-lag]
yi = y[lag:]
else:
xi = x
yi = y
# x - mu_x; y - mu_y
xdiff = xi - mu_x
ydiff = yi - mu_y
# E[(x - mu_x) * (y - mu_y)]
out[ii] = xdiff.dot(ydiff) / n
# NB: xdiff.dot(ydiff) == (xdiff * ydiff).sum()
if normalize:
# E[(x - mu_x) * (y - mu_y)] / (sigma_x * sigma_y)
out /= np.std(x) * np.std(y)
return lags, out
Some more general points of advice:
As I mentioned in the comments, you should try to give your functions names that are informative, and that aren't likely to conflict with other things in your namespace (e.g. array vs np.array).
It's much better to make your functions self-contained. In your version, N, k, Xs and Ys are defined outside the main function. In this situation you might accidentally modify or overwrite one of these variables, and it can get tricky to debug errors caused by this sort of thing.
Appending to numpy arrays (e.g. using np.append or np.concatenate) is slow, so avoid it whenever you can. If, as in this case, you know the size of the output ahead of time, it's much faster to pre-allocate the output array (e.g. using np.empty or np.zeros), then fill in the elements. If you absolutely have to do concatenation, it's often faster to append to a normal Python list, then convert it to a numpy array at the end.
It's available by specifying maxlags:
import matplotlib.pyplot as plt
xcorr = plt.xcorr(signal_1, signal_2, maxlags=1)
Documentation can be found here. This implementation is based on np.correlate.
I can generate Gaussian data with random.gauss(mu, sigma) function, but how can I generate 2D gaussian? Is there any function like that?
If you can use numpy, there is numpy.random.multivariate_normal(mean, cov[, size]).
For example, to get 10,000 2D samples:
np.random.multivariate_normal(mean, cov, 10000)
where mean.shape==(2,) and cov.shape==(2,2).
I'd like to add an approximation using exponential functions. This directly generates a 2d matrix which contains a movable, symmetric 2d gaussian.
I should note that I found this code on the scipy mailing list archives and modified it a little.
import numpy as np
def makeGaussian(size, fwhm = 3, center=None):
""" Make a square gaussian kernel.
size is the length of a side of the square
fwhm is full-width-half-maximum, which
can be thought of as an effective radius.
"""
x = np.arange(0, size, 1, float)
y = x[:,np.newaxis]
if center is None:
x0 = y0 = size // 2
else:
x0 = center[0]
y0 = center[1]
return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
For reference and enhancements, it is hosted as a gist here. Pull requests welcome!
Since the standard 2D Gaussian distribution is just the product of two 1D Gaussian distribution, if there are no correlation between the two axes (i.e. the covariant matrix is diagonal), just call random.gauss twice.
def gauss_2d(mu, sigma):
x = random.gauss(mu, sigma)
y = random.gauss(mu, sigma)
return (x, y)
import numpy as np
# define normalized 2D gaussian
def gaus2d(x=0, y=0, mx=0, my=0, sx=1, sy=1):
return 1. / (2. * np.pi * sx * sy) * np.exp(-((x - mx)**2. / (2. * sx**2.) + (y - my)**2. / (2. * sy**2.)))
x = np.linspace(-5, 5)
y = np.linspace(-5, 5)
x, y = np.meshgrid(x, y) # get 2D variables instead of 1D
z = gaus2d(x, y)
Straightforward implementation and example of the 2D Gaussian function. Here sx and sy are the spreads in x and y direction, mx and my are the center coordinates.
Numpy has a function to do this. It is documented here. Additionally to the method proposed above it allows to draw samples with arbitrary covariance.
Here is a small example, assuming ipython -pylab is started:
samples = multivariate_normal([-0.5, -0.5], [[1, 0],[0, 1]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
samples = multivariate_normal([0.5, 0.5], [[0.1, 0.5],[0.5, 0.6]], 1000)
plot(samples[:, 0], samples[:, 1], '.')
In case someone find this thread and is looking for somethinga little more versatile (like I did), I have modified the code from #giessel. The code below will allow for asymmetry and rotation.
import numpy as np
def makeGaussian2(x_center=0, y_center=0, theta=0, sigma_x = 10, sigma_y=10, x_size=640, y_size=480):
# x_center and y_center will be the center of the gaussian, theta will be the rotation angle
# sigma_x and sigma_y will be the stdevs in the x and y axis before rotation
# x_size and y_size give the size of the frame
theta = 2*np.pi*theta/360
x = np.arange(0,x_size, 1, float)
y = np.arange(0,y_size, 1, float)
y = y[:,np.newaxis]
sx = sigma_x
sy = sigma_y
x0 = x_center
y0 = y_center
# rotation
a=np.cos(theta)*x -np.sin(theta)*y
b=np.sin(theta)*x +np.cos(theta)*y
a0=np.cos(theta)*x0 -np.sin(theta)*y0
b0=np.sin(theta)*x0 +np.cos(theta)*y0
return np.exp(-(((a-a0)**2)/(2*(sx**2)) + ((b-b0)**2) /(2*(sy**2))))
We can try just using the numpy method np.random.normal to generate a 2D gaussian distribution.
The sample code is np.random.normal(mean, sigma, (num_samples, 2)).
A sample run by taking mean = 0 and sigma 20 is shown below :
np.random.normal(0, 20, (10,2))
>>array([[ 11.62158316, 3.30702215],
[-18.49936277, -11.23592946],
[ -7.54555371, 14.42238838],
[-14.61531423, -9.2881661 ],
[-30.36890026, -6.2562164 ],
[-27.77763286, -23.56723819],
[-18.18876597, 41.83504042],
[-23.62068377, 21.10615509],
[ 15.48830184, -15.42140269],
[ 19.91510876, 26.88563983]])
Hence we got 10 samples in a 2d array with mean = 0 and sigma = 20