I have two lists ( of different lengths) of numbers.
Using Python, I want to calculate histograms with say 10 bins.
Then I want to smooth these two histograms with Standard kernel (gaussian kernel with mean = 0 ,sigma=1)
Then I want to calculate the KL distance between these 2 smoothed histograms.
I found some code about histogram calculation but no sure about how to apply standard kernel for smoothening and then how to calculate the KL distance.
Please help.
For calculating histograms you can use numpy.histogram() and for gaussian smoothing scipy.ndimage.filters.gaussian_filter(). Kullback-Leibler divergence code can be found here.
Method to calculate do the required calculation would look something like this:
import numpy as np
from scipy.ndimage.filters import gaussian_filter
def kl(p, q):
p = np.asarray(p, dtype=np.float)
q = np.asarray(q, dtype=np.float)
return np.sum(np.where(p != 0, p * np.log(p / q), 0))
def smoothed_hist_kl_distance(a, b, nbins=10, sigma=1):
ahist, bhist = (np.histogram(a, bins=nbins)[0],
np.histogram(b, bins=nbins)[0])
asmooth, bsmooth = (gaussian_filter(ahist, sigma),
gaussian_filter(bhist, sigma))
return kl(asmooth, bsmooth)
Related
I would like to smooth time series data. For this I would like to use Python.
Now I have already found the function scipy.ndimage.gaussian_filter1d.
For this, the array and a sigma value must be passed.
Now to my question:
Is the sigma value equal to the filter length?
I would like to run a filter of length 365 over the data.
Would it then be the correct procedure to set this sigma value to 365 or am I confusing things?
sigma defines how your Gaussian filter are spread around its mean. You can create gaussian filter with a specific size like below.
import numpy as np
import matplotlib.pyplot as plt
sigma1 = 3
sigma2 = 50
def gaussian_filter1d(size,sigma):
filter_range = np.linspace(-int(size/2),int(size/2),size)
gaussian_filter = [1 / (sigma * np.sqrt(2*np.pi)) * np.exp(-x**2/(2*sigma**2)) for x in filter_range]
return gaussian_filter
fig,ax = plt.subplots(1,2)
ax[0].plot(gaussian_filter1d(size=365,sigma=sigma1))
ax[0].set_title(f'sigma= {sigma1}')
ax[1].plot(gaussian_filter1d(size=365,sigma=sigma2))
ax[1].set_title(f'sigma= {sigma2}')
plt.show()
Here is the effect of sigma on the Gaussian filter.
Later, you might convolve your signal with your Gaussian filter.
I would like to compute the RMS Amplitude, of a gaussian white noise signal.
import matplotlib.pyplot as plt
import numpy as np
mean = 0
std = 1.0
t = 100
def zv(t):
return np.random.normal(mean, std, size = t)
def rms(x):
return np.sqrt(np.mean(zv(x)**2))
plt.plot(zv(t))
plt.plot(rms(t))
The plot of zv(t) works - but I don't know why the plot of rms(t) is just empty.
Do you have some comments?
Best Regards
zv(t) returns a one dimensional array of size t. As a result, when you take the mean, it is a single value. You can verify this by printing out the value of rms(t). If you want to create a plot along t for rms, you will need to generate multiple monte carlo samples. For example,
def zv(t):
n = 1000
return np.random.normal(mean, std, size = (n, t))
def rms(x):
return np.sqrt(np.mean(zv(x)**2, axis = 0))
I am trying to solve a Physics equation using a Monte Carlo simulation which I know is very long (I just need to use it to learn about it).
I have around 5 values, one is time and I have the random uncertainties (errors) for each of these values. So like mass is (10 +- 0.1)kg, where the error is 0.1 kg
How do I actually find the distribution of measurements if I performed this experiment 5,000 times for example?
I know I could make 2 arrays of errors, and maybe put them in a function. But what am I supposed to do to then? Do I put the errors in the equation and then add the answer to the arrays, and then put the changed array values in the equation and repeat this a thousand times. Or do I actually calculate the real value and add it to the array.
Please can you help me understand this.
Edit:
The problem I have is basically of a sphere of density ds that is falling by a distance l in time t through a liquid of density dl, this fits in an equation for viscosity and I need to find the distribution of viscosity measurements.
The equation shouldn't matter at, whatever equation I have I should be able to use a method like this to find the distribution of measurements. Weather I'm dropping a ball out a window or whatever.
Basic Monte Carlo is very straightforward. The following might get you started:
import random,statistics,math
#The following function generates a
#random observation of f(x) where
#x is a vector of independent normal variables
#whose means are given by the vector mus
#and whose standard deviations are given by sigmas
def sample(f,mus,sigmas):
x = (random.gauss(m,s) for m,s in zip(mus,sigmas))
return f(*x)
#do n times, returning the sample mean and standard deviation:
def monte_carlo(f,mus,sigmas,n):
samples = [sample(f,mus,sigmas) for _ in range(n)]
return (statistics.mean(samples), statistics.stdev(samples))
#for testing purposes:
def V(r,h):
return math.pi*r**2*h
print(monte_carlo(V,(2,4),(0.02, 0.01),1000))
With output:
(50.2497301631037, 1.0215188736786902)
Ok, lets try with simple example - you have air gun which shoots balls with mass m and velocity v. You have to measure kinetic energy
E = m*v2 / 2
There is distribution of velocity - gaussian with mean value of 10 and std deviation 1.
There is distribution of masses - but we cannot do gaussian, lets assume it is truncated normal, with low limit of 1, so that there is no negative values, with loc equal to 5 and scale equal to 3.
So what we will do - sample velocity, sample mass, use them to find kinetic energy, do it multiple times, build energy distribution, get mean value, get std deviation, draw graphs etc
Some simple Python code
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import truncnorm
def sampleMass(n, low, high, mu, sigma):
"""
Sample n mass values from truncated normal
"""
tn = truncnorm(low, high, loc=mu, scale=sigma)
return tn.rvs(n)
def sampleVelocity(n, mu, sigma):
return np.random.normal(loc = mu, scale = sigma, size=n)
mass_low = 1.
mass_high = 1000.
mass_mu = 5.
mass_sigma = 3.0
vel_mu = 10.0
vel_sigma = 1.0
nof_trials = 100000
mass = sampleMass(nof_trials, mass_low, mass_high, mass_mu, mass_sigma) # get samples of mass
vel = sampleVelocity(nof_trials, vel_mu, vel_sigma) # get samples of velocity
kinenergy = 0.5 * mass * vel*vel # distribution of kinetic energy
print("Mean value and stddev of the final distribution")
print(np.mean(kinenergy))
print(np.std(kinenergy))
print("Min/max values of the final distribution")
print(np.min(kinenergy))
print(np.max(kinenergy))
# print histogram of the distribution
n, bins, patches = plt.hist(kinenergy, 100, density=True, facecolor='green', alpha=0.75)
plt.xlabel('Energy')
plt.ylabel('Probability')
plt.title('Kinetic energy distribution')
plt.grid(True)
plt.show()
with output like
Mean value and stddev of the final distribution
483.8162951263243
118.34049421853899
Min/max values of the final distribution
128.86671038372
1391.400187563612
are there any functions or libraries implemented in python for computing the image skeleton (skeletonization) using chamfer distance transform?
the following link is an example of chamfer distance transform:
http://www.inf.u-szeged.hu/~palagyi/skel/chamfer34.gif
Thank you
Your question was not clearly phrased. Chamfer distance is the distance between two curves or two binary images
Say you have two curves.
Curve A
Curve B
The simplest way to calculate the Chamfer transform is convert curve A into Distance Transform in a image. Then use the distances to calculate the nearest distance between each point in Curve A and points of curve B.
In other words, the sum of closest point distances between both curves or binary images.
Sample Code
import numpy as np
import cv2
# for Chamfer Distance between two curves
p_a - n x 2 numpy array
p_b - n x 2 numpy array
image_shape - (h, w) tuple
def chamfer(p_a, p_b, image_shape):
mask = np.ones(image_shape[:2], dtype=np.uint8) * 255
mask[p_a[:, 1].astype(int), p_a[:, 0].astype(int)] = 0
dist = cv2.distanceTransform(mask, cv2.DIST_L2, 3, dstType=cv2.CV_32F)
return dist[p_b[:, 1].astype(int), p_b[:, 0].astype(int)].sum()
chamfer_dist = 0.5 * (chamfer(p_a, p_b, image_shape) + chamfer(p_b, p_a, image_shape)
Another option is to use Hausdorff Distance which is considered in some respects to better
Not sure if this is what you're looking for, but I have an efficient implementation of the TEASAR skeletonization algorithm using the exact euclidean distance transform here: https://github.com/seung-lab/kimimaro
I've been working with this for the last days and I couldn't see yet where is the problem.
I'm trying to weight a function with 2 variables f(q,r) within a Gaussian distribution g(r) with a specific mean value (R0) and deviation (sigma). This is needed because the theoretical function f(q) has a certain dispersity in its r variable when analyzed experimentally. Therefore, we use a probability density function to weigh our function in the r variable.
I include the code, which works, but doesn't give the expected result (the weighted curve should be smoother as the polydispersity grows (higher sigma) as it is shown below. As you can see, I integrated the convolution of the 2 functions f(r,q)*g(r) from r = 0 to r = +inf.
The result is plotted to compare the weigh result with the simple function:
from scipy.integrate import quad, quadrature
import numpy as np
import math as m
import matplotlib.pyplot as plt
#function weighted with a probability density function (gaussian)
def integrand(r,q):
#gaussian function normalized
def gauss_nor(r):
#gaussian function
def gauss(r):
return m.exp(-((r-R0)**2)/(2*sigma**2))
return (m.exp(-((r-R0)**2)/(2*sigma**2)))/(quad(gauss,0,np.inf)[0])
#function f(r,q)
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
return gauss_nor(r)*f(r,q)
#quadratic integration of the integrand (from 0 to +inf)
#integrand is function*density_function (gauss)
def function(q):
return quad(integrand, 0, np.inf, args=(q))[0]
#parameters used in the function
R0=20
sigma=5
#range to plot q
q=np.arange(0.001,2.0,0.005)
#vector where the result of the integral will be saved
function_vec = np.vectorize(function)
#vector for the squared power of the integral
I=[]
I=(function_vec(q))**2
#function without density function
I0=[]
I0=(3*(np.sin(q*R0)-q*R0*np.cos(q*R0))/((R0*q)**3))**2
#plot of weighted and non-weighted functions
p1,=plt.plot(q,I,'b')
p3,=plt.plot(q,I0,'r')
plt.legend([p1,p3],('Weighted','No weighted'))
plt.yscale('log')
plt.xscale('log')
plt.show()
Thank you very much. I've been with this problems for some days already and I haven't found the mistake.
Maybe somebody know how to weigh a function with a PDF in an easier way.
I simplified your code, the output is the same as yours. I think it's already very smooth, there are some very sharp peak in the log-log graph, just because the curve has zero points. So it's not smooth in a log-log graph, but it's smooth in a normal X-Y graph.
import numpy as np
def gauss(r):
return np.exp(-((r-R0)**2)/(2*sigma**2))
def f(r,q):
return 3*(np.sin(q*r)-q*r*np.cos(q*r))/((r*q)**3)
R0=20
sigma=5
qm, rm = np.ogrid[0.001:2.0:0.005, 0.001:40:1000j]
gr = gauss(rm)
gr /= np.sum(gr)
fm = f(rm, qm)
fm *= gr
plot(qm.ravel(), fm.sum(axis=1)**2)
plt.yscale('log')
plt.xscale('log')