how to calculate signal to noise ratio using python - python

Dear experts i have a data set.i just want to calculate signal to noise ratio of the data. data is loaded here https://i.fluffy.cc/jwg9d7nRNDFqdzvg1Qthc0J7CNtKd5CV.html
my code is given below:
import numpy as np
from scipy import signaltonoise
import scipy.io
dat=scipy.io.loadmat('./data.mat')
arr=dat['dn']
snr=scipy.stats.signaltonoise(arr, axis=0, ddof=0)
but i am getting error like importError: cannot import name 'signaltonoise' from 'scipy'
if it doesn't exist how to calculate snr,please suggest some other way to do it with this data set using python.

scipy.stats.signaltonoise was removed in scipy 1.0.0. You can either downgrade your scipy version or create the function yourself:
def signaltonoise(a, axis=0, ddof=0):
a = np.asanyarray(a)
m = a.mean(axis)
sd = a.std(axis=axis, ddof=ddof)
return np.where(sd == 0, 0, m/sd)
Source: https://github.com/scipy/scipy/blob/v0.16.0/scipy/stats/stats.py#L1963
See the github link for the docstring.
Edit: the full script would then look as follows
import numpy as np
import scipy.io
def signaltonoise(a, axis=0, ddof=0):
a = np.asanyarray(a)
m = a.mean(axis)
sd = a.std(axis=axis, ddof=ddof)
return np.where(sd == 0, 0, m/sd)
dat = scipy.io.loadmat('./data.mat')
arr = dat['dn']
snr = signaltonoise(arr)

More generally speaking, it depends on the application. For many applications, the relation between mean and standard deviation might be sufficient.
As JuliettVictor pointed out, the old scipy implementation's source code can be found online easily and is the most common one. In order to convert this to decibels, one would add
. Before that one should calculate the absolute value, in case the signal's mean is negative:
def signaltonoise_dB(a, axis=0, ddof=0):
a = np.asanyarray(a)
m = a.mean(axis)
sd = a.std(axis=axis, ddof=ddof)
return 20*np.log10(abs(np.where(sd == 0, 0, m/sd)))
This runs into problems though, when the signal of interest contains higher frequencies (e.g. in audio applications, here, DC is often filtered out even).
In matlab's snr() function a kaiser windowed periodogram is used, the peak of the fundamental is detected and its power is computed. The same happens for the harmonics. The rest of the signal is assumed to be noise and their corresponding power levels are calculated.
Other approaches involve low-pass filtering of the signal (similar to calculating its mean).
Yet another python based example can be found here.
An incomplete overview of methods (including a matlab like periodogram-based one) in python can be found here

You can use Sox for that:
def add_noise(exp,sig, n_directory, snrdb):
""" This function calculate the mixture of an audio
and an audio file as a noise with a desired SNR in db
Input:
sig: signal, the output vector of wav.read
rate: the signal's rate
n_directory: directory of noise
snrdb: Desired signal-to-noise ratio in db
Output:
d = s + n such that SNR = Ps/Pn """
direc_st = exp + "mixed.wav"
write(exp + 'sig_t.wav', 16000, sig)
s = np.expand_dims(sig, -1)
s = s.astype('float64')
rate, n = wav.read(n_directory)
n = np.expand_dims(n, -1)
n = n.astype('float64')
snr = 10 ** (snrdb * 0.1)
Es = np.sum(s ** 2)
En = np.sum(n ** 2)
snr = 10 ** (snrdb * 0.1)
alpha = np.sqrt(Es / (snr * En))
cmd = "sox -m -v" + str(
alpha) + " " + n_directory + ' ' + exp + 'sig_t.wav ' + direc_st
os.system(cmd)
rate, d = wav.read(direc_st)
return d

Related

fft normalization vs the norm-option

i have to often normalize the result of circular-convolution, so I 'borrowed'-and-modfied the following routine :
def fft_normalize(x):# Normalize a vector x in complex domain.
c = np.fft.rfft(x)
# Look at real and image as if they were real
ri = np.vstack([c.real, c.imag])
# Normalize magnitude of each complex/real pair
norm = np.linalg.norm(ri, axis=0)
if np.any(norm==0): norm[norm == 0] = np.float64(1e-308) #!fixme
ri= np.divide(ri,norm)
c_proj = ri[0,:] + 1j * ri[1,:]
rv = np.fft.irfft(c_proj, n=x.shape[-1])
return rv
def fft_convolution(a, b):
return np.fft.irfft(np.fft.rfft(a) * np.fft.rfft(b))
so that I do this :
fft_normalize(fft_convolution(a,b))
i see in the numpy docs there is a 'norm' option. Is this equivalent to what i'm doing ?
And if yes, which option should I use ?
def fft_convolution2(a, b):
return np.fft.irfft(np.fft.rfft(a) * np.fft.rfft(b), norm='ortho')
When I test it it behaves better when I do fft_normalize()
Second, i had to add this line, but it does not seem, right. any ideas ?
if np.any(norm==0): norm[norm == 0] = np.float64(1e-308) #!fixme
As a side note, if you know !! numpy docs mentions that they promote float32 to float64 and that scipy.fftpack does not !!
Would fftpack be faster ! because scipy says fftpack is obsolete and there is no info on the new scipy ?
#cris are you sugesting i do it this way :
def fft_normalize(x):# Normalize a vector x in complex domain.
c = np.fft.rfft(x)
ri = np.vstack([c.real, c.imag])
norm = np.abs(c)
if np.any(norm==0): norm[norm == 0] = MIN #!fixme
ri= np.divide(ri,norm)
c_proj = ri[0,:] + 1j * ri[1,:]
rv = np.fft.irfft(c_proj, n=x.shape[-1])
return rv
The norm argument to the FFT functions in NumPy determine whether the transform result is multiplied by 1, 1/N or 1/sqrt(N), with N the number of samples in the array. Normally, the inverse transform is normalized by dividing by N, and the forward transform is not. Specifying “ortho” here causes both transforms to be normalized by 1/sqrt(2). Specifying “forward” causes only the forward transform to be normalized by 1/N.
These normalizations are very different from the one you apply, where you normalize each element in the frequency domain.

Checking if Frequentist approach is correct? Bayesian approach using MCMC for AB test. How to calculate Bayes Factors in Python?

I've been trying to get my head around Frequentist and Bayesian approaches for a toy data AB test problem.
The results don't really make sense to me. I am struggling to understand the results, or whether I have computed them (in)correctly (which is probably likely). Furthermore, after much research, I am still somewhat lost as to how to compute Bayes Factors. I've seen packages in R that make this look somewhat easy. Alas, I am not familiar with R and would prefer to be able to solve this problem in Python.
I would greatly appreciate any help and guidance regarding this!
Here is the data:
# imports
import pingouin as pg
import pymc3 as pm
import pandas as pd
import numpy as np
import scipy.stats as scs
import statsmodels.stats.api as sms
import math
import matplotlib.pyplot as plt
# A = control -- B = treatment
a_success = 10730
a_failure = 61988
a_total = a_success + a_failure
a_cr = a_success / a_total
b_success = 10966
b_failure = 60738
b_total = b_success + b_failure
b_cr = b_success / b_total
I started by doing some power analysis, to determine the number of required samples with a power of 0.8, alpha of 0.05 and a practical significance of 2%. I'm not sure whether expected conversion rates should be supplied, or the baseline + some proportion. Depending on the effect size, the required number of samples increases significantly.
# determine required sample size
baseline_rate = a_cr
practical_significance = 0.02
alpha = 0.05
power = 0.8
nobs1 = None
# is this how to calculate effect size?
effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + practical_significance) # 5204
# # or this?
# effect_size = sms.proportion_effectsize(baseline_rate, baseline_rate + baseline_rate * practical_significance) # 228583
sample_size = sms.NormalIndPower().solve_power(effect_size = effect_size,
power = power,
alpha = alpha,
nobs1 = nobs1,
ratio = 1)
I continued trying to determine if the null hypothesis could be rejected:
# calculate pooled probability
pooled_probability = (a_success + b_success) / (a_total + b_total)
# calculate pooled standard error and margin of error
se_pooled = math.sqrt(pooled_probability * (1 - pooled_probability) * (1 / b_total + 1 / a_total))
z_score = scs.norm.ppf(1 - alpha / 2)
margin_of_error = se_pooled * z_score
# the estimated difference between probability of conversions of both groups
d_hat = (test_b_success / test_b_total) - (test_a_success / test_a_total)
# test if null hypothesis can be rejected
lower_bound = d_hat - margin_of_error
upper_bound = d_hat + margin_of_error
if practical_significance < lower_bound:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
which evaluates to 'do not reject the null ...' despite group B (treatment) showing a 3.65% relative improvement with regards to conversion rate over group A (control) which seems... odd?
I tried a slightly different approach (I guess a slightly different hypothesis?):
successes = [a_success, b_success]
nobs = [a_total, b_total]
z_stat, p_value = sms.proportions_ztest(successes, nobs=nobs)
(lower_a, lower_b), (upper_a, upper_b) = sms.proportion_confint(successes, nobs=nobs, alpha=alpha)
if p_value < alpha:
print("reject null hypothesis -- groups do not have the same conversion rates")
else:
print("do not reject the null hypothesis -- groups have the same conversion rates")
Which evaluates to 'reject null hypothesis ... ' with p-value: 0.004236. This seems highly contradictory, especially since the p-value is < 0.01.
On to Bayes... I created some arrays of success and failures (and only tested on 100 observations) due to how long this thing takes, and ran the following:
# generate lists of 1, 0
obs_a = np.repeat([1, 0], [a_success, a_failure])
obs_v = np.repeat([1, 0], [b_success, b_failure])
for _ in range(10):
np.random.shuffle(observations_A)
np.random.shuffle(observations_B)
with pm.Model() as model:
p_A = pm.Beta("p_A", 1, 1)
p_B = pm.Beta("p_B", 1, 1)
delta = pm.Deterministic("delta", p_A - p_B)
obs_A = pm.Bernoulli("obs_A", p_A, observed = obs_a[:1000])
obs_B = pm.Bernoulli("obs_B", p_B, observed = obs_b[:1000])
step = pm.NUTS()
trace = pm.sample(1000, step = step, chains = 2)
Firstly, I understand that you are supposed to burn some proportion of the trace -- how do you determine an appropriate number of indices to burn?
In trying to evaluate the posterior probabilities, is the following code the correct way to do this?
b_lift = (trace['p_B'].mean() - trace['p_A'].mean()) / trace['p_A'].mean() * 100
b_prob = np.mean(trace["delta"] > 0)
a_lift = (trace['p_A'].mean() - trace['p_B'].mean()) / trace['p_B'].mean() * 100
a_prob = np.mean(trace["delta"] < 0)
# is the Bayes Factor just the ratio of the posterior probabilities for these two models?
BF = (trace['p_B'] / trace['p_A']).mean()
print(f'There is {b_prob} probability B outperforms A by a magnitude of {round(b_lift, 2)}%')
print(f'There is {a_prob} probability A outperforms B by a magnitude of {round(a_lift, 2)}%')
print('BF:', BF)
-- output:
There is 0.666 probability B outperforms A by a magnitude of 1.29%
There is 0.334 probability A outperforms B by a magnitude of -1.28%
BF: 1.013357654428127
I suspect that this is not the correct way to calculate Bayes Factors. How can the Bayes Factor be calculated?
I really hope you can help me understand all of the above... I realize it's an exceptionally long post. But I've tried every resource I can find and am still stuck!
Kind regards.

MCMC Sampling a Maxwellian Curve Using Python's emcee

I am trying to introduce myself to MCMC sampling with emcee. I want to simply take a sample from a Maxwell Boltzmann distribution using a set of example code on github, https://github.com/dfm/emcee/blob/master/examples/quickstart.py.
The example code is really excellent, but when I change the distribution from a Gaussian to a Maxwellian, I receive the error, TypeError: lnprob() takes exactly 2 arguments (3 given)
However it is not called anywhere where it is not given the appropriate parameters? In need of some guidance as to how to define a Maxwellian Curve and have it fit into this example code.
Here is what I have;
from __future__ import print_function
import numpy as np
import emcee
try:
xrange
except NameError:
xrange = range
def lnprob(x, a, icov):
pi = np.pi
return np.sqrt(2/pi)*x**2*np.exp(-x**2/(2.*a**2))/a**3
ndim = 2
means = np.random.rand(ndim)
cov = 0.5-np.random.rand(ndim**2).reshape((ndim, ndim))
cov = np.triu(cov)
cov += cov.T - np.diag(cov.diagonal())
cov = np.dot(cov,cov)
icov = np.linalg.inv(cov)
nwalkers = 50
p0 = [np.random.rand(ndim) for i in xrange(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[means, icov])
pos, prob, state = sampler.run_mcmc(p0, 5000)
sampler.reset()
sampler.run_mcmc(pos, 100000, rstate0=state)
Thanks
I think there are a couple of problems that I see. The main one is that emcee wants you to give it the natural logarithm of the probability distribution function that you want to sample. So, rather than having:
def lnprob(x, a, icov):
pi = np.pi
return np.sqrt(2/pi)*x**2*np.exp(-x**2/(2.*a**2))/a**3
you would instead want, e.g.
def lnprob(x, a):
pi = np.pi
if x < 0:
return -np.inf
else:
return 0.5*np.log(2./pi) + 2.*np.log(x) - (x**2/(2.*a**2)) - 3.*np.log(a)
where the if...else... statement is to explicitly say that negative values of x have zero probability (or -infinity in log-space).
You also shouldn't have to calculate icov and pass it to lnprob as that's only needed for the Gaussian case in the example you link to.
When you call:
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=[means, icov])
the args value should just be any additional arguments that your lnprob function requires, so in your case this would be the value of a that you want to set your Maxwell-Boltxmann distribution up with. This should be a single value rather than the two randomly initialised values you have set when creating mean.
Overall, the following should work for you:
from __future__ import print_function
import emcee
import numpy as np
from numpy import pi as pi
# define the natural log of the Maxwell-Boltzmann distribution
def lnprob(x, a):
if x < 0:
return -np.inf
else:
return 0.5*np.log(2./pi) + 2.*np.log(x) - (x**2/(2.*a**2)) - 3.*np.log(a)
# choose a value of 'a' for the distributions
a = 5. # I'm choosing 5!
# choose the number of walkers
nwalkers = 50
# set some initial points at which to calculate the lnprob
p0 = [np.random.rand(1) for i in xrange(nwalkers)]
# initialise the sampler
sampler = emcee.EnsembleSampler(nwalkers, 1, lnprob, args=[a])
# Run 5000 steps as a burn-in.
pos, prob, state = sampler.run_mcmc(p0, 5000)
# Reset the chain to remove the burn-in samples.
sampler.reset()
# Starting from the final position in the burn-in chain, sample for 100000 steps.
sampler.run_mcmc(pos, 100000, rstate0=state)
# lets check the samples look right
mbmean = 2.*a*np.sqrt(2./pi) # mean of Maxwell-Boltzmann distribution
print("Sample mean = {}, analytical mean = {}".format(np.mean(sampler.flatchain[:,0]), mbmean))
mbstd = np.sqrt(a**2*(3*np.pi-8.)/np.pi) # std. dev. of M-B distribution
print("Sample standard deviation = {}, analytical = {}".format(np.std(sampler.flatchain[:,0]), mbstd))

MINRES implementation in Python

Is there any python implementation of MINRES pseudoinversion algorithm that can deal with Hermitian matrices?
I have found a few sources, but all of them are only capable of working with real matrices and do not seem to be easily generalizable onto the complex case:
https://searchcode.com/codesearch/view/89958680/
https://github.com/pascanur/theano_optimize
(there are a couple of other links, but my reputation does not allow me to post them)
A Hermitian system of size $n$
$$\mathbf y = \mathbf H^{-1}\mathbf v$$
can be embedded in a real, symmetric system of size $2n$:
\begin{equation}
\begin{bmatrix}
\Re(\mathbf y)\\Im(\mathbf y)
\end{bmatrix}=
\begin{bmatrix}
\Re(\mathbf H)&-\Im(\mathbf H)\\Im(\mathbf H)&\Re(\mathbf H)
\end{bmatrix}^{-1}
\begin{bmatrix}
\Re(\mathbf v)\\Im(\mathbf v)
\end{bmatrix}.
\end{equation}
Minimum-residual methods are often used for large problems, where constructing $H$ is impractical. In which case we may have an operation which computes a matrix-vector product, $f: \mathbb C^n \to \mathbb C^n; ,, f(\mathbf v) = \mathbf H\mathbf v.$ This function can be wrapped to operate on $\mathbf x \in \mathbb R^{2n}$ by converting $\mathbf x$ back to a complex vector, applying $f$, and then embedding the result back in $\mathbb R^{2n}$.
Here is an example in python / numpy / scipy:
from scipy.sparse.linalg import minres, LinearOperator
from pylab import *
# Problem size
N = 100
# error helper
er = lambda t,a,b:print('%s error:'%t,mean(abs(a-b)))
# random Hermitian matrix
Q = randn(N,N) + 1j*randn(N,N)
H = Q#conj(Q.T)
# random complex vector
v = randn(N) + 1j*randn(N)
# ground-truth solution
x0 = inv(H)#v
# Pack/unpack complex vector as stacked real vector
c2r = lambda v:block([real(v),imag(v)])
r2c = lambda v:kron([1,1j],eye(N))#v
# Verify that we can embed C^n in R^(2N)
Hr = real(H)
Hi = imag(H)
Hs = block([[Hr,-Hi],[Hi,Hr]])
vs = c2r(v)
xs = inv(Hs)#vs
x1 = r2c(xs)
er('Embed',x0,x1)
# Verify that minres works as expected in R-embed
x2 = r2c(minres(Hs,vs,tol=1e-12)[0])
er('Minres 1',x0,x2)
# Demonstrate using operators
Av = lambda u:c2r( H # r2c(u) )
A = LinearOperator((N*2,)*2,Av,Av)
# Minres, converting input/output to/from complex/real
x3 = r2c(minres(Hs,vs,tol=1e-12)[0])
er('Minres 2',x0,x3)
>>> Embed error: 5.317184726020268e-12
>>> Minres 1 error: 6.641342200989796e-11
>>> Minres 2 error: 6.641342200989796e-11

Efficient way to implement simple filter with varying coeffients in Python/Numpy

I am looking for an efficient way to implement a simple filter with one coefficient that is time-varying and specified by a vector with the same length as the input signal.
The following is a simple implementation of the desired behavior:
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
output = myfilter(signal, weights)
Is there a way to do this more efficiently with numpy or scipy?
You can trade in the overhead of the loop for a couple of additional ops:
import numpy as np
def myfilter(signal, weights):
output = np.empty_like(weights)
val = signal[0]
for i in range(len(signal)):
val += weights[i]*(signal[i] - val)
output[i] = val
return output
def vectorised(signal, weights):
wp = np.r_[1, np.multiply.accumulate(1 - weights[1:])]
sw = weights * signal
sw[0] = signal[0]
sws = np.add.accumulate(sw / wp)
return wp * sws
weights = np.random.uniform(0, 0.1, (100,))
signal = np.linspace(1, 3, 100)
print(np.allclose(myfilter(signal, weights), vectorised(signal, weights)))
On my machine the vectorised version is several times faster. It uses a "closed form" solution of your recurrence equation.
Edit: For very long signal / weight (100,000 samples, say) this method doesn't work because of overflow. In that regime you can still save a bit (more than 50% on my machine) using the following trick, which has the added bonus that you needn't solve the recurrence formula, only invert it.
from scipy import linalg
def solver(signal, weights):
rw = 1 / weights[1:]
v = np.r_[1, rw, 1-rw, 0]
v.shape = 2, -1
return linalg.solve_banded((1, 0), v, signal)
This trick uses the fact that your recurrence is formally similar to a Gauss elimination on a matrix with only one nonvanishing subdiagonal. It piggybacks on a library function that specialises in doing precisely that.
Actually, quite proud of this one.

Categories

Resources