I'm following a tutorial on the usage of Python in bioinformatics. In the tutorial a Mann-Whitney U test was performed via the function below.
numpy.random.seed was used in the first line after packages but nowhere else. I was wondering what is the use for this action as it seemingly doesn't effect the results?
def mannwhitney(descriptor, verbose=False):
from numpy.random import seed
from numpy.random import randn
from scipy.stats import mannwhitneyu
seed(1)
selection =[descriptor, "Bioactivity_Class"]
df = df_2class[selection]
active = df[df.Bioactivity_Class == "active"]
active = active[descriptor]
selection=[descriptor,"Bioactivity_Class"]
df = df_2class[selection]
inactive = df[df.Bioactivity_Class == "inactive"]
inactive = inactive[descriptor]
stat,p = mannwhitneyu(active,inactive)
#creating a result dataframe for easier interpretation
alpha = 0.05
if p> alpha:
interpretation = "Same distribution (fail to reject H0)"
else:
interpretation = "Different distribution (reject H0)"
results = pd.DataFrame ({"Descriptor": descriptor,"Statistics": stat,"p":p,
"alpha":alpha, "Interpretation":interpretation},
index =[0])
return results
I have a set whose samples are discrete values (in particular, the size of a queue over time). Now I'd like to find what distribution they belong to. To achieve this goal I'd act the same way I did for the other quantities, i.e. plotting a qqplot, launching
import statsmodels.api as sm
sm.qqplot(df, dist = 'geom', sparams = (.5,), line ='s', alpha = 0.3, marker ='.')
This works if dist is not a discrete random variables (e.g. 'exp' or 'norm') and indeed I used to get some results, but when the distribution is discrete (say, 'geom'), I get
AttributeError: 'geom_gen' object has no attribute 'fit'
I searched on the Internet how to make a qqplot (or something similar) to spot what distribution my samples belong to but I found nothing
def discreteQQ(x_sample):
p_test = np.array([])
for i in range(0, 1001):
p_test = np.append(p_test, i/1000)
i = i + 1
x_sample = np.sort(x_sample)
x_theor = stats.geom.rvs(.5, size=len(x_sample))
ecdf_sample = np.arange(1, len(x_sample) + 1)/(len(x_sample)+1)
x_theor = stats.geom.ppf(ecdf_sample, p=0.5)
for p in p_test:
plt.scatter(np.quantile(x_theor, p), np.quantile(x_sample, p), c = 'blue')
plt.xlabel('Theoretical quantiles')
plt.ylabel('Sample quantiles')
plt.show()
Generate a theoretical geometric distribution using scipy.stats.geom, convert the sample and theoretical data using statsmodels' ProbPlot and pass these to statsmodels' qqplot_2samples.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from statsmodels.graphics.gofplots import ProbPlot
from statsmodels.graphics.gofplots import qqplot_2samples
p_theor = 1/4 # The probability we check for
p_sample = 1/5 # The true probability of the sample distribution
# The experimental data
x_sample = stats.geom.rvs(p_sample, size=50)
# The model data
x_theor = stats.geom.rvs(p_theor, size=100)
qqplot_2samples(ProbPlot(x_sample), ProbPlot(x_theor), line='45')
plt.show()
I have an integration equations to calculate key rate and need to convert it into Python.
The equation to calculate key rate is given by:
where R(n) is:
and p(n)dn is:
The key rate should be plotted like this:
I have sucessfully plotted the static model of the graph using following equation:
import numpy as np
import math
from math import pi,e,log
import matplotlib.pyplot as plt
n1=np.arange(10, 55, 1)
n=10**(-n1/10)
Y0=1*(10**-5)
nd=0.25
ed=0.03
nsys=nd*n
QBER=((1/2*Y0)+(ed*nsys))/(Y0+nsys)
H2=-QBER*np.log2(QBER)-(1-QBER)*np.log2(1-QBER)
Rsp=np.log10((Y0+nsys)*(1-(2*H2)))
print (Rsp)
plt.plot(n1,Rsp)
plt.xlabel('Loss (dB)')
plt.ylabel('log10(Rate)')
plt.show()
However, I failed to plot the R^ratewise model. This is my code:
import numpy as np
import matplotlib.pyplot as plt
def h2(x):
return -x*np.log2(x)-(1-x)*np.log2(1-x)
e0=0.5
ed=0.03
Y0=1e-5
nd=0.25
nt=np.linspace(0.1,0.00001,1000)
y=np.zeros(np.size(nt))
Rate=np.zeros(np.size(nt))
eta_0=0.0015
for (i,eta) in enumerate(nt):
nsys=eta*nd
sigma=0.9
y[i]=1/(eta*sigma*np.sqrt(2*np.pi))*np.exp(-(np.log(eta/eta_0)+(1/2*sigma*sigma))**2/(2*sigma*sigma))
Rate[i]=(max(0.0,(Y0+nsys)*(1-2*h2(min(0.5,(e0*Y0+ed*nsys)/(Y0+nsys))))))*y[i]
plt.plot(nt,np.log10(Rate))
plt.xlabel('eta')
plt.ylabel('Rate')
plt.show()
Hopefully that anyone can help me to code the key rate with integration p(n)dn as stated above. This is the paper for referrence:
key rate
Thank you.
I copied & ran your second code block as-is, and it generated a plot. Is that what you wanted?
Using y as the p(n) in the equation, and the Rsp as the R(n), you should be able to use
NumPy's trapz function
to approximate the integral from the sampled p(n) and R(n):
n = np.linspace(0, 1, no_of_samples)
# ...generate y & Rst from n...
R_rate = np.trapz(y * Rst, n)
However, you'll have to change your code to sample y & Rst using the same n, spanning from 0 to 1`.
P.S. there's no need for the loop in your second code block; it can be condensed by removing the i's, swapping eta for nt, and using NumPy's minimum and maximum functions, like so:
nsys=nt*nd
sigma=0.9
y=1/(nt*sigma*np.sqrt(2*np.pi))*np.exp(-(np.log(nt/eta_0)+(1/2*sigma*sigma))**2/(2*sigma*sigma))
Rate=(np.maximum(0.0,(Y0+nsys)*(1-2*h2(np.minimum(0.5,(e0*Y0+ed*nsys)/(Y0+nsys))))))*y
I am writing a code about a mono-energetic gamma beam which the dominated interaction is photoelectric absorption, mu=2 cm-1, and i need to generate 50000 random numbers and sample the interaction depth(which I do not know if i did it or not).
I know that the mean free path=mu-1, but I need to find the mean free path from the simulation and from mu and compare them, is what I did right in the code or not?
import random
import matplotlib.pyplot as plt
import numpy as np
mu=(2)
random.seed=()
data = np.random.randn(50000)*10
bins = np.arange(data.min(), data.max()+1e-8, 0.1)
meanfreepath = 1/mu
print(meanfreepath)
plt.hist(data, bins=bins)
plt.show()
Well, interaction depth distribution is Exponential one, not a gaussian.
So code would be
lmbda = 2 # cm^-1
beta = 1.0/lmbda
data = np.random.exponential(scale=beta, size=50000)
mfp = np.mean(data)
print(mfp)
# build histogram
More details at https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.exponential.html
Code above produced
0.4977168417102998
which looks like 2-1 to me
Firstly, there are a few topics on this but they involve deprecated packages with pandas etc. Suppose I'm trying to predict a variable w with variables x,y and z. I want to run a multiple linear regression to try and predict w. There are quite a few solutions that will produce the coefficients but I'm not sure how to use these. So, in pseudocode;
import numpy as np
from scipy import stats
w = np.array((1,2,3,4,5,6,7,8,9,10)) # Time series I'm trying to predict
x = np.array((1,3,6,1,4,6,8,9,2,2)) # The three variables to predict w
y = np.array((2,7,6,1,5,6,3,9,5,7))
z = np.array((1,3,4,7,4,8,5,1,8,2))
def model(w,x,y,z):
# do something!
return guess # where guess is some 10 element array formed
# using multiple linear regression of x,y,z
guess = model(w,x,y,z)
r = stats.pearsonr(w,guess) # To see how good guess is
Hopefully this makes sense as I'm new to MLR. There is probably a package in scipy that does all this so any help welcome!
You can use the normal equation method.
Let your equation be of the form : ax+by+cz +d =w
Then
import numpy as np
x = np.asarray([[1,3,6,1,4,6,8,9,2,2],
[2,7,6,1,5,6,3,9,5,7],
[1,3,4,7,4,8,5,1,8,2],
[1,1,1,1,1,1,1,1,1,1]]).T
y = numpy.asarray([1,2,3,4,5,6,7,8,9,10]).T
a,b,c,d = np.linalg.pinv((x.T).dot(x)).dot(x.T.dot(y))
Think I've found out now. If anyone could confirm that this produces the correct results that'd be great!
import numpy as np
from scipy import stats
# What I'm trying to predict
y = [-6,-5,-10,-5,-8,-3,-6,-8,-8]
# Array that stores two predictors in columns
x = np.array([[-4.95,-4.55],[-10.96,-1.08],[-6.52,-0.81],[-7.01,-4.46],[-11.54,-5.87],[-4.52,-11.64],[-3.36,-7.45],[-2.36,-7.33],[-7.65,-10.03]])
# Fit linear least squares and get regression coefficients
beta_hat = np.linalg.lstsq(x,y)[0]
print(beta_hat)
# To store my best guess
estimate = np.zeros((9))
for i in range(0,9):
# y = x1b1 + x2b2
estimate[i] = beta_hat[0]*x[i,0]+beta_hat[1]*x[i,1]
# Correlation between best guess and real values
print(stats.pearsonr(estimate,y))