Python random library: Simulating from Pareto distribution (using shape & scale params)

Python random library: Simulating from Pareto distribution (using shape & scale params) - python

According to the Python docs, random.paretovariate(alpha) simulates from the Pareto distribution where alpha is the shape parameter. But the Pareto distribution takes both a shape and scale parameter.
How can I simulate from this distribution specifying both parameters?

You can use NumPy instead:
from numpy import random
pareto = random.pareto(a=4, size=(4, 8))
print(pareto)
>>>[[0.32803729 0.03626127 0.73736579 0.53301595 0.33443536 0.12561402
0.00816275 0.0134468 ]
[0.21536643 0.15798882 0.52957712 0.06631794 0.03728101 0.80383849
0.01727098 0.03910042]
[0.24481661 0.13497905 0.00665971 0.41875676 0.20252262 0.13701287
0.06929994 0.05350275]
[0.93898544 0.02621125 0.0873763 0.15660287 0.31329102 3.95332518
0.09149938 0.08415795]]
You can also nicely graph the data using matplotlib and seaborn:
from numpy import random
import matplotlib.pyplot as plt
import seaborn
seaborn.distplot(random.pareto(a=4, size=1000), kde=False)
plt.show()

Related

How to plot a smooth curve in python for a list of values?

I have created a list of values of Shannon entropy for a pair of multiple sequence aligned sequences. While plotting the values I get a simple plot. I want to plot a smooth curve over the lines. Can anyone suggest to me what will be the right way to process it? BAsically I want to plot a smooth curve that touches the tip of every bar and goes to zero where the "y axis value" is zero.
link for image: [1]: https://i.stack.imgur.com/SY3jH.png
#importing the relevant packages
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline
from Bio import AlignIO
import warnings
warnings.filterwarnings("ignore")
#function to calculate the Shannon Entropy of a MSA
# H = -sum[p(x).log2(px)]
def shannon_entropy(list_input):
unique_aa = set(list_input)
M = len(list_input)
entropy_list = []
# Number of residues in column
for aa in unique_aa:
n_i = list_input.count(aa)
P_i = n_i/float(M)
entropy_i = P_i*(math.log(P_i,2))
entropy_list.append(entropy_i)
sh_entropy = -(sum(entropy_list))
#print(sh_entropy)
return sh_entropy
#importing the MSA file
#importing the clustal file
align_clustal1 =AlignIO.read("/home/clustal.aln", "clustal")
def shannon_entropy_list_msa(alignment_file):
shannon_entropy_list = []
for col_no in range(len(list(alignment_file[0]))):
list_input = list(alignment_file[:, col_no])
shannon_entropy_list.append(shannon_entropy(list_input))
return shannon_entropy_list
clustal_omega1 = shannon_entropy_list_msa(align_clustal1)
# Plotting the data
plt.figure(figsize=(18,10))
plt.plot(clustal_omega1, 'r')
plt.xlabel('Residue', fontsize=16)
plt.ylabel("Shannon's entropy", fontsize=16)
plt.show()
Edit 1:
Here is what my graph looks like after implementing the "pchip" method. link for the pchip output: https://i.stack.imgur.com/hA3KW.png

pchip monotonic spline output
One approach would be to use PCHIP interpolation, which will give you the monotonic curve with the required behaviour for zero values on the y-axis.
We can't run your exact code example on our machines because you point to a local Clustal file in your 'home' directory.
Here's a simple working example, with link to output image:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import pchip
mylist = [10,0,0,0,0,9,9,0,0,0,11,11,11,0,0]
mylist_np = np.array(mylist)
samples = np.array(range(len(mylist)))
xnew = np.linspace(samples.min(), samples.max(), 100)
plt.plot(xnew,pchip(samples, mylist_np )(xnew))
plt.show()

Seaborn violin plot over time given numpy ndarray

I have a distribution that changes over time for which I would like to plot a violin plot for each time step side-by-side using seaborn. My initial attempt failed as violinplot cannot handle a np.ndarray for the y argument:
import numpy as np
import seaborn as sns
time = np.arange(0, 10)
samples = np.random.randn(10, 200)
ax = sns.violinplot(x=time, y=samples) # Exception: Data must be 1-dimensional
The seaborn documentation has an example for a vertical violinplot grouped by a categorical variable. However, it uses a DataFrame in long format.
Do I need to convert my time series into a DataFrame as well? If so, how do I achieve this?

A closer look at the documentation made me realize that omitting the x and y argument altogether leads to the data argument being interpreted in wide-form:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
samples = np.random.randn(20, 10)
ax = sns.violinplot(data=samples)
plt.show()

In the violin plot documentation it says that the input x and y parameters do not have to be a data frame, but they have a restriction of having the same dimension. In addition, the variable y that you created has 10 rows and 200 columns. This is detrimental when plotting the graphics and causes a dimension problem.
I tested it and this code has no problems when reading the python file.
import numpy as np
import seaborn as sns
import pandas as pd
time = np.arange(0, 200)
samples = np.random.randn(10, 200)
for sample in samples:
ax = sns.violinplot(x=time, y=sample)
You can then group the resulting graphs using this link:
https://python-graph-gallery.com/199-matplotlib-style-sheets/
If you want to convert your data into data frames it is also possible. You just need to use pandas.
example
import pandas as pd
x = [1,2,3,4]
df = pd.DataFrame(x)

Why Error on the fitting functions is too large also curve seems to pass from maximum number of points. How to reduce that error?

I want to use power law to fit on my data points because I have to calculate the value of v. But the error on my fitting parameters is too large, although curve seems to pass all data points. How to reduce is error?
`import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import math
import scipy
from scipy import optimize
x_data= np.array([30, 45, 60, 75])
y_data= np.array([0.42597867, 0.26249343, 0.19167837, 0.08116507])
fig = plt.figure()
ax= fig.add_subplot(111)
def ff(L,v,c):
return (L**(-1/v)+c)
ax2.scatter(x_data, y_data, marker='s',s=4**2,)
pfit,pcov = optimize.curve_fit(ff,x_data,y_data)
print("pfit: ",pfit)
print("pcov: ",pcov.shape)
#print(pcov)
perr = np.sqrt(np.diag(pcov))
x=np.linspace(20,85,1000)
ax2.plot(x,ff(x,*pfit),color='red')`

How to pull samples with a tweedie distribution using numpy

I'm trying to plot a CDF of random samples to compare to a target within a dataset that follows a tweedie distribution. I know the following code will pull random samples along a poisson distribution:
import numpy as np
import matplotlib.pyplot as plt
x_r = np.random.poisson(lam = coll_df['pure_premium'].mean(), size = len(coll_df['pure_premium'])).sort()
y_r = np.arange(1, len(x)+1)/len(x)
_ = plt.plot(x, y_r, color = 'red')
_ = plt.xlabel('Percent of Pure Premium')
_ = plt.ylabel('ECDF')
However, there is no tweedie distribution option on the random sampling. Anyone know how to hack this together?

PyPI has a tweedie package. A minimal example drawing a sample would be:
import tweedie, seaborn as sns, matplotlib.pyplot as plt
tvs = tweedie.tweedie(mu=10, p=1.5, phi=20).rvs(100000)
sns.distplot(tvs)
plt.show()
The package's GitHub pages have a more fancy example. The package implements rv_continuous, so one gets a bunch of other functionality besides rvs(). Also, while there seems no nice online docs, help(tweedie.tweedie) gives lots of detail.

How to locate the median in a (seaborn) KDE plot?

I am trying to do a Kernel Density Estimation (KDE) plot with seaborn and locate the median. The code looks something like this:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
sns.kdeplot(data, shade=True)
# x_median, y_median = magic_function()
# plt.vlines(x_median, 0, y_median)
plt.show()
As you can see I need a magic_function() to fetch the median x and y values from the kdeplot. Then I would like to plot them with e.g. vlines. However, I can't figure out how to do that. The result should look something like this (obviously the black median bar is wrong here):
I guess my question is not strictly related to seaborn and also applies to other kinds of matplotlib plots. Any ideas are greatly appreciated.

You need to:
Extract the data of the kde line
Integrate it to calculate the cumulative distribution function (CDF)
Find the value that makes CDF equal 1/2, that is the median
import numpy as np
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_palette("hls", 1)
data = np.random.randn(30)
p=sns.kdeplot(data, shade=True)
x,y = p.get_lines()[0].get_data()
#care with the order, it is first y
#initial fills a 0 so the result has same length than x
cdf = scipy.integrate.cumtrapz(y, x, initial=0)
nearest_05 = np.abs(cdf-0.5).argmin()
x_median = x[nearest_05]
y_median = y[nearest_05]
plt.vlines(x_median, 0, y_median)
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python random library: Simulating from Pareto distribution (using shape & scale params) - python

According to the Python docs, random.paretovariate(alpha) simulates from the Pareto distribution where alpha is the shape parameter. But the Pareto distribution takes both a shape and scale parameter. How can I simulate from this distribution specifying both parameters?

Related

How to plot a smooth curve in python for a list of values?

Seaborn violin plot over time given numpy ndarray

Why Error on the fitting functions is too large also curve seems to pass from maximum number of points. How to reduce that error?

How to pull samples with a tweedie distribution using numpy

How to locate the median in a (seaborn) KDE plot?

Categories

Resources