I would like to sample over a particular probability distribution that I define, for example (1+k*cos^2(theta)). I would like to do it in python, but ideas on other languages are welcome. I thought it was possible to create a function that we would do a random sampling from. Maybe too naive of me. Could you give me any tips or suggestions?
Related
I want to cluster some stars based on given position (X,Y,Z) using DBSCAN, I do not know how to adjust the data to get the right numbers of clusters to plot it afterward?
this is how the data looks like
what is the right parameters for these data?
the number of rows are 1.202672e+06
import pandas as pd
data = pd.read_csv('datasets/full_dataset.csv')
from sklearn.cluster import DBSCAN
clusters=DBSCAN(eps=0.5,min_samples=40,metric="euclidean",algorithm="auto")
min_samples is arguably one of the tougher ones to choose, but you can decide that by just looking at the results and deciding how much noise you are okay with.
Choosing eps can be aided by running k-NN to understand the density distribution of your data. I believe that the DBACAN paper recommends in more detail. There might even be a way to plot this in python (in R it is kNNdistplot).
I would prefer to use OPTICS, which is essentially doing all eps values simultaneously. However, I haven't found a decent implementation of this in either in python or R. In fact, there is an incorrect implementation in python which doesn't follow the original OPTICS paper at all.
If you really want to use optics, I recommend using a java implementation available using ELKI.
If anyone else has heard of a proper python implementation, I'd love to hear it.
If you want to go the trial and error route, start eps much smaller and go from there.
Suppose you have a record of distribution for each day in some period. For example, some distribution which depends on a parameter which evolves over time. Suppose we have dozens or hundreds of days. How would you visualize the change of this distribution? I want the plot to contain as much info as possible.
One way I could think of is: proxy the density function and plot these curves's evolution in 2D. It will be some form of homotopy: the initial distribution converging to the final with some smooth step. Of course here I assume smoothness.
Thanks for your help.
The question is theoretical in nature but I am aiming for python realization so implementations or suggestions for libraries are also welcome.
I have sorted data with pandas so that I have this dataframe (I work with anaconda, jupyter notebook):
I showed a histogram with the abscissa indexing "écart G-D" and ordinate "probabilité".
I found a topic on stack overflow that deals exactly what I want to do except that it is 7 years old and the code is obsolete! I still tried while correcting some things but it does not work (besides I do not even understand the code) ...
Here is the link of the topic:
Fitting empirical distribution to theoretical ones with Scipy (Python)?
I would like to graphically test the probability density function that best follows the shape of my histogram.
If anyone could enlighten me, it would be great because I'm really in a bind ...
Thank you.
You can fit your data manually by calculating the parameters of a distribution(mean, lambda, etc) and use scipy to generate that distribution. Also, if your main objective is just fit the data to a distribution and then use that distribution later, you can use another software (Stat::Fit) to best fit to your data automatically and plot it on the histogram.
You can use the distfit library in Python. It will determine the best theoretical distribution for your data.
I want to generate a continuous distribution (type Maxwell-Boltzmann) with Python. I mean, I want to create the distribution in order to generate random values.
This link kinda helps:
Create a continuous distribution in python
I don't have idea where to start, I know the analytical function that generates the distribution but I don't know how to implement that. Can anyone help me?
Thanks in advance
scipy has a Maxwell-Boltzmann distribution built-in, maxwell, and its pdf method will give you the probability density function.
I am using Python (SimPy package mostly, but it is irrelevant to the question I think), modeling some systems and running simulations. For this purpose I need to produce random numbers that follow distributions. I have done alright so far with some distributions like exponential and normal by importing the random (eg from random import *) and using the expovariate or normalvariate methods. However I cannot find any method in random that produce numbers that follow the Erlang distribution. So:
Is there some method that I overlooked?
Do I have to import some other library?
Can I make some workaround? (In think that I can use the Exponential distribution to produce random “Erlang” numbers but I am not sure how. A piece of code might help me.
Thank you in advance!
Erlang distribution is a special case of the gamma distribution, which exists as numpy.random.gamma (reference). Just use an integer value for the k ("shape") argument. See also about scipy.stats.gamma for functions with the PDF, CDF etc.
As the previous answer stated, the erlang distribution is a special case of the gamma distribution. As far as I know, you do not, however, need the numpy package. Random numbers from a gamma distribution can be generated in python using random.gammavariate(alpha, beta).
Usage:
import random
print random.gammavariate(3,1)