Draw a Bell Curve on my Distribution Sample [duplicate] - python

This question already has answers here:
Fitting a Normal distribution to 1D data
(4 answers)
Python: Visualize a normal curve on data's histogram
(1 answer)
How do I draw a histogram for a normal distribution using python matplotlib?
(3 answers)
Fit a curve to a histogram in Python
(2 answers)
How to draw a matching Bell curve over a histogram?
(1 answer)
Closed 1 year ago.
I have the following piece of code:
from pyspark.sql import DataFrame
import plotly.express as px
import matplotlib.pyplot as plt
dfPy = sqlContext.table("df")
pd = dfPy.toPandas()
pd[['col4']].plot(kind='hist', bins=[0,10,20,30,40,50,60,70,80,90,100], rwidth=0.8)
plt.show()
And I get to see the following result of running it in the Apache Zeppelin notebook:
As it can be seen that I have two issues:
How can I draw a bell curve? Seems the distribution is not normal or gaussian like. So I suppose that I should do some data transformation. Correct?
How can I now draw a bell curve on the resulting histogram?

Related

How add a plot to histogram in seaborn.jointplot? [duplicate]

This question already has answers here:
How to overlay a Seaborn jointplot with a "marginal" (distribution histogram) from a different dataset
(4 answers)
How to add manually customised seaborn plots to JointGrid/jointplot
(1 answer)
Closed 3 days ago.
I have a seaborn jointplot:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
def plot():
random_matrix=np.random.standard_normal((1000,2))
df=pd.DataFrame(random_matrix,columns=["x","y"])
print(df)
b=10
plot=sns.jointplot(data=df,x="x", y="y",ratio=1,s=1,marginal_ticks=True,marginal_kws=dict(bins=b))
plot.fig.suptitle("title")
plot.figure.savefig("./plot.png")
That produces the following plot: plot
Now let's say that I have already fit a function to one of these histograms (let's say to the upper one). How could I plot this function on that histogram?
I just started to use the seaborn package so I have no intuition how one can do that. Before I plotted histograms and a scatter plot separately in matplotlib and put them together in such a composition but I want to use a more automatic tool and seaborn seems to be the one.

Creating a horizontal chart in python [duplicate]

This question already has answers here:
Python Pylab scatter plot error bars (the error on each point is unique)
(2 answers)
Error Bars with Seaborn and Stripplot
(1 answer)
Closed 3 months ago.
I am trying to generate a horizontal chart for the dataset I have. I am trying a get a chart that looks like this using Python.
This is how my dataset looks like:
How do I get this dataset to look like a straight horizontal line with a dot representing the average?
When I use plt.plot, I get something like this:

show grid by default [duplicate]

This question already has answers here:
How do I draw a grid onto a plot in Python?
(4 answers)
Change matplotlib grid color with rcParams
(2 answers)
Closed 1 year ago.
I am using the matplotlib.pyplot package in a Jupyter Notebook and for each separate plot I am turning on the grid:
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.grid()
Is there a way to show the grid by default, throughout the whole Notebook?

How can I get the "shape" of some data so I can generate similar random numbers in numpy/scipy [duplicate]

This question already has answers here:
Fitting empirical distribution to theoretical ones with Scipy (Python)?
(13 answers)
Python: Generate random values from empirical distribution
(1 answer)
Closed 2 years ago.
Apologies. I know what I want to do, but am not sure what it is called and so haven't been able to search for it.
I am chasing down some anomalies in data (two reports which should add to the same total based on about 50K readings differ slightly). I therefore want to generate some random data which is the same "shape" as the data in question in order to determine whether this might be down to rounding error.
Is there a way of analysing the existing 50K or so numbers and then generating random numbers which would look pretty much the same shape on a histogram? My presumption is that numpy is probably the best tool for this, but I am open to advice.
You can use scipy's stats package to do this, if I'm interpreting your question correctly:
First, we generate a histogram, and measure its histogram distribution using the scipy.stats.rv_histogram() method
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt
data = scipy.stats.norm.rvs(size=50000, loc=0)
hist = np.histogram(data, bins=100)
dist = scipy.stats.rv_histogram(hist)
To generate new data from this histogram, we simply call the rvs() method on the dist variable:
fake_data = dist.rvs(size=50000)
Then, we show the two distributions to prove we are getting what we expect:
plt.figure()
plt.hist(data,bins=100, alpha=0.5, label='real data')
plt.hist(fake_data,bins=100, alpha=0.5, label='fake data')
plt.legend(loc='upper right')
plt.show()
Hopefully this is what you're looking to do.
The magic words are "inverse transform sampling" (you can generate the CDF from your histogram distribution). See this nice tutorial: https://usmanwardag.github.io/python/astronomy/2016/07/10/inverse-transform-sampling-with-python.html

Is it possible to plot multiple histogram the same way like tensorboard does in a notebook [duplicate]

This question already has answers here:
frequency trail in matplotlib
(2 answers)
Demo of Joypy (joyplots in python) not working?
(1 answer)
How do I visually stack multiple line graphs above each other in python?
(1 answer)
Closed 4 years ago.
Given some data frame we can easily plot the histogram for each column in a notebook like so:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.hist()
But instead of having one plot per column is it possible to plot it in one plot the same way tensorboard does it?
Is a 3D histogram the best we can do? Maybe using something different then matplotlib?

Categories

Resources