Plot a density histogram with Plotly - python

I'm looking for a way to plot a density histogram with Plotly. As a density=True with a numpy histogram. My variable is a continuous one from 0 to 20. I already have a count on yaxis with bins. So I'm looking for replace theses counts by percentage (o density).

Give a try using the layout option:
layout = go.Layout(yaxis=dict(tickformat=".2%"))
You can see this question kind of duplicated here

Try this:
go.Histogram(x=some_vec, histnorm="probability density")

Related

Seaborn distplot() won't display frequency in the y-axis

I am trying to display the weighted frequency in the y-axis of a seaborn.distplot() graph, but it keeps displaying the density (which is the default in distplot())
I read the documentation and also many similar questions here in Stack.
The common answer is to set norm_hist=False and also to assign the weights in a bumpy array as in a standard histogram. However, it keeps showing the density and not the probability/frequency of each bin.
My code is
plt.figure(figsize=(10, 4))
plt.xlim(-0.145,0.145)
plt.axvline(0, color='grey')
data = df['col1']
x = np.random.normal(data.mean(), scale=data.std(), size=(100000))
normal_dist =sns.distplot(x, hist=False,color="red",label="Gaussian")
data_viz = sns.distplot(data,color="blue", bins=31,label="data", norm_hist=False)
# I also tried adding the weights inside the argument
#hist_kws={'weights': np.ones(len(data))/len(data)})
plt.legend(bbox_to_anchor=(1, 1), loc=1)
And I keep receiving this output:
Does anyone have an idea of what could be the problem here?
Thanks!
[EDIT]: The problem is that the y-axis is showing the kdevalues and not those from the weighted histogram. If I set kde=False then I can display the frequency in the y-axis. However, I still want to keep the kde, so I am not considering that option.
Keeping the kde and the frequency/count in one y-axis in one plot will not work because they have different scales. So it might be better to create a plot with 2 axis with each showing the kde and histogram separately.
From documentation norm_hist If True, the histogram height shows a density rather than a count. **This is implied if a KDE or fitted density is plotted**.
versusnja in https://github.com/mwaskom/seaborn/issues/479 has a workaround:
# Plot hist without kde.
# Create another Y axis.
# Plot kde without hist on the second Y axis.
# Remove Y ticks from the second axis.
first_ax = sns.distplot(data, kde=False)
second_ax = ax.twinx()
sns.distplot(data, ax=second_ax, kde=True, hist=False)
second_ax.set_yticks([])
If you need this just for visualization it should be good enough.

How do I use mathlibplot.hist with x and y values using bins=40 in Python 3?

I have a large list of data points of x and y values that I need to put into a histogram with 40 bins but mathlibplot.hist is only letting me enter 1 variable with bins. I've tried hist2d as well but it's not very clean. Any help would be appreciated!
As you have data points x and y, you can simply use hist method to plot histogram.
The following code will help you to create a histogram.
plt.hist([x,y],bins=40, histtype='step',fill=True)
plt.show()
The histogram will look like the following:
If you want to change the style or give it title and labels, you can do it. Here is another histogram with unfilled bars.
If you still face any problem, let me know then.
Maybe you can make use of matplotlib library to solve your purpose:
It will be like imposing 2 histograms on top of each other.
In the below code, I am trying to plot a histograms of y_train and predicted(X_train) in the same space.
You can modify the variables as per your requirement.
import matplotlib.pyplot as plt
plt.hist(y_train, stacked=True,bins=40, label='Actual', alpha=0.5)
plt.hist(regressor.predict(X_train),bins=40, stacked=True, label='Predicted', alpha=0.5)
plt.legend(loc='best')
plt.show()
Hope this helps!

Matplotlib - Plot histogram truncate bar

I am plotting a histogram of observed values from a population against a normal distribution (dervived from the mean and std of the sample). The sample has an unusual number of observations of value 0 (not to be confused with "NAN"). As a result, the graph of the two does not show clearly.
How can I best truncate the one bar in the histogram to allow the rest of the plot to fill the frame?
Why don't you set the y-limit to be 0.00004? Then you can analyze better the plot.
axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])

Seaborn : How to get the count in y axis for distplot using PairGrid

I'm using PairGrid but I don't understand what does y axis means for distplot. I thought it represents a count. But it's starting from negative values in the pairgrid. If I make only the distplot, I'm getting the count.
I don't know if it's clear so, there's some plots :
My PairGrid:
My distplot :
The distplot is the same as the plot in the top left corner of the PairGrid.
The code corresponding to this is :
sns.distplot(pd.DataFrame(mySerie), kde=False)
and for the PairGrid :
g = sns.PairGrid(myDataFrame)
g = g.map_diag(sns.distplot, kde=False)
g = g.map_offdiag(plt.scatter)
Thank you in advance
You can use both methods to see a different trend in the data in respect to the range of values and the total count.
See below to get a better idea on what I was working for when I came across your question, (sorry not sharing data itself that is too big).
with KDE false I can see that the amount of Yes is twice as much as No in the total count.
Instead with KDE True I can see that at lower ranges of values the No is predominant and even higher in % over the Yes category.
Hope it will help...
kde=False
kde=False
kde=True
kde=True
It was understanding (although I could be mistaken) that the y-axis in your histograms is the fraction of the total counts. For example, in my distplot, roughly 0.08 or 8% of rows are in the 0-5 bin
My Distplot

Make seaborn swarmplot width the same as the violin

I am trying to plot a swarplot on top of a violin plot.
Is there any way to make the swarm width to be shortened just like the width option from violin plot?
Would it be easier to use a matplotlib.scatter to do it instead of seaborn.swarmplot?
import seaborn as sns
data = pd.read_csv('allparticles.csv')
b = sns.swarmplot(x="capsid", y="dT",hue="media",data=dataT,dodge=True,size=8)
c = sns.violinplot(x="capsid", y="dT",hue="media",inner="box",data=data ,width=0.3)
This results in something like this:
I would like to make the swarmplot slimmer to match the violins.
My only other idea is to get the x min and max from the violin and plot it using matplotlib.
Thank you.
The point of the swarm plot is to displace the points so that they don't overlap. You can see for example in the WT swarm plot that the width of the swarm is determined by the number of points which are close together, plus the width of each point. If you want the plots to be slimmer, you will have to make the points smaller. You can do this using the size parameter of sns.swarmplot.
We can make it similar by reducing the size of the points so that they match with the violin plot.
Try to vary the value of the size parameter:
b=sns.swarmplot(x="capsid",y="dT",hue="media",data=dataT,dodge=True,size=3,color="0.25")

Categories

Resources