relatively new python user here.
I'm trying to build a histogram of stock index annual returns.
My goal is to have a graph which looks like this:
Can someone help me how to come up with this sort of stacked bars in the histogram with the labels for the years?
So far I have the following:
plt.figure()
plt.hist(returns, bins=[-0.6,-0.5,-0.4,-0.3,-0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7])
plt.xticks(np.arange(-0.6, 0.7, 0.1))
plt.xlabel("Returns DJIA in %")
plt.ylabel("Number of observations (Probability in %)")
Thanks in advance!
Related
I am trying to use the sunburst plot from Plotly but to display the average value of each group and subgroup and not the sum because my data points are rates.
is there a way I can get the plot for the averages ?
this is the plot I got
I checked if it is displaying sums using groupby :
but i want the values to be these
I have a dataset with 17 features and 14k observations.
I would like to plot the price distribution to get a better understanding. price feature has a float64 data type
Plotting the price distribution gives me the following
The distribution looks like this
Why does this plot looks like this? Something wrong with my data? What's the proper way to solve this?
code:
fig, ax = plt.subplots(1, 1, figsize = (9,5))
data['sale_price'].hist(bins=50, ax=ax)
plt.xlabel('Price')
plt.title('Distribution of prices')
plt.ylabel('Number of houses')
It seems your histogram is heavily Long-Tailed. As you have prices up to 3*1e7 while the majority of your data are much smaller, in the order of 1e6. So the bin=50 parameter does such that the first bin includes almost all of the data. possible treatments:
Use logarithmic bins (see this post)
choose bins according to 0-75 quantiles
However note that the 2nd solution creates an ugly accumulation of value count at the right tail of the histogram, maybe not desired. Still... It depends on the data. I'd use logarithmic histogram for house prices. I guess it makes more sense in terms of visualization
I am plotting a histogram of observed values from a population against a normal distribution (dervived from the mean and std of the sample). The sample has an unusual number of observations of value 0 (not to be confused with "NAN"). As a result, the graph of the two does not show clearly.
How can I best truncate the one bar in the histogram to allow the rest of the plot to fill the frame?
Why don't you set the y-limit to be 0.00004? Then you can analyze better the plot.
axes = plt.gca()
axes.set_xlim([xmin,xmax])
axes.set_ylim([ymin,ymax])
I am trying to visualize some data using seaborns. I am using a catplot that is set to be a bar plot. I have it showing the error bars to be the standard deviation. I want to know what value it is using for the mean and standard deviation it is using in the visualization, however I do not know how to retrieve that information from the plot. How would I go about getting that information?
bar_graph = seaborn.catplot(x="x", y="y", hue="z", data=data, ci="sd", capsize=0.1, kind="bar")
Trying to get that data from the plot generated by seaborn would not be impossible, but would be very cumbersome, as seaborn does not return the artists that it creates and catplot() can generate a number of subplots, etc.
However, I expect you don't need to get the data from the plot, you can get them directly from the dataframe, can't you? This simple demonstration shows that the plot and the calculated values do match:
titanic = sns.load_dataset("titanic")
sns.catplot(x='sex',y='age',hue="class", data=titanic, ci="sd", capsize=0.1, kind="bar")
titanic.groupby(['sex','class'])['age'].describe()[['mean','std']]
mean std
sex class
female First 34.611765 13.612052
Second 28.722973 12.872702
Third 21.750000 12.729964
male First 41.281386 15.139570
Second 30.740707 14.793894
Third 26.507589 12.159514
I'm looking for a way to plot a density histogram with Plotly. As a density=True with a numpy histogram. My variable is a continuous one from 0 to 20. I already have a count on yaxis with bins. So I'm looking for replace theses counts by percentage (o density).
Give a try using the layout option:
layout = go.Layout(yaxis=dict(tickformat=".2%"))
You can see this question kind of duplicated here
Try this:
go.Histogram(x=some_vec, histnorm="probability density")