pandas - draw a distribution of a column - python

The DataFrame is as the following:
And I'd like to draw a distribution of population of the groupby geo_name, but when I use the following command:
df.hist(column='population')
The histogram is not each bar for geo_name's population:
For example, there should be two top bars from (Ont.) and (Que.), but there is only one bar which is much higher than others.
What's the matter? How to resolve it?

I think you're looking for a bar chart of populations, one bar per province, with provinces arranged along the horizontal axis. If so, try this:
df['population'].plot(kind='bar')

Related

Seaborn stacked displot with three variables

I am wondering if there is a way to visualise three variables using stacked displot. Two is relatively straightforward using x and hue arguments. For example:
tt=pd.read_csv("titanic-data.csv")
tt['Pclass'] = tt['Pclass'].astype(str)
sns.displot(x="Pclass", hue = "Embarked", multiple="stack",data=tt,height=5, aspect=1)
sns.set(font_scale=2)
Which results in following:
Now I would like to add whether a passenger has survived or not to this, for example by splitting the bars in two for every value in Pclass like in the following sketch, where bottom left rectangle for every category could be class 0 (not survived) and top right 1 (survived).
Can anyone advice how to implement this or any other sensible way of visualising three variables?
Many thanks.

Python plotly order facet_wrap by specific facet

I have a simple facet_wrap barplot generated in python plotly that looks like the attached image.
Is it possible to order the x-axis to another facet than the last one. The pandas dataframe is sorted according to the y-axis (which is what I want) but would like this specifically on second-to-last facet (so that it looks similar to the last one in the current plot) but keep the current order of the facet. simple facet_wrap barplot
Sample code below. This will automatically sort the x-axis according to the bottom facet - which is "DEN_Tumour_WD" in this case.
toPlot = pd.DataFrame(allModel)
toPlot = toPlot.sort_values(by=['Flux Ratio (log-scaled)'])
fig = px.bar(toPlot,
x='Reaction',
y='Flux Ratio (log-scaled)',
template = 'none',
facet_row="Model",
color='Subsystem',
category_orders={"Model": ["nonDEN_Liver_CD",
"nonDEN_Liver_WD",
"DEN_Liver_CD",
"DEN_AdjLiver_WD",
"DEN_Tumour_WD"]})

How to use column values for x axis labels in matplotlib

I have a basic DataFrame in pandas and using matplotlib to create a chart
I have followed advice found on SO and also on the docs for labelling the values on the x axis but they won't change from the indices.
I have this,
Presc_df_asc = Presc_df.sort_values('Total Items',ascending=True)
Presc_df_asc['Total Items'].plot.bar(x="Practice", ylim=[Presc_df_asc['Total Items'].min(), Presc_df_asc['Total Items'].max()])
plt.xlabel('Practice')
plt.ylabel('Total Items')
plt.title('practice total items')
plt.legend(('Items',),loc='upper center')
From what I have found plot.bar(x="Practice" should set the x-axis to show the values int he practice column under each bar.
But no matter what I try I get the x-axis labelled as indices with just the main label saying Practices.
In order for the plotting command to be able to access the "Practice" column, you need to apply the plot function to the entire dataframe (or a sub_dataframe that contains at least these two columns). The code below uses the corresponding labels below each bar. The rot=0 argument prevents the labels from being rotated by 90°.
Presc_df_asc.plot.bar(x="Practice", y ="Total Items",
ylim=[Presc_df_asc['Total Items'].min(),
Presc_df_asc['Total Items'].max()], rot=0)

Plot a density histogram with Plotly

I'm looking for a way to plot a density histogram with Plotly. As a density=True with a numpy histogram. My variable is a continuous one from 0 to 20. I already have a count on yaxis with bins. So I'm looking for replace theses counts by percentage (o density).
Give a try using the layout option:
layout = go.Layout(yaxis=dict(tickformat=".2%"))
You can see this question kind of duplicated here
Try this:
go.Histogram(x=some_vec, histnorm="probability density")

Extra set of bars on plot in Pandas?

I want to create a plot using Pandas to show the standard deviations of item prices on specific week days (in my case there are 6 relevant days of the week, each shown as 0-5 on the x axis).
It seems to work however there is another set of smaller bars next to each standard deviation bar that is literally also valued at 0-5.
I think this means that I'm also accidentally also plotting the day of the week.
How can I get rid of these smaller bars and only show the standard deviation bars?
sales_std=sales_std[['WeekDay','price']].groupby(['WeekDay']).std()
.reset_index()
Here is where I try to plot the graph:
p = sales_std.plot(figsize=
(15,5),legend=False,kind="bar",rot=45,color="orange",fontsize=16,
yerr=sales_std);
p.set_title("Standard Deviation", fontsize=18);
p.set_xlabel("WeekDay", fontsize=18);
p.set_ylabel("Price", fontsize=18);
p.set_ylim(0,100);
Resulting Bar Plot:
You are plotting both WeekDay and price at the same time (i.e. plotting an entire Dataframe). In order to show bars for price only, you need to plot Series given WeekDay as an index (so no reset_index() is required after groupby()).
# you don't need `reset_index()` in your code
sales_std=sales_std[['WeekDay','price']].groupby(['WeekDay']).std()
sales_std['price'].plot(kind='bar')
Note: I intentionally omitted graph-styling parts of your code to focus on fixing the issue.

Categories

Resources