I have the following dataframe:
Color Level Proportion
-------------------------------------
0 Blue 1 0.1
1 Blue 2 0.3
2 Blue 3 0.6
3 Red 1 0.2
4 Red 2 0.5
5 Red 3 0.3
Here I have 2 color categories, where each color category has 3 levels, and each entry has a proportion, which sum to 1 for each color category. I want to make a stacked bar chart from this dataframe that has 2 stacked bars, one for each color category. Within each of those stacked bars will be the proportion for each level, all summing to 1. So while the bars will be "stacked" different, the bars as complete bars will be the same length of 1.
I have tried this:
df.plot(kind='bar', stacked=True)
I then get this stacked bar chart, which is not what I want:
I want 2 stacked bars, and so a stacked bar for "Blue" and a stacked bar for "Red", where these bars are "stacked" by the proportions, with the colors of these stacks corresponding to each level. And so both of these bars would be of length 1 along the x-axis, which would be labelled "proportion". How can I fix my code to create this stacked bar chart?
Make a pivot and then plot it:
df.pivot(index = 'Color', columns = 'Level', values = 'Proportion')
df.plot(kind = 'bar', stacked = True)
Edit: Cleaner legend
You could create a Seaborn sns.histplot using the proportion as weights and the level as hue:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Color': ['Blue'] * 3 + ['Red'] * 3,
'Level': [1, 2, 3] * 2,
'Proportion': [.1, .3, .6, .2, .5, .3]})
sns.set_style('white')
ax = sns.histplot(data=df, x='Color', weights='Proportion', hue='Level', multiple='stack', palette='flare', shrink=0.75)
ax.set_ylabel('Proportion')
for bars in ax.containers:
ax.bar_label(bars, label_type='center', fmt='%.2f')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 0.97))
sns.despine()
plt.tight_layout()
plt.show()
Related
d = {'X':[1,2,3,4],'A': [50,40,20,60], '% of Total in A':[29.4,23.5,11.8,35.3] , 'B': [25,10,5,15], '% in A' :[50,25,25,25]}
df = pd.DataFrame(d)
ax = df.plot(x='X',y="A", kind="bar")
df.plot(x='X', y="B", kind="bar", ax=ax,color='C2')
X A % of Total in A B % in A
0 1 50 29.4 25 50
1 2 40 23.5 10 25
2 3 20 11.8 5 25
3 4 60 35.3 15 25
I have the above dataframe and I know how to draw a stacked bar plot based on two columns A and B.
How do I add value labels on top of the bar such as for X=0, I want to label 50 (29.4% of the total) above the blue bar, and 25 (50% in group) above the green bar within the blue bar.
Any help is appreciated.
The first bars are stored in ax.containers[0], the second in ax.containers[1]. You can call ax.bar_label(...) using these containers together with a list of the corresponding labels.
By the way, you are missing x= in the second bar plot.
from matplotlib import pyplot as plt
import pandas as pd
d = {'X': [1, 2, 3, 4], 'A': [50, 40, 20, 60], '% of Total in A': [29.4, 23.5, 11.8, 35.3], 'B': [25, 10, 5, 15], '% in A': [50, 25, 25, 25]}
df = pd.DataFrame(d)
ax = df.plot(x='X', y="A", kind="bar")
df.plot(x='X', y="B", kind="bar", color='C2', ax=ax)
ax.bar_label(ax.containers[0], labels=df['% of Total in A'])
ax.bar_label(ax.containers[1], labels=df['% in A'], color='white')
plt.show()
To further accentuate that B is a part of A, you could give them the same color, and hatch B. For example:
ax = df.plot(x='X', y="A", kind="bar", color='dodgerblue')
df.plot(x='X', y="B", kind="bar", facecolor='dodgerblue', hatch='xx', rot=0, ax=ax)
ax.bar_label(ax.containers[0], labels=[f'{p} %' for p in df['% of Total in A']])
ax.bar_label(ax.containers[1], labels=[f'{p} %' for p in df['% in A']], color='white')
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
The bars are not correctly stacked. The patches are stacked in z order, not vertically (y-order)., Also the x-axis is incorrect because x='X' is missing from the second plot.
Use zip to combine the containers and cols, and then passes the custom labels to the labels= parameter.
Also see Stacked Bar Chart with Centered Labels, and Adding value labels on a matplotlib bar chart for a thorough explanation about .bar_label.
ax = df.plot(kind='bar', x='X', y=['A', 'B'], stacked=True, rot=0, color=['tab:blue', 'tab:green'])
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')
# specify the columns to uses for alternate labels, in order based on the order of y=
cols = ['% of Total in A', '% in A']
for c, col in zip(ax.containers, cols):
labels = df[col]
# Use the alternate column for the labels instead of the bar height (or width of horizontal bars)
labels = [f'{v}%' for v in labels]
# remove the labels parameter if it's not needed for customized labels
ax.bar_label(c, labels=labels, label_type='edge')
ax.margins(y=0.1)
I am trying to plot the following data as a horizontal stacked barplot. I would like to show the Week 1 and Week 2, as bars with the largest bar size ('Total') at the top and then descending down. The actual data is 100 lines so I arrived at using Seaborn catplots with kind='bar'. I'm not sure if possible to stack (like Matplotlib) so I opted to create two charts and overlay 'Week 1' on top of 'Total', for the same stacked effect.
However when I run the below I'm getting two separate plots and the chart title and axis is one the one graph. Am I able to combine this into one stacked horizontal chart. If easier way then appreciate to find out.
Company
Week 1
Week 2
Total
Stanley Atherton
0
1
1
Dennis Auton
1
1
2
David Bailey
3
8
11
Alan Ball
5
2
7
Philip Barker
3
0
3
Mark Beirne
0
1
1
Phyllis Blitz
3
0
3
Simon Blower
4
2
6
Steven Branton
5
7
12
Rebecca Brown
0
4
4
(Names created from random name generator)
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('Sample1.csv', delimiter="\t", error_bad_lines=False)
data_rank = data.sort_values(["Attending", "Company"], ascending=[False,True])
sns.set(style="ticks")
g = sns.catplot(y='Company', x='Total', data=data_rank, kind='bar', height=4, color='red', aspect=0.8, ax=ax)
ax2 =ax.twinx()
g = sns.catplot(y='Company', x='Week 1', data=data_rank, kind='bar', height=4, color='blue', aspect=0.8, ax=ax2)
for ax in g.axes[0]:
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.spines['bottom'].set_visible(True)
ax.spines['top'].set_visible(True)
plt.title("Company by week ", size=7)
catplot 1
catplot 2
I think something like this works.
g = sns.barplot(y='Company', x='Total', data=data_rank, color='red', label='Total')
g = sns.barplot(y='Company', x='Week1', data=data_rank, color='blue', label='Week 1')
plt.title("Company by week ", size=12)
plt.xlabel('Frequency')
plt.legend()
plt.show()
import pandas as pd
income_analysis = pd.DataFrame({'Household Income': ['0-24,999', '25,000-49,999', '50,000'], 'rank1': [3,2,1], 'rank2': [1,2,3]})
Household Income rank1 rank2
0 0-24,999 3 1
1 25,000-49,999 2 2
2 50,000 1 3
sns.barplot(data = income_analysis, x = 'Household Income', y = 'rank1')
I am trying to make a bar chart where each set of bars is a different rank, and within each set of bars it is divided based on household income. So all together, 6 bar, 2 sets of bars, 3 bars in each set. My marplot above plots one of them, but how do I do it for both?
Try this,transpose and pandas plot:
income_analysis.set_index('Household Income', inplace=True)
income_analysis.T.plot.bar()
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
I've created a dummy dataframe which is similar to the one I'm using.
The dataframe consists of Fare prices, Cabin-type, and Survival (1 is alive, 0 = dead).
The first plot creates many graphs via factorplot, with each graph representing the Cabin type. The x-axis is represented by the Fare price and Y-axis is just a count of the number of occurrences at that Fare price.
What I then did was created another series, via groupby of [Cabin, Fare] and then proceeded to take the mean of the survival to get the survival rate at each Cabin and Fare price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(dict(
Fare=[20, 10, 30, 40, 40, 10, 20, 30, 40 ,30, 20, 30, 30],
Cabin=list('AAABCDBDCDDDC'),
Survived=[1, 0, 0, 0 ,0 ,1 ,1 ,0 ,1 ,1 , 0, 1, 1]
))
g =sns.factorplot(x='Fare', col='Cabin', kind='count', data=df,
col_wrap=3, size=3, aspect=1.3, palette='muted')
plt.show()
x =df.groupby(['Cabin','Fare']).Survived.mean()
What I would like to do is, plot an lineplot on the count graph above, (so the x-axis is the same, and each graph is still represented by a Cabin-type), but I would like the y-axis to be the survival mean we calculated with the groupby series x in the code above, which when outputted would be the third column below.
Cabin Fare
A 10 0.000000
20 1.000000
30 0.000000
B 20 1.000000
40 0.000000
C 30 1.000000
40 0.500000
D 10 1.000000
20 0.000000
30 0.666667
The y-axis for the line plot should be on the right side, and the range I would like is [0, .20, .40, .60, .80, 1.0, 1.2]
I looked through the seaborn docs for a while, but I couldn't figure out how to properly do this.
My desired output looks something like this image. I'm sorry my writing looks horrible, I don't know how to use paint well. So the ticks and numbers are on the right side of each graph. The line plot will be connected via dots at each x,y point. So for Cabin A, the first x,y point is (10,0) with 0 corresponding to the right y-axis. The second point is (20,1) and so on.
Data operations:
Compute frequency counts:
df_counts = pd.crosstab(df['Fare'], df['Cabin'])
Compute means across the group and unstack it back to obtain a DF. The Nan's are left as they are and not replaced by zero's to show the break in the line plot or else they would be continuous which wouldn't make much sense here.
df_means = df.groupby(['Cabin','Fare']).Survived.mean().unstack().T
Prepare the x-axis labels as strings:
df_counts.index = df_counts.index.astype(str)
df_means.index = df_means.index.astype(str)
Plotting:
fig, ax = plt.subplots(1, 4, figsize=(10,4))
df_counts.plot.bar(ax=ax, ylim=(0,5), cmap=plt.cm.Spectral, subplots=True,
legend=None, rot=0)
# Use secondary y-axis(right side)
df_means.plot(ax=ax, secondary_y=True, marker='o', color='r', subplots=True,
legend=None, xlim=(0,4))
# Adjust spacing between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()
This has been troubling me for the past 30 minutes. What I'd like to do is to scatter plot by category. I took a look at the documentation, but I haven't been able to find the answer there. I looked here, but when I ran that in iPython Notebook, I don't get anything.
Here's my data frame:
time cpu wait category
8 1 0.5 a
9 2 0.2 a
2 3 0.1 b
10 4 0.7 c
3 5 0.2 c
5 6 0.8 b
Ideally, I'd like to have a scatter plot that shows CPU on the x axis, wait on the y axis, and each point on the graph is distinguished by category. So for example, if a=red, b=blue, and c=green then point (1, 0.5) and (2, 0.2) should be red, (3, 0.1) and (6, 0.8) should be blue, etc.
How would I do this with pandas? or matplotlib? whichever does the job.
This is essentially the same answer as #JoeCondron, but a two liner:
cmap = {'a': 'red', 'b': 'blue', 'c': 'yellow'}
df.plot(x='cpu', y='wait', kind='scatter',
colors=[cmap.get(c, 'black') for c in df.category])
If no color is mapped for the category, it defaults to black.
EDIT:
The above works for Pandas 0.14.1. For 0.16.2, 'colors' needs to be changed to 'c':
df.plot(x='cpu', y='wait', kind='scatter',
c=[cmap.get(c, 'black') for c in df.category])
I'd create a column with your colors based on category, then do the following, where ax is a matplotlib ax and df is your dataframe:
ax.scatter(df['cpu'], df['wait'], marker = '.', c = df['colors'], s = 100)
You could do
color_map = {'a': 'r', 'b': 'b', 'c': 'y'}
ax = plt.subplot()
x, y = df.cpu, df.wait
colors = df.category.map(color_map)
ax.scatter(x, y, color=colors)
This will give you red for category a, blue for b, yellow for c.
So you can past a list of color aliases of the same length as the arrays.
You can check out the myriad available colours here : http://matplotlib.org/api/colors_api.html.
I don't think the plot method is very useful for scatter plots.