Plotting two Seaborn catplots in one figure - python

I am trying to plot the following data as a horizontal stacked barplot. I would like to show the Week 1 and Week 2, as bars with the largest bar size ('Total') at the top and then descending down. The actual data is 100 lines so I arrived at using Seaborn catplots with kind='bar'. I'm not sure if possible to stack (like Matplotlib) so I opted to create two charts and overlay 'Week 1' on top of 'Total', for the same stacked effect.
However when I run the below I'm getting two separate plots and the chart title and axis is one the one graph. Am I able to combine this into one stacked horizontal chart. If easier way then appreciate to find out.
Company
Week 1
Week 2
Total
Stanley Atherton
0
1
1
Dennis Auton
1
1
2
David Bailey
3
8
11
Alan Ball
5
2
7
Philip Barker
3
0
3
Mark Beirne
0
1
1
Phyllis Blitz
3
0
3
Simon Blower
4
2
6
Steven Branton
5
7
12
Rebecca Brown
0
4
4
(Names created from random name generator)
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('Sample1.csv', delimiter="\t", error_bad_lines=False)
data_rank = data.sort_values(["Attending", "Company"], ascending=[False,True])
sns.set(style="ticks")
g = sns.catplot(y='Company', x='Total', data=data_rank, kind='bar', height=4, color='red', aspect=0.8, ax=ax)
ax2 =ax.twinx()
g = sns.catplot(y='Company', x='Week 1', data=data_rank, kind='bar', height=4, color='blue', aspect=0.8, ax=ax2)
for ax in g.axes[0]:
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.spines['bottom'].set_visible(True)
ax.spines['top'].set_visible(True)
plt.title("Company by week ", size=7)
catplot 1
catplot 2

I think something like this works.
g = sns.barplot(y='Company', x='Total', data=data_rank, color='red', label='Total')
g = sns.barplot(y='Company', x='Week1', data=data_rank, color='blue', label='Week 1')
plt.title("Company by week ", size=12)
plt.xlabel('Frequency')
plt.legend()
plt.show()

Related

How make stacked bar chart from dataframe in python

I have the following dataframe:
Color Level Proportion
-------------------------------------
0 Blue 1 0.1
1 Blue 2 0.3
2 Blue 3 0.6
3 Red 1 0.2
4 Red 2 0.5
5 Red 3 0.3
Here I have 2 color categories, where each color category has 3 levels, and each entry has a proportion, which sum to 1 for each color category. I want to make a stacked bar chart from this dataframe that has 2 stacked bars, one for each color category. Within each of those stacked bars will be the proportion for each level, all summing to 1. So while the bars will be "stacked" different, the bars as complete bars will be the same length of 1.
I have tried this:
df.plot(kind='bar', stacked=True)
I then get this stacked bar chart, which is not what I want:
I want 2 stacked bars, and so a stacked bar for "Blue" and a stacked bar for "Red", where these bars are "stacked" by the proportions, with the colors of these stacks corresponding to each level. And so both of these bars would be of length 1 along the x-axis, which would be labelled "proportion". How can I fix my code to create this stacked bar chart?
Make a pivot and then plot it:
df.pivot(index = 'Color', columns = 'Level', values = 'Proportion')
df.plot(kind = 'bar', stacked = True)
Edit: Cleaner legend
You could create a Seaborn sns.histplot using the proportion as weights and the level as hue:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'Color': ['Blue'] * 3 + ['Red'] * 3,
'Level': [1, 2, 3] * 2,
'Proportion': [.1, .3, .6, .2, .5, .3]})
sns.set_style('white')
ax = sns.histplot(data=df, x='Color', weights='Proportion', hue='Level', multiple='stack', palette='flare', shrink=0.75)
ax.set_ylabel('Proportion')
for bars in ax.containers:
ax.bar_label(bars, label_type='center', fmt='%.2f')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 0.97))
sns.despine()
plt.tight_layout()
plt.show()

Create a seaborn histogram with two columns of a dataframe

I try to display a histogram with this dataframe.
gr_age weighted_cost
0 1 2272.985462
1 2 2027.919360
2 3 1417.617779
3 4 946.568598
4 5 715.731002
5 6 641.716770
I want to use gr_age column as the X axis and weighted_cost as the Y axis. Here is an example of what I am looking for with Excel:
I tried with the following code, and with discrete=True, but it gives another result, and I didn't do better with displot.
sns.histplot(data=df, x="gr_age", y="weighted_cost")
plt.show()
Thanking you for your ideas!
You want a barplot (x vs y values) not a histplot which plots the distribution of a dataset:
import seaborn as sns
ax = sns.barplot(data=df, x='gr_age', y='weighted_cost', color='#4473C5')
ax.set_title('Values by age group')
output:

Pandas - plotting user RFM

Given the following DF of user RFM activity:
uid R F M
0 1 10 1 5
1 1 2 2 10
2 1 4 3 1
3 1 5 4 10
4 2 10 1 3
5 2 1 2 10
6 2 1 3 4
Recency: The time between the last purchase and today, represented by
the distance between the rightmost circle and the vertical dotted line
that's labeled Now.
Frequency: The time between purchases, represented by the distance
between the circles on a single line.
Monetary: The amount of money spent on each purchase, represented by
the size of the circle. This amount could be the average order value
or the quantity of products that the customer ordered.
I would like to plot something like the figure below:
Where the size of the circle is the M value and the distance is the R. Any help would be appreciated.
Update
As suggested by Diziet Asahi I've tried the following:
import matplotlib.pyplot as plt
def plot_users(df):
fig, ax = plt.subplots()
ax.axis('off')
ax.scatter(x=df['M'],y=df['uid'],s=30*df['R'], marker='o', color='grey')
ax.invert_xaxis()
ax.axvline(0, ls='--', color='black', zorder=-1)
for y in df['uid'].unique():
ax.axhline(y, color='grey', zorder=-1)
tmp = pd.DataFrame({'uid':[1,1,1,1,2,2,2],'R':[10,2,4,5,10,1,1],'F':[1,2,3,4,1,3,4],'M':[5,10,1,10,3,10,4]})
plot_users(tmp)
And I get the following:
So I think there is a bug, since first user has 4 records and the sizes also doesn't match.
you can use matplotlib's scatter() with the s= argument to draw markers with an area proportional to the value in M. The rest is just tweaking the appearance of the plot.
c = 'xkcd:dark grey'
fig, ax = plt.subplots()
ax.axis('off')
ax.scatter(x=df['R'],y=df['uid'],s=60*df['M'], marker='o', color=c)
ax.invert_xaxis()
ax.axvline(0, ls='--', color=c, zorder=-1)
for y in df['uid'].unique():
ax.axhline(y, color=c, zorder=-1)
ax.set_ymargin(1)

How to plot a side-by-side grouped bar plot

import pandas as pd
income_analysis = pd.DataFrame({'Household Income': ['0-24,999', '25,000-49,999', '50,000'], 'rank1': [3,2,1], 'rank2': [1,2,3]})
Household Income rank1 rank2
0 0-24,999 3 1
1 25,000-49,999 2 2
2 50,000 1 3
sns.barplot(data = income_analysis, x = 'Household Income', y = 'rank1')
I am trying to make a bar chart where each set of bars is a different rank, and within each set of bars it is divided based on household income. So all together, 6 bar, 2 sets of bars, 3 bars in each set. My marplot above plots one of them, but how do I do it for both?
Try this,transpose and pandas plot:
income_analysis.set_index('Household Income', inplace=True)
income_analysis.T.plot.bar()
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

Python Plotting: Heatmap from dataframe with fixed colors in case of strings

I'm trying to visualise a large (pandas) dataframe in Python as a heatmap. This dataframe has two types of variables: strings ("Absent" or "Unknown") and floats.
I want the heatmap to show cells with "Absent" in black and "Unknown" in red, and the rest of the dataframe as a normal heatmap, with the floats in a scale of greens.
I can do this easily in Excel with conditional formatting of cells, but I can't find any help online to do this with Python either with matplotlib, seaborn, ggplot. What am I missing?
Thank you for your time.
You could use cmap_custom.set_under('red') and cmap_custom.set_over('black') to apply custom colors to values below and above vmin and vmax (See 1, 2):
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.axes_grid1 as axes_grid1
import pandas as pd
# make a random DataFrame
np.random.seed(1)
arr = np.random.choice(['Absent', 'Unknown']+list(range(10)), size=(5,7))
df = pd.DataFrame(arr)
# find the largest and smallest finite values
finite_values = pd.to_numeric(list(set(np.unique(df.values))
.difference(['Absent', 'Unknown'])))
vmin, vmax = finite_values.min(), finite_values.max()
# change Absent and Unknown to numeric values
df2 = df.replace({'Absent': vmax+1, 'Unknown': vmin-1})
# make sure the values are numeric
for col in df2:
df2[col] = pd.to_numeric(df2[col])
fig, ax = plt.subplots()
cmap_custom = plt.get_cmap('Greens')
cmap_custom.set_under('red')
cmap_custom.set_over('black')
im = plt.imshow(df2, interpolation='nearest', cmap = cmap_custom,
vmin=vmin, vmax=vmax)
# add a colorbar (https://stackoverflow.com/a/18195921/190597)
divider = axes_grid1.make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
plt.colorbar(im, cax=cax, extend='both')
plt.show()
The DataFrame
In [117]: df
Out[117]:
0 1 2 3 4 5 6
0 3 9 6 7 9 3 Absent
1 Absent Unknown 5 4 7 0 2
2 3 0 2 9 8 0 2
3 5 5 7 Unknown 5 Absent 4
4 7 7 5 4 7 Unknown Absent
becomes

Categories

Resources