Pandas: seaborn countplot from several columns - python

I have a dataframe with several categorical columns. I know how to do countplot which routinly plots ONE column.
Q: how to plot maximum count from ALL columns in one plot?
here is an exemplary dataframe to clarify the question:
import pandas as pd
import numpy as np
import seaborn as sns
testdf=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
testdf.head(10)
sns.countplot(data=testdf,x='Bsearch');
The last line is just using normal countplot for one column. I'd like to have the columns category (home,search,buy and check) in x-axis and their frequency in y-axis.

You need to use countplot as below:
df = pd.melt(testdf)
sns.countplot(data=df.loc[df['value']!="NO"], x='variable', hue='value')
Output:

As #HarvIpan points out, using melt you would create a long-form dataframe with the column names as entries. Calling countplot on this dataframe produces the correct plot.
As a difference to the existing solution, I would recommend not to use the hue argument at all.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
df2 = df.melt(value_vars=df.columns)
df2 = df2[df2["value"] != "NO"]
sns.countplot(data=df2, x="variable")
plt.show()

Related

Pandas, Seaborn, Plot boxplot with 2 columns and a 3º as hue

in a Pandas Df with 3 variables i want to plot 2 columns in 2 different boxes and the 3rd column as hue with seaborn
I can reach the first step with pd.melt but I cant insert the hue and make it work
This is what I have:
df=pd.DataFrame({'A':['a','a','b','a','b'],'B':[1,3,5,4,7],'C':[2,3,4,1,3]})
df2=df[['B','C']].copy()
sb.boxplot(data=pd.melt(df2), x="variable", y="value",palette= 'Blues')
I want to do this in the first DF, setting variable 'A' as hue
Can you help me?
Thank you
IIUC, you can achieve this as follows:
Apply df.melt, using column A for id_vars, and ['B','C'] for value_vars.
Next, inside sns.boxplot, feed the melted df to the data parameter, and add hue='A'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['a','a','b','a','b'], 'B':[1,3,5,4,7], 'C':[2,3,4,1,3]})
sns.boxplot(data=df.melt(id_vars='A', value_vars=['B','C']),
x='variable', y='value', hue='A', palette='Blues')
plt.show()
Result

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Python Change axis on Multi Histogram plot

I have a pandas dataframe df for which I plot a multi-histogram as follow :
df.hist(bins=20)
This give me a result that look like this (Yes this exemple is ugly since there is only one data per histogram, sorry) :
I have a subplot for each numerical column of my dataframe.
Now I want all my histograms to have an X-axis between 0 and 1. I saw that the hist() function take a ax parameter, but I cannot manage to make it work.
How is it possible to do that ?
EDIT :
Here is a minmal example :
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
df.hist(bins=20)
plt.show()
Here is a solution that works, but for sure is not ideal:
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
ax = df.hist(bins=20)
for x in ax:
for y in x:
y.set_xlim(0,1)
plt.show()

Seaborn boxplot showing number on x-axis, not the name of pd.Series object

Problem : I want my seaborn boxplot to show names of pd.Series(Group A, Group B)
on X axis, but it only shows number. The number 0 for the first pd.Series, and 1 for the next pd.Series object.
My codes are as follows.
import pandas as pd
import seaborn as sns
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
sns.set(style="whitegrid")
ax=sns.boxplot(data=[Group_A, Group_B], palette='Set2')
Result :
You can concatenate the two series into a dataframe. There are a lot of options to do so, here is one example which will produce nice names:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
df = pd.DataFrame({"ColumnA" : Group_A, "ColumnB" : Group_B})
sns.set(style="whitegrid")
ax=sns.boxplot(data=df , palette='Set2')
plt.show()

Python plot bar chart with group

I have the following dataframe:
I'm trying to plot a bar chart, with x as 'config names', y as 'value', and one bar per month (one bin per month). I'm not sure how to do this, any ideas?
If you have your data in a pandas DataFrame (let's say df), it's rather easy:
import seaborn as sns
sns.barplot(x='config names', y='value', data='df')
I'm not sure what you mean by one bin per month. The bins here are your x axis.
If you mean you want to split different months into different bins then you should just add them to the hue parameter.
import seaborn as sns
sns.barplot(x='config names', y='value', data='df', hue='month')
I may not understand what you ask but it looks like this
So I suggest you do a pivot table with your dataframe.
Let's say your dataframe variable name is df, can you try this :
import pandas as pd
import numpy as np
pt_df = pd.pivot_table(
df,
values=['value'],
columns=['month'],
aggfunc=np.sum
).plot(kind='bar')

Categories

Resources