Python plot bar chart with group - python

I have the following dataframe:
I'm trying to plot a bar chart, with x as 'config names', y as 'value', and one bar per month (one bin per month). I'm not sure how to do this, any ideas?

If you have your data in a pandas DataFrame (let's say df), it's rather easy:
import seaborn as sns
sns.barplot(x='config names', y='value', data='df')
I'm not sure what you mean by one bin per month. The bins here are your x axis.
If you mean you want to split different months into different bins then you should just add them to the hue parameter.
import seaborn as sns
sns.barplot(x='config names', y='value', data='df', hue='month')

I may not understand what you ask but it looks like this
So I suggest you do a pivot table with your dataframe.
Let's say your dataframe variable name is df, can you try this :
import pandas as pd
import numpy as np
pt_df = pd.pivot_table(
df,
values=['value'],
columns=['month'],
aggfunc=np.sum
).plot(kind='bar')

Related

Pandas, Seaborn, Plot boxplot with 2 columns and a 3º as hue

in a Pandas Df with 3 variables i want to plot 2 columns in 2 different boxes and the 3rd column as hue with seaborn
I can reach the first step with pd.melt but I cant insert the hue and make it work
This is what I have:
df=pd.DataFrame({'A':['a','a','b','a','b'],'B':[1,3,5,4,7],'C':[2,3,4,1,3]})
df2=df[['B','C']].copy()
sb.boxplot(data=pd.melt(df2), x="variable", y="value",palette= 'Blues')
I want to do this in the first DF, setting variable 'A' as hue
Can you help me?
Thank you
IIUC, you can achieve this as follows:
Apply df.melt, using column A for id_vars, and ['B','C'] for value_vars.
Next, inside sns.boxplot, feed the melted df to the data parameter, and add hue='A'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'A':['a','a','b','a','b'], 'B':[1,3,5,4,7], 'C':[2,3,4,1,3]})
sns.boxplot(data=df.melt(id_vars='A', value_vars=['B','C']),
x='variable', y='value', hue='A', palette='Blues')
plt.show()
Result

Multi Index Seaborn Line Plot

I have a multi index dataframe, with the two indices being Sample and Lithology
Sample 20EC-P 20EC-8 20EC-10-1 ... 20EC-43 20EC-45 20EC-54
Lithology Pd Di-Grd Gb ... Hbl Plag Pd Di-Grd Gb
Rb 7.401575 39.055118 6.456693 ... 0.629921 56.535433 11.653543
Ba 24.610102 43.067678 10.716841 ... 1.073115 58.520532 56.946630
Th 3.176471 19.647059 3.647059 ... 0.823529 29.647059 5.294118
I am trying to put it into a seaborn lineplot as such.
spider = sns.lineplot(data = data, hue = data.columns.get_level_values("Lithology"),
style = data.columns.get_level_values("Sample"),
dashes = False, palette = "deep")
The lineplot comes out as
1
I have two issues. First, I want to format hues by lithology and style by sample. Outside of the lineplot function, I can successfully access sample and lithology using data.columns.get_level_values, but in the lineplot they don't seem to do anything and I haven't figured out another way to access these values. Also, the lineplot reorganizes the x-axis by alphabetical order. I want to force it to keep the same order as the dataframe, but I don't see any way to do this in the documentation.
To use hue= and style=, seaborn prefers it's dataframes in long form. pd.melt() will combine all columns and create new columns with the old column names, and a column for the values. The index too needs to be converted to a regular column (with .reset_index()).
Most seaborn functions use order= to set an order on the x-values, but with lineplot the only way is to make the column categorical applying a fixed order.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
column_tuples = [('20EC-P', 'Pd '), ('20EC-8', 'Di-Grd'), ('20EC-10-1 ', 'Gb'),
('20EC-43', 'Hbl Plag Pd'), ('20EC-45', 'Di-Grd'), ('20EC-54', 'Gb')]
col_index = pd.MultiIndex.from_tuples(column_tuples, names=["Sample", "Lithology"])
data = pd.DataFrame(np.random.uniform(0, 50, size=(3, len(col_index))), columns=col_index, index=['Rb', 'Ba', 'Th'])
data_long = data.melt(ignore_index=False).reset_index()
data_long['index'] = pd.Categorical(data_long['index'], data.index) # make categorical, use order of the original dataframe
ax = sns.lineplot(data=data_long, x='index', y='value',
hue="Lithology", style="Sample", dashes=False, markers=True, palette="deep")
ax.set_xlabel('')
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.02))
plt.tight_layout() # fit legend and labels into the figure
plt.show()
The long dataframe looks like:
index Sample Lithology value
0 Rb 20EC-P Pd 6.135005
1 Ba 20EC-P Pd 6.924961
2 Th 20EC-P Pd 44.270570
...

How can I visualise categorical feature vs date column

In my dataset I have a categorical column named 'Type'contain(eg.,INVOICE,IPC,IP) and 'Date' column contain dates(eg,2014-02-01).
how can I plot these two.
On x axis I want date
On y axis a line for (eg.INVOCE) showing its trend
enter image description here
Not very sure what you mean by plot and show trend, one ways is to count like #QuangHoang suggested, and plot with a heatmap, something like below. If it is something different, please expand on your question.
import pandas as pd
import numpy as np
import seaborn as sns
dates = pd.date_range(start='1/1/2018', periods=5, freq='3M')[np.random.randint(0,5,20)]
type = np.random.choice(['INVOICE','IPC','IP'],20)
df = pd.DataFrame({'dates':dates ,'type':type})
tab = pd.crosstab(df['type'],df['dates'].dt.strftime('%d-%m-%Y'))
n = np.unique(tab.values)
cmap = sns.color_palette("BuGn_r",len(n))
sns.heatmap(tab,cmap=cmap)

Bar plot and coloured categorical variable

I have a dataframe with 3 variables:
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Approved"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
I want a barplot grouped by the Period column, showing all the values ​​contained in the Observations column and colored with the Result column.
How can I do this?
I tried the sns.barplot, but it joined the values in Observations column in just one bar(mean of the values).
sns.barplot(x='Period',y='Observations',hue='Result',data=df,ci=None)
Plot output
Assuming that you want one bar for each row, you can do as follows:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
result_cat = df["Result"].astype("category")
result_codes = result_cat.cat.codes.values
cmap = plt.cm.Dark2(range(df["Result"].unique().shape[0]))
patches = []
for code in result_cat.cat.codes.unique():
cat = result_cat.cat.categories[code]
patches.append(mpatches.Patch(color=cmap[code], label=cat))
df.plot.bar(x='Period',
y='Observations',
color=cmap[result_codes],
legend=False)
plt.ylabel("Observations")
plt.legend(handles=patches)
If you would like it grouped by the months, and then stacked, please use the following (note I updated your code to make sure one month had more than one status), but not sure I completely understood your question correctly:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
data= [["2019/oct",10,"Approved"],["2019/oct",20,"Approved"],["2019/oct",30,"Approved"],["2019/oct",40,"Under evaluation"],["2019/nov",20,"Under evaluation"],["2019/dec",30,"Aproved"]]
df = pd.DataFrame(data, columns=['Period', 'Observations', 'Result'])
df.groupby(['Period', 'Result'])['Observations'].sum().unstack('Result').plot(kind='bar', stacked=True)

Pandas: seaborn countplot from several columns

I have a dataframe with several categorical columns. I know how to do countplot which routinly plots ONE column.
Q: how to plot maximum count from ALL columns in one plot?
here is an exemplary dataframe to clarify the question:
import pandas as pd
import numpy as np
import seaborn as sns
testdf=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
testdf.head(10)
sns.countplot(data=testdf,x='Bsearch');
The last line is just using normal countplot for one column. I'd like to have the columns category (home,search,buy and check) in x-axis and their frequency in y-axis.
You need to use countplot as below:
df = pd.melt(testdf)
sns.countplot(data=df.loc[df['value']!="NO"], x='variable', hue='value')
Output:
As #HarvIpan points out, using melt you would create a long-form dataframe with the column names as entries. Calling countplot on this dataframe produces the correct plot.
As a difference to the existing solution, I would recommend not to use the hue argument at all.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.DataFrame(({ 'Ahome' : pd.Categorical(["home"]*10),
'Bsearch' : pd.Categorical(["search"]*8 + ["NO"]*2),
'Cbuy' : pd.Categorical(["buy"]*5 + ["NO"]*5),
'Dcheck' : pd.Categorical(["check"]*3 + ["NO"]*7),
} ))
df2 = df.melt(value_vars=df.columns)
df2 = df2[df2["value"] != "NO"]
sns.countplot(data=df2, x="variable")
plt.show()

Categories

Resources