Question on Matplotlib bar plot with a pandas dataframe? - python

I want to create a bar chart with SALES_COV_TYPE as X and ID_ev as Y. I would like to have both the bars in different colors. Rename X axis values, and also rename the X label in legend to something else as can be seen in the image link. Change Id_ev to 'Visits'.
This is my attempt at it and I'm not satisfied with it.
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791]}
df = pd.DataFrame(data)
df.plot(kind='bar',y='ID_ev',x='SALES_COV_TYPE')
ax=plt.gca()
my_xticks = ['CAM','GAM']
ax.set_xticklabels(my_xticks)
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
Plot
I want to create the bar chart using fig, ax method so that when I create subplots in future this becomes a template for me, but I am not being able to get it done. I would not like to use the pandas wrapper for matplotlib if possible. What do you all suggest?
Thanks!

Do you mean something like this:
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791], 'SALES_COV_CLASS':['CAM','GAM']}
df = pd.DataFrame(data)
colors = ['green', 'red']
fig, ax = plt.subplots(1, 1, figsize=(4,3))
ax.bar(df['SALES_COV_CLASS'], df['ID_ev'], color = colors)
ax.set_title('Title')
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
plt.show()
I find it easier to add a column with the name of the group, that way you don't need to reindex the x axis to remove the empty columns.

Related

How to affect a list of colors to histogram index bar in matplotlib?

I have the the folowing dataframe "freqs2" with index (SD to SD17) and associated values (frequencies) :
freqs
SD 101
SD2 128
...
SD17 65
I would like to affect a list of precise colors (in order) for each index. I've tried the following code :
colors=['#e5243b','#DDA63A', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
freqs2.plot.bar(freqs2.index, legend=False,rot=45,width=0.85, figsize=(12, 6),fontsize=(14),color=colors )
plt.ylabel('Frequency',fontsize=(17))
As result I obtain all my chart bars in red color (first color of the list).
Based on similar questions, I've tried to integrate "freqs2.index" to stipulate that the list of colors concern index but the problem stay the same.
It looks like a bug in pandas, plotting directly in matplotlib or using seaborn (which I recommend) works:
import seaborn as sns
colors=['#e5243b','#dda63a', '#4C9F38','#C5192D','#FF3A21','#26BDE2','#FCC30B','#A21942','#FD6925','#DD1367','#FD9D24','#BF8B2E','#3F7E44','#0A97D9','#56C02B','#00689D','#19486A']
# # plotting directly with matplotlib works too:
# fig = plt.figure()
# ax = fig.add_axes([0,0,1,1])
# ax.bar(x=df.index, height=df['freqs'], color=colors)
ax = sns.barplot(data=df, x= df.index, y='freqs', palette=colors)
ax.tick_params(axis='x', labelrotation=45)
plt.ylabel('Frequency',fontsize=17)
plt.show()
Edit: an issue already exists on Github

pandas subplot title size

I have recently figured out that I can use plot function directly from pandas without using Seaborn for quick visualisations.
I used the following code to generate a series of graphs from the data frame that contains years as the first column and the price for different product in the rest of the columns.
df_annual_price.plot.line(x='Date',
subplots=True,
layout=(5,5),
figsize=(60,60),
fontsize=20,
sharex=False,
title = list_of_products
)
It neatly graphs the lineplot for all the columns. However, one thing I can't figure out is how to control the fontsize of the title for each plot. I have tried to look it up in other threads but couldn't find an answer.
Is there a simple and elegant answer to this?
Pandas's plot() with subplots=True option returns a list (or list of lists) of axes.
We could enumerate each axis and call its set_title() with title and font size.
This is how you change the title font size of each subplot.
We could pick any one of the axes and call its get_figure() to obtain the Figure object of the overall plot. Then we could call Figure's suptitle() with title and font size. This is how you change the title font size of the overall figure.
The example below creates a 2 x 2 subplots and illustrates functions which may be useful for people who are new to MatplotLib and Pandas's plot() function.
import numpy as np
import pandas as pd
labels = ['y1', 'y2', 'y3', 'y4']
x = 'x'
columns = [x] + labels
matrix = np.random.rand(10, 5)
df = pd.DataFrame(matrix, columns=columns)
df = df.sort_values(by=x)
axes = df.plot(
x=x,
y=labels,
subplots=True,
layout=(2,2),
kind='hist',
figsize=(8,8)
)
for i, row in enumerate(axes):
for j, ax in enumerate(row):
ax.set_title(f'Subplot {i, j}', fontsize=12)
ax.set_xlabel('Width')
ax.set_ylabel('Percentage')
fig = axes[0, 0].get_figure()
fig.subplots_adjust(top=0.9, wspace=0.3, hspace=0.3)
_ = fig.suptitle(f'Distribution of Widths', fontsize=16) # suppress printing of title
Pandas's plot() accepts **kwargs parameters which could be passed to its underlying matplotlib.pyplot.plot(). See https://matplotlib.org/devdocs/api/_as_gen/matplotlib.pyplot.plot.html for various parameters.

Show scatter plot title from column value

I'm making automatic scatterplot charting with regression from my dataframe example
I want to make correlation between Column2 to Column3 and Column2 to Column4 in separate scatter plot group by Column1. For example, there will be 3 scatter plot of Column2 to Column3 each with the title of A, B, and C
For the plotting I'm using pandas scatter plot for example:
df.groupby('Column1').plot.scatter(x='Column2',y='Column3')
Scatter plot return exactly 3 plot, but I want to know how to add chart title based on grouping from Column1 and also how to add the regression line. I haven't use seaborn or matplotlib yet because it's quite confusing for me, so I appreciate if you can explain more :))
EDIT 1
Sorry for not being clear before. My code previously running fine but with output like this.
It's ok but hard to look at which plot belong to which group. Hence my intention is to create something like this
EDIT 2
Shoutout to everyone who are kindly helping. Caina especially helped me a lot, shoutout to him also.
This is the code he write based on several addition on the comment.
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
And this is my result plot
The title and the axis works beautifully. This thread can be considered closed as I got the help for the plot.
But as you can see, the bottom plot is in weird presentation as it also display the x axis from 0 to 600 and y from 0 to 25 although all of them should have same y axis format. For the other chart they are also stretched horizontally.
I'm trying to use the method here but not really successful with parameter square or equal
Can I ask for further on how to adjust the axis so the plot will be a square?Thank you!
You can iterate over your groups, specifying the title:
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i]).set_title(gname)
You can also use seaborn.FacetGrid directly:
g = sns.FacetGrid(df, col='Column1')
g.map(sns.regplot, 'Column2', 'Column2')
Edit (further customization based on new requirements in comments):
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
You can also use the seaborn in-built functionality. If you want to see the correlation you can just do df.corr().
import seaborn as sns
sns.heatmap(df.corr(),annot=True) # This is HeatMap
You can also use pairplot.
sns.set_style('whitegrid')
sns.set(rc={'figure.figsize':(13.7,10.27)})
sns.set(font_scale=1.3)
sns.set_palette("cubehelix",8)
sns.pairplot(df)
Here's what I did. You can try something like this.
fig, axes = plt.subplots(ncols=3,figsize=(12,6))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
The output of this is:
If you want to see the answers vertically, then change ncols to nrows and fix the figsize.
fig, axes = plt.subplots(nrows=3,figsize=(3,12))
plt.subplots_adjust(wspace=0.2, hspace=.5)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
This will give you:

Personalize pandas boxplot with colors

I've been trying to make a boxplot of some gender data that I divided into two sapareted dataframes, one for male, and one for female.
I managed to make the graph basically how I wanted it, but now I would like to make it look better. I'd like to make it look like a seaborn graph, but I wasn't able to find a way to make this using the seaborn library. I tried some ideas I found for coloring the pandas boxpplot, but nothing worked.
Is there a way to color these graphs? Or is there a way to make these side-by-side boxplots with seaborn?
dados_generos = dados_sem_zeros[["NU_NOTA_CN","NU_NOTA_CH","NU_NOTA_MT","NU_NOTA_LC","NU_NOTA_REDACAO", "TP_SEXO"]]
sexo_f = dados_generos[dados_generos["TP_SEXO"].str.contains("F")]
sexo_m = dados_generos[dados_generos["TP_SEXO"].str.contains("M")]
labels = ["CN", "CH", "MT", "LC", "REDAÇÃO"]
fig, (ax, ax2) = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
#Setting axis titles
ax.set_xlabel('Provas')
ax2.set_xlabel('Provas')
ax.set_ylabel('Notas')
#Making plots
chart1 = sexo_f[provas].boxplot(ax=ax)
chart2 = sexo_m[provas].boxplot(ax=ax2)
#Setting axis labels
chart1.set_xticklabels(labels,rotation=45)
chart2.set_xticklabels(labels,rotation=45)
plt.show()
This is the result I have:
This is the link to the data I'm using:
https://github.com/KarolDuarte/dados_generos/blob/main/dados_generos.csv
Since sns is best suitable for long form data, let's try melting the data and use sns.
# melting the data
plot_data = df.melt('TP_SEXO')
fig, axes = plt.subplots(figsize = (10,7), ncols=2, sharey=True)
for ax, (gender, data) in zip(axes, plot_data.groupby('TP_SEXO')) :
sns.boxplot(x='variable',y='value',data=data, ax=ax)
Output:

overruling data frame index when plotting with matplotlib

I want to plot with data frames, but sometimes, I want more control over my x-tick labels and it looks like the data frame index is 'overruling' my code. here is the code:
test_df = pd.DataFrame({'cycles':[0,'b',3,'d','e','f','g'],'me':[100,80,99,100,75,100,90], 'you':[100,80,99,100,75,100,90], 'us':[100,80,99,100,75,100,90]})
f, ax = plt.subplots()
x = test_df['me']
x.index = ['a','b','c','d','e','f','g']
print(x)
for a in ax.get_xticklabels():
a.set_text('me')
print(ax.get_xticklabels()[0])
ax.plot(x)
test_df.plot(x = 'cycles', y = 'me')
any idea on easier ways to easily modify x-tick labels for data frames easily without changing the index of the data frame, but easily just on the fly making the x-ticks whatever I want for any data frame column I want?
You can specify the xticks within DataFrame.plot. This is basically just a dummy to ensure the number of tick labels is correct.
Then just set the tick labels manually after the plot.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'cycles':[0,'b',3,'d','e','f','g'],
'me':[100,80,99,100,75,100,90]})
fig, ax = plt.subplots()
test_df.plot(x='cycles', y='me', ax=ax, xticks=test_df.index)
_ = ax.set_xticklabels(test_df['cycles'])
plt.show()
But you should be a bit hesitant of how the xticks aren't automatically generated. Line plots make sense when your values are ordinal. It doesn't seem obvious to me that 0 should be connected with 'b' anymore than 'e' should be connected to 'f'. In this situation a bar plot makes sense, and not-surprisingly, the xticks are generated without issue.
test_df.plot(x='cycles', y='me', kind='bar', legend=False)

Categories

Resources