Hi to all the experts,
I'm new to Python and Data Science and actually I'm learning with a real world example to get into Data Science.
I loaded already a CSV and did some work on the data. That's ok. I have the following dataframe:
dataframe
Then, I created a Pivot from the dataframe:
pivot = pd.pivot_table(
data=df,
index=['Category', 'month', 'year'],
values='Amount',
aggfunc='sum',
margins=True)
Now, I have the following dataframe:
new dataframe
Now, I want to plot the following (line chart or bar chart):
X: Month
Y: Amount
But, I want that for explicit Category like "Business" to see, how the amount changed over the periods.
Whats the best way, to plot a clear, beautiful chart with matplotlib?
Thanks in Advance.
Many Greetings
Leon
You can use the below code to build the graphs. I think the stacked bar graphs would be a good way to see the Amount in each month.
Code
## Add AFTER you have created your pivot table
dfg = pivot.reset_index().set_index(['Month', 'Category']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(6,4))
dfg['Amount'].unstack().plot.bar(stacked=True, ax=ax, legend = False)
ax.set_xticklabels(sorted(df.Month.unique()), rotation=0)
ax.set_title('My Graph')
fig.legend(loc="upper right", bbox_to_anchor=(1.1, 0.9))
plt.show()
Stacked Bar graph
Unstacked Bar graph
Change stacked = True to stacked = False to see the bars next to each other, if you are not a fan of stacked bars
Line Graphs
You can also use line graphs, but not my personal preference.
Replace the plot.bar line in above code to
dfg['Amount'].unstack().plot(kind='line', marker='o', ax=ax, legend = False)
Case:
I receive a dataframe with (say 50) columns.
I extract the necessary columns from that dataframe using a condition.
So we have a list of selected columns of our dataframe now. (Say this variable is sel_cols)
I need a bar chart for each of these columns value_counts().
And I need to arrange all these bar charts in 3 columns, and varying number of rows based on number of columns selected in sel_cols.
So, if say 8 columns were selected, I want the figure to have 3 columns and 3 rows, with last subplot empty or just 8 subplots in 3x3 matrix if that is possible.
I could generate each chart separately using following code:
for col in sel_cols:
df[col].value_counts().plot(kind='bar)
plt.show()
plt.show() inside the loop so that each chart is shown and not just the last one.
I also tried appending these charts to a list this way:
charts = []
for col in sel_cols:
charts.append(df[col].value_counts().plot(kind='bar))
I could convert this list into an numpy array through reshape() but then it will have to be perfectly divisible into that shape. So 8 chart objects will not be reshaped into 3x3 array.
Then I tried creating the subplots first in this way:
row = len(sel_cols)//3
fig, axes = plt.subplots(nrows=row,ncols=3)
This way I would get the subplots, but I get two problems:
I end up with extra subplots in the 3 columns which will go unplotted (8 columns example).
I do not know how to plot under each subplots through a loop.
I tried this:
for row in axes:
for chart, col in zip(row,sel_cols):
chart = data[col].value_counts().plot(kind='bar')
But this only plots the last subplot with the last column. All other subplots stays blank.
How to do this with minimal lines of code, possibly without any need for human verification of the final subplots placements?
You may use this sample dataframe:
pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],
'B':['E','E','E','E','F','F','F','F','E'],
'C':[1,1,0,0,1,1,0,0,1],
'D':['P','Q','R','S','P','Q','R','P','Q'],
'E':['E','E','E','E','F','F','G','G','G'],
'F':[1,1,0,0,1,1,0,0,1],
'G':['N','N','N','N','Y','N','N','Y','N'],
'H':['G','G','G','E','F','F','G','F','E'],
'I':[1,1,0,0,1,1,0,0,1],
'J':['Y','N','N','Y','Y','N','N','Y','N'],
'K':['E','E','E','E','F','F','F','F','E'],
'L':[1,1,0,0,1,1,0,0,1],
})
Selected columns are: sel_cols = ['A','B','D','E','G','H','J','K']
Total 8 columns.
Expected output is bar charts for value_counts() of each of these columns arranged in subplots in a figure with 3 columns. Rows to be decided based on number of columns selected, here 8 so 3 rows.
Given OP's sample data:
df = pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],'B':['E','E','E','E','F','F','F','F','E'],'C':[1,1,0,0,1,1,0,0,1],'D':['P','Q','R','S','P','Q','R','P','Q'],'E':['E','E','E','E','F','F','G','G','G'],'F':[1,1,0,0,1,1,0,0,1],'G':['N','N','N','N','Y','N','N','Y','N'],'H':['G','G','G','E','F','F','G','F','E'],'I':[1,1,0,0,1,1,0,0,1],'J':['Y','N','N','Y','Y','N','N','Y','N'],'K':['E','E','E','E','F','F','F','F','E'],'L':[1,1,0,0,1,1,0,0,1]})
sel_cols = list('ABDEGHJK')
data = df[sel_cols].apply(pd.value_counts)
We can plot the columns of data in several ways (in order of simplicity):
DataFrame.plot with subplots param
seaborn.catplot
Loop through plt.subplots
1. DataFrame.plot with subplots param
Set subplots=True with the desired layout dimensions. Unused subplots will be auto-disabled:
data.plot.bar(subplots=True, layout=(3, 3), figsize=(8, 6),
sharex=False, sharey=True, legend=False)
plt.tight_layout()
2. seaborn.catplot
melt the data into long-form (i.e., 1 variable per column, 1 observation per row) and pass it to seaborn.catplot:
import seaborn as sns
melted = data.melt(var_name='var', value_name='count', ignore_index=False).reset_index()
sns.catplot(data=melted, kind='bar', x='index', y='count',
col='var', col_wrap=3, sharex=False)
3. Loop through plt.subplots
zip the columns and axes to iterate in pairs. Use the ax param to place each column onto its corresponding subplot.
If the grid size is larger than the number of columns (e.g., 3*3 > 8), disable the leftover axes with set_axis_off:
fig, axes = plt.subplots(3, 3, figsize=(8, 8), constrained_layout=True, sharey=True)
# plot each col onto one ax
for col, ax in zip(data.columns, axes.flat):
data[col].plot.bar(ax=ax, rot=0)
ax.set_title(col)
# disable leftover axes
for ax in axes.flat[data.columns.size:]:
ax.set_axis_off()
Alternative to the answer by tdy, I tried to do it without seaborn using Matplotlib and a for loop.
Figured it might be better for some who want specific control over subplots with formatting and other parameters, then this is another way:
fig = plt.figure(1,figsize=(16,12))
for i, col in enumerate(sel_cols,1):
fig.add_subplot(3,4,i,)
data[col].value_counts().plot(kind='bar',ax=plt.gca())
plt.title(col)
plt.tight_layout()
plt.show(1)
plt.subplot activates a subplot, while plt.gca() points to the active subplot.
I'm making automatic scatterplot charting with regression from my dataframe example
I want to make correlation between Column2 to Column3 and Column2 to Column4 in separate scatter plot group by Column1. For example, there will be 3 scatter plot of Column2 to Column3 each with the title of A, B, and C
For the plotting I'm using pandas scatter plot for example:
df.groupby('Column1').plot.scatter(x='Column2',y='Column3')
Scatter plot return exactly 3 plot, but I want to know how to add chart title based on grouping from Column1 and also how to add the regression line. I haven't use seaborn or matplotlib yet because it's quite confusing for me, so I appreciate if you can explain more :))
EDIT 1
Sorry for not being clear before. My code previously running fine but with output like this.
It's ok but hard to look at which plot belong to which group. Hence my intention is to create something like this
EDIT 2
Shoutout to everyone who are kindly helping. Caina especially helped me a lot, shoutout to him also.
This is the code he write based on several addition on the comment.
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
And this is my result plot
The title and the axis works beautifully. This thread can be considered closed as I got the help for the plot.
But as you can see, the bottom plot is in weird presentation as it also display the x axis from 0 to 600 and y from 0 to 25 although all of them should have same y axis format. For the other chart they are also stretched horizontally.
I'm trying to use the method here but not really successful with parameter square or equal
Can I ask for further on how to adjust the axis so the plot will be a square?Thank you!
You can iterate over your groups, specifying the title:
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i]).set_title(gname)
You can also use seaborn.FacetGrid directly:
g = sns.FacetGrid(df, col='Column1')
g.map(sns.regplot, 'Column2', 'Column2')
Edit (further customization based on new requirements in comments):
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
You can also use the seaborn in-built functionality. If you want to see the correlation you can just do df.corr().
import seaborn as sns
sns.heatmap(df.corr(),annot=True) # This is HeatMap
You can also use pairplot.
sns.set_style('whitegrid')
sns.set(rc={'figure.figsize':(13.7,10.27)})
sns.set(font_scale=1.3)
sns.set_palette("cubehelix",8)
sns.pairplot(df)
Here's what I did. You can try something like this.
fig, axes = plt.subplots(ncols=3,figsize=(12,6))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
The output of this is:
If you want to see the answers vertically, then change ncols to nrows and fix the figsize.
fig, axes = plt.subplots(nrows=3,figsize=(3,12))
plt.subplots_adjust(wspace=0.2, hspace=.5)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
This will give you: