Python & Pandas: Plotting a Pivot with multiple Indexes - python

Hi to all the experts,
I'm new to Python and Data Science and actually I'm learning with a real world example to get into Data Science.
I loaded already a CSV and did some work on the data. That's ok. I have the following dataframe:
dataframe
Then, I created a Pivot from the dataframe:
pivot = pd.pivot_table(
data=df,
index=['Category', 'month', 'year'],
values='Amount',
aggfunc='sum',
margins=True)
Now, I have the following dataframe:
new dataframe
Now, I want to plot the following (line chart or bar chart):
X: Month
Y: Amount
But, I want that for explicit Category like "Business" to see, how the amount changed over the periods.
Whats the best way, to plot a clear, beautiful chart with matplotlib?
Thanks in Advance.
Many Greetings
Leon

You can use the below code to build the graphs. I think the stacked bar graphs would be a good way to see the Amount in each month.
Code
## Add AFTER you have created your pivot table
dfg = pivot.reset_index().set_index(['Month', 'Category']).sort_index(level=[0,1])
fig, ax = plt.subplots(figsize=(6,4))
dfg['Amount'].unstack().plot.bar(stacked=True, ax=ax, legend = False)
ax.set_xticklabels(sorted(df.Month.unique()), rotation=0)
ax.set_title('My Graph')
fig.legend(loc="upper right", bbox_to_anchor=(1.1, 0.9))
plt.show()
Stacked Bar graph
Unstacked Bar graph
Change stacked = True to stacked = False to see the bars next to each other, if you are not a fan of stacked bars
Line Graphs
You can also use line graphs, but not my personal preference.
Replace the plot.bar line in above code to
dfg['Amount'].unstack().plot(kind='line', marker='o', ax=ax, legend = False)

Related

How to create a single series bar graph with legends using Python

How do I visualize the below dataframe in Python. I wish to visualize the data in a bar chart where the Year_of_Release is the X axis, Global_Sales is the bar height & the genre is the legend. The bar has to be colored separately for each Genre. I have shared a sample of what I'm looking for. The sample graph was created on R using GGPLOT.
Below are the column definitions
Year_of_Release - Year of Release
Genre - Game Genre
Global_Sales - Revenue made by a Genre in that given year
Images of the data frame & desired Bar plot as as below
Data Frame:
Desired Bar Chart:
you can use the code below to plot that you need...
fig, ax = plt.subplots(figsize=(12,6))
sns.set_theme(style="darkgrid")
ax=sns.barplot(x="Year_of_Release", y="Global_Sales", hue="Genre", dodge=False, palette="rocket", data=df)
plt.xticks(rotation=90)
ax.grid(True)
ax.legend(loc='upper left')
plt.show()
Plot
with dummy data
...

Creating a bar chart with 2 y axes from lists using matplotlib

I need to make the following chart: Number of Companies, Donations vs Year as a bar chart.
The following is my data:
Year = [2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018]
No_Companies = [123558,132335,147606,155790,161211,169784,174599,183888,198727,207317,217357,228996]
Donations=[144932,304607,642328,870509,1205382,1094624,2089240,2325322,2387036,3096069,4204255,3500766]
From what I have seen from other questions, most seem to have either their data in a dataframe or a list like [[x1,y1],[x2,y2]].
How can I get the chart I need from the data I have?
You can check this link out: Plot bar and line in same plot, different y-axes using matplotlib (no pandas)
The implementation can be done as follows:
plt.figure(1, figsize=(10,10))
barchart = plt.bar(Year, No_Companies, color='red')
plt.ylabel('No Companies')
plt.twinx()
barchart1 = plt.bar(Year, Donations, color='blue')
plt.ylabel('Donations')
Graph

Plot sub-bar charts on a dataframe groupby

Hi I am having some trouble plotting sub-bar charts after a dataframe groupby
Post groupby, the data is as per the below :
I tried the below to create a bar chart.
df_temp[df_temp.index =='ABC'].unstack().plot.bar(figsize=(10,2))
How can I plot a bar charts where the x-axis is the date and y-axis is the count and each row (ABC and EFG) is its own subplot (vertically stacked)
Example below
thanks for your help !
thanks to #r-beginnners
#remove the multi-level column
df.columns = df.columns.droplevel()
#plot the sub-plots
# if y-axis scale to be the same, use sharey=True
df.T.plot(subplots=True, layout=(2,1), kind='bar', sharey=True)

Show scatter plot title from column value

I'm making automatic scatterplot charting with regression from my dataframe example
I want to make correlation between Column2 to Column3 and Column2 to Column4 in separate scatter plot group by Column1. For example, there will be 3 scatter plot of Column2 to Column3 each with the title of A, B, and C
For the plotting I'm using pandas scatter plot for example:
df.groupby('Column1').plot.scatter(x='Column2',y='Column3')
Scatter plot return exactly 3 plot, but I want to know how to add chart title based on grouping from Column1 and also how to add the regression line. I haven't use seaborn or matplotlib yet because it's quite confusing for me, so I appreciate if you can explain more :))
EDIT 1
Sorry for not being clear before. My code previously running fine but with output like this.
It's ok but hard to look at which plot belong to which group. Hence my intention is to create something like this
EDIT 2
Shoutout to everyone who are kindly helping. Caina especially helped me a lot, shoutout to him also.
This is the code he write based on several addition on the comment.
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
And this is my result plot
The title and the axis works beautifully. This thread can be considered closed as I got the help for the plot.
But as you can see, the bottom plot is in weird presentation as it also display the x axis from 0 to 600 and y from 0 to 25 although all of them should have same y axis format. For the other chart they are also stretched horizontally.
I'm trying to use the method here but not really successful with parameter square or equal
Can I ask for further on how to adjust the axis so the plot will be a square?Thank you!
You can iterate over your groups, specifying the title:
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i]).set_title(gname)
You can also use seaborn.FacetGrid directly:
g = sns.FacetGrid(df, col='Column1')
g.map(sns.regplot, 'Column2', 'Column2')
Edit (further customization based on new requirements in comments):
fig, axes = plt.subplots(1, df.Column1.nunique(), figsize=(12,8))
groups = df.groupby('Column1')
fig.tight_layout(pad=3)
# If `fig.tight_layout(pad=3)` does not work:
# plt.subplots_adjust(wspace=0.5)
for i, (gname, gdata) in enumerate(groups):
sns.regplot(x='Column2', y='Column3', data=gdata, ax=axes[i])
axes[i].set_title(gname)
axes[i].set_ylim(0,)
You can also use the seaborn in-built functionality. If you want to see the correlation you can just do df.corr().
import seaborn as sns
sns.heatmap(df.corr(),annot=True) # This is HeatMap
You can also use pairplot.
sns.set_style('whitegrid')
sns.set(rc={'figure.figsize':(13.7,10.27)})
sns.set(font_scale=1.3)
sns.set_palette("cubehelix",8)
sns.pairplot(df)
Here's what I did. You can try something like this.
fig, axes = plt.subplots(ncols=3,figsize=(12,6))
plt.subplots_adjust(wspace=0.5, hspace=0.3)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
The output of this is:
If you want to see the answers vertically, then change ncols to nrows and fix the figsize.
fig, axes = plt.subplots(nrows=3,figsize=(3,12))
plt.subplots_adjust(wspace=0.2, hspace=.5)
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[0])
axes[0].set_title("A")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[1])
axes[1].set_title("B")
df.groupby('Column1').plot.scatter(x='Column2',y='Column3',color="DarkBlue",ax=axes[2])
axes[2].set_title("C")
plt.show()
This will give you:

Pandas: using a pivot table to make bar graphs in subplots

I am working on looking at how age and time at a company influences resignation due to dissatisfaction.
I have a dataframe ("combined_updated") that has these columns:
"age_updated",
"service_cat", and
"dissatisfied."
"age_updated" and "service_cat" are strings. "age_updated" includes age ranges, and "service_cat" is a string which described the career stage of the employee (i.e. "New", "Experienced", etc.).
"dissatisfied" is boolean with True working as 1 and False as 0 in a pivot table. The pivot table therefore shows the % dissatisfied in certain groups.
I would like to make four bar graphs within a subplot with one graph looking at each career stage, with the y axis as % dissatisfied and the x axis as age.
So far I have written code that puts it all on one graph:
dis_pt = combined_updated.pivot_table(index=“service_cat”,columns=“age_cleaned”,values=“dissatisfied”)
dis_pt.plot(kind=“bar”)
Pivot Table
Graph so far
Does anyone know how to break this apart into subplots with appropriate labeling? Thanks!
You can set the axes for each plot individually.
ax = plt.subplot("410")
dis_plt.iloc[0,:].plot(kind="bar", ax= ax)
ax = plt.subplot("411")
dis_plt.iloc[1,:].plot(kind="bar", ax= ax)
ax = plt.subplot("412")
dis_plt.iloc[2,:].plot(kind="bar", ax= ax)
ax = plt.subplot("413")
dis_plt.iloc[3,:].plot(kind="bar", ax= ax)
plt.show()

Categories

Resources