How to make a stacked graph directly from a python groupby code? - python

I want to make a chart after applying groupby
So I have applied
Sales_comparison = SalesData[['Region', 'Sales2015',
'Sales2016']].groupby(['Region']).agg(['sum'])
I have tried for the graph
ax = Sales_comparison[['Sales2015','Sales2016']].plot(kind='bar', title
="Sales by region comparison", figsize=(7.5, 5), legend=True, fontsize=12)
ax.set_xlabel("Region", fontsize=12)
ax.set_ylabel("Sales", fontsize=12)
x = Sales_comparison.Region.index.tolist()
x_pos = [i for i, _ in enumerate(x)]
plt.xticks(x_pos, x)
plt.show()
But it is of no use
Is there any easier and shorter way to do what I want to achieve?
The data can be found in
Link for the data

Could you elaborate on what you mean by indexation?
I assume you mean labeling in the graph, which would work in a similar way that you have already done.
Sales_comparison = df[['Region', 'Sales2015', 'Sales2016']].groupby(['Region']).agg(['sum'])
ax = Sales_comparison.plot(kind='bar', stacked=True, legend=False,color=['navy','darkred'])
for i, label in enumerate(list(Sales_comparison.index)):
S16 = int(Sales_comparison.loc[label]['Sales2016'][0])
ax.annotate(str(S16),(i-0.2,S16+0.2*S16),color='white')
S15 = int(Sales_comparison.loc[label]['Sales2015'][0])
ax.annotate(str(S15),(i-0.2,S15-0.5*S15),color='white')
and result in the following image:
End Result with Labels

Related

Double the amount of subplots when using twinx() in matplotlib

Here's my chart:
Unfortunately, this is there too, right below:
This is the code:
fig,ax1 = plt.subplots(6,1, figsize=(20,10),dpi=300)
fig2,ax2 = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
g = ax1[index].plot(datedf.index, datedf[val], color=colors[index])
ax1[index].set(ylim=[-100,6500])
ax2[index] = ax1[index].twinx()
a = ax2[index].plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2[index].set(ylim=[200,257000])
I tried this answer but I got an error on the first line (too many values to unpack)
Can anyone explain why?
You generate 2 figures, so you end up with 2 figures.
Instead you should do something like:
fig, axes = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
ax1 = axes[index]
g = ax1.plot(datedf.index, datedf[val], color=colors[index])
ax1.set(ylim=[-100,6500])
ax2 = ax1.twinx()
ax2.plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2.set(ylim=[200,257000])
NB. The code is untested as I don't have the original dataset.

How to generate labelled barplots using seaborn?

I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()

Change titles in a for loop for plt.plot and create 6x16 subplots

secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
for i in range (96):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
plt.subplots(1, 1, sharex='all', sharey='all')
plt.plot(hold[i], label='stimulus')
plt.plot(secondHold[i], label='Blank Stimulus')
plt.title('Channel x')
plt.xlabel('time (ms)')
plt.ylabel('Avg Spike Rate')
plt.legend()
plt.show()
I am creating 96 different graphs through a for-loop and I want it to also label the graphs (i.e., the first graph would be 'Channel 1', graph two 'Channel 2' and so on. I tried ax.set_title but couldn't figure it out how to make it work with the string and numbers.
Also I'd like the graphs to print as a 6x16 subplots instead of 96 graphs in a column.
You are creating a new figure each time in your for loop that's why you get 96 figures. I don't have your data so I can't provide a final figure but the following should work for you. The idea here is:
Define a figure and an array of axes containing 6x16 subplots.
Use enumerate on axes.flatten to iterate through the subfigures ax row wise and use i as the index to access the data.
Use the field specifier %d to label the subplots iteratively.
Put plt.show() outside the for loop
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
fig, axes = plt.subplots(nrows=6, ncols=16, sharex='all', sharey='all')
for i, ax in enumerate(axes.flatten()):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
ax.plot(hold[i], label='stimulus')
ax.plot(secondHold[i], label='Blank Stimulus')
ax.set_title('Channel %d' %i)
ax.set_xlabel('time (ms)')
ax.set_ylabel('Avg Spike Rate')
ax.legend()
plt.show()

Creating Matplotlib subplot using a loop that iterates columns from different Pandas Dataframes

Ok, so I've been trying to fix this since yesterday and can't find a solution.
I created 12 pandas Dataframes (named exp_1 - exp_12) for the data of 12 different experiments, the column names are identical in all of the Dataframes. I want to create a plot with 12 subplots (12x4) with 4 plots per row for every experiment.
So far, so good. Plotting works just fine, I am currently using this code (I shortened it to 4 plots here):
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (15,27))
sns.regplot('MecA_SP', 'MecA_MP', data=exp_3, color ='blue', ax=axs[0,0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', color ='lime', data=exp_3,
ax=axs[0,1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=exp_3, color = 'deeppink',
ax=axs[0,2])
sns.boxplot(data=exp_3, orient ='h', color ='darkviolet', ax=axs[0,3])
fig.tight_layout()
plt.show()
But I'm trying to create these subplot by using a loop so that I don't have to manually input each the sample names for each and every Dataframe. Right now this is what I have:
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (14,5))
exps = {0: 'exp_1',1: 'exp_2',2: 'exp_3',3: 'exp_4',4: 'exp_5',5: 'exp_6',
6:'exp_7',7: 'exp_8', 8:'exp_9',9: 'exp_10',10: 'exp_11',11: 'exp_12'}
for x in exps :
sns.regplot('MecA_SP', 'MecA_MP', data=x, color ='blue', ax=axs[exps[x],
0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', color ='lime', data=x,
ax=axs[exps[x], 1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=x, color = 'deeppink',
ax=axs[exps[x], 2])
sns.boxplot(data=x, orient ='h', color ='darkviolet', ax=axs[exps[x],3])
fig.tight_layout()
plt.show()
This is what I what my plot looks like if I don't use a loop, but just Write the whole thing by hand:
enter image description here
Does anyone have an idea how I could solve this? I'll be happy about any suggestions, so thanks in advance
Simply save your dataframes in a list not a dictionary of dataframe names and then iterate to create subplots. Even use enumerate to get a loop count for plot ax position.
exps = [exp_1, exp_2, exp_3, exp_4, exp_5, exp_6
exp_7, exp_8, exp_9, exp_10, exp_11, exp_12]
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (14,5))
for i, x in enumerate(exps):
sns.regplot('MecA_SP', 'MecA_MP', data=x, color='blue', ax=axs[i, 0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', data=x, color='lime', ax=axs[i, 1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=x, color='deeppink', ax=axs[i, 2])
sns.boxplot(orient='h', data=x, color='darkviolet', ax=axs[i, 3])
fig.tight_layout()
plt.show()
plt.clf()
plt.close()

Matplotlib: Automatic coloured legend for all subplots using subplot line labels

The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.

Categories

Resources