Here's my chart:
Unfortunately, this is there too, right below:
This is the code:
fig,ax1 = plt.subplots(6,1, figsize=(20,10),dpi=300)
fig2,ax2 = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
g = ax1[index].plot(datedf.index, datedf[val], color=colors[index])
ax1[index].set(ylim=[-100,6500])
ax2[index] = ax1[index].twinx()
a = ax2[index].plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2[index].set(ylim=[200,257000])
I tried this answer but I got an error on the first line (too many values to unpack)
Can anyone explain why?
You generate 2 figures, so you end up with 2 figures.
Instead you should do something like:
fig, axes = plt.subplots(6,1, figsize=(20,10),dpi=300)
for index, val in enumerate(datedf.columns):
ax1 = axes[index]
g = ax1.plot(datedf.index, datedf[val], color=colors[index])
ax1.set(ylim=[-100,6500])
ax2 = ax1.twinx()
ax2.plot(qtydf.index, qtydf[val], color=colors[index], alpha=0.5)
ax2.set(ylim=[200,257000])
NB. The code is untested as I don't have the original dataset.
I am a bit new to Python. And I am playing with a dummy dataset to get some Python data manipulation practice. Below is the code for generating the dummy data:
d = {
'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] ,
'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] ,
'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] ,
'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] ,
'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] ,
'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)
categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')
I am basically struggling with a couple of things:
#1. A stacked barplot with absolute values (like the excel example below)
#2. A stacked barplot with percentage values (like the excel example below)
Below are my target visualizations for # 1 and # 2 using countplot().
#1
#2
For # 1, instead of a stacked barplot, with countplot() I am able to make a clustered barplot, like below, and also the annotation snippet feels more like a workaround rather than being Python elegant.
# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))
for i,ax in zip(categoricals[:-1],axes.flatten()):
sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
for p in ax.patches:
height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust) #x,y coordinates where we want to put our data label
ax.annotate(height,label_xy) # final annotation
For # 2, I tried creating a new data frame housing % values but that felt tedious and error-prone.
I feel an option like stacked = True, proportion = True, axis = 1, annotate = True could have been so useful for countplot() to have.
Are there any other libraries that would be straight-froward and less code-intensive? Any comments or suggestions are welcome.
In this case, I think plotly.express may be more intuitive for you.
import plotly.express as px
df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()
Basically, if you want to have more flexibility to adjust your charts, it is hard to avoid writing lots of codes.
I also try using matplotlib and pandas to create a stacked bar chart for percentages. If you are interested in it, you can try it.
sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
# Conver the axes matrix to a 1-d array
axes = ax.flatten()
for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
# Calculate the number of plots
df_temp = (CarWash.groupby(col)['ReversedPayment']
.value_counts()
.unstack(1).fillna(0)
.rename({0:f'No', 1:f'Yes'})
.rename({0:'No', 1:'Yes'}, axis=1))
df_temp = df_temp / df_temp.sum(axis=0)
df_temp.plot.bar(stacked=True, ax=axes[i])
axes[i].set_title(col, y=1.03, fontsize=16)
rects = axes[i].patches
labels = df_temp.values.flatten()
for rect, label in zip(rects, labels):
if label == 0: continue
axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
ha='center', va='bottom', color='white', fontsize=12)
axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
axes[i].tick_params(rotation=0)
plt.tight_layout()
plt.show()
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
for i in range (96):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
plt.subplots(1, 1, sharex='all', sharey='all')
plt.plot(hold[i], label='stimulus')
plt.plot(secondHold[i], label='Blank Stimulus')
plt.title('Channel x')
plt.xlabel('time (ms)')
plt.ylabel('Avg Spike Rate')
plt.legend()
plt.show()
I am creating 96 different graphs through a for-loop and I want it to also label the graphs (i.e., the first graph would be 'Channel 1', graph two 'Channel 2' and so on. I tried ax.set_title but couldn't figure it out how to make it work with the string and numbers.
Also I'd like the graphs to print as a 6x16 subplots instead of 96 graphs in a column.
You are creating a new figure each time in your for loop that's why you get 96 figures. I don't have your data so I can't provide a final figure but the following should work for you. The idea here is:
Define a figure and an array of axes containing 6x16 subplots.
Use enumerate on axes.flatten to iterate through the subfigures ax row wise and use i as the index to access the data.
Use the field specifier %d to label the subplots iteratively.
Put plt.show() outside the for loop
secondHold = np.zeros((96,30))
channel = ['channel' for x in range(96)]
fig, axes = plt.subplots(nrows=6, ncols=16, sharex='all', sharey='all')
for i, ax in enumerate(axes.flatten()):
BlankBinsx = bins[blankposition,0:30,i]
StimBinsx = bins[NonBlankPositions,0:30,i]
meanx = BlankBinsx.mean(axis=0);
stimmeanx = StimBinsx.mean(axis=0);
for j in range(30):
hold[i][j] = meanx[j];
secondHold[i][j] = stimmeanx[j];
ax.plot(hold[i], label='stimulus')
ax.plot(secondHold[i], label='Blank Stimulus')
ax.set_title('Channel %d' %i)
ax.set_xlabel('time (ms)')
ax.set_ylabel('Avg Spike Rate')
ax.legend()
plt.show()
Ok, so I've been trying to fix this since yesterday and can't find a solution.
I created 12 pandas Dataframes (named exp_1 - exp_12) for the data of 12 different experiments, the column names are identical in all of the Dataframes. I want to create a plot with 12 subplots (12x4) with 4 plots per row for every experiment.
So far, so good. Plotting works just fine, I am currently using this code (I shortened it to 4 plots here):
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (15,27))
sns.regplot('MecA_SP', 'MecA_MP', data=exp_3, color ='blue', ax=axs[0,0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', color ='lime', data=exp_3,
ax=axs[0,1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=exp_3, color = 'deeppink',
ax=axs[0,2])
sns.boxplot(data=exp_3, orient ='h', color ='darkviolet', ax=axs[0,3])
fig.tight_layout()
plt.show()
But I'm trying to create these subplot by using a loop so that I don't have to manually input each the sample names for each and every Dataframe. Right now this is what I have:
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (14,5))
exps = {0: 'exp_1',1: 'exp_2',2: 'exp_3',3: 'exp_4',4: 'exp_5',5: 'exp_6',
6:'exp_7',7: 'exp_8', 8:'exp_9',9: 'exp_10',10: 'exp_11',11: 'exp_12'}
for x in exps :
sns.regplot('MecA_SP', 'MecA_MP', data=x, color ='blue', ax=axs[exps[x],
0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', color ='lime', data=x,
ax=axs[exps[x], 1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=x, color = 'deeppink',
ax=axs[exps[x], 2])
sns.boxplot(data=x, orient ='h', color ='darkviolet', ax=axs[exps[x],3])
fig.tight_layout()
plt.show()
This is what I what my plot looks like if I don't use a loop, but just Write the whole thing by hand:
enter image description here
Does anyone have an idea how I could solve this? I'll be happy about any suggestions, so thanks in advance
Simply save your dataframes in a list not a dictionary of dataframe names and then iterate to create subplots. Even use enumerate to get a loop count for plot ax position.
exps = [exp_1, exp_2, exp_3, exp_4, exp_5, exp_6
exp_7, exp_8, exp_9, exp_10, exp_11, exp_12]
fig, axs = plt.subplots(nrows = 12, ncols=4, figsize = (14,5))
for i, x in enumerate(exps):
sns.regplot('MecA_SP', 'MecA_MP', data=x, color='blue', ax=axs[i, 0])
sns.regplot('blaOXA_SP', 'blaOXA_MP', data=x, color='lime', ax=axs[i, 1])
sns.regplot('Aph3_SP', 'Aph3_MP', data=x, color='deeppink', ax=axs[i, 2])
sns.boxplot(orient='h', data=x, color='darkviolet', ax=axs[i, 3])
fig.tight_layout()
plt.show()
plt.clf()
plt.close()
The code below achieves what I want to do, but does so in a very roundabout way. I have looked around for a succinct way to produce a single legend for a figure that includes multiple subplots that takes into account their labels, to no avail. plt.figlegend() requires you to pass in labels and lines, and plt.legend() requires only handles (slightly better).
My example below illustrates what I want. I have 9 vectors, each with one of 3 categories. I want to plot each vector on a separate sub plot, label it, and plot a legend which indicates (using colour) what the label means; this is the automatic behaviour on a single plot.
Do you know of a better way of achieving the plot below?
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
nr_lines = 9
nr_cats = 3
np.random.seed(1337)
# Data
X = np.random.randn(nr_lines, 100)
labels = ['Category {}'.format(ii) for ii in range(nr_cats)]
y = np.random.choice(labels, nr_lines)
# Ideally wouldn't have to manually pick colours
clrs = matplotlib.rcParams['axes.prop_cycle'].by_key()['color']
clrs = [clrs[ii] for ii in range(nr_cats)]
lab_clr = {k: v for k, v in zip(labels, clrs)}
fig, ax = plt.subplots(3, 3)
ax = ax.flatten()
for ii in range(nr_lines):
ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
lines = [a.lines[0] for a in ax]
l_labels = [l.get_label() for l in lines]
# the hack - get a single occurance of each label
idx_list = [l_labels.index(lab) for lab in labels]
lines_ = [lines[idx] for idx in idx_list]
#l_labels_ = [l_labels[idx] for idx in idx_list]
plt.legend(handles=lines_, bbox_to_anchor=[2, 2.5])
plt.tight_layout()
plt.savefig('/home/james/Downloads/stack_figlegend_example.png',
bbox_inches='tight')
You could use a dictionary to collect them using the label as a key. For example:
handles = {}
for ii in range(nr_lines):
l1, = ax[ii].plot(X[ii,:], label=y[ii], color=lab_clr[y[ii]])
if y[ii] not in handles:
handles[y[ii]] = l1
plt.legend(handles=handles.values(), bbox_to_anchor=[2, 2.5])
You only add a handle to the dictionary if the category isn't already present.