Show first and last label in pandas plot - python

I have a DataFrame with 361 columns. I want to plot it but showing only the first and last columns in the legend. For instance:
d = {'col1':[1,2],'col2':[3,4],'col3':[5,6],'col4':[7,8]}
df = pd.DataFrame(data=d)
If I plot through df.plot() all the legends will be displayed, but I only want 'col1' and 'col4' in my legend with the proper color code (I am using a colormap) and legend title.
One way to do this is to plot each column separately through matplotlib without using legends and then plot two more empty plots with only the labels (example below), but I wonder if there is a direct way to do it with pandas.
for columns in df:
plt.plot(df[columns])
plt.plot([],[],label=df.columns[0])
plt.plot([],[],label=df.columns[-1])
plt.legend()
plt.show()

Let's try extracting the handlers/labels from the axis and defining new legend:
ax = df.plot()
handlers, labels = ax.get_legend_handles_labels()
new_handlers, new_labels = [], []
for h,l in zip(handlers, labels):
if l in ['col1','col4']:
new_handlers.append(h)
new_labels.append(l)
ax.legend(new_handlers, new_labels)
Output:

You can try to split your df into two dfs which the second one will contain only the columns of interest and then plot both dfs showing only the second legend.

Related

Creating a single tidy seaborn plot in a 'for' loop

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

plotting whit subplots in a loop python [duplicate]

Case:
I receive a dataframe with (say 50) columns.
I extract the necessary columns from that dataframe using a condition.
So we have a list of selected columns of our dataframe now. (Say this variable is sel_cols)
I need a bar chart for each of these columns value_counts().
And I need to arrange all these bar charts in 3 columns, and varying number of rows based on number of columns selected in sel_cols.
So, if say 8 columns were selected, I want the figure to have 3 columns and 3 rows, with last subplot empty or just 8 subplots in 3x3 matrix if that is possible.
I could generate each chart separately using following code:
for col in sel_cols:
df[col].value_counts().plot(kind='bar)
plt.show()
plt.show() inside the loop so that each chart is shown and not just the last one.
I also tried appending these charts to a list this way:
charts = []
for col in sel_cols:
charts.append(df[col].value_counts().plot(kind='bar))
I could convert this list into an numpy array through reshape() but then it will have to be perfectly divisible into that shape. So 8 chart objects will not be reshaped into 3x3 array.
Then I tried creating the subplots first in this way:
row = len(sel_cols)//3
fig, axes = plt.subplots(nrows=row,ncols=3)
This way I would get the subplots, but I get two problems:
I end up with extra subplots in the 3 columns which will go unplotted (8 columns example).
I do not know how to plot under each subplots through a loop.
I tried this:
for row in axes:
for chart, col in zip(row,sel_cols):
chart = data[col].value_counts().plot(kind='bar')
But this only plots the last subplot with the last column. All other subplots stays blank.
How to do this with minimal lines of code, possibly without any need for human verification of the final subplots placements?
You may use this sample dataframe:
pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],
'B':['E','E','E','E','F','F','F','F','E'],
'C':[1,1,0,0,1,1,0,0,1],
'D':['P','Q','R','S','P','Q','R','P','Q'],
'E':['E','E','E','E','F','F','G','G','G'],
'F':[1,1,0,0,1,1,0,0,1],
'G':['N','N','N','N','Y','N','N','Y','N'],
'H':['G','G','G','E','F','F','G','F','E'],
'I':[1,1,0,0,1,1,0,0,1],
'J':['Y','N','N','Y','Y','N','N','Y','N'],
'K':['E','E','E','E','F','F','F','F','E'],
'L':[1,1,0,0,1,1,0,0,1],
})
Selected columns are: sel_cols = ['A','B','D','E','G','H','J','K']
Total 8 columns.
Expected output is bar charts for value_counts() of each of these columns arranged in subplots in a figure with 3 columns. Rows to be decided based on number of columns selected, here 8 so 3 rows.
Given OP's sample data:
df = pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],'B':['E','E','E','E','F','F','F','F','E'],'C':[1,1,0,0,1,1,0,0,1],'D':['P','Q','R','S','P','Q','R','P','Q'],'E':['E','E','E','E','F','F','G','G','G'],'F':[1,1,0,0,1,1,0,0,1],'G':['N','N','N','N','Y','N','N','Y','N'],'H':['G','G','G','E','F','F','G','F','E'],'I':[1,1,0,0,1,1,0,0,1],'J':['Y','N','N','Y','Y','N','N','Y','N'],'K':['E','E','E','E','F','F','F','F','E'],'L':[1,1,0,0,1,1,0,0,1]})
sel_cols = list('ABDEGHJK')
data = df[sel_cols].apply(pd.value_counts)
We can plot the columns of data in several ways (in order of simplicity):
DataFrame.plot with subplots param
seaborn.catplot
Loop through plt.subplots
1. DataFrame.plot with subplots param
Set subplots=True with the desired layout dimensions. Unused subplots will be auto-disabled:
data.plot.bar(subplots=True, layout=(3, 3), figsize=(8, 6),
sharex=False, sharey=True, legend=False)
plt.tight_layout()
2. seaborn.catplot
melt the data into long-form (i.e., 1 variable per column, 1 observation per row) and pass it to seaborn.catplot:
import seaborn as sns
melted = data.melt(var_name='var', value_name='count', ignore_index=False).reset_index()
sns.catplot(data=melted, kind='bar', x='index', y='count',
col='var', col_wrap=3, sharex=False)
3. Loop through plt.subplots
zip the columns and axes to iterate in pairs. Use the ax param to place each column onto its corresponding subplot.
If the grid size is larger than the number of columns (e.g., 3*3 > 8), disable the leftover axes with set_axis_off:
fig, axes = plt.subplots(3, 3, figsize=(8, 8), constrained_layout=True, sharey=True)
# plot each col onto one ax
for col, ax in zip(data.columns, axes.flat):
data[col].plot.bar(ax=ax, rot=0)
ax.set_title(col)
# disable leftover axes
for ax in axes.flat[data.columns.size:]:
ax.set_axis_off()
Alternative to the answer by tdy, I tried to do it without seaborn using Matplotlib and a for loop.
Figured it might be better for some who want specific control over subplots with formatting and other parameters, then this is another way:
fig = plt.figure(1,figsize=(16,12))
for i, col in enumerate(sel_cols,1):
fig.add_subplot(3,4,i,)
data[col].value_counts().plot(kind='bar',ax=plt.gca())
plt.title(col)
plt.tight_layout()
plt.show(1)
plt.subplot activates a subplot, while plt.gca() points to the active subplot.

Plot sub-bar charts on a dataframe groupby

Hi I am having some trouble plotting sub-bar charts after a dataframe groupby
Post groupby, the data is as per the below :
I tried the below to create a bar chart.
df_temp[df_temp.index =='ABC'].unstack().plot.bar(figsize=(10,2))
How can I plot a bar charts where the x-axis is the date and y-axis is the count and each row (ABC and EFG) is its own subplot (vertically stacked)
Example below
thanks for your help !
thanks to #r-beginnners
#remove the multi-level column
df.columns = df.columns.droplevel()
#plot the sub-plots
# if y-axis scale to be the same, use sharey=True
df.T.plot(subplots=True, layout=(2,1), kind='bar', sharey=True)

How to add legends to a plot i made using a for loop in seaborn?

I made a distribution plot using sns.distplot() for multiple columns in a pd.DataFrame using a for loop:
for i in heatmap_df.columns[1:6]:
sns.distplot(df[i], hist=False)
So there's a kde line of different colours for each of the columns on the same graph. How do I add a legend to specify which color is for which column? Or is there a special kind of plot that allows me do this without using a for loop at all?
How about putting label in sns:
df = sns.load_dataset('iris')
for i in df.columns[:4]:
sns.distplot(df[i], hist=False, label=i)
Output:
And without for loop:
df.iloc[:,:4].plot.kde()
Output:

Question on Matplotlib bar plot with a pandas dataframe?

I want to create a bar chart with SALES_COV_TYPE as X and ID_ev as Y. I would like to have both the bars in different colors. Rename X axis values, and also rename the X label in legend to something else as can be seen in the image link. Change Id_ev to 'Visits'.
This is my attempt at it and I'm not satisfied with it.
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791]}
df = pd.DataFrame(data)
df.plot(kind='bar',y='ID_ev',x='SALES_COV_TYPE')
ax=plt.gca()
my_xticks = ['CAM','GAM']
ax.set_xticklabels(my_xticks)
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
Plot
I want to create the bar chart using fig, ax method so that when I create subplots in future this becomes a template for me, but I am not being able to get it done. I would not like to use the pandas wrapper for matplotlib if possible. What do you all suggest?
Thanks!
Do you mean something like this:
data = {'SALES_COV_TYPE':[84.0,88.0], 'ID_ev':[2360869,882791], 'SALES_COV_CLASS':['CAM','GAM']}
df = pd.DataFrame(data)
colors = ['green', 'red']
fig, ax = plt.subplots(1, 1, figsize=(4,3))
ax.bar(df['SALES_COV_CLASS'], df['ID_ev'], color = colors)
ax.set_title('Title')
ax.set_xlabel('Sales Coverage Type')
ax.set_ylabel('Number of Customer Visits')
plt.show()
I find it easier to add a column with the name of the group, that way you don't need to reindex the x axis to remove the empty columns.

Categories

Resources