Pandas DataFrame plot: specify column from MultiIndex for secondary_y - python

I am plotting a multi-index columns DataFrame.
What is the syntax to specify the column(s) to be plotted on secondary_y using the .plot method of pandas DataFrame?
Setup
import numpy as np
import pandas as pd
mt_idx = pd.MultiIndex.from_product([['A', 'B'], ['first', 'second']])
df = pd.DataFrame(np.random.randint(0, 10, size=(20, len(mt_idx))), columns=mt_idx)
My Attempts
df.plot(secondary_y=('B', 'second'))
df.plot(secondary_y='(B, second)')
None of the above worked, as all the lines were plotted on the principal y-axis.

One possible solution would be to plot each column, then specify secondary=True. Doing it the following way requires you to specifiy the axes to which they will be plotted:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
mt_idx = pd.MultiIndex.from_product([['A', 'B'], ['first', 'second']])
df = pd.DataFrame(np.random.randint(0, 10, size=(20, len(mt_idx))), columns=mt_idx)
df.A.plot(ax=ax)
df.B.plot(ax=ax, secondary_y=True)
plt.show()

You might drop the upper column index level. If you don't want to modify the original dataframe, this could be done on a copy of it.
df2 = df.copy()
df2.columns = df2.columns.map('_'.join)
df2.plot(secondary_y=('B_second'))

Related

Create a graph of a pivot_table in Python

I create a pivot table and I want create a bar graph. This is my pivot_table:
I don't know how to stract the values of the column 1970 and use this information to make a bar graph.
Thanks!!
Just convert dataframe column names to str then you can select the data of year 1970 with df['1970']. Then, you can use pandas built-in plot.bar method to make a bar plot. Try this:
import pandas as pd
import matplotlib.pyplot as plt
#converting column names to string
df.columns = df.columns.astype(str)
#plotting a bar plot
df['1970'].plot.bar()
plt.show()
Examples based on #AlanDyke DataFrame:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
df.columns = df.columns.astype(str)
df['1970'].plot.bar()
plt.show()
you can use plt.bar and slice the dataframe:
df = pd.DataFrame([[1970,'a',1],
[1970,'b',2],
[1971,'a',2],
[1971,'b',3]],
columns=['year','location', 'value'])
df = pd.pivot_table(df, values='value', index='location', columns='year')
plt.bar(list(df.transpose().columns), height=df[1970])

How to plot a bar chart without aggregation Seaborn?

How do you plot a bar chart without aggregation? I have two columns, one contains values and the other is categorical, but I want to plot each row individually, without aggregation.
By default, sns.barplot(x = "col1", y = "col2", data = df) will aggregate by taking the mean of the values for each category in col1.
How do I simply just plot a bar for each row in my dataframe with no aggregation?
In case 'col1' only contains unique labels, you immediately get your result with sns.barplot(x='col1', y='col2', data=df). In case there are repeated labels, you can use the index as x and afterwards change the ticks:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'col1': list('ababab'), 'col2': np.random.randint(10, 20, 6)})
ax = sns.barplot(x=df.index, y='col2', data=df)
ax.set_xticklabels(df['col1'])
ax.set_xlabel('col1')
plt.show()
PS: Similarly, a horizontal bar chart could be created as:
df = pd.DataFrame({'col1': list('ababab'), 'col2': np.random.randint(10, 20, 6)})
ax = sns.barplot(x='col2', y=df.index, data=df, orient='h')
ax.set_yticklabels(df['col1'])
ax.set_ylabel('col1')

How do I style only the last row of a pandas dataframe?

I can style a pandas dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
df.style.background_gradient(cmap=cm)
but I can't figure out how to only apply a style to the last row. There is a subset option in the background_gradient call, and it suggests that I use an index slice but I cannot figure out how to make just the last row have any kind of styling.
Here is my closest to success:
df.style.background_gradient(cmap=cm, subset=[2], axis='index')
Use the last element of your index as your subset.
df.style.background_gradient(cmap=cm, axis=1, subset=df.index[-1])
You could also use pd.IndexSlice which is useful if you want to apply the style to multiple rows, including the last:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
indices = pd.IndexSlice[[0, df.last_valid_index()], :]
df.style.background_gradient(cmap=cm, axis=1, subset=indices)

Pandas.plot(subplots=True) with 3 columns in each subplot

I have a DataFrame with 700 rows and 6 columns:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
I can plot all columns in a single plot by calling:
df.plot()
And I can plot each column in a single plot by calling:
df.plot(subplots=True)
How can I have two subplots with three columns each from my DataFrame?!
Here's a general approach to plot a dataframe with n columns in each subplot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
col_per_plot = 3
cols = df.columns.tolist()
# Create groups of 3 columns
cols_splits = [cols[i:i+col_per_plot] for i in range(0, len(cols), col_per_plot)]
# Define plot grid.
# Here I assume it is always one row and many columns. You could fancier...
fig, axarr = plt.subplots(1, len(cols_splits))
# Plot each "slice" of the dataframe in a different subplot
for cc, ax in zip(cols_splits, axarr):
df.loc[:, cc].plot(ax = ax)
This gives the following picture:

Multiple histograms in Pandas

I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot.
I have the following code:
import nsfg
import matplotlib.pyplot as plt
df = nsfg.ReadFemPreg()
preg = nsfg.ReadFemPreg()
live = preg[preg.outcome == 1]
first = live[live.birthord == 1]
others = live[live.birthord != 1]
#fig = plt.figure()
#ax1 = fig.add_subplot(111)
first.hist(column = 'prglngth', bins = 40, color = 'teal', \
alpha = 0.5)
others.hist(column = 'prglngth', bins = 40, color = 'blue', \
alpha = 0.5)
plt.show()
The above code does not work when I use ax = ax1 as suggested in: pandas multiple plots not working as hists nor this example does what I need: Overlaying multiple histograms using pandas. When I use the code as it is, it creates two windows with histograms. Any ideas how to combine them?
Here's an example of how I'd like the final figure to look:
As far as I can tell, pandas can't handle this situation. That's ok since all of their plotting methods are for convenience only. You'll need to use matplotlib directly. Here's how I do it:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
#import seaborn
#seaborn.set(style='ticks')
np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
fig, ax = plt.subplots()
a_heights, a_bins = np.histogram(df['A'])
b_heights, b_bins = np.histogram(df['B'], bins=a_bins)
width = (a_bins[1] - a_bins[0])/3
ax.bar(a_bins[:-1], a_heights, width=width, facecolor='cornflowerblue')
ax.bar(b_bins[:-1]+width, b_heights, width=width, facecolor='seagreen')
#seaborn.despine(ax=ax, offset=10)
And that gives me:
In case anyone wants to plot one histogram over another (rather than alternating bars) you can simply call .hist() consecutively on the series you want to plot:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas
np.random.seed(0)
df = pandas.DataFrame(np.random.normal(size=(37,2)), columns=['A', 'B'])
df['A'].hist()
df['B'].hist()
This gives you:
Note that the order you call .hist() matters (the first one will be at the back)
A quick solution is to use melt() from pandas and then plot with seaborn.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# make dataframe
df = pd.DataFrame(np.random.normal(size=(200,2)), columns=['A', 'B'])
# plot melted dataframe in a single command
sns.histplot(df.melt(), x='value', hue='variable',
multiple='dodge', shrink=.75, bins=20);
Setting multiple='dodge' makes it so the bars are side-by-side, and shrink=.75 makes it so the pair of bars take up 3/4 of the whole bin.
To help understand what melt() did, these are the dataframes df and df.melt():
From the pandas website (http://pandas.pydata.org/pandas-docs/stable/visualization.html#visualization-hist):
df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
plt.figure();
df4.plot(kind='hist', alpha=0.5)
You make two dataframes and one matplotlib axis
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'data1': np.random.randn(10),
'data2': np.random.randn(10)
})
df2 = df1.copy()
fig, ax = plt.subplots()
df1.hist(column=['data1'], ax=ax)
df2.hist(column=['data2'], ax=ax)
Here is the snippet, In my case I have explicitly specified bins and range as I didn't handle outlier removal as the author of the book.
fig, ax = plt.subplots()
ax.hist([first.prglngth, others.prglngth], 10, (27, 50), histtype="bar", label=("First", "Other"))
ax.set_title("Histogram")
ax.legend()
Refer Matplotlib multihist plot with different sizes example.
this could be done with brevity
plt.hist([First, Other], bins = 40, color =('teal','blue'), label=("First", "Other"))
plt.legend(loc='best')
Note that as the number of bins increase, it may become a visual burden.
You could also try to check out the pandas.DataFrame.plot.hist() function which will plot the histogram of each column of the dataframe in the same figure.
Visibility is limited though but you can check out if it helps!
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.hist.html

Categories

Resources