Pandas.plot(subplots=True) with 3 columns in each subplot - python

I have a DataFrame with 700 rows and 6 columns:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
I can plot all columns in a single plot by calling:
df.plot()
And I can plot each column in a single plot by calling:
df.plot(subplots=True)
How can I have two subplots with three columns each from my DataFrame?!

Here's a general approach to plot a dataframe with n columns in each subplot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(700,6))
col_per_plot = 3
cols = df.columns.tolist()
# Create groups of 3 columns
cols_splits = [cols[i:i+col_per_plot] for i in range(0, len(cols), col_per_plot)]
# Define plot grid.
# Here I assume it is always one row and many columns. You could fancier...
fig, axarr = plt.subplots(1, len(cols_splits))
# Plot each "slice" of the dataframe in a different subplot
for cc, ax in zip(cols_splits, axarr):
df.loc[:, cc].plot(ax = ax)
This gives the following picture:

Related

Rearranging the columns of my heatmap using python's seaborn

I'm trying to visualize the following .csv data:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
4,4,2,2,4,2,3,5,3,4,2,5,2,1,4,4,2,1,5,2
2,2,4,4,4,2,2,2,4,4,2,4,2,2,3,2,2,4,5,2
4,5,4,1,4,2,2,4,4,3,2,2,2,1,2,4,4,2,5,4
3,4,2,4,4,2,2,2,4,3,2,4,4,3,3,4,2,4,5,1
4,4,3,2,4,3,4,5,4,3,1,5,3,2,4,2,2,3,4,2
4,5,2,3,5,1,3,4,3,3,1,2,4,4,5,4,1,4,5,4
5,5,5,2,4,3,2,4,4,2,2,4,4,2,4,2,2,4,4,5
4,4,3,1,5,3,2,4,2,2,1,4,4,2,4,1,2,5,5,3
1,3,5,2,4,4,3,1,4,4,2,3,1,4,3,4,3,3,4,1
3,3,5,2,4,2,4,4,3,4,1,5,4,2,1,2,2,4,5,2
Here's my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
map = sns.clustermap(df, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
plt.show()
Which returns:
I want to rearrange my heatmap and order it column-wise by the frequency of each response. For example, The column Q5 has the value 4 repeated 8 times (more than any other column), so it should be the first column. Columns 17 and 19 have a value that is repeated 7 times, so they should come in second and third (exact order doesn't matter). How can I do this?
You can compute the order and reindex before using the data in clustermap:
order = (df.apply(pd.Series.value_counts)
.max()
.sort_values(ascending=False)
.index
)
import seaborn as sns
cm = sns.clustermap(df[order], col_cluster=False, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
Output:

Bar plot for multidimensional columns using pandas

I want to plot my dataframe (df) as a bar plot based on the time columns, where each bar represents the value counts() for each letter that appears in the column.
Expected output
.
date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N
When I select individual time columns, I can do as below
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('df.csv')
df = df['04:00:00'].value_counts()
df.plot(kind='bar')
plt.show()
How can I plot all the columns on the same bar plot as shown on the expected output.
One possible solution is:
pd.DataFrame({t: df[t].value_counts() for t in df.columns if t != "date"}).T.plot.bar()
Here is an approach via seaborn's catplot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO
df_str = '''date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N'''
df = pd.read_csv(StringIO(df_str))
df_long = df.set_index('date').melt(var_name='hour', value_name='kind')
g = sns.catplot(kind='count', data=df_long, x='kind', palette='mako',
col='hour', col_wrap=5, height=3, aspect=0.5)
for ax in g.axes.flat:
ax.set_xlabel(ax.get_title()) # use the title as xlabel
ax.grid(True, axis='y')
ax.set_title('')
if len(ax.get_ylabel()) == 0:
sns.despine(ax=ax, left=True) # remove left axis for interior subplots
ax.tick_params(axis='y', size=0)
plt.tight_layout()
plt.show()

How to make a distplot for each column in a pandas dataframe

I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

Python plotting by different dataframe columns (using Seaborn?)

I'm trying to create a scatterplot of a dataset with point coloring based on different categorical columns. Seaborn works well here for one plot:
fg = sns.FacetGrid(data=plot_data, hue='col_1')
fg.map(plt.scatter, 'x_data', 'y_data', **kws).add_legend()
plt.show()
I then want to display the same data, but with hue='col_2' and hue='col_3'. It works fine if I just make 3 plots, but I'm really hoping to find a way to have them all appear as subplots in one figure. Unfortunately, I haven't found any way to change the hue from one plot to the next. I know there are plotting APIs that allow for an axis keyword, thereby letting you pop it into a matplotlib figure, but I haven't found one that simultaneously allows you to set 'ax=' and 'hue='. Any ideas?
Thanks in advance!
Edit:
Here's some sample code to illustrate the idea
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
sns.set(style='ticks')
fg = sns.FacetGrid(data=df, hue='pet', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
fg = sns.FacetGrid(data=df, hue='hair', size=5)
fg.map(plt.scatter, 'x', 'y').add_legend()
plt.show()
This plots what I want, but in two windows. The color scheme is set in the first plot by grouping by 'pet', and in the second plot by 'hair'. Is there any way to do this on one plot?
In order to plot 3 scatterplots with different colors for each, you may create 3 axes in matplotlib and plot a scatter to each axes.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(10,5),
columns=["x", "y", "col1", "col2", "col3"])
fig, axes = plt.subplots(nrows=3)
for ax, col in zip(axes, df.columns[2:]):
ax.scatter(df.x, df.y, c=df[col])
plt.show()
For categorical data it is often easier to plot several scatter plots, one per category.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
cols = ['pet',"hair"]
fig, axes = plt.subplots(nrows=len(cols ))
for ax,col in zip(axes,cols):
for n, group in df.groupby(col):
ax.scatter(group.x,group.y, label=n)
ax.legend()
plt.show()
You may surely use a FacetGrid, if you really want, but that requires a different data format of the DataFrame.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
xx = np.random.rand(10,2)
cat1 = np.array(['cat','dog','dog','dog','cat','hamster','cat','cat','hamster','dog'])
cat2 = np.array(['blond','brown','brown','black','black','blond','blond','blond','brown','blond'])
d = {'x':xx[:,0], 'y':xx[:,1], 'pet':cat1, 'hair':cat2}
df = pd.DataFrame(data=d)
df2 = pd.melt(df, id_vars=['x','y'], value_name='category', var_name="kind")
fg = sns.FacetGrid(data=df2, row="kind",hue='category', size=3)
fg.map(plt.scatter, 'x', 'y').add_legend()

Categories

Resources