Show median and quantiles on Seaborn pairplot (Python) - python

I am making a corner plot using Seaborn. I would like to display lines on each diagonal histogram showing the median value and quantiles. Example shown below.
I usually do this using the Python package 'corner', which is straightforward. I want to use Seaborn just because it has better aesthetics.
The seaborn plot was made using this code:
import seaborn as sns
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})

Seaborn provides test data sets that come in handy to explain something you want to change to the default behavior. That way, you don't need to generate your own test data, nor to supply your own data that can be complicated and/or sensitive.
To update the subplots in the diagonal, there is g.map_diag(...) which will call a given function for each individual column. It gets 3 parameters: the data used for the x-axis, a label and a color.
Here is an example to add vertical lines for the main quantiles, and change the title. You can add more calculations for further customizations.
import matplotlib.pyplot as plt
import seaborn as sns
def update_diag_func(data, label, color):
for val in data.quantile([.25, .5, .75]):
plt.axvline(val, ls=':', color=color)
plt.title(data.name, color=color)
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, corner=True, diag_kws={'kde': True})
g.map_diag(update_diag_func)
g.fig.subplots_adjust(top=0.97) # provide some space for the titles
plt.show()

Seaborn is built ontop of matplotlib so you can try this:
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})
plt.text(300, 250, "An annotation")
plt.show()

Related

Overlay kde plot using Seaborn displot

I'm trying to recreate a plot that I made with seaborn distplot but using displot, since distplot is being depreciated.
How do I make the displot overlay the two columns?
Here is the original code to create using distplot:
import pandas as pd
import numpy as np
import seaborn as sns
df1 = pd.DataFrame({'num1':np.random.normal(loc=0.0, scale=1.0, size=100),'num2':np.random.normal(loc=0.0, scale=1.0, size=100)})
sns.distplot(df1['num1'],hist=False,color='orange',)
sns.distplot(df1['num2'],hist=False,color='blue')
Here is the code for the plot using displot
sns.displot(data = df1, x = 'num1',color='orange', kind = 'kde')
sns.displot(data = df1, x = 'num2',color='blue', kind = 'kde')
In think your are looking for kdeplot.
sns.kdeplot(data=df1, palette=['orange', 'blue'])
Without any special layout I get this result for your example.
I set the palette argument to define the colors as you did in your example, but this is optional.

Removing outliers from dataset identified in Matplotlib/Seaborn boxplot

I have produced a Boxplot/Swarmplot graph using Matplotlib/Seaborn in Pandas. Some outliers can been seen in the graph (as dots outside the "whiskers"/"fence" area). I am looking for a way to trim the dataset directly after they have been identified in the graph and without removing them from the original dataset. I do not want to simply hide the outlier dots.
Some methods have been recommended and pandas quantile looks promising but I am not sure how to implement these with the code I have been using.
My graph with the outliers.
The code I used to produce this graph. The data has been organized into the tidy format.
# Import libraries and modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set seaborn style
sns.set(style="whitegrid", palette="colorblind")
# load length tidy data
length_tidy = pd.read_csv('results/tidy/length_tidy.csv')
score_tidy = pd.read_csv('results/tidy/score_tidy.csv')
# Define and save boxplot and swarmplot for length data
fig, ax = plt.subplots(figsize=(10,6))
ax = sns.boxplot(x='Metric', y='Length', data=length_tidy, ax=ax)
ax = sns.swarmplot(x="Metric", y="Length", data=length_tidy, color=".25")
ax.set_xlabel('Condition')
ax.set_ylabel('Length in micrometers')
plt.savefig('statistics/boxplot/length_boxplot.png', dpi=300)
fig, ax = plt.subplots(figsize=(10,6))
ax = sns.boxplot(x='Metric', y='Score', data=score_tidy, ax=ax)
ax = sns.swarmplot(x="Metric", y="Score", data=score_tidy, color=".25")
ax.set_xlabel('Condition')
ax.set_ylabel('Score')
plt.savefig('statistics/boxplot/score_boxplot.png', dpi=300)
An example of some of the data I am working with in the CSV format.
Object,Metric,Length
M11,B2A10,1.807782
MT1,B2A10,3.2207116666666664
MT1,B2A1,3.57675
MT1,B2A2,2.9474600000000004
MT1,B2A3,2.247772857142857
MT1,B2A4,3.754455
MT1,B2A5,2.716282
MT1,B2A6,2.91325
MT1,B2A7,1.24806
MT1,B2A8,2.00371875
MT1,B2A9,1.5435599999999998
MT1,B2B1,2.2051515384615388
MT1,B2B2,1.5278873333333332
MT1,B2B3,1.7283750000000002
MT1,B2B4,1.4547385714285714
MT1,B2B5,3.237578333333333
MT1,B2B6,2.47016
MT1,B2B7,2.1185947777777776
MT1,B2B8,1.8502877777777773
MT10,B2A10,3.07143
MT10,B2A1,3.34361
MT10,B2A2,2.889958333333333
MT10,B2A3,2.22087
MT10,B2A4,2.87669
MT10,B2A5,1.6745005555555557
MT10,B2A7,2.09018
MT10,B2A8,2.4947450000000004
MT10,B2B1,1.849095882352941
MT10,B2B2,1.5291758000000002
MT10,B2B5,1.6423770999999998
MT10,B2B6,1.9680385714285715
MT10,B2B7,1.7207240000000001
MT10,B2B8,2.9618275
MT12,B2A10,1.7243058333333334
MT12,B2A1,3.3938900000000003
MT12,B2A2,2.00601
MT12,B2A3,2.1720200000000003
MT12,B2A4,2.452923333333333
MT12,B2A5,2.986948
MT12,B2A7,2.08466
MT12,B2A8,1.29047
MT12,B2B1,2.528839230769232
MT12,B2B2,1.4011425454545454
MT12,B2B5,1.626078333333333
MT12,B2B6,1.074394454545455
MT12,B2B7,2.0897078571428573
MT12,B2B8,1.4102533333333336

Different point size based on hue argument in seaborn

I am trying to have different point sizes on a seaboard scatterplot depending on the value on the "hue" column of my dataframe.
sns.scatterplot(x="X", y="Y", data=df, hue='value',style='value')
value can take 3 different values (0,1 and 2) and I would like points which value is 2 to be bigger on the graph.
I tried the sizes argument :
sizes=(1,1,4)
But could not get it done this way.
Let's use the s parameter and pass a list of sizes using a function of df['value'] to scale the point sizes:
df = pd.DataFrame({'X':[1,2,3],'Y':[1,4,9],'value':[1,0,2]})
import seaborn as sns
_ = sns.scatterplot(x='X',y='Y', data=df, s=df['value']*50+10)
Output:
Using seaborn scatterplots arguments:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'X':[1,2,3,4,5],'Y':[1,2,3,4,5],
'value':[1,1,0,2,2]})
df["size"] = np.where(df["value"] == 2, "Big", "Small")
sns.scatterplot(x="X", y="Y", hue='value', size="size",
data=df, size_order=("Small", "Big"), sizes=(160, 40))
plt.show()
Note that the order of sizes needs to be reveresed compared to the size_order. I have no idea why that would make sense, though.

How to plot a Python Dataframe with category values like this picture?

How can I achieve that using matplotlib?
Here is my code with the data you provided. As there's no class [they are all different, despite your first example in your question does have classes], I gave colors based on the numbers. You can definitely start alone from here, whatever result you want to achieve. You just need pandas, seaborn and matplotlib:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import xls
df=pd.read_excel('data.xlsx')
# exclude Ranking values
df1 = df.ix[:,1:-1]
# for each element it takes the value of the xls cell
df2=df1.applymap(lambda x: float(x.split('\n')[1]))
# now plot it
df_heatmap = df2
fig, ax = plt.subplots(figsize=(15,15))
sns.heatmap(df_heatmap, square=True, ax=ax, annot=True, fmt="1.3f")
plt.yticks(rotation=0,fontsize=16);
plt.xticks(fontsize=12);
plt.tight_layout()
plt.savefig('dfcolorgraph.png')
Which produces the following picture.

Python Pandas Matplotlib Plot Colored by type value defined in single column

I have data of the following format:
import pandas as ps
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],\
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],\
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=ps.DataFrame(table,columns=['time','data','type']
I would like to plot data as a function of time connected as a line, but I would like each line to be a separate color for unique types. In this example, the result would be three lines: a data(time) line for each type a, b, and, c. Any guidance is appreciated.
I have been unable to produce a line with this data--pandas.scatter will produce a plot, while pandas.plot will not. I have been messing with loops to produce a plot for each type, but I have not found a straight forward way to do this. My data typically has an unknown number of unique 'type's. Does pandas and/or matpltlib have a way to create this type of plot?
Pandas plotting capabilities will allow you to do this if everything is indexed properly. However, sometimes it's easier to just use matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
groups = df.groupby('type')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group['time'], group['data'], label=name)
ax.legend(loc='best')
plt.show()
If you'd prefer to use the pandas plotting wrapper, you'll need to override the legend labels:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
df.index = df['time']
groups = df[['data', 'type']].groupby('type')
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
Just to throw in the seaborn solution.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, hue="type", size=5)
g.map(plt.plot, "time", "data")
g.add_legend()

Categories

Resources