Different point size based on hue argument in seaborn - python

I am trying to have different point sizes on a seaboard scatterplot depending on the value on the "hue" column of my dataframe.
sns.scatterplot(x="X", y="Y", data=df, hue='value',style='value')
value can take 3 different values (0,1 and 2) and I would like points which value is 2 to be bigger on the graph.
I tried the sizes argument :
sizes=(1,1,4)
But could not get it done this way.

Let's use the s parameter and pass a list of sizes using a function of df['value'] to scale the point sizes:
df = pd.DataFrame({'X':[1,2,3],'Y':[1,4,9],'value':[1,0,2]})
import seaborn as sns
_ = sns.scatterplot(x='X',y='Y', data=df, s=df['value']*50+10)
Output:

Using seaborn scatterplots arguments:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'X':[1,2,3,4,5],'Y':[1,2,3,4,5],
'value':[1,1,0,2,2]})
df["size"] = np.where(df["value"] == 2, "Big", "Small")
sns.scatterplot(x="X", y="Y", hue='value', size="size",
data=df, size_order=("Small", "Big"), sizes=(160, 40))
plt.show()
Note that the order of sizes needs to be reveresed compared to the size_order. I have no idea why that would make sense, though.

Related

Show median and quantiles on Seaborn pairplot (Python)

I am making a corner plot using Seaborn. I would like to display lines on each diagonal histogram showing the median value and quantiles. Example shown below.
I usually do this using the Python package 'corner', which is straightforward. I want to use Seaborn just because it has better aesthetics.
The seaborn plot was made using this code:
import seaborn as sns
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})
Seaborn provides test data sets that come in handy to explain something you want to change to the default behavior. That way, you don't need to generate your own test data, nor to supply your own data that can be complicated and/or sensitive.
To update the subplots in the diagonal, there is g.map_diag(...) which will call a given function for each individual column. It gets 3 parameters: the data used for the x-axis, a label and a color.
Here is an example to add vertical lines for the main quantiles, and change the title. You can add more calculations for further customizations.
import matplotlib.pyplot as plt
import seaborn as sns
def update_diag_func(data, label, color):
for val in data.quantile([.25, .5, .75]):
plt.axvline(val, ls=':', color=color)
plt.title(data.name, color=color)
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, corner=True, diag_kws={'kde': True})
g.map_diag(update_diag_func)
g.fig.subplots_adjust(top=0.97) # provide some space for the titles
plt.show()
Seaborn is built ontop of matplotlib so you can try this:
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})
plt.text(300, 250, "An annotation")
plt.show()

Overlay kde plot using Seaborn displot

I'm trying to recreate a plot that I made with seaborn distplot but using displot, since distplot is being depreciated.
How do I make the displot overlay the two columns?
Here is the original code to create using distplot:
import pandas as pd
import numpy as np
import seaborn as sns
df1 = pd.DataFrame({'num1':np.random.normal(loc=0.0, scale=1.0, size=100),'num2':np.random.normal(loc=0.0, scale=1.0, size=100)})
sns.distplot(df1['num1'],hist=False,color='orange',)
sns.distplot(df1['num2'],hist=False,color='blue')
Here is the code for the plot using displot
sns.displot(data = df1, x = 'num1',color='orange', kind = 'kde')
sns.displot(data = df1, x = 'num2',color='blue', kind = 'kde')
In think your are looking for kdeplot.
sns.kdeplot(data=df1, palette=['orange', 'blue'])
Without any special layout I get this result for your example.
I set the palette argument to define the colors as you did in your example, but this is optional.

How to make a distplot for each column in a pandas dataframe

I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()

How to show label names in pandas groupby histogram plot

I can plot multiple histograms in a single plot using pandas but there are few things missing:
How to give the label.
I can only plot one figure, how to change it to layout=(3,1) or something else.
Also, in figure 1, all the bins are filled with solid colors, and its kind of difficult to know which is which, how to fill then with different markers (eg. crosses,slashes,etc)?
Here is the MWE:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
df.groupby('species')['sepal_length'].hist(alpha=0.7,label='species')
plt.legend()
Output:
To change layout I can use by keyword, but can't give them colors
HOW TO GIVE DIFFERENT COLORS?
df.hist('sepal_length',by='species',layout=(3,1))
plt.tight_layout()
Gives:
You can resolve to groupby:
fig,ax = plt.subplots()
hatches = ('\\', '//', '..') # fill pattern
for (i, d),hatch in zip(df.groupby('species'), hatches):
d['sepal_length'].hist(alpha=0.7, ax=ax, label=i, hatch=hatch)
ax.legend()
Output:
In pandas version 1.1.0 you can simply set the legend keyword to true.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset('iris')
df.groupby('species')['sepal_length'].hist(alpha=0.7, legend = True)
output image
It's more code, but using pure matplotlib will always give you more control over the plots. For your second case:
import matplotlib.pyplot as plt
import numpy as np
from itertools import zip_longest
# Dictionary of color for each species
color_d = dict(zip_longest(df.species.unique(),
plt.rcParams['axes.prop_cycle'].by_key()['color']))
# Use the same bins for each
xmin = df.sepal_length.min()
xmax = df.sepal_length.max()
bins = np.linspace(xmin, xmax, 20)
# Set up correct number of subplots, space them out.
fig, ax = plt.subplots(nrows=df.species.nunique(), figsize=(4,8))
plt.subplots_adjust(hspace=0.4)
for i, (lab, gp) in enumerate(df.groupby('species')):
ax[i].hist(gp.sepal_length, ec='k', bins=bins, color=color_d[lab])
ax[i].set_title(lab)
# same xlim for each so we can see differences
ax[i].set_xlim(xmin, xmax)

Python Pandas Matplotlib Plot Colored by type value defined in single column

I have data of the following format:
import pandas as ps
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],\
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],\
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=ps.DataFrame(table,columns=['time','data','type']
I would like to plot data as a function of time connected as a line, but I would like each line to be a separate color for unique types. In this example, the result would be three lines: a data(time) line for each type a, b, and, c. Any guidance is appreciated.
I have been unable to produce a line with this data--pandas.scatter will produce a plot, while pandas.plot will not. I have been messing with loops to produce a plot for each type, but I have not found a straight forward way to do this. My data typically has an unknown number of unique 'type's. Does pandas and/or matpltlib have a way to create this type of plot?
Pandas plotting capabilities will allow you to do this if everything is indexed properly. However, sometimes it's easier to just use matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
groups = df.groupby('type')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group['time'], group['data'], label=name)
ax.legend(loc='best')
plt.show()
If you'd prefer to use the pandas plotting wrapper, you'll need to override the legend labels:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
df.index = df['time']
groups = df[['data', 'type']].groupby('type')
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
Just to throw in the seaborn solution.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, hue="type", size=5)
g.map(plt.plot, "time", "data")
g.add_legend()

Categories

Resources