I have a heatmap in Seaborn via sns.heatmap. I now want to white out the bottom row and right column but keep the values.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
np.random.seed(2021)
df = pd.DataFrame(np.random.normal(0, 1, (6, 4))
df = df.rename(columns = {0:"a", 1:"b", 2:"c", 3:"d"})
df.index = [value for key, value in {0:"a", 1:"b", 2:"c", 3:"d", 4:"e", 5:"f"}.items()]
sns.heatmat(df, annot = True)
plt.show
I thought I had to include a mask argument in my sns.heatmap call, but I am not having success giving a proper mask, and the mask removes the annotation. I also need to preserve the text indices of my data frame d. How can I get those cells whited out while preserving the text indices?
Here is an approach:
use the original data for annotation (annot=data)
create a "norm" using the original data, to be used for coloring
create a copy of the colormap and assign an "over" color as "white"
create a copy of the data, and fill the right column and lower row with a value higher than the maximum of the original data (np.inf can't be used, because then no annotation will be placed); use this copy for the coloring; seaborn will magically use the appropriate color for the annotation
to use the dataframe's column and index names in the heatmap, just use sns.heatmap(..., xticklabels=df.columns, yticklabels=df.index)
if you don't have a recent seaborn version installed, you might consider using one of matplotlib's standard colormaps, or create one via matplotlib.colors.ListedColormap(). Maybe cmap = ListedColormap(sns.color_palette('rocket', 256))?
In example code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from copy import copy
np.random.seed(2021)
df = pd.DataFrame(np.random.normal(0, 1, (6, 4)), columns=[*"abcd"], index=[*"abcdef"])
data = df.to_numpy()
data_for_colors = data.copy()
data_for_colors[:, -1] = data.max() + 10
data_for_colors[-1, :] = data.max() + 10
norm = plt.Normalize(data[:-1, :-1].min(), data[:-1, :-1].max())
# cmap = sns.color_palette('rocket', as_cmap=True).copy()
cmap = copy(plt.get_cmap('RdYlGn'))
cmap.set_over('white')
sns.set_style('white')
sns.heatmap(data=data_for_colors, xticklabels=df.columns, yticklabels=df.index,
annot=data, cmap=cmap, norm=norm)
plt.show()
Related
I have two pandas series of numbers (not necessarily in the same size).
Can I create one side by side box plot for both of the series?
I didn't found a way to create a boxplot from a series, and not from 2 series.
For the test I generated 2 Series, of different size:
np.random.seed(0)
s1 = pd.Series(np.random.randn(10))
s2 = pd.Series(np.random.randn(14))
The first processing step is to concatenate them into a single DataFrame
and set some meaningful column names (will be included in the picture):
df = pd.concat([s1, s2], axis=1)
df.columns = ['A', 'B']
And to create the picture, along with a title, you can run:
ax = df.boxplot()
ax.get_figure().suptitle(t='My Boxplot', fontsize=16);
For my source data, the result is:
We can try with an example dataset, two series, unequal length, and defined colors.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(100)
S1 = pd.Series(np.random.normal(0,1,10))
S2 = pd.Series(np.random.normal(0,1,14))
colors = ['#aacfcf', '#d291bc']
One option is to make a data.frame containing the two series in a column, and provide a label for the series:
fig, ax = plt.subplots(1, 1,figsize=(6,4))
import seaborn as sns
sns.boxplot(x='series',y='values',
data=pd.DataFrame({'values':pd.concat([S1,S2],axis=0),
'series':np.repeat(["S1","S2"],[len(S1),len(S2)])}),
ax = ax,palette=colors,width=0.5
)
The other, is to use matplotlib directly, as the other solutions have suggested. However, there is no need to concat them column wise and create some amounts of NAs. You can directly use plt.boxplot from matplotlib to plot an array of values. The downside is, that it takes a bit of effort to adjust the colors etc, as I show below:
fig, ax = plt.subplots(1, 1,figsize=(6,4))
bplot = ax.boxplot([S1,S2],patch_artist=True,widths=0.5,
medianprops=dict(color="black"),labels =['S1','S2'])
plt.setp(bplot['boxes'], color='black')
for patch, color in zip(bplot['boxes'], colors):
patch.set_facecolor(color)
Try this:
import numpy as np
ser1 = pd.Series(np.random.randn(10))
ser2 = pd.Series(np.random.randn(10))
## solution
pd.concat([ser1, ser2], axis=1).plot.box()
I am plotting a seaborn heatmap and would like to annotate only the specific cells with custom text.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
labels = StringIO(u'''7,8,4,,1
5,2,,2,8
1,,6,,7
3,1,,4,7''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)
fig, ax = plt.subplots()
sns.heatmap(data, annot=labels, ax=ax, vmin=0, vmax=100)
plt.show()
The above code generates the following heatmap:
and the commented line generates the following heatmap:
I would like to show only the non-nan (or non-zero) text on the cells. How can that be achieved?
Use a string array for annot instead of a masked array:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
labels = StringIO(u'''7,8,4,,1
5,2,,2,8
1,,6,,7
3,1,,4,7''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
labels = pd.read_csv(labels, header=None)
#labels = np.ma.masked_invalid(labels)
# Convert everything to strings:
annotations = labels.astype(str)
annotations[np.isnan(labels)] = ""
fig, ax = plt.subplots()
sns.heatmap(data, annot=annotations, fmt="s", ax=ax, vmin=0, vmax=100)
plt.show()
To complement the answer by #mrzo, you can use na_filter=False in read_csv() to store nans as empty strings and use pandas.DataFrame.astype() to convert to strings in place:
# ...
labels = pd.read_csv(labels, header=None, na_filter=False).astype(str)
sns.heatmap(data, annot=labels, fmt='s', ax=ax, vmin=0, vmax=100)
Just going to add this as it has taken me some time to work out how to do something similar programmatically for a slightly different application: I wanted to suppress 0-values from the annotation, but because the values were arising as the result of a crosstab operation I couldn't use William Miller's nice approach without writing the crosstab out and then reading it back in which seemed... inelegant.
There may be a yet more elegant way to do this, but for me running it through numpy was ridiculously fast and quite easy.
import numpy as np
import pandas as pd
import seaborn as sns
from io import StringIO
data = StringIO(u'''75,83,41,47,19
51,24,100,0,58
12,94,63,91,7
34,13,86,41,77''')
data = pd.read_csv(data, header=None)
data = data.apply(pd.to_numeric)
# For more complex functions you could write a def instead
# of using this simple lambda function
an = np.vectorize(lambda x: '' if x<50 else str(round(x,-1)))(data.to_numpy())
sns.heatmap(
data=data.to_numpy(), # Note this is now numpy too
cmap='BuPu',
annot=an, # The matching ndarray of annotations
fmt = '', # Formats annotations as strings (i.e. no formatting)
cbar=False, # Seems overkill if you've got annotations
vmin=0,
vmax=data.max().max()
)
This can make life a little more difficult in terms of labelling axes, though it's straightforward enough: ax.set_xticklabels(df.columns.values). And if you had axislabels in, say, the first column then you'd need to use iloc (data.iloc[:,1:]) in your to_numpy call, but combined with a custom colormap (e.g. 0==white) you can create heatmaps that are a lot easier to look at.
Obviously the crude rounding is confusing (why does 80 have different shades?) but you get the point:
I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()
I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)
tmpdf.boxplot(['original','new'], by = 'by column', ax = ax, sym = '')
gets me a plot like this
I want to compare "original" with "new", how can I arrange to put the two "0" boxes in one panel and the two "1" boxes in another panel? And of course swap the labelling with that.
Thanks
Here is a sample dataset to demonstrate.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# simulate some artificial data
# ==========================================
np.random.seed(0)
df = pd.DataFrame(np.random.rand(10,2), columns=['original', 'new'] )
df['by column'] = pd.Series([0,0,0,0,1,1,1,1,1,1])
# your original plot
ax = df.boxplot(['original', 'new'], by='by column', figsize=(12,6))
To get desired output, use groupby explicitly out of boxplot, so that we iterate over all subgroups, and plot a boxplot for each.
ax = df[['original', 'new']].groupby(df['by column']).boxplot(figsize=(12,6))