I am trying to create a clustered heatmap (with a dendrogram) using plotly in Python. The one they have made in their website does not scale well, I have come to various solutions, but most of them are in R or JavaScript. I am trying to create a heatmap with a dendrogram from the left side of the heatmap only, showing clusters across the y axis (from the hierarchical clustering). A really good looking example is this one: https://chart-studio.plotly.com/~jackp/6748. My purpose is to create something like this, but only with the left-side dendrogram. If someone can implement something like this in Python, I will be really grateful!
Let the data be X = np.random.randint(0, 10, size=(120, 10))
The following suggestion draws on elements from both Dendrograms in Python and chart-studio.plotly.com/~jackp. This particular plot uses your data X = np.random.randint(0, 10, size=(120, 10)). One thing that the linked approaches had in common, was, in my opinion, that the datasets and data munging procedures were a bit messy. So I decided to build the following figure on a pandas dataframe with df = pd.DataFrame(X) to hopefully make everything a bit clearer
Plot
Complete code
import plotly.graph_objects as go
import plotly.figure_factory as ff
import numpy as np
import pandas as pd
from scipy.spatial.distance import pdist, squareform
import random
import string
X = np.random.randint(0, 10, size=(120, 10))
df = pd.DataFrame(X)
# Initialize figure by creating upper dendrogram
fig = ff.create_dendrogram(df.values, orientation='bottom')
fig.for_each_trace(lambda trace: trace.update(visible=False))
for i in range(len(fig['data'])):
fig['data'][i]['yaxis'] = 'y2'
# Create Side Dendrogram
# dendro_side = ff.create_dendrogram(X, orientation='right', labels = labels)
dendro_side = ff.create_dendrogram(X, orientation='right')
for i in range(len(dendro_side['data'])):
dendro_side['data'][i]['xaxis'] = 'x2'
# Add Side Dendrogram Data to Figure
for data in dendro_side['data']:
fig.add_trace(data)
# Create Heatmap
dendro_leaves = dendro_side['layout']['yaxis']['ticktext']
dendro_leaves = list(map(int, dendro_leaves))
data_dist = pdist(df.values)
heat_data = squareform(data_dist)
heat_data = heat_data[dendro_leaves,:]
heat_data = heat_data[:,dendro_leaves]
heatmap = [
go.Heatmap(
x = dendro_leaves,
y = dendro_leaves,
z = heat_data,
colorscale = 'Blues'
)
]
heatmap[0]['x'] = fig['layout']['xaxis']['tickvals']
heatmap[0]['y'] = dendro_side['layout']['yaxis']['tickvals']
# Add Heatmap Data to Figure
for data in heatmap:
fig.add_trace(data)
# Edit Layout
fig.update_layout({'width':800, 'height':800,
'showlegend':False, 'hovermode': 'closest',
})
# Edit xaxis
fig.update_layout(xaxis={'domain': [.15, 1],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'ticks':""})
# Edit xaxis2
fig.update_layout(xaxis2={'domain': [0, .15],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks':""})
# Edit yaxis
fig.update_layout(yaxis={'domain': [0, 1],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks': ""
})
# # Edit yaxis2
fig.update_layout(yaxis2={'domain':[.825, .975],
'mirror': False,
'showgrid': False,
'showline': False,
'zeroline': False,
'showticklabels': False,
'ticks':""})
fig.update_layout(paper_bgcolor="rgba(0,0,0,0)",
plot_bgcolor="rgba(0,0,0,0)",
xaxis_tickfont = dict(color = 'rgba(0,0,0,0)'))
fig.show()
The simplest solution to this problem is to use dash_bio.Clustergram function in dash_bio package.
import pandas as pd
import dash_bio as dashbio
X = np.random.randint(0, 10, size=(120, 10))
dashbio.Clustergram(
data=X,
# row_labels=rows,
# column_labels=columns,
cluster='row',
color_threshold={
'row': 250,
'col': 700
},
height=800,
width=700,
color_map= [
[0.0, '#636EFA'],
[0.25, '#AB63FA'],
[0.5, '#FFFFFF'],
[0.75, '#E763FA'],
[1.0, '#EF553B']
]
)
An more laborious solution is to use the plot function plotly.figure_factory.create_dendrogram combined with plotly.graph_objects.Heatmap as in plotly document
the example is not a dendrogram heat map but rather a pair wised distance heat map, you can use the two function to create dendrogram heat map though.
can also use seabornes clustermap
https://seaborn.pydata.org/generated/seaborn.clustermap.html
Related
My Python code is:
values = [234, 64, 54,10, 0, 1, 0, 9, 2, 1, 7, 7]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul','Aug','Sep','Oct', 'Nov','Dec']
colors = ['yellowgreen', 'red', 'gold', 'lightskyblue',
'white','lightcoral','blue','pink', 'darkgreen',
'yellow','grey','violet','magenta','cyan']
plt.pie(values, labels=labels, autopct='%1.1f%%', shadow=True,
colors=colors, startangle=90, radius=1.2)
plt.show()
Is it possible to show the labels "Jan", "Feb", "Mar", etc. and the percentages, either:
without overlapping, or
using an arrow mark?
Alternatively you can put the legends beside the pie graph:
import matplotlib.pyplot as plt
import numpy as np
x = np.char.array(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct', 'Nov','Dec'])
y = np.array([234, 64, 54,10, 0, 1, 0, 9, 2, 1, 7, 7])
colors = ['yellowgreen','red','gold','lightskyblue','white','lightcoral','blue','pink', 'darkgreen','yellow','grey','violet','magenta','cyan']
porcent = 100.*y/y.sum()
patches, texts = plt.pie(y, colors=colors, startangle=90, radius=1.2)
labels = ['{0} - {1:1.2f} %'.format(i,j) for i,j in zip(x, porcent)]
sort_legend = True
if sort_legend:
patches, labels, dummy = zip(*sorted(zip(patches, labels, y),
key=lambda x: x[2],
reverse=True))
plt.legend(patches, labels, loc='left center', bbox_to_anchor=(-0.1, 1.),
fontsize=8)
plt.savefig('piechart.png', bbox_inches='tight')
EDIT: if you want to keep the legend in the original order, as you mentioned in the comments, you can set sort_legend=False in the code above, giving:
If anyone just wants to offset the labels automatically, and not use a legend, I wrote this function that does it (yup I'm a real try-hard). It uses numpy but could easily be re-written in pure python.
import numpy as np
def fix_labels(mylabels, tooclose=0.1, sepfactor=2):
vecs = np.zeros((len(mylabels), len(mylabels), 2))
dists = np.zeros((len(mylabels), len(mylabels)))
for i in range(0, len(mylabels)-1):
for j in range(i+1, len(mylabels)):
a = np.array(mylabels[i].get_position())
b = np.array(mylabels[j].get_position())
dists[i,j] = np.linalg.norm(a-b)
vecs[i,j,:] = a-b
if dists[i,j] < tooclose:
mylabels[i].set_x(a[0] + sepfactor*vecs[i,j,0])
mylabels[i].set_y(a[1] + sepfactor*vecs[i,j,1])
mylabels[j].set_x(b[0] - sepfactor*vecs[i,j,0])
mylabels[j].set_y(b[1] - sepfactor*vecs[i,j,1])
So use it like:
wedges, labels, autopct = ax1.pie(sizes, labels=groups, autopct='%1.1f%%',
shadow=False, startangle=90)
fix_labels(autopct, sepfactor=3)
fix_labels(labels, sepfactor=2)
This works well as-written if you only have a few labels overlapping. If you have a whole bunch like OP, you might want to add a random direction vector to the vecs[i,j,:] = a-b line. That would probably work well.
Try tightlayout.
plt.tight_layout()
at the end of your code. It may prevent the overlap a little bit.
First of all; avoid pie charts whenever you can!
Secondly, have a think about how objects work in python. I believe this example should be self-explaining, however, you obviously don't need to move labels manually.
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
ax.axis('equal')
patches, texts, autotexts = ax.pie([12,6,2,3],
labels=['A', 'B', 'C', 'no data'],
autopct='%1.1f%%',
pctdistance=0.5,
labeldistance=1.1)
# Move a label
texts[1]._x =-0.5
texts[1]._y =+0.5
# E.g. change some formatting
texts[-1]._color = 'blue'
There are some options to modify the labels:
# Check all options
print(texts[0].__dict__)
returns
{'_stale': False,
'stale_callback': <function matplotlib.artist._stale_axes_callback(self, val)>,
'_axes': <AxesSubplot:>,
'figure': <Figure size 432x288 with 1 Axes>,
'_transform': <matplotlib.transforms.CompositeGenericTransform at 0x7fe09bedf210>,
'_transformSet': True,
'_visible': True,
'_animated': False,
'_alpha': None,
'clipbox': <matplotlib.transforms.TransformedBbox at 0x7fe065d3dd50>,
'_clippath': None,
'_clipon': False,
'_label': '',
'_picker': None,
'_contains': None,
'_rasterized': None,
'_agg_filter': None,
'_mouseover': False,
'eventson': False,
'_oid': 0,
'_propobservers': {},
'_remove_method': <function list.remove(value, /)>,
'_url': None,
'_gid': None,
'_snap': None,
'_sketch': None,
'_path_effects': [],
'_sticky_edges': _XYPair(x=[], y=[]),
'_in_layout': True,
'_x': -0.07506663683168735,
'_y': 1.097435647331897,
'_text': 'A',
'_color': 'black',
'_fontproperties': <matplotlib.font_manager.FontProperties at 0x7fe065d3db90>,
'_usetex': False,
'_wrap': False,
'_verticalalignment': 'center',
'_horizontalalignment': 'right',
'_multialignment': None,
'_rotation': 'horizontal',
'_bbox_patch': None,
'_renderer': <matplotlib.backends.backend_agg.RendererAgg at 0x7fe08b01fd90>,
'_linespacing': 1.2,
'_rotation_mode': None}
Trying to add image and price label and add more space on time and it seems like ylim= takes care of that but when i add it my whole graph disappears.
market_colors = mpf.make_marketcolors(
base_mpf_style="charles"
)
rc = {
"axes.labelcolor": "none",
"axes.spines.bottom": True,
"axes.spines.left": False,
"axes.spines.right": False,
"axes.spines.top": False,
"font.size": 10,
}
styles = mpf.make_mpf_style(
base_mpf_style="nightclouds",
marketcolors=market_colors,
gridstyle="",
rc=rc
)
filledShape = {
"y1": df['Close'].values,
"facecolor": "#2279e4"
}
(mpf.plot(df, type='line',
title='Test',
linecolor='white',
style=styles,
volume=True,
figsize=(8, 6),
figscale=0.5,
fill_between=filledShape, tight_layout=True,
scale_padding={'left': 1, 'top': 5, 'right': 1, 'bottom': 2}
))
There are three techniques that I know of to display an image on a matplotlib plot:
Axes.imshow()
Figure.figimage()
Putting the image in an AnnotationBbox
In terms of working with mplfinance, I would say that technique one, calling Axes.imshow() is probably simplest:
Step 1:
For all three of the above techniques, when you call mpf.plot() set kwarg returnfig=True:
fig axlist = mpf.plot(df,...,returnfig=True)
This will give you access to the mplfinance Figure and Axes objects.
Step 2:
Now create a new Axes object where you want the image/logo:
# Note: [left,bottom,width,height] are in terms of fraction of the Figure.
# For example [0.05,0.08,0.10,0.06] means:
# the lower/left corner of the Axes will be located:
# 5% of the way in from the left
# 8% down from the top,
# and the Axes will be
# 10% of the Figure wide and
# 6% of the Figure high.
logo_axes = fig.add_axes([left,bottom,width,height])
Step 3:
Read in the image:
import Image
im = Image.open('image_file_name.png')
Step 4:
Call imshow() on the newly created Axes, and turn of the axis lines:
logo_axes.imshow(im)
logo_axes.axis('off')
Step 5:
Since returnfig=True causes mplfinance to not show the Figure, call mpf.show()
mpf.show()
I'm not sure if this answer will help you or not since I'm not sure what kind of images you want to add. I assume you want to add a corporate logo or something like that, so I did some research and found an answer to whether you can add a watermark to an mpf. I used this answer as a guide and added the icons used on stackoveflow.com to the graph. However, it was not possible to add them to the axes, so I had to add them to the fig. I have changed the style to add the image.
img = plt.imread('./data/se-icon.png')
market_colors = mpf.make_marketcolors(
base_mpf_style="charles"
)
rc = {
"axes.labelcolor": "none",
"axes.spines.bottom": True,
"axes.spines.left": False,
"axes.spines.right": False,
"axes.spines.top": False,
"font.size": 10,
}
styles = mpf.make_mpf_style(
base_mpf_style="yahoo",# nightclouds
marketcolors=market_colors,
gridstyle="",
rc=rc
)
filledShape = {
"y1": df['Close'].values,
"facecolor": "#2279e4"
}
fig, axes = mpf.plot(df, type='line',
title='Test',
linecolor='white',
style=styles,
volume=True,
figsize=(8, 6),
figscale=0.5,
fill_between=filledShape,
tight_layout=True,
scale_padding={'left': 1, 'top': 5, 'right': 1, 'bottom': 2},
returnfig=True
)
#axes[0].imshow(img)
#height = img.shape[1]
fig.figimage(img, 0, fig.bbox.ymax - height*1.5)
plt.show()
I have performed outlier detection on some entrance sensor data for a shopping mall. I want create one plot for each entrance and highlight the observations that are outliers (which are marked by True in the outlier column in the dataframe).
Here is a small snippet of the data for two entrances and a time span of six days:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({"date": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6],
"mall": ["Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1"],
"entrance": ["West", "West","West","West","West", "West", "East", "East", "East", "East", "East", "East"],
"in": [132, 140, 163, 142, 133, 150, 240, 250, 233, 234, 2000, 222],
"outlier": [False, False, False, False, False, False, False, False, False, False, True, False]})
In order to create several plots (there are twenty entrances in the full data), I have come across lmplot in seaborn.
sns.set_theme(style="darkgrid")
for i, group in df.groupby('entrance'):
sns.lmplot(x="date", y="in", data=group, fit_reg=False, hue = "entrance")
#pseudo code
#for the rows that have an outlier (outlier = True) create a red dot for that observation
plt.show()
There are two things I would like to accomplish here:
Lineplot instead of scatterplot. I have not been successful in using sns.lineplot for creating separate plots for each entrance, as it seems lmplot is more fit for this.
For each entrance plot, I would like show which of the observations that are outliers, preferably as a red dot. I have tried writing some pseudo code in my plotting attempts.
seaborn.lmplot is a Facetgrid, which I think is more difficult to use, in this case.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
for i, group in df.groupby(['entrance']):
# plot all the values as a lineplot
sns.lineplot(x="date", y="in", data=group)
# select the data when outlier is True and plot it
data_t = group[group.outlier == True]
sns.scatterplot(x="date", y="in", data=data_t, c=['r'])
# add a title using the value from the groupby
plt.title(f'Entrance: {i}')
# show the plot here, not outside the loop
plt.show()
Alternate option
This option will allow for setting the number of columns and rows of a figure
import math
# specify the number of columns to plot
ncols = 2
# determine the number of rows, even if there's an odd number of unique entrances
nrows = math.ceil(len(df.entrance.unique()) / ncols)
fig, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(16, 16))
# extract the axes into an nx1 array, which is easier to index with idx.
axes = axes.ravel()
for idx, (i, group) in enumerate(df.groupby(['entrance'])):
# plot all the values as a lineplot
sns.lineplot(x="date", y="in", data=group, ax=axes[idx])
# select the data when outlier is True and plot it
data_t = group[group.outlier == True]
sns.scatterplot(x="date", y="in", data=data_t, c=['r'], ax=axes[idx])
axes[idx].set_title(f'Entrance: {i}')
I am recently exploring Plotly and I wonder if there is a way for sharing a plot and let the viewer switch between a logarithmic axis and linear axis.
Any suggestion?
Plotly has a dropdown feature which allows the user to dynamically update the plot styling and/or the traces being displayed. Below is a minimal working example of a plot where the user can switch between a logarithmic and linear scale.
import plotly
import plotly.graph_objs as go
x = [1, 2, 3]
y = [1000, 10000, 100000]
y2 = [5000, 10000, 90000]
trace1 = go.Bar(x=x, y=y, name='trace1')
trace2 = go.Bar(x=x, y=y2, name='trace2', visible=False)
data = [trace1, trace2]
updatemenus = list([
dict(active=1,
buttons=list([
dict(label='Log Scale',
method='update',
args=[{'visible': [True, True]},
{'title': 'Log scale',
'yaxis': {'type': 'log'}}]),
dict(label='Linear Scale',
method='update',
args=[{'visible': [True, False]},
{'title': 'Linear scale',
'yaxis': {'type': 'linear'}}])
]),
)
])
layout = dict(updatemenus=updatemenus, title='Linear scale')
fig = go.Figure(data=data, layout=layout)
plotly.offline.iplot(fig)
I added two traces to the data list to show how traces can also be added or removed from a plot. This can be controlled by the visible list in updatemenus for each button.
I need to generate a scatter plot, where multiple categorical variables can be represented by color and shape of the scatters. For example,
df = pd.DataFrame({'animals':pd.Series(['tiger','cheetah','lion','giraffe','elephant','gorilla']),
'weight': pd.Series([200,120,240,240,400,300]),
'meal': pd.Series([20,10,40,15,40,30]),
'region': pd.Categorical(["Asian","American","African","African",'Asian','American']),
'gender': pd.Categorical(["female","female","male","female",'male','male']),
'group': pd.Series([True, False, False, True, True, True])})
sns.lmplot(data = df, x = 'weight', y = 'meal', hue = 'region', palette = 'Dark2',
fit_reg = False, scatter_kws={"s": 20}, size = 4)
This will create a plot, where the region is represented by different colors.
But I also need to represent the gender, in different shapes. So the color shows the region and the shape shows the gender.
Is there a way to achieve this? thanks for any suggestion!
enter image description here
You need to simply pass a list of markers in your call to sns.lmplot:
import seaborn as sns
df = pd.DataFrame({'animals':pd.Series(['tiger','cheetah','lion','giraffe','elephant','gorilla']),
'weight': pd.Series([200,120,240,240,400,300]),
'meal': pd.Series([20,10,40,15,40,30]),
'region': pd.Categorical(["Asian","American","African","African",'Asian','American']),
'gender': pd.Categorical(["female","female","male","female",'male','male']),
'group': pd.Series([True, False, False, True, True, True])})
sns.lmplot(data = df, x = 'weight', y = 'meal', hue = 'region', palette = 'Dark2',
fit_reg = False, scatter_kws={"s": 20}, size = 4, markers=["o","+","x"])
plt.show()
Which gives: