I'm looking into outliers detection. Brendan Gregg has a really nice article and I'm especially intrigued by his visualizations. One of the methods he uses are frequency trails.
I'm trying to reproduce this in matplotlib using this example. Which looks like this:
And the plot is based on this answer: https://stackoverflow.com/a/4152016/948369
Now my issue is, like described by Brendan, that I have a continuous line that masks the outlier (I simplified the input values so you can still see them):
Any help on making the line "non-continuous" for non existent values?
Seaborn also provides a very neat example:
They call it a joy/ridge plot however: https://seaborn.pydata.org/examples/kde_ridgeplot.html
#!/usr/bin/python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, size=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play will with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
I would stick with a flat 2D plot and displace each level by a set vertical amount. You'll have to play the the levels (in the code below I called it displace) to properly see the outliers, but this does a pretty good job at replicating your target image. The key, I think, is to set the "zero" values to None so pylab does not draw them.
import numpy as np
import pylab as plt
import itertools
k = 20
X = np.linspace(0, 20, 500)
Y = np.zeros((k,X.size))
# Add some fake data
MU = np.random.random(k)
for n in xrange(k):
Y[n] += np.exp(-(X-MU[n]*n)**2 / (1+n/3))
Y *= 50
# Add some outliers for show
Y += 2*np.random.random(Y.shape)
displace = Y.max()/4
# Add a cutoff
Y[Y<1.0] = None
face_colors = itertools.cycle(["#D3D820", "#C9CC54",
"#D7DA66", "#FDFE42"])
fig = plt.figure()
ax = fig.add_subplot(111, axisbg='black')
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
for n,y in enumerate(Y):
# Vertically displace each plot
y0 = np.ones(y.shape) * n * displace
y1 = y + n*displace
plt.fill_between(X, y0,y1,lw=1,
facecolor=face_colors.next(),
zorder=len(Y)-n)
plt.show()
Related
When I create a plot with many curves it would be convenient to be able to label each curve at the right where it ends.
The result of plt.legend produces too many similar colors and the legend is overlapping the plot.
As one can see in the example below the use of plt.legend is not very effective:
import numpy as np
from matplotlib import pyplot as plt
n=10
x = np.linspace(0,1, n)
for i in range(n):
y = np.linspace(x[i],x[i], n)
plt.plot(x, y, label=str(i))
plt.legend(loc='upper right')
plt.show()
If possible I would like to have something similar to this plot:
or this:
I would recommend the answer suggested in the comments, but another method that gives something similar to your first option (albeit without the exact placement of the legend markers matching the positions of the associated lines) is:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
n=10
x = np.linspace(0, 1, n)
labels = [str(i) for i in range(len(x))]
for i in range(n):
y = np.linspace(x[i], x[i], n)
ax.plot(x, y, label=labels[i])
h, _ = ax.get_legend_handles_labels()
# sort the legend handles/labels so they are in the same order as the data
hls = sorted(zip(x, h, labels), reverse=True)
ax.legend(
[ha[1] for ha in hls], # get handles
[la[2] for la in hls], # get labels
bbox_to_anchor=(1.04, 0, 0.1, 1), # set box outside of axes
loc="lower left",
labelspacing=1.6, # add space between labels
)
leg = ax.get_legend()
# expand the border of the legend
fontsize = fig.canvas.get_renderer().points_to_pixels(leg._fontsize)
pad = 2 * (leg.borderaxespad + leg.borderpad) * fontsize
leg._legend_box.set_height(leg.get_bbox_to_anchor().height - pad)
This is heavily reliant on the answers here and here.
Please forgive the crude explanation but I'm unsure how to describe the issue and as they say, a picture says a thousand words, so what I am trying to achieve is to draw a graph in matplotlib that looks like the below:
whereby the scale of the color range is the same across all bars as the x limits of the x-axis.
The closest I have got to so far is this (please ignore the fact it's not horizontal - I was planning on editing that once I had figured out the coloring):
fig, ax = plt.subplots()
mpl.pyplot.viridis()
bars = ax.bar(df['Profile'], df['noise_result'])
grad = np.atleast_2d(np.linspace(0,1,256)).T
ax = bars[0].axes
lim = ax.get_xlim()+ax.get_ylim()
for bar in bars:
bar.set_zorder(1)
bar.set_facecolor('none')
x,y = bar.get_xy()
w, h = bar.get_width(), bar.get_height()
ax.imshow(grad, extent=[x,x+w,y,y+h], aspect='auto', zorder=1,interpolation='nearest')
ax.axis(lim)
which only results in a graph like below:
Many thanks
I'm going along with your approach. The idea is to:
choose an appropriate colormap
create a normalizer for the bar values.
create a mappable which is going to map the normalized values to the colormap in order to create a colorbar.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from matplotlib.colors import Normalize
import pandas as pd
import numpy as np
df = pd.DataFrame({'key':['A', 'B', 'C', 'D', 'E'], 'val':[100, 20, 70, 40, 100]})
# create a normalizer
norm = Normalize(vmin=df['val'].min(), vmax=df['val'].max())
# choose a colormap
cmap = cm.plasma
# map values to a colorbar
mappable = cm.ScalarMappable(norm=norm, cmap=cmap)
mappable.set_array(df['val'])
fig, ax = plt.subplots()
bars = ax.bar(df['key'], df['val'])
ax = bars[0].axes
lim = ax.get_xlim()+ax.get_ylim()
for bar, val in zip(bars, df['val']):
grad = np.atleast_2d(np.linspace(0,val,256)).T
bar.set_zorder(1)
bar.set_facecolor('none')
x, y = bar.get_xy()
w, h = bar.get_width(), bar.get_height()
ax.imshow(np.flip(grad), extent=[x,x+w,y,y+h], aspect='auto', zorder=1,interpolation='nearest', cmap=cmap, norm=norm)
ax.axis(lim)
cb = fig.colorbar(mappable)
cb.set_label("Values")
Using what you have, you could change line 12 to:
ax.imshow(grad, extent=[x,x+w,y,y+h], aspect='auto', zorder=1, cmap = plt.get_cmap('gist_heat_r'))
or some other color map from:
https://matplotlib.org/stable/tutorials/colors/colormaps.html
You could also change line 3 to start as:
bars = ax.barh
for horizontal bars.
When I run the following lines, I get a plot with a large space at the top and the bottom with no bars.
How can I remove this extra space?
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib.transforms import Affine2D
random.seed(1)
df = pd.DataFrame(np.random.randn(50, 1), columns=["parameter"])
df["standard_error"]= ((df.parameter**2)**0.5)/2
name = "plot"
x = ["A"+str(x) for x in df.index.tolist()]
y1 = df.parameter
yerr1 = df.standard_error
fig, ax = plt.subplots()
fig.set_figheight(len(x))
plt.rc('axes', labelsize=22)
plt.grid(b=True, which='major', color='#666666', linestyle='-', alpha=0.2)
trans1 = Affine2D().translate(-0.1, 0.0) + ax.transData
trans2 = Affine2D().translate(+0.1, 0.0) + ax.transData
er1 = ax.errorbar(y1, x, xerr=yerr1, marker="o", linestyle="none", transform=trans1)
ax.axvline(x=0, color="black")
plt.savefig(name + '.png', bbox_inches='tight')
If you mean the extra space below and above your smallest and largest data points along the y-axis then you can simply use plt.ylim, e.g:
plt.ylim(0, 50)
Which will change the extent of the y-axis to the range 0 - 50. Similarly for the x-axis there's plt.xlim
I want to add a y-axis label to a density ridgeline plot using seaborn in python. To make the ridgeline plot, I am following the code from the seaborn gallery. For convenience, I have copied their code snippet below. How should I modify this to label the y-axis in a manner that does not overlap the density curves?
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
To be clear, I'd like the plot to look something like:
You can add the general label like a text use the follow code line g.fig.text(0.04, 0.5, 'Y axis label', va='center', rotation='vertical') and you will obtain the follow:
I think replacing g.fig.subplots_adjust(hspace=-.25) with g.fig.subplots_adjust(hspace=.1) should do the trick.
I would like to add a sample y-axis tick on the right side of the Ridge plot, to know what is the range of values of all the plots. Preferably I would like to add it only to one of the subplots and not to all of them.
My plot is based on the seaborn 'ridge plot' example at: https://seaborn.pydata.org/examples/kde_ridgeplot.html
I've tried the following code with no luck:
g.set(yticks=[0,200])
g.set_y_label_position("right")
g.set_ylabels('[Range]',fontsize=9,fontweight="normal")
If you want to modify one particular axes from a FacetGrid, you can get a reference from the list g.axes
Here is how I would go about it
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw=.2)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
#
# Changes from seaborn example below this point
#
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25, right=0.9)
# Remove axes details that don't play well with overlap
g.set_titles("")
#g.set(yticks=[])
g.despine(bottom=True, left=True, right=False, top=True, offset=5)
for ax in g.axes.ravel():
if ax.is_first_row(): # can use .is_last_row() to show spine on the bottom plot instead
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
ax.set_ylabel("MW")
else:
ax.spines['right'].set_visible(False)
[l.set_visible(False) for l in ax.get_yticklabels()] # necessary because y-axes are shared