Formatting style for matplotlib: scatterplot histogram hybrid - python

In an old standalone plotting package (sm) there was a style available for scatter plots which I found more appealing to the general style. It appears as each point looking almost like a histogram which stretches to the next point.
An example of a scatter plot using this style:
Matplotlib does have this style for histograms, but I'm wondering if there's a way to cheat the system to allow the style to work for scatter plots.

I think some of the confusion comes from the fact that the desired plot is not a scatter plot. It's a line plot with lines in form of a step-like function.
You may plot step functions with pyplot.step(x,y) or plot(x,y, drawstyle="steps").
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
x = np.linspace(0,1)
y = np.random.rand(len(x))
fig, ax = plt.subplots()
ax.step(x,y)
# or
# ax.plot(x,y, drawstyle="steps")
plt.show()

Related

Set axis limits across faceted plot

How can I fix the x-axis on each of the plots in the following situation? Using xlim only affects the second plot axis, not both.
import pandas as pd
import matplotlib.pyplot as plt
sample = pd.DataFrame({'mean':[1,2,3,4,5], 'median':[10,20,30,40,50]})
sample.hist()
plt.xlim(0, 100)
Bonus, what is the correct pandas terminology for the two plots here? Subplots? Facets?
The correct terminology would be subplot or axes since hist returns the matplotlib axis instances:
axes = sample.hist()
for ax in axes.ravel():
ax.set_xlim(0,100)
Output:

How to use a colored shape as yticks in matplotlib or seaborn?

I am working on a task called knowledge tracing which estimates the student mastery level over time. I would like to plot a similar figure as below using the Matplotlib or Seaborn.
It uses different colors to represent a knowledge concept, instead of a text. However, I have googled and found there is no article is talking about how we can do this.
I tried the following
# simulate a record of student mastery level
student_mastery = np.random.rand(5, 30)
df = pd.DataFrame(student_mastery)
# plot the heatmap using seaborn
marker = matplotlib.markers.MarkerStyle(marker='o', fillstyle='full')
sns_plot = sns.heatmap(df, cmap="RdYlGn", vmin=0.0, vmax=1.0)
y_limit = 5
y_labels = [marker for i in range(y_limit)]
plt.yticks(range(y_limit), y_labels)
Yet it simply returns the __repr__ of the marker, e.g., <matplotlib.markers.MarkerStyle at 0x1c5bb07860> on the yticks.
Thanks in advance!
While How can I make the xtick labels of a plot be simple drawings using matplotlib? gives you a general solution for arbitrary shapes, for the shapes shown here, it may make sense to use unicode symbols as text and colorize them according to your needs.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
fig, ax = plt.subplots()
ax.imshow(np.random.rand(3,10), cmap="Greys")
symbolsx = ["⚪", "⚪", "⚫", "⚫", "⚪", "⚫","⚪", "⚫", "⚫","⚪"]
colorsx = np.random.choice(["#3ba1ab", "#b43232", "#8ecc3a", "#893bab"], 10)
ax.set_xticks(range(len(symbolsx)))
ax.set_xticklabels(symbolsx, size=40)
for tick, color in zip(ax.get_xticklabels(), colorsx):
tick.set_color(color)
symbolsy = ["◾", "◾", "◾"]
ax.set_yticks(range(len(symbolsy)))
ax.set_yticklabels(symbolsy, size=40)
for tick, color in zip(ax.get_yticklabels(), ["crimson", "gold", "indigo"]):
tick.set_color(color)
plt.show()

How to insert a triangle/contour plot generated with GetDist into a Matplotlib subplot?

I'm doing some analysis on MCMC samples and I'm using the GetDist python package to create my contour plots. However the contour plots are only a part of the whole analysis, and I would like to show some other plots along with the contour plot in the same figure.
I'm using matplotlib to generate all my other plots, so my question is: is there any way to have a GetDist plot in a matplotlib subplot, so that I have a matplotlib figure with multiple plots and a GetDist plot in it?
I'm using GridSpec to split the figure in subplots (and also to split subplots in subsubplots).
I tried to set a particular subplot as the current axis before creating the triangle plot, and I also tried to look at the GetDist source code to find a way to pass the wanted subplot as an argument to GetDist, but with no luck.
Right now, my code looks something like this
import matplotlib.pyplot as plt
import getdist.plots
from matplotlib import gridspec
import numpy as np
import numpy.random
N = 4
# Generate mock random data
chain = np.array([numpy.random.normal(loc=(i+0.5), scale=(i+0.5), size=100000) for i in xrange(N)])
Names = [str(unichr(i+97)) for i in xrange(N)]
Labels = [str(unichr(i+97)) for i in xrange(N)]
Sample = getdist.MCSamples(samples=chain.T, names=Names, labels=Labels)
#Set up plot layout
fig = plt.figure(figsize=(10.5,12.5))
gs = gridspec.GridSpec(3, 2, width_ratios=[2,3], height_ratios=[10,0.5,4], wspace=0.2, hspace=.05)
Lplot = gridspec.GridSpecFromSubplotSpec(N, 1, subplot_spec=gs[0,0], hspace=0.)
Rplot = gridspec.GridSpecFromSubplotSpec(2,1,subplot_spec=gs[0,1], height_ratios=[4,1.265])
Dplot = plt.subplot(gs[2,0:2])
axL = [plt.subplot(Lplot[i]) for i in xrange(N)]
axR = plt.subplot(Rplot[1])
GD = plt.subplot(Rplot[0])
for i in axL+[axR]+[GD]+[Dplot]: i.set_xticks([]); i.set_yticks([])
plt.sca(GD)
# Generate triangle plot
g = getdist.plots.getSubplotPlotter()
g.triangle_plot(Sample, filled=True)
plt.savefig("outfile.pdf", bbox_inches='tight')
I would like to have my contour plot in the "GD" subplot.
Any help?

Cumulative probability plots in Matplotlib

How would I make a plot of this style in python with matplotlib? (Cumulative probability plot) I don't need complete code, mostly just need a place to start and a general idea of what I need to do for it.
A cumulative probability plot is really easy to make:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
fig,ax = plt.subplots()
ax.plot(np.sort(data),np.linspace(0.0,1.0,len(data)))
plt.xlabel(r'$x$')
plt.ylabel(r'$P(X \leq x)$')
plt.show()
Note that it can have a strong advantage over a probability density plot as it does not require binning of your data. (Should you be looking for the latter you can check this code).

matplotlib: plotting histogram plot just above scatter plot

I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
Code:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")
ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')
ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')
plt.show()
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:

Categories

Resources