Getting legend in seaborn jointplot - python

I'm interested in using the seaborn joint plot for visualizing correlation between two numpy arrays. I like the visual distinction that the kind='hex' parameter gives, but I would also like to know the actual count that different shades correspond to. Does anyone know how to put this legend on the side or even on the plot? I tried looking at the documentation and couldn't find it.
Thanks!

EDIT: updated to work with new Seaborn ver.
You need to do it manually by making a new axis with add_axes and then pass the name of the ax to plt.colorbar().
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.normal(0.0, 1.0, 1000)
y = np.random.normal(0.0, 1.0, 1000)
hexplot = sns.jointplot(x, y, kind="hex")
plt.subplots_adjust(left=0.2, right=0.8, top=0.8, bottom=0.2) # shrink fig so cbar is visible
# make new ax object for the cbar
cbar_ax = hexplot.fig.add_axes([.85, .25, .05, .4]) # x, y, width, height
plt.colorbar(cax=cbar_ax)
plt.show()
Sources: I almost gave up after I read a dev say that the
"work/benefit ratio [to implement colorbars] is too high"
but then I eventually found this solution in another issue.

The following has worked for me:
t1 = sns.jointplot(data=df, x="originalestimate_hours", y="working_hours_per_day_created_target", hue="status")
t1.ax_joint.legend_._visible=False
t1.fig.legend(bbox_to_anchor=(1, 1), loc=2)

Related

How to draw a 3D grid using matplotlib based on three columns of data?

I'm facing a problem with making a 3D plot. I want to build a 3D surface plot like below from three columns of data.
Expected graphic case
I have implemented a few currently, as shown below.
Current picture case
But I still don't know how to make it "grid" like the first picture? Does anyone know how to achieve this? Part of the code and full data are as follows.
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import os
import warnings
from mpl_toolkits.mplot3d import Axes3D
warnings.filterwarnings('ignore')
os.chdir(r"E:\SoftwareFile\stataFile")
matplotlib.use('TkAgg')
plt.figure(figsize=(10,6))
data = pd.read_stata(r"E:\SoftwareFile\stataFile\demo.dta")
ax = plt.axes(projection="3d")
ax.plot_trisurf(data["age"], data["weight"], data["pr_highbp"],
cmap=plt.cm.Spectral_r)
ax.set_xticks(np.arange(20, 90, step=10))
ax.set_yticks(np.arange(40, 200, step=40))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))
ax.set_title("Probability of Hypertension by Age and Weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg")
ax.zaxis.set_rotate_label(False)
ax.set_zlabel("Probability of Hypertension", rotation=90)
ax.view_init(elev=30, azim=240)
plt.savefig("demo.png", dpi=1200)
Download all data
Sincerely appreciate your help
Remove the colormap and opacity in the trisurf command like so:
ax.plot_trisurf(
data["age"],
data["weight"],
data["pr_highbp"],
color=None,
linewidth=1,
antialiased=True,
edgecolor="Black",
alpha=0,
)
That should result in:
You could also take a look at plot_wireframe(). For that I think you have to start with
x = data["age"].to_list()
y = data["weight"].to_list()
X, Y = np.meshgrid(x, y)
But I'm not sure how to create the z coordinate. It seems you may need interpolation from what I read.

Creating a histogram of Yearly Returns

I am trying to finish a task for a project and my task is to create a histogram of yearly returns of Dow Jones historical returns. I have uploaded a picture of the task and my progress below. The problem I have at this point is that I can't find a way to separate the years in the histogram as it shows in the task and I don't know how to modify the y-axix and the legend to show the information that is showing in the first picture.
Any help is appreciated
What I am trying to make and My progress so far
Here is my code:
# Importing packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
#setting the order
order=[-60,-50,-40,-30,-20,-10,
0,10,20,30,40,50,60,70]
#getting the data
dow_jones_returns = pd.read_csv('data/dow-jones-by-year-historical-annual-returns (2).csv')
dow_jones=pd.DataFrame(data=dow_jones_returns)
dow_jones['date']=pd.to_datetime(dow_jones['date'])
dow_jones['date']=pd.DatetimeIndex(dow_jones['date']).year
pd.to_numeric(dow_jones.value)
up_to_2019=dow_jones.iloc[0:99]
lastyear= dow_jones.iloc[-1]
#ploting the histogram
fig = plt.figure()
up_to_2019['value'].plot.hist(bins = order)
plt.show()
Hi to just give you some further directions,
Regarding the Textbox
the textbox looks like it contains the summary statistics of DataFrame.describe() + a few additional ones. You can create a textbox by utilzing a combination of .text() and .subplot()
I found this guide to be very useful for creating a textbox in a plot
Since we dont have the data,
here a pseudo code:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
textstr = str(up_to_2019['value'].describe())
ax.hist(up_to_2019['value'], bins = order)
# these are matplotlib.patch.Patch properties
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
# place a text box in upper left in axes coords
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=10,
verticalalignment='top', bbox=props)
plt.show()
Regarding the y-axis:
1) Here is how you set the right label: plt.ylabel("Number of Observations\n(Probability in%)")
2) Than add the Ticks plt.yticks(np.arange(1,27))
Regarding the labels inside the bins
Thats rather tricky, one option, though definitely not advised would to also include the labels via the .text() method. I dont know if it helps but here is how you do this in R.
Also might helpful are these two links:
how-to-add-a-text-into-a-rectangle
Change color for the patches in a hist
Apparently calling plt.hist() has three return values one of which is callled patches. You can iterate over patches and i.e. change the color of these (see the link above) however I couldn't figure how to put a text to them.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,100]
num_bins = 5
n, bins, patches = plt.hist(x, num_bins, facecolor='blue', alpha=0.5)
for i,pat in enumerate(patches):
pat.set_test("Test") #this doesnt work sadly

How to use a colored shape as yticks in matplotlib or seaborn?

I am working on a task called knowledge tracing which estimates the student mastery level over time. I would like to plot a similar figure as below using the Matplotlib or Seaborn.
It uses different colors to represent a knowledge concept, instead of a text. However, I have googled and found there is no article is talking about how we can do this.
I tried the following
# simulate a record of student mastery level
student_mastery = np.random.rand(5, 30)
df = pd.DataFrame(student_mastery)
# plot the heatmap using seaborn
marker = matplotlib.markers.MarkerStyle(marker='o', fillstyle='full')
sns_plot = sns.heatmap(df, cmap="RdYlGn", vmin=0.0, vmax=1.0)
y_limit = 5
y_labels = [marker for i in range(y_limit)]
plt.yticks(range(y_limit), y_labels)
Yet it simply returns the __repr__ of the marker, e.g., <matplotlib.markers.MarkerStyle at 0x1c5bb07860> on the yticks.
Thanks in advance!
While How can I make the xtick labels of a plot be simple drawings using matplotlib? gives you a general solution for arbitrary shapes, for the shapes shown here, it may make sense to use unicode symbols as text and colorize them according to your needs.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
fig, ax = plt.subplots()
ax.imshow(np.random.rand(3,10), cmap="Greys")
symbolsx = ["⚪", "⚪", "⚫", "⚫", "⚪", "⚫","⚪", "⚫", "⚫","⚪"]
colorsx = np.random.choice(["#3ba1ab", "#b43232", "#8ecc3a", "#893bab"], 10)
ax.set_xticks(range(len(symbolsx)))
ax.set_xticklabels(symbolsx, size=40)
for tick, color in zip(ax.get_xticklabels(), colorsx):
tick.set_color(color)
symbolsy = ["◾", "◾", "◾"]
ax.set_yticks(range(len(symbolsy)))
ax.set_yticklabels(symbolsy, size=40)
for tick, color in zip(ax.get_yticklabels(), ["crimson", "gold", "indigo"]):
tick.set_color(color)
plt.show()

matplotlib: plotting histogram plot just above scatter plot

I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
Code:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")
ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')
ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')
plt.show()
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:

How to zoomed a portion of image and insert in the same plot in matplotlib

I would like to zoom a portion of data/image and plot it inside the same figure. It looks something like this figure.
Is it possible to insert a portion of zoomed image inside the same plot. I think it is possible to draw another figure with subplot but it draws two different figures. I also read to add patch to insert rectangle/circle but not sure if it is useful to insert a portion of image into the figure. I basically load data from the text file and plot it using a simple plot commands shown below.
I found one related example from matplotlib image gallery here but not sure how it works. Your help is much appreciated.
from numpy import *
import os
import matplotlib.pyplot as plt
data = loadtxt(os.getcwd()+txtfl[0], skiprows=1)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.semilogx(data[:,1],data[:,2])
plt.show()
Playing with runnable code is one of the
fastest ways to learn Python.
So let's start with the code from the matplotlib example gallery.
Given the comments in the code, it appears the code is broken up into 4 main stanzas.
The first stanza generates some data, the second stanza generates the main plot,
the third and fourth stanzas create the inset axes.
We know how to generate data and plot the main plot, so let's focus on the third stanza:
a = axes([.65, .6, .2, .2], axisbg='y')
n, bins, patches = hist(s, 400, normed=1)
title('Probability')
setp(a, xticks=[], yticks=[])
Copy the example code into a new file, called, say, test.py.
What happens if we change the .65 to .3?
a = axes([.35, .6, .2, .2], axisbg='y')
Run the script:
python test.py
You'll find the "Probability" inset moved to the left.
So the axes function controls the placement of the inset.
If you play some more with the numbers you'll figure out that (.35, .6) is the
location of the lower left corner of the inset, and (.2, .2) is the width and
height of the inset. The numbers go from 0 to 1 and (0,0) is the located at the
lower left corner of the figure.
Okay, now we're cooking. On to the next line we have:
n, bins, patches = hist(s, 400, normed=1)
You might recognize this as the matplotlib command for drawing a histogram, but
if not, changing the number 400 to, say, 10, will produce an image with a much
chunkier histogram, so again by playing with the numbers you'll soon figure out
that this line has something to do with the image inside the inset.
You'll want to call semilogx(data[3:8,1],data[3:8,2]) here.
The line title('Probability')
obviously generates the text above the inset.
Finally we come to setp(a, xticks=[], yticks=[]). There are no numbers to play with,
so what happens if we just comment out the whole line by placing a # at the beginning of the line:
# setp(a, xticks=[], yticks=[])
Rerun the script. Oh! now there are lots of tick marks and tick labels on the inset axes.
Fine. So now we know that setp(a, xticks=[], yticks=[]) removes the tick marks and labels from the axes a.
Now, in theory you have enough information to apply this code to your problem.
But there is one more potential stumbling block: The matplotlib example uses
from pylab import *
whereas you use import matplotlib.pyplot as plt.
The matplotlib FAQ says import matplotlib.pyplot as plt
is the recommended way to use matplotlib when writing scripts, while
from pylab import * is for use in interactive sessions. So you are doing it the right way, (though I would recommend using import numpy as np instead of from numpy import * too).
So how do we convert the matplotlib example to run with import matplotlib.pyplot as plt?
Doing the conversion takes some experience with matplotlib. Generally, you just
add plt. in front of bare names like axes and setp, but sometimes the
function come from numpy, and sometimes the call should come from an axes
object, not from the module plt. It takes experience to know where all these
functions come from. Googling the names of functions along with "matplotlib" can help.
Reading example code can builds experience, but there is no easy shortcut.
So, the converted code becomes
ax2 = plt.axes([.65, .6, .2, .2], axisbg='y')
ax2.semilogx(t[3:8],s[3:8])
plt.setp(ax2, xticks=[], yticks=[])
And you could use it in your code like this:
from numpy import *
import os
import matplotlib.pyplot as plt
data = loadtxt(os.getcwd()+txtfl[0], skiprows=1)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
ax1.semilogx(data[:,1],data[:,2])
ax2 = plt.axes([.65, .6, .2, .2], axisbg='y')
ax2.semilogx(data[3:8,1],data[3:8,2])
plt.setp(ax2, xticks=[], yticks=[])
plt.show()
The simplest way is to combine "zoomed_inset_axes" and "mark_inset", whose description and
related examples could be found here:
Overview of AxesGrid toolkit
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
import numpy as np
def get_demo_image():
from matplotlib.cbook import get_sample_data
import numpy as np
f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False)
z = np.load(f)
# z is a numpy array of 15x15
return z, (-3,4,-4,3)
fig, ax = plt.subplots(figsize=[5,4])
# prepare the demo image
Z, extent = get_demo_image()
Z2 = np.zeros([150, 150], dtype="d")
ny, nx = Z.shape
Z2[30:30+ny, 30:30+nx] = Z
# extent = [-3, 4, -4, 3]
ax.imshow(Z2, extent=extent, interpolation="nearest",
origin="lower")
axins = zoomed_inset_axes(ax, 6, loc=1) # zoom = 6
axins.imshow(Z2, extent=extent, interpolation="nearest",
origin="lower")
# sub region of the original image
x1, x2, y1, y2 = -1.5, -0.9, -2.5, -1.9
axins.set_xlim(x1, x2)
axins.set_ylim(y1, y2)
plt.xticks(visible=False)
plt.yticks(visible=False)
# draw a bbox of the region of the inset axes in the parent axes and
# connecting lines between the bbox and the inset axes area
mark_inset(ax, axins, loc1=2, loc2=4, fc="none", ec="0.5")
plt.draw()
plt.show()
The nicest way I know of to do this is to use mpl_toolkits.axes_grid1.inset_locator (part of matplotlib).
There is a great example with source code here: https://github.com/NelleV/jhepc/tree/master/2013/entry10
The basic steps to zoom up a portion of a figure with matplotlib
import numpy as np
from matplotlib import pyplot as plt
# Generate the main data
X = np.linspace(-6, 6, 1024)
Y = np.sinc(X)
# Generate data for the zoomed portion
X_detail = np.linspace(-3, 3, 1024)
Y_detail = np.sinc(X_detail)
# plot the main figure
plt.plot(X, Y, c = 'k')
# location for the zoomed portion
sub_axes = plt.axes([.6, .6, .25, .25])
# plot the zoomed portion
sub_axes.plot(X_detail, Y_detail, c = 'k')
# insert the zoomed figure
# plt.setp(sub_axes)
plt.show()

Categories

Resources