I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
Code:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")
ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')
ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')
plt.show()
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:
Related
I do have a question with matplotlib in python. I create different figures, where every figure should have the same height to print them in a publication/poster next to each other.
If the y-axis has a label on the very top, this shrinks the height of the box with the plot. So I use MaxNLocator to remove the upper and lower y-tick. In some plots, I want to have the 1.0 as a number on the y-axis, because I have normalized data. So I need a solution, which expands in these cases the y-axis and ensures 1.0 is a y-Tick, but does not corrupt the size of the figure using tight_layout().
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0,1,num=11)
y = np.linspace(1,.42,num=11)
fig,axs = plt.subplots(1,1)
axs.plot(x,y)
locator=MaxNLocator(prune='both',nbins=5)
axs.yaxis.set_major_locator(locator)
plt.tight_layout()
fig.show()
Here is a link to a example-pdf, which shows the problems with height of upper boxline.
I tried to work with adjust_subplots() but this is of no use for me, because I vary the size of the figures and want to have same the font size all the time, which changes the margins.
Question is:
How can I use MaxNLocator and specify a number which has to be in the y-axis?
Hopefully someone of you has some advice.
Greetings,
Laenan
Assuming that you know in advance how many plots there will be in 1 row on a page one way to solve this would be to put all those plots into one figure - matplotlib will make sure they are alinged on axes:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0, 1, num=11)
y = np.linspace(1, .42, num=11)
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,3), gridspec_kw={'wspace':.2})
ax1.plot(x,y)
ax2.plot(x,y)
locator=MaxNLocator(prune='both', nbins=5)
ax1.yaxis.set_major_locator(locator)
# You don't need to use tight_layout and using it might give an error
# plt.tight_layout()
fig.show()
The example here
What is the difference between 'log' and 'symlog'?
nicely shows how a linear scale at the origin can be used with a log scale elsewhere. I want to go the other way around. I want to have a a log scale from 1-100 and then a linear! scale from 100-1000. What are my options? Like the figure above
This attempt did not work
import matplotlib.pyplot as plt
plt.figure()
plt.errorbar(x, y, yerr=yerrors)
plt.xscale('symlog', linthreshx= (100,1000))
The problem seems to be that linthreshx is defined to take the range (-x,x). So if x if 5 we would get a linear scale on (-5,5). One is confined to the origin. I thought simply choosing a different range should work but it does not. Any ideas?
From the response of user1318806 to cphlewis:
Thank you. Actually I wanted a combination of log+linear on the x axis not y. But I assume your code should be easily adaptable.
Hello! If you wanted a combination of log+linear on the x-axis (patterned from the code of Duncan Watts and CubeJockey):
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
# Numbers from -50 to 50, with 0.1 as step
xdomain = np.arange(-50,50, 0.1)
axMain = plt.subplot(111)
axMain.plot(np.sin(xdomain), xdomain)
axMain.set_xscale('linear')
axMain.set_xlim((0.5, 1.5))
axMain.spines['left'].set_visible(False)
axMain.yaxis.set_ticks_position('right')
axMain.yaxis.set_visible(False)
divider = make_axes_locatable(axMain)
axLin = divider.append_axes("left", size=2.0, pad=0, sharey=axMain)
axLin.set_xscale('log')
axLin.set_xlim((0.01, 0.5))
axLin.plot(np.sin(xdomain), xdomain)
axLin.spines['right'].set_visible(False)
axLin.yaxis.set_ticks_position('left')
plt.setp(axLin.get_xticklabels(), visible=True)
plt.title('Linear right, log left')
The code above yields:
(MISCELLANEOUS) Here's a very minor fix for the title and the absence of tick marks on the right side:
# Fix for: title + no tick marks on the right side of the plot
ax2 = axLin.twinx()
ax2.spines['left'].set_visible(False)
ax2.tick_params(axis='y',which='both',labelright='off')
Adding these lines will give you this:
pythonmatplotlib
I assume you want linear near the origin, log farther -- since `symlog' does it the other way around -- I couldn't come up with data that looked good like this, but you can put it together with the axes_grid:
# linear and log axes for the same plot?
# starting with the histogram example from
# http://matplotlib.org/mpl_toolkits/axes_grid/users/overview.html
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
# Numbers from -50 to 50, with 0.1 as step
xdomain = np.arange(-50,50, 0.1)
axMain = plt.subplot(111)
axMain.plot(xdomain, np.sin(xdomain))
axMain.set_yscale('log')
axMain.set_ylim((0.01, 0.5))
divider = make_axes_locatable(axMain)
axLin = divider.append_axes("top", size=2.0, pad=0.02, sharex=axMain)
axLin.plot(xdomain, np.sin(xdomain))
axLin.set_xscale('linear')
axLin.set_ylim((0.5, 1.5))
plt.title('Linear above, log below')
plt.show()
This solution makes an addition to cphlewis's answer so that there is a smooth transition, and the plot appears to have consistent tick markers. My change adds these three lines:
axLin.spines['bottom'].set_visible(False)
axLin.xaxis.set_ticks_position('top')
plt.setp(axLin.get_xticklabels(), visible=False)
In total, the code is
# linear and log axes for the same plot?
# starting with the histogram example from
# http://matplotlib.org/mpl_toolkits/axes_grid/users/overview.html
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
# Numbers from -50 to 50, with 0.1 as step
xdomain = np.arange(-50,50, 0.1)
axMain = plt.subplot(111)
axMain.plot(xdomain, np.sin(xdomain))
axMain.set_yscale('log')
axMain.set_ylim((0.01, 0.5))
axMain.spines['top'].set_visible(False)
axMain.xaxis.set_ticks_position('bottom')
divider = make_axes_locatable(axMain)
axLin = divider.append_axes("top", size=2.0, pad=0, sharex=axMain)
axLin.plot(xdomain, np.sin(xdomain))
axLin.set_xscale('linear')
axLin.set_ylim((0.5, 1.5))
# Removes bottom axis line
axLin.spines['bottom'].set_visible(False)
axLin.xaxis.set_ticks_position('top')
plt.setp(axLin.get_xticklabels(), visible=False)
plt.title('Linear above, log below')
plt.show()
I have three dataframes and I plot the KDE using seaborn module in python. The issue is that these plots try to make the area under the curve 1 (which is how they are intended to perform), so the height in the plots are normalized ones. But is there any way to show the actual values instead of the normalized ones. Also is there any way I can find out the point of intersection for the curves?
Note: I do not want to use the curve_fit method of scipy as I am not sure about the distribution I will get for each dataframe, it can be multimodal also.
import seaborn as sns
plt.figure()
sns.distplot(data_1['gap'],kde=True,hist=False,label='1')
sns.distplot(data_2['gap'],kde=True,hist=False,label='2')
sns.distplot(data_3['gap'],kde=True,hist=False,label='3')
plt.legend(loc='best')
plt.show()
Output for the code is attached in the link as I can't post images.plot_link
You can just grab the line and rescale its y-values with set_data:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# create some data
n = 1000
x = np.random.rand(n)
# plot stuff
fig, ax = plt.subplots(1,1)
ax = sns.distplot(x, kde=True, hist=False, ax=ax)
# find the line and rescale y-values
children = ax.get_children()
for child in children:
if isinstance(child, matplotlib.lines.Line2D):
x, y = child.get_data()
y *= n
child.set_data(x,y)
# update y-limits (not done automatically)
ax.set_ylim(y.min(), y.max())
fig.canvas.draw()
I do have a question with matplotlib in python. I create different figures, where every figure should have the same height to print them in a publication/poster next to each other.
If the y-axis has a label on the very top, this shrinks the height of the box with the plot. So I use MaxNLocator to remove the upper and lower y-tick. In some plots, I want to have the 1.0 as a number on the y-axis, because I have normalized data. So I need a solution, which expands in these cases the y-axis and ensures 1.0 is a y-Tick, but does not corrupt the size of the figure using tight_layout().
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0,1,num=11)
y = np.linspace(1,.42,num=11)
fig,axs = plt.subplots(1,1)
axs.plot(x,y)
locator=MaxNLocator(prune='both',nbins=5)
axs.yaxis.set_major_locator(locator)
plt.tight_layout()
fig.show()
Here is a link to a example-pdf, which shows the problems with height of upper boxline.
I tried to work with adjust_subplots() but this is of no use for me, because I vary the size of the figures and want to have same the font size all the time, which changes the margins.
Question is:
How can I use MaxNLocator and specify a number which has to be in the y-axis?
Hopefully someone of you has some advice.
Greetings,
Laenan
Assuming that you know in advance how many plots there will be in 1 row on a page one way to solve this would be to put all those plots into one figure - matplotlib will make sure they are alinged on axes:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0, 1, num=11)
y = np.linspace(1, .42, num=11)
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,3), gridspec_kw={'wspace':.2})
ax1.plot(x,y)
ax2.plot(x,y)
locator=MaxNLocator(prune='both', nbins=5)
ax1.yaxis.set_major_locator(locator)
# You don't need to use tight_layout and using it might give an error
# plt.tight_layout()
fig.show()
When matplotlib makes figures, I find that it "pads" the space around axes too much for my taste (and in an asymmetrical way). For example with
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
x, y = 12*np.random.rand(2, 1000)
ax.set(xlim=[2,10])
ax.plot(x, y, 'go')
I get something that looks like
(here for example in Adobe Illustrator).
I'd like the bounds of the figure to be closer to the axes on all sides, especially on the left and right.
How can I adjust these bounds programmatically in matplotlib, relative to each axis?
try:
plt.tight_layout()
the default parameter set is:
plt.tight_layout(pad=1.08, h_pad=None, w_pad=None, rect=None)