I can't figure out how to get the axis labels in my Seaborn Joint plot to not display with absolute values (rather than with an offset). I know I can do this in matplotlib with
plt.ticklabel_format(useOffset=False)
But how do I get it to work with this example
import numpy as np
import pandas as pd
import seaborn as sns
sns.set(style="white")
# Generate a random correlated bivariate dataset
rs = np.random.RandomState(5)
mean = [0, 0]
cov = [(1, .5), (.5, 1)]
x1, x2 = rs.multivariate_normal(mean, cov, 500).T
x1 = pd.Series(x1, name="$X_1$")
x2 = pd.Series(x2, name="$X_2$")
# Show the joint distribution using kernel density estimation
g = sns.jointplot(x1, x2, kind="kde", size=7, space=0, xlim=(0.995, 1.005))
Any suggestions would be greatly appreaciated.
Thanks for your suggestion, #ImportanceofbeingErnest; however that still hasn't solved the problem. Here is a screenshot of the plot, I want the x-axis to look like the y-axis in terms of axis labelling. The offsets disappear if I make the x range larger, but for my dataset that doesn't really work.
My suggestion would be to add import matplotlib.pyplot as plt at the beginning of the script and plt.ticklabel_format(useOffset=False) at the end.
Due to the jointplot creating several axes, plt.ticklabel_format(useOffset=False) will only affect the last of them.
An easy solution is to use
plt.rcParams['axes.formatter.useoffset'] = False
just after the imports. This will turn the offset use off for the complete script.
Related
plt.hist's density argument does not work.
I tried to use the density argument in the plt.hist function to normalize stock returns in my plot, but it didn't work.
The following code worked fine for me and give me the probability density function which I desired.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
num_bins = 50
plt.hist(x, num_bins, density=1)
plt.show()
But when I tried it with stock data, it simply didn't work. The result gave the unnormalized data. I didn't find any abnormal data in my data array.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
plt.hist(returns, 50,density = True)
plt.show()
# "returns" is a np array consisting of 360 days of stock returns
This is a known issue in Matplotlib.
As stated in Bug Report: The density flag in pyplot.hist() does not work correctly
When density = False, the histogram plot would have counts on the Y-axis. But when density = True, the Y-axis does not mean anything useful. I think a better implementation would plot the PDF as the histogram when density = True.
The developers view this as a feature not a bug since it maintains compatibility with numpy. They have closed several the bug reports about it already with since it is working as intended. Creating even more confusion the example on the matplotlib site appears to show this feature working with the y-axis being assigned a meaningful value.
What you want to do with matplotlib is reasonable but matplotlib will not let you do it that way.
It is not a bug.
Area of the bars equal to 1.
Numbers only seem strange because your bin sizes are small
Since this isn't resolved; based on #user14518925's response which is actually correct, this is treating bin width as an actual valid number whereas from my understanding you want each bin to have a width of 1 such that the sum of frequencies is 1. More succinctly, what you're seeing is:
\sum_{i}y_{i}\times\text{bin size} =1
Whereas what you want is:
\sum_{i}y_{i} =1
therefore, all you really need to change is the tick labels on the y-axis. One way to this is to disable the density option :
density = false
and instead divide by the total sample size as such (shown in your example):
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 0 # mean of distribution
sigma = 0.0000625 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
fig = plt.figure()
plt.hist(x, 50, density=False)
locs, _ = plt.yticks()
print(locs)
plt.yticks(locs,np.round(locs/len(x),3))
plt.show()
Another approach, besides that of tvbc, is to change the yticks on the plot.
import matplotlib.pyplot as plt
import numpy as np
steps = 10
bins = np.arange(0, 101, steps)
data = np.random.random(100000) * 100
plt.hist(data, bins=bins, density=True)
yticks = plt.gca().get_yticks()
plt.yticks(yticks, np.round(yticks * steps, 2))
plt.show()
How would I make a plot of this style in python with matplotlib? (Cumulative probability plot) I don't need complete code, mostly just need a place to start and a general idea of what I need to do for it.
A cumulative probability plot is really easy to make:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
fig,ax = plt.subplots()
ax.plot(np.sort(data),np.linspace(0.0,1.0,len(data)))
plt.xlabel(r'$x$')
plt.ylabel(r'$P(X \leq x)$')
plt.show()
Note that it can have a strong advantage over a probability density plot as it does not require binning of your data. (Should you be looking for the latter you can check this code).
I read a waveform from an oscilloscope. The waveform is divided into 10 segments as a function of time. I want to plot the complete waveform, one segment above (or under) another, 'with a vertical offset', so to speak. Additionally, a color map is necessary to show the signal intensity. I've only been able to get the following plot:
As you can see, all the curves are superimposed, which is unacceptable. One could add an offset to the y data but this is not how I would like to do it. Surely there is a much neater way of plotting my data? I've tried a few things to solve this issue using pylab but I am not even sure how to proceed and if this is the right way to go.
Any help will be appreciated.
import readTrc #helps read binary data from an oscilloscope
import matplotlib.pyplot as plt
fName = r"...trc"
datX, datY, m = readTrc.readTrc(fName)
segments = m['SUBARRAY_COUNT'] #number of segments
x, y = [], []
for i in range(segments+1):
x.append(datX[segments*i:segments*(i+1)])
y.append(datY[segments*i:segments*(i+1)])
plt.plot(x,y)
plt.show()
A plot with a vertical offset sounds like a frequency trail.
Here's one approach that does just adjust the y value.
Frequency Trail in MatPlotLib
The same plot has also been coined a joyplot/ridgeline plot. Seaborn has an implementation that creates a series of plots (FacetGrid), and then adjusts the offset between them for a similar effect.
https://seaborn.pydata.org/examples/kde_joyplot.html
An example using a line plot might look like:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
segments = 10
points_per_segment = 100
#your data preparation will vary
x = np.tile(np.arange(points_per_segment), segments)
z = np.floor(np.arange(points_per_segment * segments)/points_per_segment)
y = np.sin(x * (1 + z))
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
pal = sns.color_palette()
g = sns.FacetGrid(df, row="z", hue="z", aspect=15, height=.5, palette=pal)
g.map(plt.plot, 'x', 'y')
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.00)
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
Out:
I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
Code:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")
ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')
ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')
plt.show()
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:
I would like to use kernel density estimate of seaborn.
First I would like to add a colorbor for the main plot.
Second I would like to add horizontal line to the joint probability distribution to show the 68%, 98% confidence levels and another line which shows the true value
Third I also would like to remove the legend in the plot, considering the following example:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_context("paper")
# Generate a random correlated bivariate dataset
rs = np.random.RandomState(5)
mean = [0, 0]
cov = [(1, .5), (.5, 1)]
x1, x2 = rs.multivariate_normal(mean, cov, 500).T
x1 = pd.Series(x1, name="$X_1$")
x2 = pd.Series(x2, name="$X_2$")
# Show the joint distribution using kernel density estimation
g = sns.jointplot(x1, x2, kind="kde", size=7, space=0, color="r")
How should I do it?
Not easily possible (although the density values are not particularly interpretable anyway).
These are matplotlib objects, you can add any additional plot elements you want to them.
stat_func=None, as is shown here.
AFAIK you can't do any of those per doc.
I would like to add a colorbor for the main plot.
That's no an option with jointplot. Colorbars are only available with heatmap, clustermap, and interactplot
I would like to add horizontal line to the joint probability distribution to show the 68%, 98% confidence levels and another line
which shows the true value
Not an option as well, the closest you can come to that is overlay two graphs
I also would like to remove the legend in the plot
I'm assuming you're talking about the pearsonr and p values. Those aren't legends and no documentation to show a way to remove them.