plot log-scale and linear scale functions and histograms on same canvas - python

I have a probability density function of that I can only evaluate the logarithm without running into numeric issues. I have a histogram that I would like to plot on the same canvas. However, for the histogram, I need the option log=True to have it plotted in log scale, wheras for the function, I can only have the logarithms of the values directly. How can I plot both on the same canvas?
Please look at this MWE for illustration of the problem:
import matplotlib.pyplot as plt
import random
import math
import numpy as np
sqrt2pi = math.sqrt(2*math.pi)
def gauss(l):
return [ 1/sqrt2pi * math.exp(-x*x) for x in l]
def loggauss(l):
return [ -math.log(sqrt2pi) -x*x for x in l ]
# just fill a histogram
h = [ random.gauss(0,1) for x in range(0,1000) ]
plt.hist(h,bins=21,normed=True,log=True)
# this works nicely
xvals = np.arange(-4,4,0.1)
plt.plot(xvals,gauss(xvals),"-k")
# but I would like to plot this on the same canvas:
# plt.plot(xvals,loggauss(xvals),"-r")
plt.show()
Any suggestions?

If I understand correctly, you want to plot two data sets in the same figure, on the same x-axis, but one on a log y-scale and one on a linear y-scale. You can do this using twinx:
fig, ax = plt.subplots()
ax.hist(h,bins=21,normed=True,log=True)
ax2 = ax.twinx()
ax2.plot(xvals, loggauss(xvals), '-r')

Related

How to have third variable control the color gradient on a log-scaled plot in python?

Is there a way to have a third variable control the color gradient on a log-scaled plot? Also: how would I make a color legend for it? I want it to look something like the image linked below.
(https://i.stack.imgur.com/iNkHw.png)
#creating arrays
sulfate = np.array(master['SO4-2_(input)'])
chloride = np.array(master['Cl-_(input)'])
pH = np.array(master['pH'])
#create plot
fig, ax = plt.subplots()
plt.figure(1)
ax.loglog(chloride,sulfate,'.',c=pH,cmap='hsv')
#add 1:1 ratio line
plt.plot( [0,1],[0,1] )
#x and y axes lims
plt.xlim(10.0E-7,10.0E-1)
plt.ylim(10.0E-7,10.0E-1)
plt.show()
When I try to use the technique for a typical scatter plot is says that the variable is not a valid value for color.
As suggested in JohanC's comment, use the scatter function and then set the axis scales to logarithmic separately. To get a colorbar, use colorbar. If you also want the colorbar to have logarithmic scaling (I am not sure if that is what you want), use the norm argument of scatter and provide a matplotlib.colors.LogNorm.
from matplotlib.colors import LogNorm
import matplotlib.pyplot as plt
import numpy as np
# Create come mock data
sulfate = np.random.rand(20)
chloride = np.random.rand(20)
pH = np.arange(20) + 1
# Create the plot
plt.scatter(sulfate, chloride, c=pH, norm=LogNorm(), cmap="cividis")
plt.xscale("log")
plt.yscale("log")
plt.colorbar()
Depending on what data format your original variable master is in, there might be easier ways to produce this plot. For example, with xarray:
import xarray as xr
ds = xr.Dataset(
data_vars={"sulfate": ("x", sulfate), "chloride": ("x", chloride), "pH": ("x", pH)}
)
ds.plot.scatter(
x="sulfate",
y="chloride",
hue="pH",
xscale="log",
yscale="log",
norm=LogNorm(),
cmap="cividis",
)
Or with pandas:
df = ds.to_dataframe()
ax = df.plot.scatter(
x="sulfate",
y="chloride",
c="pH",
loglog=True,
colorbar=True,
norm=LogNorm(),
cmap="cividis",
)

How to plot several curves with an offset on the same graph

I read a waveform from an oscilloscope. The waveform is divided into 10 segments as a function of time. I want to plot the complete waveform, one segment above (or under) another, 'with a vertical offset', so to speak. Additionally, a color map is necessary to show the signal intensity. I've only been able to get the following plot:
As you can see, all the curves are superimposed, which is unacceptable. One could add an offset to the y data but this is not how I would like to do it. Surely there is a much neater way of plotting my data? I've tried a few things to solve this issue using pylab but I am not even sure how to proceed and if this is the right way to go.
Any help will be appreciated.
import readTrc #helps read binary data from an oscilloscope
import matplotlib.pyplot as plt
fName = r"...trc"
datX, datY, m = readTrc.readTrc(fName)
segments = m['SUBARRAY_COUNT'] #number of segments
x, y = [], []
for i in range(segments+1):
x.append(datX[segments*i:segments*(i+1)])
y.append(datY[segments*i:segments*(i+1)])
plt.plot(x,y)
plt.show()
A plot with a vertical offset sounds like a frequency trail.
Here's one approach that does just adjust the y value.
Frequency Trail in MatPlotLib
The same plot has also been coined a joyplot/ridgeline plot. Seaborn has an implementation that creates a series of plots (FacetGrid), and then adjusts the offset between them for a similar effect.
https://seaborn.pydata.org/examples/kde_joyplot.html
An example using a line plot might look like:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
segments = 10
points_per_segment = 100
#your data preparation will vary
x = np.tile(np.arange(points_per_segment), segments)
z = np.floor(np.arange(points_per_segment * segments)/points_per_segment)
y = np.sin(x * (1 + z))
df = pd.DataFrame({'x': x, 'y': y, 'z': z})
pal = sns.color_palette()
g = sns.FacetGrid(df, row="z", hue="z", aspect=15, height=.5, palette=pal)
g.map(plt.plot, 'x', 'y')
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.00)
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
Out:

Get actual numbers instead of normalized value in seaborn KDE plots

I have three dataframes and I plot the KDE using seaborn module in python. The issue is that these plots try to make the area under the curve 1 (which is how they are intended to perform), so the height in the plots are normalized ones. But is there any way to show the actual values instead of the normalized ones. Also is there any way I can find out the point of intersection for the curves?
Note: I do not want to use the curve_fit method of scipy as I am not sure about the distribution I will get for each dataframe, it can be multimodal also.
import seaborn as sns
plt.figure()
sns.distplot(data_1['gap'],kde=True,hist=False,label='1')
sns.distplot(data_2['gap'],kde=True,hist=False,label='2')
sns.distplot(data_3['gap'],kde=True,hist=False,label='3')
plt.legend(loc='best')
plt.show()
Output for the code is attached in the link as I can't post images.plot_link
You can just grab the line and rescale its y-values with set_data:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# create some data
n = 1000
x = np.random.rand(n)
# plot stuff
fig, ax = plt.subplots(1,1)
ax = sns.distplot(x, kde=True, hist=False, ax=ax)
# find the line and rescale y-values
children = ax.get_children()
for child in children:
if isinstance(child, matplotlib.lines.Line2D):
x, y = child.get_data()
y *= n
child.set_data(x,y)
# update y-limits (not done automatically)
ax.set_ylim(y.min(), y.max())
fig.canvas.draw()

matplotlib: plotting histogram plot just above scatter plot

I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:
I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module
I encountered the same problem today. Additionally I wanted a CDF for the marginals.
Code:
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np
x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))
fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")
ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')
ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')
plt.show()
Hope it helps the next person searching for scatter-plot with marginal distribution.
Here's an example of how to do it, using gridspec.GridSpec:
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
fig = plt.figure()
gs = GridSpec(4,4)
ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])
ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")
# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)
# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')
# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()
I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :
ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()
The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.
On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:

Colorbar does not show values when using LogNorm()

I am trying to make a contour plot with the contour levels scaled by the log of the values. However, the colorbar does not show enough values next to the colors. Here is a simple example.
import numpy as N
import matplotlib as M
import matplotlib.pyplot as PLT
# Set up a simple function to plot
values = N.empty((10,10))
for xi in range(10):
for yi in range(10):
values[xi,yi] = N.exp(xi*yi/10. - 1)
levels = N.logspace(-1, 4, 10)
log_norm = M.colors.LogNorm()
# Currently not used - linear scaling
linear_norm = M.colors.Normalize()
# Plot the function using the indices as the x and y axes
PLT.contourf(values, norm=log_norm, levels=levels)
PLT.colorbar()
If you switch log_norm for linear_norm in the contourf call, you'll see that the colorbar does have values. Of course, using linear_norm means the colors are scaled linearly and the contours are not well distributed for this function.
I'm using python 2.7.2, enthought edition which comes with matplotlib, on Mac OS 10.7.
Add a format to the call to PLT.colorbar:
import numpy as N
import matplotlib as M
import matplotlib.pyplot as PLT
# Set up a simple function to plot
x,y = N.meshgrid(range(10),range(10))
values = N.exp(x*y/10. - 1)
levels = N.logspace(-1, 4, 10)
log_norm = M.colors.LogNorm()
# Currently not used - linear scaling
linear_norm = M.colors.Normalize()
# Plot the function using the indices as the x and y axes
PLT.contourf(values, norm=log_norm, levels=levels)
PLT.colorbar(format='%.2f')
PLT.show()

Categories

Resources