convert a scatter plot into a contour plot in matplotllib [duplicate] - python

I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.
I've come across a very similar question, which shows an example of this using R:
R Scatter Plot: symbol color represents number of overlapping points
What's the best way to accomplish something similar in python using matplotlib?

In addition to hist2d or hexbin as #askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.
If you want to do that:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100)
plt.show()
If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50)
plt.show()

Plotting >100k data points?
The accepted answer, using gaussian_kde() will take a lot of time. On my machine, 100k rows took about 11 minutes. Here I will add two alternative methods (mpl-scatter-density and datashader) and compare the given answers with same dataset.
In the following, I used a test data set of 100k rows:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Output & computation time comparison
Below is a comparison of different methods.
1: mpl-scatter-density
Installation
pip install mpl-scatter-density
Example code
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Drawing this took 0.05 seconds:
And the zoom-in looks quite nice:
2: datashader
Datashader is an interesting project. It has added support for matplotlib in datashader 0.12.
Installation
pip install datashader
Code (source & parameterer listing for dsshow):
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
dsartist = dsshow(
df,
ds.Point("x", "y"),
ds.count(),
vmin=0,
vmax=35,
norm="linear",
aspect="auto",
ax=ax,
)
plt.colorbar(dsartist)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
It took 0.83 s to draw this:
There is also possibility to colorize by third variable. The third parameter for dsshow controls the coloring. See more examples here and the source for dsshow here.
3: scatter_with_gaussian_kde
def scatter_with_gaussian_kde(ax, x, y):
# https://stackoverflow.com/a/20107592/3015186
# Answer by Joel Kington
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=100, edgecolor='')
It took 11 minutes to draw this:
4: using_hist2d
import matplotlib.pyplot as plt
def using_hist2d(ax, x, y, bins=(50, 50)):
# https://stackoverflow.com/a/20105673/3015186
# Answer by askewchan
ax.hist2d(x, y, bins, cmap=plt.cm.jet)
It took 0.021 s to draw this bins=(50,50):
It took 0.173 s to draw this bins=(1000,1000):
Cons: The zoomed-in data does not look as good as in with mpl-scatter-density or datashader. Also you have to determine the number of bins yourself.
5: density_scatter
The code is as in the answer by Guillaume.
It took 0.073 s to draw this with bins=(50,50):
It took 0.368 s to draw this with bins=(1000,1000):

Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)
#To be sure to plot all data
z[np.where(np.isnan(z))] = 0.0
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
norm = Normalize(vmin = np.min(z), vmax = np.max(z))
cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
cbar.ax.set_ylabel('Density')
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )

You could make a histogram:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()

Related

Choosing a specific contour in oscillatory data to plot with matplotlib

I have oscillatory data to which I would like to add a specific contour line. For example, the data pass through a value several times, and I would like to pick a specific instance of that value to highlight with a contour. As an example, consider a Bessel function. Below, I plot the contours with a single level, 0.2. I would like to choose to show only the outer contour, however, and not the other interior ones.
from scipy.special import jv
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-20,20,num=500)
y = np.linspace(-20,20,num=500)
[X,Y] = np.meshgrid(x,y)
Z = jv(1,np.sqrt(X**2.+Y**2.))
fig = plt.figure()
ax = fig.add_subplot(111)
cb = ax.pcolormesh(X,Y,Z)
ax.contour(X,Y,Z,[.2],linestyles='dashed')
cbar = fig.colorbar(cb)
plt.show()
If helpful, this is a plot of my actual data (the code used to create is far too long to include here). I would only like to plot the outermost purple contour:
Thank you
Let's see how you like this ;) ... I plot all contour lines invisibly, but extract the contour object and replot the first one (that I just figured out by trial and error, and might be different in your case).
from scipy.special import jv
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-20, 20, num=500)
y = np.linspace(-20, 20, num=500)
[X, Y] = np.meshgrid(x, y)
Z = jv(1, np.sqrt(X**2. + Y**2.))
fig = plt.figure()
ax = fig.add_subplot(111)
cb = ax.pcolormesh(X, Y, Z)
cont = ax.contour(X, Y, Z, [.2], alpha=0) # alpha = 0 -> invisible
the_interesting_one = cont.allsegs[0][0]
plt.plot(the_interesting_one[:, 0], the_interesting_one[:, 1], "k--")
cbar = fig.colorbar(cb)
plt.show()

How to plot property distribution with interpolation?

I have a dataframe like this:
import random
import matplotlib.pyplot as plt
plt.style.use('ggplot')
fig = plt.figure(figsize=(16,8))
import pandas as pd
data = pd.DataFrame({"X":random.sample(range(530000, 560000), 60),
"Y":random.sample(range(8580000, 8620000), 60),
"PROPERTY":random.choices(range(0, 30), k=60)})
I saw an example where I could plot my PROPERTY along X and Y coordinates as a triangle spatial distribution:
x = data["X"]
y = data["Y"]
z = data["PROPERTY"]
# Plot Triangular Color Filled Contour
plt.tricontourf(x, y, z, cmap="rainbow")
plt.colorbar()
plt.tricontour(x, y, z)
# Set well shapes
plt.scatter(x, y, color='black')
plt.xlabel("X")
plt.ylabel("Y")
Althoug I would like to plot it as a different map type, not with these abrupt data transitions. Maybe like kriging or smooth interpolation like this example:
Anyone could show me an example?
I used the pykrige package to interpolate the point data into a grid field.
The code and output figure are here.
import random
import matplotlib.pyplot as plt
plt.style.use('ggplot')
fig = plt.figure(figsize=(6,4))
import pandas as pd
from pykrige import OrdinaryKriging
import numpy as np
random.seed(100)
data = pd.DataFrame({"X":random.sample(range(530000, 560000), 60),
"Y":random.sample(range(8580000, 8620000), 60),
"PROPERTY":random.choices(range(0, 30), k=60)})
x = data["X"]
y = data["Y"]
z = data["PROPERTY"]
x1 = np.linspace(530000.,560000,700)
y1 = np.linspace(8580000,8620000,400)
dict1= {'sill': 1, 'range': 6500.0, 'nugget': .1}
OK = OrdinaryKriging(x,y,z,variogram_model='gaussian',
variogram_parameters=dict1,nlags=6)
zgrid,ss = OK.execute('grid',x1,y1)
xgrid,ygrid = np.meshgrid(x1,y1)
# Plot Triangular Color Filled Contour
# plt.tricontourf(x, y, z, cmap="rainbow")
plt.contourf(xgrid, ygrid, zgrid, cmap="rainbow")
plt.colorbar()
# Set well shapes
plt.scatter(x, y, color='black')
plt.xlabel("X")
plt.ylabel("Y")

Using drawstyle "steps-mid" together with x-log-scale causes step points to be non-centered

Matplotlib offers various options for the drawstyle. steps-mid does the following:
The steps variants connect the points with step-like lines, i.e. horizontal lines with vertical steps. [...]
'steps-mid': The step is halfway between the points.
This works fine when the x-scale is linear however when using a log-scale it still seems to compute the step points by averaging in data-space rather than log-space. This leads to data points not being centered between the steps.
import matplotlib.pyplot as plt
import numpy as np
x = np.logspace(0, 10, num=10)
y = np.arange(x.size) % 2
fig, ax = plt.subplots()
ax.set_xscale('log')
ax.plot(x, y, drawstyle='steps-mid', marker='s')
Is there a way to use step-like plotting together with x-log-scale such that the steps are centered between data points in log-space?
I don't know of a way other than building the steps correctly in log space yourself:
import matplotlib.pyplot as plt
import numpy as np
x = np.logspace(0, 10, num=10)
y = np.arange(x.size) % 2
def log_steps_mid(x, y, **kwargs):
x_log = np.log10(x)
x_log_mid = x_log[:-1] + np.diff(x_log)/2
x_mid = 10 ** x_log_mid
x_mid = np.hstack([x[0],
np.repeat(x_mid, 2),
x[-1]])
y_mid = np.repeat(y, 2)
ax.plot(x_mid, y_mid, **kwargs)
fig, ax = plt.subplots()
ax.set_xscale('log')
ax.plot(x, y, ls='', marker='s', color='b')
log_steps_mid(x, y, color='b')

Partial shade of distribution plot using Seaborn

Following simple code:
import numpy as np
import seaborn as sns
dist = np.random.normal(loc=0, scale=1, size=1000)
ax = sns.kdeplot(dist, shade=True);
Yields the following image:
I would like to only shade everything right (or left to some x value). What is the simplest way? I am ready to use something other than Seaborn.
After calling ax = sns.kdeplot(dist, shade=True), the last line in ax.get_lines() corresponds to the kde density curve:
ax = sns.kdeplot(dist, shade=True)
line = ax.get_lines()[-1]
You can extract the data corresponding to that curve using line.get_data:
x, y = line.get_data()
Once you have the data, you can, for instance, shade the region corresponding to x > 0 by selecting those points and calling ax.fill_between:
mask = x > 0
x, y = x[mask], y[mask]
ax.fill_between(x, y1=y, alpha=0.5, facecolor='red')
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
dist = np.random.normal(loc=0, scale=1, size=1000)
ax = sns.kdeplot(dist, shade=True)
line = ax.get_lines()[-1]
x, y = line.get_data()
mask = x > 0
x, y = x[mask], y[mask]
ax.fill_between(x, y1=y, alpha=0.5, facecolor='red')
plt.show()
Using seaborn is often fine for standard plots, but when some customized requirements come into play, falling back to matplotlib is often easier.
So one may first calculate the kernel density estimate and then plot it in the region of interest.
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("seaborn-darkgrid")
dist = np.random.normal(loc=0, scale=1, size=1000)
kde = stats.gaussian_kde(dist)
# plot complete kde curve as line
pos = np.linspace(dist.min(), dist.max(), 101)
plt.plot(pos, kde(pos))
# plot shaded kde only right of x=0.5
shade = np.linspace(0.5,dist.max(), 101)
plt.fill_between(shade,kde(shade), alpha=0.5)
plt.ylim(0,None)
plt.show()

Plotting a 2D heatmap

Using Matplotlib, I want to plot a 2D heat map. My data is an n-by-n Numpy array, each with a value between 0 and 1. So for the (i, j) element of this array, I want to plot a square at the (i, j) coordinate in my heat map, whose color is proportional to the element's value in the array.
How can I do this?
The imshow() function with parameters interpolation='nearest' and cmap='hot' should do what you want.
Please review the interpolation parameter details, and see Interpolations for imshow and Image antialiasing.
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
Seaborn is a high-level API for matplotlib, which takes care of a lot of the manual work.
seaborn.heatmap automatically plots a gradient at the side of the chart etc.
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
uniform_data = np.random.rand(10, 12)
ax = sns.heatmap(uniform_data, linewidth=0.5)
plt.show()
You can even plot upper / lower left / right triangles of square matrices. For example, a correlation matrix, which is square and is symmetric, so plotting all values would be redundant.
corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True, cmap="YlGnBu")
plt.show()
I would use matplotlib's pcolor/pcolormesh function since it allows nonuniform spacing of the data.
Example taken from matplotlib:
import matplotlib.pyplot as plt
import numpy as np
# generate 2 2d grids for the x & y bounds
y, x = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))
z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)
# x and y are bounds, so z should be the value *inside* those bounds.
# Therefore, remove the last value from the z array.
z = z[:-1, :-1]
z_min, z_max = -np.abs(z).max(), np.abs(z).max()
fig, ax = plt.subplots()
c = ax.pcolormesh(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max)
ax.set_title('pcolormesh')
# set the limits of the plot to the limits of the data
ax.axis([x.min(), x.max(), y.min(), y.max()])
fig.colorbar(c, ax=ax)
plt.show()
For a 2d numpy array, simply use imshow() may help you:
import matplotlib.pyplot as plt
import numpy as np
def heatmap2d(arr: np.ndarray):
plt.imshow(arr, cmap='viridis')
plt.colorbar()
plt.show()
test_array = np.arange(100 * 100).reshape(100, 100)
heatmap2d(test_array)
This code produces a continuous heatmap.
You can choose another built-in colormap from here.
Here's how to do it from a csv:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
# Load data from CSV
dat = np.genfromtxt('dat.xyz', delimiter=' ',skip_header=0)
X_dat = dat[:,0]
Y_dat = dat[:,1]
Z_dat = dat[:,2]
# Convert from pandas dataframes to numpy arrays
X, Y, Z, = np.array([]), np.array([]), np.array([])
for i in range(len(X_dat)):
X = np.append(X, X_dat[i])
Y = np.append(Y, Y_dat[i])
Z = np.append(Z, Z_dat[i])
# create x-y points to be used in heatmap
xi = np.linspace(X.min(), X.max(), 1000)
yi = np.linspace(Y.min(), Y.max(), 1000)
# Interpolate for plotting
zi = griddata((X, Y), Z, (xi[None,:], yi[:,None]), method='cubic')
# I control the range of my colorbar by removing data
# outside of my range of interest
zmin = 3
zmax = 12
zi[(zi<zmin) | (zi>zmax)] = None
# Create the contour plot
CS = plt.contourf(xi, yi, zi, 15, cmap=plt.cm.rainbow,
vmax=zmax, vmin=zmin)
plt.colorbar()
plt.show()
where dat.xyz is in the form
x1 y1 z1
x2 y2 z2
...
Use matshow() which is a wrapper around imshow to set useful defaults for displaying a matrix.
a = np.diag(range(15))
plt.matshow(a)
https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.matshow.html
This is just a convenience function wrapping imshow to set useful defaults for displaying a matrix. In particular:
Set origin='upper'.
Set interpolation='nearest'.
Set aspect='equal'.
Ticks are placed to the left and above.
Ticks are formatted to show integer indices.
Here is a new python package to plot complex heatmaps with different kinds of row/columns annotations in Python: https://github.com/DingWB/PyComplexHeatmap

Categories

Resources