I have data in 3 dimensions. I would like to plot the first two dimensions and colorize by the third. I want it show up as an image like hist2d would do, except instead of being colorized by the occupation of the first two dimensions, I want it to be colorized by the third dimension. I think this will require binning everything. How can this be achieved?
Example data:
x = np.random.normal(loc=10, scale=2, size=100)
y = np.random.normal(loc=25, scale=5, size=100)
z = np.cos(x)+np.sin(y)
I want to plot x vs y and colorize by the intensity z. But, not just a scatterplot, I want it to come out as an image like this.
The easy solution, since the data is not structured on a grid is to use tripcolor from matplotlib (there is also tricontourf):
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
x = np.random.normal(loc=10, scale=2, size=100)
y = np.random.normal(loc=25, scale=5, size=100)
z = np.cos(x)+np.sin(y)
plt.tripcolor(x, y, z);
plt.plot(x, y, '.k');
The other solution is, prior to the visualization, to interpolated the data on a regular grid using, for instance, griddata from Scipy:
from scipy.interpolate import griddata
# define the grid
x_fine = np.linspace(min(x), max(x), 200)
y_fine = np.linspace(min(y), max(y), 200)
x_grid, y_grid = np.meshgrid(x_fine, y_fine)
# interpolate the data:
z_grid = griddata((x, y), z, (x_grid.ravel(), y_grid.ravel()), method='cubic').reshape(x_grid.shape)
plt.pcolor(x_fine, y_fine, z_grid);
plt.plot(x, y, '.k');
I use ggplot for R, not so much for python, but, here's a sample:
import pandas as pd
import numpy as np
# is this the best implementation of ggplot?
from plotnine import *
x = np.random.normal(loc=10, scale=2, size=100)
y = np.random.normal(loc=25, scale=5, size=100)
z = np.cos(x)+np.sin(y)
df = pd.DataFrame({'x':x, 'y':y, 'z':z})
p = ggplot(df, aes(x='x', y='y', colour='z')) + geom_point()
p = p + scale_color_distiller(type='div', palette='RdYlBu')
p
Related
My data points are:
x =[5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 0.33E-04, 1.00E-03]
y= [494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715]
The x axis on my plot must be exponential!!
I want to make a regression line such as the image added, in an S shape. How do I do this (in matlab or python)?
IMG
UPDATE: I tried:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 100)
#define spline
spl = make_interp_spline(x, y, k=2)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(x,y, 'o', xnew, y_smooth)
plt.xscale("log")
plt.show()
My results are: results
How can I make it even smoother? differing the k doesn't make it better.
Note that the higher the degree you use for the k argument, the more “wiggly” the curve will be
Depending on how curved you want the line to be, you can modify the value for k.
try this:
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import numpy as np
#create data
x = np.array([5.00E-07, 1.40E-06, 4.10E-06, 1.25E-05, 3.70E-05, 1.11E-04, 3.33E-04, 1.00E-03])
y= np.array([494.55, 333.4666667, 333.3333333, 333.1, 303.4966667, 197.7533333, 66.43333333, 67.715])
#define x as 200 equally spaced values between the min and max of original x
xnew = np.linspace(x.min(), x.max(), 200)
#define spline
spl = make_interp_spline(x, y, k=3)
y_smooth = spl(xnew)
#create smooth line chart
plt.plot(xnew, y_smooth)
plt.show()
I have a spreadsheet file that I would like to input to create a 3D surface graph using Matplotlib in Python.
I used plot_trisurf and it worked, but I need the projections of the contour profiles onto the graph that I can get with the surface function, like this example.
I'm struggling to arrange my Z data in a 2D array that I can use to input in the plot_surface method. I tried a lot of things, but none seems to work.
Here it is what I have working, using plot_trisurf
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import pandas as pd
df=pd.read_excel ("/Users/carolethais/Desktop/Dissertação Carol/Códigos/Resultados/res_02_0.5.xlsx")
fig = plt.figure()
ax = fig.gca(projection='3d')
# I got the graph using trisurf
graf=ax.plot_trisurf(df["Diametro"],df["Comprimento"], df["temp_out"], cmap=matplotlib.cm.coolwarm)
ax.set_xlim(0, 0.5)
ax.set_ylim(0, 100)
ax.set_zlim(25,40)
fig.colorbar(graf, shrink=0.5, aspect=15)
ax.set_xlabel('Diâmetro (m)')
ax.set_ylabel('Comprimento (m)')
ax.set_zlabel('Temperatura de Saída (ºC)')
plt.show()
This is a part of my df, dataframe:
Diametro Comprimento temp_out
0 0.334294 0.787092 34.801994
1 0.334294 8.187065 32.465551
2 0.334294 26.155976 29.206090
3 0.334294 43.648591 27.792126
4 0.334294 60.768219 27.163233
... ... ... ...
59995 0.437266 14.113660 31.947302
59996 0.437266 25.208851 30.317583
59997 0.437266 33.823035 29.405461
59998 0.437266 57.724209 27.891616
59999 0.437266 62.455890 27.709298
I tried this approach to use the imported data with plot_surface, but what I got was indeed a graph but it didn't work, here it's the way the graph looked with this approach:
Thank you so much
A different approach, based on re-gridding the data, that doesn't require that the original data is specified on a regular grid [deeply inspired by this example;-].
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.tri as tri
from mpl_toolkits.mplot3d import Axes3D
np.random.seed(19880808)
# compute the sombrero over a cloud of random points
npts = 10000
x, y = np.random.uniform(-5, 5, npts), np.random.uniform(-5, 5, npts)
z = np.cos(1.5*np.sqrt(x*x + y*y))/(1+0.33*(x*x+y*y))
# prepare the interpolator
triang = tri.Triangulation(x, y)
interpolator = tri.LinearTriInterpolator(triang, z)
# do the interpolation
xi = yi = np.linspace(-5, 5, 101)
Xi, Yi = np.meshgrid(xi, yi)
Zi = interpolator(Xi, Yi)
# plotting
fig = plt.figure()
ax = fig.gca(projection='3d')
norm = plt.Normalize(-1,1)
ax.plot_surface(Xi, Yi, Zi,
cmap='inferno',
norm=plt.Normalize(-1,1))
plt.show()
plot_trisurf expects x, y, z as 1D arrays while plot_surface expects X, Y, Z as 2D arrays or as x, y, Z with x, y being 1D array and Z a 2D array.
Your data consists of 3 1D arrays, so plotting them with plot_trisurf is immediate but you need to use plot_surface to be able to project the isolines on the coordinate planes... You need to reshape your data.
It seems that you have 60000 data points, in the following I assume that you have a regular grid 300 points in the x direction and 200 points in y — but what is important is the idea of regular grid.
The code below shows
the use of plot_trisurf (with a coarser mesh), similar to your code;
the correct use of reshaping and its application in plot_surface;
note that the number of rows in reshaping corresponds to the number
of points in y and the number of columns to the number of points in x;
and 4. incorrect use of reshaping, the resulting subplots are somehow
similar to the plot you showed, maybe you just need to fix the number
of row and columns.
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
x, y = np.arange(30)/3.-5, np.arange(20)/2.-5
x, y = (arr.flatten() for arr in np.meshgrid(x, y))
z = np.cos(1.5*np.sqrt(x*x + y*y))/(1+0.1*(x*x+y*y))
fig, axes = plt.subplots(2, 2, subplot_kw={"projection" : "3d"})
axes = iter(axes.flatten())
ax = next(axes)
ax.plot_trisurf(x,y,z, cmap='Reds')
ax.set_title('Trisurf')
X, Y, Z = (arr.reshape(20,30) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 20×30')
X, Y, Z = (arr.reshape(30,20) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 30×20')
X, Y, Z = (arr.reshape(40,15) for arr in (x,y,z))
ax = next(axes)
ax.plot_surface(X,Y,Z, cmap='Reds')
ax.set_title('Surface 40×15')
plt.tight_layout()
plt.show()
I have a data file in NumPy array, I would like to view the 3D-image. I am sharing an example, where I can view 2D image of size (100, 100), this is a slice in xy-plane at z = 0.
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
X, Y, Z = np.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
T = np.sin(X*Y*Z)/(X*Y*Z)
T=T[:,:,0]
im = plt.imshow(T, cmap='hot')
plt.colorbar(im, orientation='vertical')
plt.show()
How can I view a 3D image of the data T of shape (100, 100, 100)?
I think the main problem is, that you do have 4 informations for each point, so you are actually interessted in a 4-dimensional object. Plotting this is always difficult (maybe even impossible). I suggest one of the following solutions:
You change the question to: I'm not interessted in all combinations of x,y,z, but only the ones, where z = f(x,y)
You change the accuracy of you plot a bit, saying that you don't need 100 levels of z, but only maybe 5, then you simply make 5 of the plots you already have.
In case you want to use the first method, then there are several submethods:
A. Plot the 2-dim surface f(x,y)=z and color it with T
B. Use any technic that is used to plot complex functions, for more info see here.
The plot given by method 1.A (which I think is the best solution) with z=x^2+y^2 yields:
I used this programm:
import numpy as np
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
X, Y = np.mgrid[-10:10:100j, -10:10:100j]
Z = (X**2+Y**2)/10 #definition of f
T = np.sin(X*Y*Z)
norm = mpl.colors.Normalize(vmin=np.amin(T), vmax=np.amax(T))
T = mpl.cm.hot(T) #change T to colors
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, facecolors=T, linewidth=0,
cstride = 1, rstride = 1)
plt.show()
The second method gives something like:
With the code:
norm = mpl.colors.Normalize(vmin=-1, vmax=1)
X, Y= np.mgrid[-10:10:101j, -10:10:101j]
fig = plt.figure()
ax = fig.gca(projection='3d')
for i in np.linspace(-1,1,5):
Z = np.zeros(X.shape)+i
T = np.sin(X*Y*Z)
T = mpl.cm.hot(T)
ax.plot_surface(X, Y, Z, facecolors=T, linewidth=0, alpha = 0.5, cstride
= 10, rstride = 10)
plt.show()
Note: I changed the function to T = sin(X*Y*Z) because dividing by X*Y*Zmakes the functions behavior bad, as you divide two number very close to 0.
I have got a solution to my question. If we have the NumPy data, then we can convert them into TVTK ImageData and then visualization is possible with the help of mlab form Mayavi. The code and its 3D visualization are the following
from tvtk.api import tvtk
import numpy as np
from mayavi import mlab
X, Y, Z = np.mgrid[-10:10:100j, -10:10:100j, -10:10:100j]
data = np.sin(X*Y*Z)/(X*Y*Z)
i = tvtk.ImageData(spacing=(1, 1, 1), origin=(0, 0, 0))
i.point_data.scalars = data.ravel()
i.point_data.scalars.name = 'scalars'
i.dimensions = data.shape
mlab.pipeline.surface(i)
mlab.colorbar(orientation='vertical')
mlab.show()
For another randomly generated data
from numpy import random
data = random.random((20, 20, 20))
The visualization will be
Using Matplotlib, I want to plot a 2D heat map. My data is an n-by-n Numpy array, each with a value between 0 and 1. So for the (i, j) element of this array, I want to plot a square at the (i, j) coordinate in my heat map, whose color is proportional to the element's value in the array.
How can I do this?
The imshow() function with parameters interpolation='nearest' and cmap='hot' should do what you want.
Please review the interpolation parameter details, and see Interpolations for imshow and Image antialiasing.
import matplotlib.pyplot as plt
import numpy as np
a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()
Seaborn is a high-level API for matplotlib, which takes care of a lot of the manual work.
seaborn.heatmap automatically plots a gradient at the side of the chart etc.
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
uniform_data = np.random.rand(10, 12)
ax = sns.heatmap(uniform_data, linewidth=0.5)
plt.show()
You can even plot upper / lower left / right triangles of square matrices. For example, a correlation matrix, which is square and is symmetric, so plotting all values would be redundant.
corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True, cmap="YlGnBu")
plt.show()
I would use matplotlib's pcolor/pcolormesh function since it allows nonuniform spacing of the data.
Example taken from matplotlib:
import matplotlib.pyplot as plt
import numpy as np
# generate 2 2d grids for the x & y bounds
y, x = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))
z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)
# x and y are bounds, so z should be the value *inside* those bounds.
# Therefore, remove the last value from the z array.
z = z[:-1, :-1]
z_min, z_max = -np.abs(z).max(), np.abs(z).max()
fig, ax = plt.subplots()
c = ax.pcolormesh(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max)
ax.set_title('pcolormesh')
# set the limits of the plot to the limits of the data
ax.axis([x.min(), x.max(), y.min(), y.max()])
fig.colorbar(c, ax=ax)
plt.show()
For a 2d numpy array, simply use imshow() may help you:
import matplotlib.pyplot as plt
import numpy as np
def heatmap2d(arr: np.ndarray):
plt.imshow(arr, cmap='viridis')
plt.colorbar()
plt.show()
test_array = np.arange(100 * 100).reshape(100, 100)
heatmap2d(test_array)
This code produces a continuous heatmap.
You can choose another built-in colormap from here.
Here's how to do it from a csv:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
# Load data from CSV
dat = np.genfromtxt('dat.xyz', delimiter=' ',skip_header=0)
X_dat = dat[:,0]
Y_dat = dat[:,1]
Z_dat = dat[:,2]
# Convert from pandas dataframes to numpy arrays
X, Y, Z, = np.array([]), np.array([]), np.array([])
for i in range(len(X_dat)):
X = np.append(X, X_dat[i])
Y = np.append(Y, Y_dat[i])
Z = np.append(Z, Z_dat[i])
# create x-y points to be used in heatmap
xi = np.linspace(X.min(), X.max(), 1000)
yi = np.linspace(Y.min(), Y.max(), 1000)
# Interpolate for plotting
zi = griddata((X, Y), Z, (xi[None,:], yi[:,None]), method='cubic')
# I control the range of my colorbar by removing data
# outside of my range of interest
zmin = 3
zmax = 12
zi[(zi<zmin) | (zi>zmax)] = None
# Create the contour plot
CS = plt.contourf(xi, yi, zi, 15, cmap=plt.cm.rainbow,
vmax=zmax, vmin=zmin)
plt.colorbar()
plt.show()
where dat.xyz is in the form
x1 y1 z1
x2 y2 z2
...
Use matshow() which is a wrapper around imshow to set useful defaults for displaying a matrix.
a = np.diag(range(15))
plt.matshow(a)
https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.matshow.html
This is just a convenience function wrapping imshow to set useful defaults for displaying a matrix. In particular:
Set origin='upper'.
Set interpolation='nearest'.
Set aspect='equal'.
Ticks are placed to the left and above.
Ticks are formatted to show integer indices.
Here is a new python package to plot complex heatmaps with different kinds of row/columns annotations in Python: https://github.com/DingWB/PyComplexHeatmap
I'd like to make a scatter plot where each point is colored by the spatial density of nearby points.
I've come across a very similar question, which shows an example of this using R:
R Scatter Plot: symbol color represents number of overlapping points
What's the best way to accomplish something similar in python using matplotlib?
In addition to hist2d or hexbin as #askewchan suggested, you can use the same method that the accepted answer in the question you linked to uses.
If you want to do that:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=100)
plt.show()
If you'd like the points to be plotted in order of density so that the densest points are always on top (similar to the linked example), just sort them by the z-values. I'm also going to use a smaller marker size here as it looks a bit better:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
# Generate fake data
x = np.random.normal(size=1000)
y = x * 3 + np.random.normal(size=1000)
# Calculate the point density
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
# Sort the points by density, so that the densest points are plotted last
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=50)
plt.show()
Plotting >100k data points?
The accepted answer, using gaussian_kde() will take a lot of time. On my machine, 100k rows took about 11 minutes. Here I will add two alternative methods (mpl-scatter-density and datashader) and compare the given answers with same dataset.
In the following, I used a test data set of 100k rows:
import matplotlib.pyplot as plt
import numpy as np
# Fake data for testing
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
Output & computation time comparison
Below is a comparison of different methods.
1: mpl-scatter-density
Installation
pip install mpl-scatter-density
Example code
import mpl_scatter_density # adds projection='scatter_density'
from matplotlib.colors import LinearSegmentedColormap
# "Viridis-like" colormap with white background
white_viridis = LinearSegmentedColormap.from_list('white_viridis', [
(0, '#ffffff'),
(1e-20, '#440053'),
(0.2, '#404388'),
(0.4, '#2a788e'),
(0.6, '#21a784'),
(0.8, '#78d151'),
(1, '#fde624'),
], N=256)
def using_mpl_scatter_density(fig, x, y):
ax = fig.add_subplot(1, 1, 1, projection='scatter_density')
density = ax.scatter_density(x, y, cmap=white_viridis)
fig.colorbar(density, label='Number of points per pixel')
fig = plt.figure()
using_mpl_scatter_density(fig, x, y)
plt.show()
Drawing this took 0.05 seconds:
And the zoom-in looks quite nice:
2: datashader
Datashader is an interesting project. It has added support for matplotlib in datashader 0.12.
Installation
pip install datashader
Code (source & parameterer listing for dsshow):
import datashader as ds
from datashader.mpl_ext import dsshow
import pandas as pd
def using_datashader(ax, x, y):
df = pd.DataFrame(dict(x=x, y=y))
dsartist = dsshow(
df,
ds.Point("x", "y"),
ds.count(),
vmin=0,
vmax=35,
norm="linear",
aspect="auto",
ax=ax,
)
plt.colorbar(dsartist)
fig, ax = plt.subplots()
using_datashader(ax, x, y)
plt.show()
It took 0.83 s to draw this:
There is also possibility to colorize by third variable. The third parameter for dsshow controls the coloring. See more examples here and the source for dsshow here.
3: scatter_with_gaussian_kde
def scatter_with_gaussian_kde(ax, x, y):
# https://stackoverflow.com/a/20107592/3015186
# Answer by Joel Kington
xy = np.vstack([x, y])
z = gaussian_kde(xy)(xy)
ax.scatter(x, y, c=z, s=100, edgecolor='')
It took 11 minutes to draw this:
4: using_hist2d
import matplotlib.pyplot as plt
def using_hist2d(ax, x, y, bins=(50, 50)):
# https://stackoverflow.com/a/20105673/3015186
# Answer by askewchan
ax.hist2d(x, y, bins, cmap=plt.cm.jet)
It took 0.021 s to draw this bins=(50,50):
It took 0.173 s to draw this bins=(1000,1000):
Cons: The zoomed-in data does not look as good as in with mpl-scatter-density or datashader. Also you have to determine the number of bins yourself.
5: density_scatter
The code is as in the answer by Guillaume.
It took 0.073 s to draw this with bins=(50,50):
It took 0.368 s to draw this with bins=(1000,1000):
Also, if the number of point makes KDE calculation too slow, color can be interpolated in np.histogram2d [Update in response to comments: If you wish to show the colorbar, use plt.scatter() instead of ax.scatter() followed by plt.colorbar()]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.colors import Normalize
from scipy.interpolate import interpn
def density_scatter( x , y, ax = None, sort = True, bins = 20, **kwargs ) :
"""
Scatter plot colored by 2d histogram
"""
if ax is None :
fig , ax = plt.subplots()
data , x_e, y_e = np.histogram2d( x, y, bins = bins, density = True )
z = interpn( ( 0.5*(x_e[1:] + x_e[:-1]) , 0.5*(y_e[1:]+y_e[:-1]) ) , data , np.vstack([x,y]).T , method = "splinef2d", bounds_error = False)
#To be sure to plot all data
z[np.where(np.isnan(z))] = 0.0
# Sort the points by density, so that the densest points are plotted last
if sort :
idx = z.argsort()
x, y, z = x[idx], y[idx], z[idx]
ax.scatter( x, y, c=z, **kwargs )
norm = Normalize(vmin = np.min(z), vmax = np.max(z))
cbar = fig.colorbar(cm.ScalarMappable(norm = norm), ax=ax)
cbar.ax.set_ylabel('Density')
return ax
if "__main__" == __name__ :
x = np.random.normal(size=100000)
y = x * 3 + np.random.normal(size=100000)
density_scatter( x, y, bins = [30,30] )
You could make a histogram:
import numpy as np
import matplotlib.pyplot as plt
# fake data:
a = np.random.normal(size=1000)
b = a*3 + np.random.normal(size=1000)
plt.hist2d(a, b, (50, 50), cmap=plt.cm.jet)
plt.colorbar()