heatmap with variable datapoint width - python

I want to plot the coefficients of a linear model over time.
On the y-axis you have the i-th feature of my model, on the x-axis is time and the value of the i-th coefficient is color coded.
In my example, the coefficients are constant from 0 to t1, t1 to t2 and so on. The intervals are not equally sized. Currently I circumvent this by creating many points spaced by delta t:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
xi1 = [0, 1, 2]
t1 = range(4)
xi2 = [1, 1, 2]
t2 = range(5, 8)
data= np.vstack([xi1]*len(t1) + [xi2]*len(t2)).T
sns.heatmap(data)
Is there a way to do this more efficiently (without creating the redundant information)? I am also looking to have the right x-axis labels according to my t values.

You can use a matplotlib pcolormesh.
import matplotlib.pyplot as plt
import numpy as np
a = [[0,1],[1,1],[2,2]]
y = [0,1,2,3]
x = [0,5,8]
X,Y = np.meshgrid(x,y)
Z = np.array(a)
cmap = plt.get_cmap("RdPu", 3)
plt.pcolormesh(X,Y,Z, cmap=cmap)
plt.gca().invert_yaxis()
plt.colorbar(boundaries=np.arange(-0.5,3), ticks=np.unique(Z))
plt.show()

Related

Create a heat map out of three 1D arrays

I want to create a heatmap out of 3 1dimensional arrays. Something that looks like this:
Up to this point, I was only able to create a scatter plot where the markers have a different color and marker size depending on the third value:
My code:
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5*np.random.rand(1000)
ms1 = (zf).astype('int')
from matplotlib.colors import LinearSegmentedColormap
# Remove the middle 40% of the RdBu_r colormap
interval = np.hstack([np.linspace(0, 0.4), np.linspace(0.6, 1)])
colors = plt.cm.RdBu_r(interval)
cmap = LinearSegmentedColormap.from_list('name', colors)
col = cmap(np.linspace(0,1,len(ms1)))
#for i in range(len(ms1)):
plt.scatter(xf, yf, c=zf, s=5*ms1/1e4, cmap=cmap,alpha=0.8)#, norm =matplotlib.colors.LogNorm())
ax1 =plt.colorbar(pad=0.01)
is giving me this result:
Any idea how I could make it look like the first figure?
Essentially what I want to do is find the average of the z value for groups of the x and y arrays
I think the functionality you are looking for is provided by scipy.stats.binned_statistic_2d. You can use it to organize values of xf and yf arrays into 2-dimensional bins, and compute the mean of zf values in each bin:
import numpy as np
from scipy import stats
np.random.seed(0)
xf = np.random.rand(1000)
yf = np.random.rand(1000)
zf = 1e5 * np.random.rand(1000)
means = stats.binned_statistic_2d(xf,
yf,
values=zf,
statistic='mean',
bins=(5, 5))[0]
Then you can use e.g. seaborn to plot a heatmap of the array of mean values:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(means,
cmap="Reds_r",
annot=True,
annot_kws={"fontsize": 16},
cbar=True,
linewidth=2,
square=True)
plt.show()
This gives:

How to plot the typical bowl shape when illustrating gradient descent with matplotlib?

When illustrating gradient descent, we usually see the bowl shape graph below. Also, it is said that using log_loss instead of squared error, we can find minimum value of loss more easily, as using squared error as loss function, may result in multiple local minimum values.
Therefore, I want to plot the bowl shape graph like below.
However, I only managed to plot the following
Here is my code, could anyone help me fix it? thanks
from mpl_toolkits.mplot3d.axes3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import math
fig, ax1 = plt.subplots(1, 1, figsize=(8, 5), subplot_kw={'projection': '3d'})
# Get the test data
x1 = 1
x2 = 1
y = 0.8
w = np.linspace(-10,10,100)
# w = np.random.random(100)
wl = np.linspace(-10,10,100)
# wl = np.random.random(100)
w1 = np.ones((100,100))
w2 = np.ones((100,100))
for idx in range(100):
w1[idx] = w1[idx]*w
w2[:,idx] = w2[:,idx]*wl
L = []
for i in range(w1.shape[0]):
for j in range(w1.shape[1]):
a = w1[i,j]*x1 + w2[i,j]*x2
f = 1/(1+math.exp(-a))
l = -(y*math.log(f)+(1-y)*math.log(1-f))
# l = (1/2)*(f-y)**2
L.append(l)
l = np.array(L).reshape(w1.shape)
ax1.plot_wireframe(w1,w2,l)
ax1.set_title("plot backpropogation")
plt.tight_layout()
plt.show()
The following ignores the Formula from the question and is probably completely unrelated to any actual problem. It just shows how to plot a bowl.
A way to plot a bowl is to use a function that is rotationally symmetric about the z axis.
For example:
from mpl_toolkits.mplot3d.axes3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig, ax1 = plt.subplots(figsize=(8, 5),
subplot_kw={'projection': '3d'})
alpha = 0.8
r = np.linspace(-alpha,alpha,100)
X,Y= np.meshgrid(r,r)
l = 1./(1+np.exp(-(X**2+Y**2)))
ax1.plot_wireframe(X,Y,l)
ax1.set_title("plot")
plt.show()

Matplotlib Plot Lines with Colors Through Colormap

I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
plt.plot(x,y)
plt.show()
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.
The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col = pl.cm.jet([0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
pl.figure()
pl.plot(x,y)
n = 20
colors = pl.cm.jet(np.linspace(0,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])
Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
Parameters
----------
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
Notes:
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
Returns
-------
lc : LineCollection instance.
"""
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
lc.set_array(np.asarray(c))
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
ax.add_collection(lc)
ax.autoscale()
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
Produces:
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
axcb.set_label('Y-intercept')
ax.set_title('Line Collection with mapped colors')
Produces:
Hope this helps!
An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[plt.cm.jet(i) for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)
If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = plt.cm.hsv(r) # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)
This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm = plt.cm.bwr(np.linspace(0, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
plt.show()
Resulting figure here

Python - Randomly subsamble a range of points to plot

I have two lists, x and y, that I wish to plot together in a scatter plot.
The lists contain too many data points. I would like a graph with much less points. I cannot crop or trim these lists, I need to randomly subsamble a set number of points from both of these lists. What would be the best way to approach this?
You could subsample the lists using
idx = np.random.choice(np.arange(len(x)), num_samples)
plt.scatter(x[idx], y[idx])
However, this leaves the result a bit up to random luck. We can do better by making a heatmap. plt.hexbin makes this particularly easy:
plt.hexbin(x, y)
Here is an example, comparing the two methods:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
np.random.seed(2015)
N = 10**5
val1 = np.random.normal(loc=10, scale=2,size=N)
val2 = np.random.normal(loc=0, scale=1, size=N)
fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True)
cmap = plt.get_cmap('jet')
norm = mcolors.LogNorm()
num_samples = 10**4
idx = np.random.choice(np.arange(len(val1)), num_samples)
ax[0].scatter(val1[idx], val2[idx])
ax[0].set_title('subsample')
im = ax[1].hexbin(val1, val2, gridsize=50, cmap=cmap, norm=norm)
ax[1].set_title('hexbin heatmap')
plt.tight_layout()
fig.colorbar(im, ax=ax.ravel().tolist())
plt.show()
You can pick randomly from x and y using a random index mask
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
# Pick random 10 samples, 2 means two choices from [0, 1] for the mask
subsample = np.random.choice(2, 10).astype(bool)
plt.scatter(x[subsample], y[subsample])
plt.show()
Alternatively you can use hist2d to plot a 2D histogram, which uses densities instead of data points
plt.hist2d(x, y) # No need to subsample
You can use random.sample():
max_points = len(x)
# Assuming you only want 50 points.
random_indexes = random.sample(range(max_points), 50)
new_x = [x[i] for i in random_indexes]
new_y = [y[i] for i in random_indexes]

Matplotlib matrix/image explicitly state axis values

I would use imshow for this, so I will use it to describe my problem.
I have several matrices which I would like to plot on the same axis. Something like this:
import matplotlib.pyplot as plt
import numpy as np
a = np.array([[0,1,2],[0,1,2]])
x = np.array([0,1,2])
y = np.array([0,1])
a2 = np.array([[10,11,12],[10,11,12]])
x2 = np.array([10,11,12])
y2 = np.array([0,1])
plt.imshow(a,extent=[x.min(),x.max(),y.min(),y.max()])
plt.imshow(a2,extent=[x2.min(),x2.max(),y2.min(),y2.max()])
plt.show()
(With this code the first imshow is overwritten by the second)
The reason why I can't combine them into a single matrix with one set of x and y axes (by filling the gaps with zeros) is that the combined matrix would be huge and there are large spaces in between the strips.
It's not overwritten, the axes limits are just reset to the extents of the last image each time.
Just call plt.autoscale().
As a quick example of what you're seeing:
import numpy as np
import matplotlib.pyplot as plt
data1, data2 = np.random.random((2,10,10))
fig, ax = plt.subplots()
ax.imshow(data1, extent=[-10, 0, -10, 0])
ax.imshow(data2, extent=[10, 20, 10, 20])
plt.show()
Now, if we just call autoscale:
import numpy as np
import matplotlib.pyplot as plt
data1, data2 = np.random.random((2,10,10))
fig, ax = plt.subplots()
ax.imshow(data1, extent=[-10, 0, -10, 0])
ax.imshow(data2, extent=[10, 20, 10, 20])
ax.autoscale()
plt.show()

Categories

Resources