I have 3 Python list: x = [x0, x1, x2, ..., xn], y = [y0, y1, y2, ..., yn] and v = [v0, v1, v2, ..., vn] and what I need to do it to visualize the data by creating a heatmap, where at coordinate (x[k], y[k]), value v[k] is visualized, the result could be something like the result in GnuPlot Heatmap XYZ. Due to system constrain I cannot use other 3rd-party tools except numpy and matplotlib.
I've found some related topic (Heatmap in matplotlib with pcolor?, Generate a heatmap in MatPlotLib using a scatter data set) but it seems not resolved the same issue.
If the encoded matrix is not too large for memory, it can easily be converted to a dense numpy array using array slicing:
import numpy as np
import matplotlib.pyplot as plt
x = [1, 0]
y = [0, 1]
v = [2, 3]
M = np.zeros((max(x) + 1, max(y) + 1))
M[x, y] = v
fig, ax = plt.subplots()
ax.matshow(M)
Related
I cannot make it clear for me, how pyplot trisurf works. All the examples I have seen on the Internet use numpy, pandas and other stuff impeding understanding this tool
Pyplot docs say it requires X, Y and Z as 1D arrays. But if I try to provide them, it issues a RuntimeError: Error in qhull Delaunay triangulation calculation: singular input data (exitcode=2); use python verbose option (-v) to see original qhull error. I tried using python list and numpy arange
What are exactly those 1D arrays the tool wants me to provide?
plot_trisurf, when no explicit triangles are given, connects nearby 3D points with triangles to form some kind of surface. X is a 1D array (or a list) of the x-coordinates of these points (similar for Y and Z).
It doesn't work too well when all points lie on the same 3D line. For example, setting all X, Y and Z to [1, 2, 3] will result in a line, not a triangle. P1=(1,1,1), P2=(2,2,2), P3=(3,3,3). The n'th point will use the n'th x, the n'th y and the n'th z. A simple example would be ´ax.plot_trisurf([0, 1, 1], [0, 0, 1], [1, 2, 3])`.
Here is an example:
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
from math import sin, cos, pi
fig = plt.figure(figsize=(14, 9))
ax1 = fig.add_subplot(1, 2, 1, projection='3d')
ax1.plot_trisurf([0, 1, 1], [0, 0, 1], [1, 2, 3],
facecolor='cornflowerblue', edgecolor='crimson', alpha=0.4, linewidth=4, antialiased=True)
ax2 = fig.add_subplot(1, 2, 2, projection='3d')
N = 12
X = [0] + [sin(a * 2 * pi / N) for a in range(N)]
Y = [0] + [cos(a * 2 * pi / N) for a in range(N)]
Z = [1] + [0 for a in range(N)]
ax2.plot_trisurf(X, Y, Z,
facecolor='cornflowerblue', edgecolor='crimson', alpha=0.4, linewidth=4, antialiased=True)
plt.show()
I add three normal distributions to obtain a new distribution as shown below, how can I do sampling according to this distribution in python?
import matplotlib.pyplot as plt
import scipy.stats as ss
import numpy as np
x = np.linspace(0, 10, 1000)
y1 = [ss.norm.pdf(v, loc=5, scale=1) for v in x]
y2 = [ss.norm.pdf(v, loc=1, scale=1.3) for v in x]
y3 = [ss.norm.pdf(v, loc=9, scale=1.3) for v in x]
y = np.sum([y1, y2, y3], axis=0)/3
plt.plot(x, y, '-')
plt.xlabel('$x$')
plt.ylabel('$P(x)$')
BTW, is there a better way to plot such a probability distribution?
It seems that you're asking two questions: how do I sample from a distribution and how do I plot the PDF?
Assuming you're trying to sample from a mixture distribution of 3 normal ones shown in your code, the following code snipped performs this kind of sampling in the naïve, straightforward way as a proof-of-concept.
Basically, the idea is to
Choose an index i among the index of components, i.e. 0, 1, 2 ..., according to their probability weights.
Having chosen i, select the corresponding distribution and obtain a sample point from it.
Continue from 1 until enough sample points are collected.
However, to plot the PDF, you don't really need a sample in this case, because the theoretical solution is quite easy. In the more general case, the PDF can be approximated by a histogram from the sample.
The code below performs both sampling and PDF-plotting using the theoretical PDF.
import numpy as np
import numpy.random
import scipy.stats as ss
import matplotlib.pyplot as plt
# Set-up.
n = 10000
numpy.random.seed(0x5eed)
# Parameters of the mixture components
norm_params = np.array([[5, 1],
[1, 1.3],
[9, 1.3]])
n_components = norm_params.shape[0]
# Weight of each component, in this case all of them are 1/3
weights = np.ones(n_components, dtype=np.float64) / 3.0
# A stream of indices from which to choose the component
mixture_idx = numpy.random.choice(len(weights), size=n, replace=True, p=weights)
# y is the mixture sample
y = numpy.fromiter((ss.norm.rvs(*(norm_params[i])) for i in mixture_idx),
dtype=np.float64)
# Theoretical PDF plotting -- generate the x and y plotting positions
xs = np.linspace(y.min(), y.max(), 200)
ys = np.zeros_like(xs)
for (l, s), w in zip(norm_params, weights):
ys += ss.norm.pdf(xs, loc=l, scale=s) * w
plt.plot(xs, ys)
plt.hist(y, normed=True, bins="fd")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.show()
In order to make the answer of Cong Ma work more general, I slightly modified his code. The weights work now for any number of mixture components.
import numpy as np
import numpy.random
import scipy.stats as ss
import matplotlib.pyplot as plt
# Set-up.
n = 10000
numpy.random.seed(0x5eed)
# Parameters of the mixture components
norm_params = np.array([[5, 1],
[1, 1.3],
[9, 1.3]])
n_components = norm_params.shape[0]
# Weight of each component, in this case all of them are 1/3
weights = np.ones(n_components, dtype=np.float64) / float(n_components)
# A stream of indices from which to choose the component
mixture_idx = numpy.random.choice(n_components, size=n, replace=True, p=weights)
# y is the mixture sample
y = numpy.fromiter((ss.norm.rvs(*(norm_params[i])) for i in mixture_idx),
dtype=np.float64)
# Theoretical PDF plotting -- generate the x and y plotting positions
xs = np.linspace(y.min(), y.max(), 200)
ys = np.zeros_like(xs)
for (l, s), w in zip(norm_params, weights):
ys += ss.norm.pdf(xs, loc=l, scale=s) * w
plt.plot(xs, ys)
plt.hist(y, normed=True, bins="fd")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.show()
I want to plot the coefficients of a linear model over time.
On the y-axis you have the i-th feature of my model, on the x-axis is time and the value of the i-th coefficient is color coded.
In my example, the coefficients are constant from 0 to t1, t1 to t2 and so on. The intervals are not equally sized. Currently I circumvent this by creating many points spaced by delta t:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
xi1 = [0, 1, 2]
t1 = range(4)
xi2 = [1, 1, 2]
t2 = range(5, 8)
data= np.vstack([xi1]*len(t1) + [xi2]*len(t2)).T
sns.heatmap(data)
Is there a way to do this more efficiently (without creating the redundant information)? I am also looking to have the right x-axis labels according to my t values.
You can use a matplotlib pcolormesh.
import matplotlib.pyplot as plt
import numpy as np
a = [[0,1],[1,1],[2,2]]
y = [0,1,2,3]
x = [0,5,8]
X,Y = np.meshgrid(x,y)
Z = np.array(a)
cmap = plt.get_cmap("RdPu", 3)
plt.pcolormesh(X,Y,Z, cmap=cmap)
plt.gca().invert_yaxis()
plt.colorbar(boundaries=np.arange(-0.5,3), ticks=np.unique(Z))
plt.show()
I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
plt.plot(x,y)
plt.show()
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.
The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col = pl.cm.jet([0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
pl.figure()
pl.plot(x,y)
n = 20
colors = pl.cm.jet(np.linspace(0,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])
Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
Parameters
----------
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
Notes:
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
Returns
-------
lc : LineCollection instance.
"""
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
lc.set_array(np.asarray(c))
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
ax.add_collection(lc)
ax.autoscale()
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
Produces:
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
axcb.set_label('Y-intercept')
ax.set_title('Line Collection with mapped colors')
Produces:
Hope this helps!
An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[plt.cm.jet(i) for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)
If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = plt.cm.hsv(r) # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)
This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm = plt.cm.bwr(np.linspace(0, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
plt.show()
Resulting figure here
I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:
a = [1, 2, 3]
b = [1, 2, None]
pylab.scatter(a,b) doesn't work.
Is there some way that I could draw the points of real value while not displaying these NaN value?
Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.
As an example:
import numpy as np
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()
Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.
As an example:
import matplotlib.pyplot as plt
import pandas
x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()
pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.
As another example, using both masked arrays and NaNs, this time with a line plot:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)
y1 = np.ma.masked_where(y > 0.7, y)
y2 = y.copy()
y2[y > 0.7] = np.nan
fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
ax.plot(x, ydata)
ax.axhline(0.7, color='red')
axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")
fig.tight_layout()
plt.show()
Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.
There are many ways to accomplish this. Here is one:
a = [1, 2, 3]
b = [1, None, 2]
i = 0
while i < len(a):
if a[i] == None or b[i] == None:
a = a[:i] + a[i+1:]
b = b[:i] + b[i+1:]
else:
i += 1
"""Now a = [1, 3] and b = [1, 2]"""
pylab.scatter(a,b)