Draw / Create Scatterplots of datasets with NaN

Draw / Create Scatterplots of datasets with NaN - python

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:
a = [1, 2, 3]
b = [1, 2, None]
pylab.scatter(a,b) doesn't work.
Is there some way that I could draw the points of real value while not displaying these NaN value?

Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.
As an example:
import numpy as np
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()
Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.
As an example:
import matplotlib.pyplot as plt
import pandas
x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()
pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.
As another example, using both masked arrays and NaNs, this time with a line plot:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)
y1 = np.ma.masked_where(y > 0.7, y)
y2 = y.copy()
y2[y > 0.7] = np.nan
fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
ax.plot(x, ydata)
ax.axhline(0.7, color='red')
axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")
fig.tight_layout()
plt.show()

Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.
There are many ways to accomplish this. Here is one:
a = [1, 2, 3]
b = [1, None, 2]
i = 0
while i < len(a):
if a[i] == None or b[i] == None:
a = a[:i] + a[i+1:]
b = b[:i] + b[i+1:]
else:
i += 1
"""Now a = [1, 3] and b = [1, 2]"""
pylab.scatter(a,b)

Related

How to make a scatter plot with different variables

Let's say I have a equation y=a*x, where a=[1,2,3,4] and x,y each have a set of values.
I get that for a simple x vs y plot plt.scatter(x,y) is enough, but how can I make a scatter plot of x vs y for each a?

This will create a list of axis objects and then it will make a scatterplot to each of them.
I imported numpy in order to multiply the lists, which are now numpy arrays.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 4])
y = np.array([4, 5, 2])
a = np.array([1, 5, 2])
#create the axis objects
fig, axis = plt.subplots(1, len(a))
for i in range(len(a)):
axis[i].scatter(x, y * a[i])

matplotlib imshow -- use any vector as axis

I would like to use any vector as an axis in plt.imshow().
A = np.random.rand(4, 4)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
I imagine something like this:
plt.imshow(a, x_ax=x, y_ax=y)
I know there is an extent parameter available, but sadly it does not allow for non-equally spaced vectors.
Can anyone please help? Thanks in advance.

Imshow plots are always equally spaced. The question would be if you want to have
(a) an equally spaced plot with unequally spaced labels, or
(b) an unequally spaced plot with labels to scale.
(a) equally spaced plot
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(4, 4)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
plt.imshow(a)
plt.xticks(range(len(x)), x)
plt.yticks(range(len(y)), y)
plt.show()
(b) unequally spaced plot
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(3, 3)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
X,Y = np.meshgrid(x,y)
plt.pcolormesh(X,Y,a)
plt.xticks(x)
plt.yticks(y)
plt.show()
Note that in this case the "vector" would specify the edges of the grid, thus they would only allow for a 3x3 array.

heatmap with variable datapoint width

I want to plot the coefficients of a linear model over time.
On the y-axis you have the i-th feature of my model, on the x-axis is time and the value of the i-th coefficient is color coded.
In my example, the coefficients are constant from 0 to t1, t1 to t2 and so on. The intervals are not equally sized. Currently I circumvent this by creating many points spaced by delta t:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
xi1 = [0, 1, 2]
t1 = range(4)
xi2 = [1, 1, 2]
t2 = range(5, 8)
data= np.vstack([xi1]*len(t1) + [xi2]*len(t2)).T
sns.heatmap(data)
Is there a way to do this more efficiently (without creating the redundant information)? I am also looking to have the right x-axis labels according to my t values.

You can use a matplotlib pcolormesh.
import matplotlib.pyplot as plt
import numpy as np
a = [[0,1],[1,1],[2,2]]
y = [0,1,2,3]
x = [0,5,8]
X,Y = np.meshgrid(x,y)
Z = np.array(a)
cmap = plt.get_cmap("RdPu", 3)
plt.pcolormesh(X,Y,Z, cmap=cmap)
plt.gca().invert_yaxis()
plt.colorbar(boundaries=np.arange(-0.5,3), ticks=np.unique(Z))
plt.show()

Matplotlib Plot Lines with Colors Through Colormap

I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
plt.plot(x,y)
plt.show()
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.

The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col = pl.cm.jet([0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
pl.figure()
pl.plot(x,y)
n = 20
colors = pl.cm.jet(np.linspace(0,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])

Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
Parameters
----------
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
Notes:
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
Returns
-------
lc : LineCollection instance.
"""
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
lc.set_array(np.asarray(c))
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
ax.add_collection(lc)
ax.autoscale()
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
Produces:
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
axcb.set_label('Y-intercept')
ax.set_title('Line Collection with mapped colors')
Produces:
Hope this helps!

An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[plt.cm.jet(i) for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)

If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = plt.cm.hsv(r) # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)

This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm = plt.cm.bwr(np.linspace(0, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
plt.show()
Resulting figure here

create heatmap of 3 list using matplotlib and numpy

I have 3 Python list: x = [x0, x1, x2, ..., xn], y = [y0, y1, y2, ..., yn] and v = [v0, v1, v2, ..., vn] and what I need to do it to visualize the data by creating a heatmap, where at coordinate (x[k], y[k]), value v[k] is visualized, the result could be something like the result in GnuPlot Heatmap XYZ. Due to system constrain I cannot use other 3rd-party tools except numpy and matplotlib.
I've found some related topic (Heatmap in matplotlib with pcolor?, Generate a heatmap in MatPlotLib using a scatter data set) but it seems not resolved the same issue.

If the encoded matrix is not too large for memory, it can easily be converted to a dense numpy array using array slicing:
import numpy as np
import matplotlib.pyplot as plt
x = [1, 0]
y = [0, 1]
v = [2, 3]
M = np.zeros((max(x) + 1, max(y) + 1))
M[x, y] = v
fig, ax = plt.subplots()
ax.matshow(M)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Draw / Create Scatterplots of datasets with NaN - python

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this: a = [1, 2, 3] b = [1, 2, None] pylab.scatter(a,b) doesn't work. Is there some way that I could draw the points of real value while not displaying these NaN value?

Related

How to make a scatter plot with different variables

matplotlib imshow -- use any vector as axis

heatmap with variable datapoint width

Matplotlib Plot Lines with Colors Through Colormap

create heatmap of 3 list using matplotlib and numpy

Categories

Resources