I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot
Related
Suppose I draw a plot using the code below. How to plot the rug part on the top edge of x-axis?
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(np.random.normal(0, 0.1, 100), rug=True, hist=False)
plt.show()
The seaborn.rugplot creates a LineCollection with the length of the lines being defined in axes coordinates. Those are always the same, such that the plot does not change if you invert the axes.
You can create your own LineCollection from the data though. The advantage compared to using bars is that the linewidth is in points and therefore no lines will be lost independend of the data range.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
def upper_rugplot(data, height=.05, ax=None, **kwargs):
from matplotlib.collections import LineCollection
ax = ax or plt.gca()
kwargs.setdefault("linewidth", 1)
segs = np.stack((np.c_[data, data],
np.c_[np.ones_like(data), np.ones_like(data)-height]),
axis=-1)
lc = LineCollection(segs, transform=ax.get_xaxis_transform(), **kwargs)
ax.add_collection(lc)
fig, ax = plt.subplots()
data = np.random.normal(0, 0.1, 100)
sns.distplot(data, rug=False, hist=False, ax=ax)
upper_rugplot(data, ax=ax)
plt.show()
Rugs are just thin lines at the data points. Yo can think of them as thin bars. That being said, you can have a following work around: Plot distplot without rugs and then create a twin x-axis and plot a bar chart with thin bars. Following is a working answer:
import numpy as np; np.random.seed(21)
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots()
data = np.random.normal(0, 0.1, 100)
sns.distplot(data, rug=False, hist=False, ax=ax)
ax1 = ax.twinx()
ax1.bar(data, height=ax.get_ylim()[1]/10, width=0.001)
ax1.set_ylim(ax.get_ylim())
ax1.invert_yaxis()
ax1.set_yticks([])
plt.show()
Stupid way to plot a scatter plot
Suppose I have a data with 3 classes, the following code can give me a perfect graph with a correct legend, in which I plot out data class by class.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
X0 = X[y==0]
X1 = X[y==1]
X2 = X[y==2]
ax = plt.subplot(1,1,1)
ax.scatter(X0[:,0],X0[:,1], lw=0, s=40)
ax.scatter(X1[:,0],X1[:,1], lw=0, s=40)
ax.scatter(X2[:,0],X2[:,1], lw=0, s=40)
ax.legend(['0','1','2'])
Better way to plot a scatter plot
However, if I have a dataset with 3000 classes, the above method doesn't work anymore. (You won't expect me to write 3000 line corresponding to each class, right?)
So I come up with the following plotting code.
num_classes = len(set(y))
palette = np.array(sns.color_palette("hls", num_classes))
ax = plt.subplot(1,1,1)
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
ax.legend(['0','1','2'])
This code is perfect, we can plot out all the classes with only 1 line. However, the legend is not showing correctly this time.
Question
How to maintain a correct legend when we plot graphs by using the following?
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
plt.legend() works best when you have multiple "artists" on the plot. That is the case in your first example which is why calling plt.legend(labels) works effortlessly.
If you are worried about writing lots of lines of code then you can take advantage of for loops.
As we can see with this example using 5 classes:
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs(centers=5)
ax = plt.subplot(1,1,1)
for c in np.unique(y):
ax.scatter(X[y==c,0],X[y==c,1],label=c)
ax.legend()
np.unique() returns a sorted array of the unique elements of y, by looping through these and plotting each class with its own artist plt.legend() can easily provide a legend.
Edit:
You can also assign labels to the plots as you make them which is probably safer.
plt.scatter(..., label=c) followed by plt.legend()
Why not simply do the following?
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
ngroups = 3
ax = plt.subplot(1, 1, 1)
for i in range(ngroups):
ax.scatter(X[y==i][:,0], X[y==i][:,1], lw=0, s=40, label=i)
ax.legend()
I've tried the other threads, but can't work out how to solve. I'm attempting to create a discrete colorbar. Much of the code appears to be working, a discrete bar does appear, but the labels are wrong and it throws the error: "No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf)."
Pretty sure the error is because I'm missing an argument in plt.colorbar, but not sure what it's asking for or how to define it.
Below is what I have. Any thoughts gratefully received:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
ex2 = sample_data.plot.scatter(x='order_count', y='total_value',c='cluster', marker='+', ax=ax, cmap='plasma', norm=norm, s=100, edgecolor ='none', alpha=0.70)
plt.colorbar(ticks=np.linspace(0,3,4))
plt.show()
Indeed, the fist argument to colorbar should be a ScalarMappable, which would be the scatter plot PathCollection itself.
Setup
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"x" : np.linspace(0,1,20),
"y" : np.linspace(0,1,20),
"cluster" : np.tile(np.arange(4),5)})
cmap = mpl.colors.ListedColormap(["navy", "crimson", "limegreen", "gold"])
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
Pandas plotting
The problem is that pandas does not provide you access to this ScalarMappable directly. So one can catch it from the list of collections in the axes, which is easy if there is only one single collection present: ax.collections[0].
fig, ax = plt.subplots()
df.plot.scatter(x='x', y='y', c='cluster', marker='+', ax=ax,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70, colorbar=False)
fig.colorbar(ax.collections[0], ticks=np.linspace(0,3,4))
plt.show()
Matplotlib plotting
One could consider using matplotlib directly to plot the scatter in which case you would directly use the return of the scatter function as argument to colorbar.
fig, ax = plt.subplots()
scatter = ax.scatter(x='x', y='y', c='cluster', marker='+', data=df,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70)
fig.colorbar(scatter, ticks=np.linspace(0,3,4))
plt.show()
Output in both cases is identical.
When I plot the pcolormesh plot use the colormap from matplotlib.cm (like "jet", "Set2", etc), I can use:
cMap = plt.cm.get_cmap("jet",lut=6)
The colorbar shows like this:
But if I want to call the colormap from the Basemap package (like GMT_drywet, GMT_no_green, etc). I can't use plt.cm,get_cmap to get these colormap and divide them.
Does mpl_toolkits.basemap.cm have a similiar function like lut?
Expanding on #tacaswell's comment above, you can achieve the same functionality using the _resample method. This will produce segmented colormaps for pcolor/pcolormesh plots which don't generate discrete-stepped colorbars like contourf. To achieve the same effect as you did with jet in your question:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import cm
plt.figure()
cmap = cm.GMT_drywet._resample(6)
pm = plt.pcolormesh(np.random.rand(10,8), cmap=cmap)
plt.colorbar(pm, orientation='horizontal')
plt.show()
As long as the plot you are making has discrete color values (e.g. contour or contourf), then colorbar should automatically generate a colorbar with discrete steps. Here's a plot based on the first example from the basemap documentation:
from mpl_toolkits.basemap import Basemap, cm
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
ax.hold(True)
map = Basemap(projection='ortho',lat_0=45,lon_0=-100,resolution='l')
map.drawcoastlines(linewidth=0.25)
map.drawcountries(linewidth=0.25)
map.fillcontinents(color='coral',lake_color='aqua')
map.drawmapboundary(fill_color='aqua')
map.drawmeridians(np.arange(0,360,30))
map.drawparallels(np.arange(-90,90,30))
nlats = 73; nlons = 145; delta = 2.*np.pi/(nlons-1)
lats = (0.5*np.pi-delta*np.indices((nlats,nlons))[0,:,:])
lons = (delta*np.indices((nlats,nlons))[1,:,:])
wave = 0.75*(np.sin(2.*lats)**8*np.cos(4.*lons))
mean = 0.5*np.cos(2.*lats)*((np.sin(2.*lats))**2 + 2.)
x, y = map(lons*180./np.pi, lats*180./np.pi)
map.contourf(x,y,wave+mean,15, alpha=0.5, cmap=cm.GMT_drywet)
cb = map.colorbar()
plt.show()
I have a range of points x and y stored in numpy arrays.
Those represent x(t) and y(t) where t=0...T-1
I am plotting a scatter plot using
import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.show()
I would like to have a colormap representing the time (therefore coloring the points depending on the index in the numpy arrays)
What is the easiest way to do so?
Here is an example
import numpy as np
import matplotlib.pyplot as plt
x = np.random.rand(100)
y = np.random.rand(100)
t = np.arange(100)
plt.scatter(x, y, c=t)
plt.show()
Here you are setting the color based on the index, t, which is just an array of [1, 2, ..., 100].
Perhaps an easier-to-understand example is the slightly simpler
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(100)
y = x
t = x
plt.scatter(x, y, c=t)
plt.show()
Note that the array you pass as c doesn't need to have any particular order or type, i.e. it doesn't need to be sorted or integers as in these examples. The plotting routine will scale the colormap such that the minimum/maximum values in c correspond to the bottom/top of the colormap.
Colormaps
You can change the colormap by adding
import matplotlib.cm as cm
plt.scatter(x, y, c=t, cmap=cm.cmap_name)
Importing matplotlib.cm is optional as you can call colormaps as cmap="cmap_name" just as well. There is a reference page of colormaps showing what each looks like. Also know that you can reverse a colormap by simply calling it as cmap_name_r. So either
plt.scatter(x, y, c=t, cmap=cm.cmap_name_r)
# or
plt.scatter(x, y, c=t, cmap="cmap_name_r")
will work. Examples are "jet_r" or cm.plasma_r. Here's an example with the new 1.5 colormap viridis:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(100)
y = x
t = x
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.scatter(x, y, c=t, cmap='viridis')
ax2.scatter(x, y, c=t, cmap='viridis_r')
plt.show()
Colorbars
You can add a colorbar by using
plt.scatter(x, y, c=t, cmap='viridis')
plt.colorbar()
plt.show()
Note that if you are using figures and subplots explicitly (e.g. fig, ax = plt.subplots() or ax = fig.add_subplot(111)), adding a colorbar can be a bit more involved. Good examples can be found here for a single subplot colorbar and here for 2 subplots 1 colorbar.
To add to wflynny's answer above, you can find the available colormaps here
Example:
import matplotlib.cm as cm
plt.scatter(x, y, c=t, cmap=cm.jet)
or alternatively,
plt.scatter(x, y, c=t, cmap='jet')
Subplot Colorbar
For subplots with scatter, you can trick a colorbar onto your axes by building the "mappable" with the help of a secondary figure and then adding it to your original plot.
As a continuation of the above example:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(10)
y = x
t = x
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.scatter(x, y, c=t, cmap='viridis')
ax2.scatter(x, y, c=t, cmap='viridis_r')
# Build your secondary mirror axes:
fig2, (ax3, ax4) = plt.subplots(1, 2)
# Build maps that parallel the color-coded data
# NOTE 1: imshow requires a 2-D array as input
# NOTE 2: You must use the same cmap tag as above for it match
map1 = ax3.imshow(np.stack([t, t]),cmap='viridis')
map2 = ax4.imshow(np.stack([t, t]),cmap='viridis_r')
# Add your maps onto your original figure/axes
fig.colorbar(map1, ax=ax1)
fig.colorbar(map2, ax=ax2)
plt.show()
Note that you will also output a secondary figure that you can ignore.
Single colorbar for multiple subplots
sometimes it is preferable to have a single colorbar to indicate data values visualised on multiple subplots.
In this case, a Normalize() object needs to be created using the minimum and maximum data values across both plots.
Then a colorbar object can be created from a ScalarMappable() object, which maps between scalar values and colors.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(10)
y = x
t1 = x # Colour data for first plot
t2 = 2*x # Color data for second plot
all_data = np.concatenate([t1, t2])
# Create custom Normalise object using the man and max data values across both subplots to ensure colors are consistent on both plots
norm = plt.Normalize(np.min(all_data), np.max(all_data))
fig, axs = plt.subplots(1, 2)
axs[0].scatter(x, y, c=t1, cmap='viridis', norm=norm)
axs[1].scatter(x**2, y, c=t2, cmap='viridis', norm=norm)
# Create the colorbar
smap = plt.cm.ScalarMappable(cmap='viridis', norm=norm)
cbar = fig.colorbar(smap, ax=axs, fraction=0.1, shrink = 0.8)
cbar.ax.tick_params(labelsize=11)
cbar.ax.set_ylabel('T', rotation=0, labelpad = 15, fontdict = {"size":14})
plt.show()
subplots_colorbar