Pandas scatter plot TypeError - python

I am trying to plot points in a shapefile using geopandas and I keep encountering
TypeError: You must first set_array for mappable
whenever I run the code below. This error disappears when I remove the colormap attribute. But I want to change the color of my points and I think colormap is helpful for that.
Here's my code:
import matplotlib.pyplot as plt
import geopandas
shapefile = geopandas.GeoDataFrame.from_file('file.shp')
fig, ax = plt.subplots(1)
base = shapefile.plot(ax=ax)
df.plot.scatter('Long', 'Lat', c=df['colC'], s=df['colD'], alpha=0.7, ax=base, colormap='viridis')

You might try direct calls to matplotlib. I don't have your dataset to try this with, but try the following:
from operator import itemgetter
import geopandas
import matplotlib.pyplot as plt
shapefile = geopandas.GeoDataFrame.from_file('file.shp')
fig, ax = plt.subplots()
shapefile.plot(ax=ax)
x, y, c, s = itemgetter('Long', 'Lat', 'colC', 'colD')(df)
ax.scatter(x, y, c=c, s=s, cmap='viridis')

Related

Python Scatter plot not working with "None" points

Say I create three lists:
x=[1,2,3]
y=[4,5,6]
z=[1,None,4]
How can I scatter this and simply only include the points with numbers (i.e. exclude the "none" point). My code won't produce a scatter plot when I include these lists (however when I include a number instead of "None" it works):
from mpl_toolkits import mplot3d
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c='r', marker='o')
plt.show()
You can do
import numpy as np
and replace your None with a np.nan. The points containing np.nan will not be plotted in your scatter plot. See this matplotlib doc for more information.
If you have long lists containing None, you can perform the conversion via
array_containing_nans = np.array(list_containing_nones, dtype=float)
you can use numpy.nan instead of None
import numpy as np
z=[1,None,4]
z_numpy = np.asarray(z, dtype=np.float32)
....
ax.scatter(x, y, z_numpy, c='r', marker='o')
You should use NaNs instead of None which is not the same thing. A NaN is a float.
Minimal example
import numpy as np
import matplotlib.pyplot as plt
x=[1,2,3]
y=[4,5,6]
z=[1,np.nan,4]
plt.scatter(x,y,z)
plt.show()

Showing a correct legend when doing scatter plot with palette

Stupid way to plot a scatter plot
Suppose I have a data with 3 classes, the following code can give me a perfect graph with a correct legend, in which I plot out data class by class.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
X0 = X[y==0]
X1 = X[y==1]
X2 = X[y==2]
ax = plt.subplot(1,1,1)
ax.scatter(X0[:,0],X0[:,1], lw=0, s=40)
ax.scatter(X1[:,0],X1[:,1], lw=0, s=40)
ax.scatter(X2[:,0],X2[:,1], lw=0, s=40)
ax.legend(['0','1','2'])
Better way to plot a scatter plot
However, if I have a dataset with 3000 classes, the above method doesn't work anymore. (You won't expect me to write 3000 line corresponding to each class, right?)
So I come up with the following plotting code.
num_classes = len(set(y))
palette = np.array(sns.color_palette("hls", num_classes))
ax = plt.subplot(1,1,1)
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
ax.legend(['0','1','2'])
This code is perfect, we can plot out all the classes with only 1 line. However, the legend is not showing correctly this time.
Question
How to maintain a correct legend when we plot graphs by using the following?
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
plt.legend() works best when you have multiple "artists" on the plot. That is the case in your first example which is why calling plt.legend(labels) works effortlessly.
If you are worried about writing lots of lines of code then you can take advantage of for loops.
As we can see with this example using 5 classes:
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs(centers=5)
ax = plt.subplot(1,1,1)
for c in np.unique(y):
ax.scatter(X[y==c,0],X[y==c,1],label=c)
ax.legend()
np.unique() returns a sorted array of the unique elements of y, by looping through these and plotting each class with its own artist plt.legend() can easily provide a legend.
Edit:
You can also assign labels to the plots as you make them which is probably safer.
plt.scatter(..., label=c) followed by plt.legend()
Why not simply do the following?
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
ngroups = 3
ax = plt.subplot(1, 1, 1)
for i in range(ngroups):
ax.scatter(X[y==i][:,0], X[y==i][:,1], lw=0, s=40, label=i)
ax.legend()

Create a discrete colorbar in matplotlib

I've tried the other threads, but can't work out how to solve. I'm attempting to create a discrete colorbar. Much of the code appears to be working, a discrete bar does appear, but the labels are wrong and it throws the error: "No mappable was found to use for colorbar creation. First define a mappable such as an image (with imshow) or a contour set (with contourf)."
Pretty sure the error is because I'm missing an argument in plt.colorbar, but not sure what it's asking for or how to define it.
Below is what I have. Any thoughts gratefully received:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
ex2 = sample_data.plot.scatter(x='order_count', y='total_value',c='cluster', marker='+', ax=ax, cmap='plasma', norm=norm, s=100, edgecolor ='none', alpha=0.70)
plt.colorbar(ticks=np.linspace(0,3,4))
plt.show()
Indeed, the fist argument to colorbar should be a ScalarMappable, which would be the scatter plot PathCollection itself.
Setup
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"x" : np.linspace(0,1,20),
"y" : np.linspace(0,1,20),
"cluster" : np.tile(np.arange(4),5)})
cmap = mpl.colors.ListedColormap(["navy", "crimson", "limegreen", "gold"])
norm = mpl.colors.BoundaryNorm(np.arange(-0.5,4), cmap.N)
Pandas plotting
The problem is that pandas does not provide you access to this ScalarMappable directly. So one can catch it from the list of collections in the axes, which is easy if there is only one single collection present: ax.collections[0].
fig, ax = plt.subplots()
df.plot.scatter(x='x', y='y', c='cluster', marker='+', ax=ax,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70, colorbar=False)
fig.colorbar(ax.collections[0], ticks=np.linspace(0,3,4))
plt.show()
Matplotlib plotting
One could consider using matplotlib directly to plot the scatter in which case you would directly use the return of the scatter function as argument to colorbar.
fig, ax = plt.subplots()
scatter = ax.scatter(x='x', y='y', c='cluster', marker='+', data=df,
cmap=cmap, norm=norm, s=100, edgecolor ='none', alpha=0.70)
fig.colorbar(scatter, ticks=np.linspace(0,3,4))
plt.show()
Output in both cases is identical.

mpldatacursor Scatter Plot point colour information

I have a scatter plot with a colour scaling where each plotted point is associated with another value. This is a lazy workaround to make a "countour plot" style image without having to regularise data points. To make analysis easier I am using mpldatacursor to generate interactive annotations on the plot, and I have a custom formatter which is displaying co-ordinates just fine:
datacursor(scatter,
formatter='$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=$$\,$THz'.format,
draggable=True)
but what I really want is for that third line, $\Delta F=$$\,$THz, to include a statement that returns the value associated with the colour map at that point. Does anyone know what kwargs I should use to achieve this?
EDIT: MWE
from mpldatacursor import datacursor
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(np.random.random(100),
np.random.random(100),
c=np.random.random(100),
s=0.5)
cb = plt.colorbar(scatter, label="Colour")
datacursor(scatter,
formatter='$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=$$\,$THz'.format,
draggable=True)
You will need to convert the index of the picked point to the value to be shown. Therefore the scatter's colors should be publicly available, such that the ind of the pick_event can index it and return the value at the picked point.
from mpldatacursor import datacursor
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.random.random(100)
y = np.random.random(100)
c = np.random.random(100)
scatter = ax.scatter(x, y, c=c, s=1)
cb = plt.colorbar(scatter, label="Colour")
def fmt(**dic):
tx = '$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=${z:.2f}$\,$THz'
dic.update({"z" : c[dic["ind"][0]]})
return tx.format(**dic)
datacursor(scatter, formatter=fmt, draggable=True)
plt.show()

matplotlib colorbar for scatter

I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot

Categories

Resources