scatter plots with string arrays in matplotlib - python

this seems like it should be an easy one but I can't figure it out. I have a pandas data frame and would like to do a 3D scatter plot with 3 of the columns. The X and Y columns are not numeric, they are strings, but I don't see how this should be a problem.
X= myDataFrame.columnX.values #string
Y= myDataFrame.columnY.values #string
Z= myDataFrame.columnY.values #float
fig = pl.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
isn't there an easy way to do this? Thanks.

You could use np.unique(..., return_inverse=True) to get representative ints for each string. For example,
In [117]: uniques, X = np.unique(['foo', 'baz', 'bar', 'foo', 'baz', 'bar'], return_inverse=True)
In [118]: X
Out[118]: array([2, 1, 0, 2, 1, 0])
Note that X has dtype int32, as np.unique can handle at most 2**31 unique strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.axes3d as axes3d
N = 12
arr = np.arange(N*2).reshape(N,2)
words = np.array(['foo', 'bar', 'baz', 'quux', 'corge'])
df = pd.DataFrame(words[arr % 5], columns=list('XY'))
df['Z'] = np.linspace(1, 1000, N)
Z = np.log10(df['Z'])
Xuniques, X = np.unique(df['X'], return_inverse=True)
Yuniques, Y = np.unique(df['Y'], return_inverse=True)
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.scatter(X, Y, Z, s=20, c='b')
ax.set(xticks=range(len(Xuniques)), xticklabels=Xuniques,
yticks=range(len(Yuniques)), yticklabels=Yuniques)

Scatter does this automatically now (from at least matplotlib 2.1.0):
plt.scatter(['A', 'B', 'B', 'C'], [0, 1, 2, 1])

Try converting the characters to numbers for the plotting and then use the characters again for the axis labels.
Using hash
You could use the hash function for the conversion;
from mpl_toolkits.mplot3d import Axes3D
xlab = myDataFrame.columnX.values
ylab = myDataFrame.columnY.values
X =[hash(l) for l in xlab]
Y =[hash(l) for l in xlab]
Z= myDataFrame.columnY.values #float
fig = figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X, Y, np.log10(Z), s=20, c='b')
As M4rtini has pointed out in the comments, it't not clear what the spacing/scaling of string coordinates should be; the hash function could give unexpected spacings.
Nondegenerate uniform spacing
If you wanted to have the points uniformly spaced then you would have to use a different conversion.
For example you could use
X =[i for i in range(len(xlab))]
though that would cause each point to have a unique x-position even if the label is the same, and the x and y points would be correlated if you used the same approach for Y.
Degenerate uniform spacing
A third alternative is to first get the unique members of xlab (using e.g. set) and then map each xlab to a position using the unique set for the mapping; e.g.
xmap = dict((sn, i)for i,sn in enumerate(set(xlab)))
X = [xmap[l] for l in xlab]


Animate points with matplotlib

I want to have 10 moving points. I used the code below. I'm experimenting with matplotlib which I don't know very well.
from matplotlib import pyplot as plt
import numpy as np
from matplotlib import animation
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
# second option - move the point position at every frame
def update_point(n, x, y, z, point):
point.set_data(np.array([x[n], y[n]]))
point.set_3d_properties(z[n], 'z')
return point
def x(i):
return np.cos(t*i)
for i in range(10):
t=np.arange(0, 2*np.pi, 2*np.pi/100)
point, = ax.plot([x(i)[0]], [y[0]], [z[0]], 'o')
ani=animation.FuncAnimation(fig, update_point, 99, fargs=(x(i), y, z, point))
ax.set_xlim([-1.5, 1.5])
ax.set_ylim([-1.5, 1.5])
ax.set_zlim([-1.5, 1.5])
I hoped that if I turn x to a function of i, then I will have 10 points in the for loop, but nothing happened. Only one point is moving. What am I doing wrong?
For a start, you place your animation object anim into the loop, so not only the point data but also the animation object is repeatedly overwritten. For ease of use, let's put the data points into numpy arrays, where rows represent the time and columns the different points you want to animate. Then, we calculate the x, y, and z arrays based on the t array (for aesthetics, a seamless loop along the columns with length 2*pi, with each column shifted so that the points are equally distributed) and simply update the x, y, and z data row-wise in each animation step. Closely related to your script, this would look like:
from matplotlib import pyplot as plt
import numpy as np
from matplotlib import animation
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
num_of_points = 7
num_of_frames = 50
t=np.linspace(0, 2*np.pi, num_of_frames, endpoint=False)[:, None] + np.linspace(0, 2*np.pi, num_of_points, endpoint=False)[None, :]
points, = ax.plot([], [], [], 'o')
def update_points(n):
points.set_data(np.array([x[n, :], y[n, :]]))
points.set_3d_properties(z[n, :], 'z')
return points,
ax.set_xlim([-1.5, 1.5])
ax.set_ylim([-1.5, 1.5])
ax.set_zlim([-1.5, 1.5])
ani=animation.FuncAnimation(fig, update_points, num_of_frames, interval=10, blit=True, repeat=True)
Sample output:
As you chose to animate line plots (these are animated markers without visible lines, scatter plots are different in structure), you cannot use different colors unless you plot each point separately. On the plus side, you can use blitting to make the animation faster.
And another point regarding your code - I suggest not using np.arange(), as this can lead to float problems at the endpoint. Use instead np.linspace(). As default, the endpoint is included but in this script, we changed it to False, so that time point [0] is the next step in the 2*pi cycle after time point [-1].
For different point characteristics, you just have to fill your arrays differently. As I said, each consists of columns for each point and rows for the different time points:
from matplotlib import pyplot as plt
import numpy as np
from matplotlib import animation
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
num_of_points = 4
num_of_frames = 100
#different rotation frequencies
t = np.linspace(0, 2*np.pi, num_of_frames, endpoint=False)[:, None] * np.arange(1, num_of_points+1)
#different x-y centers
x = np.cos(t) + np.asarray([0, 4, 0, 3])
y = np.sin(t) + np.asarray([0, 0, 5, 2])
#different heights
z = np.zeros(num_of_frames)[:, None] + np.arange(num_of_points)
#point 4 gets random altitude fluctuations
z[:, 3] += np.random.random(num_of_frames)/5
points, = ax.plot([], [], [], 'o')
def update_points(n):
points.set_data(np.array([x[n, :], y[n, :]]))
points.set_3d_properties(z[n, :], 'z')
return points,
ax.set_xlim([x.min()-0.5, x.max()+0.5])
ax.set_ylim([y.min()-0.5, y.max()+0.5])
ax.set_zlim([z.min()-0.5, z.max()+0.5])
ani=animation.FuncAnimation(fig, update_points, num_of_frames, interval=20, blit=True, repeat=True)
As the time information is derived from the row number, you could also forget the t helper array and fill directly the x, y, and z arrays with the desired or random data as the following example shows. However, for an animation, you have to ensure smooth transitions between states, so incremental changes along axis 0 are essential.
num_of_points = 4
num_of_frames = 100
#random walk
x = np.random.random((num_of_frames, num_of_points))-0.4
y = np.random.random((num_of_frames, num_of_points))-0.3
z = np.random.random((num_of_frames, num_of_points))-0.5
x[:] = x.cumsum(axis=0)
y[:] = y.cumsum(axis=0)
z[:] = z.cumsum(axis=0)
points, = ax.plot([], [], [], 'o')

3d plot from two vectors and an array

I have two vectors that store my X, Y values than are lengths 81, 105 and then a (81,105) array (actually a list of lists) that stores my Z values for those X, Y. What would be the best way to plot this in 3d? This is what i've tried:
Z = np.load('Z.npy')
X = np.load('X.npy')
Y = np.linspace(0, 5, 105)
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap= 'viridis')
I get the following error : ValueError: shape mismatch: objects cannot be broadcast to a single shape
OK, I got it running. There is some tricks here. I will mention them in the codes.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from random import shuffle
# produce some data.
x = np.linspace(0,1,81)
y = np.linspace(0,1,105)
z = [[i for i in range(81)] for x in range(105)]
array_z = np.array(z)
# Make them randomized.
# Match data in x and y.
data = []
for i in range(len(x)):
for j in range(len(y)):
data.append([x[i], y[j], array_z[j][i]])
# Be careful how you data is stored in your Z array.
# Stored in dataframe
results = pd.DataFrame(data, columns = ['x','y','z'])
# Plot the data.
fig = plt.figure(figsize=(6,6))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(results.x, results.y, results.z, cmap= 'viridis')
The picture looks weird because I produced some data. Hope it helps.

Matplotlib Plot Lines with Colors Through Colormap

I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.
The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col =[0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
colors =,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])
Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
lc : LineCollection instance.
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
ax.set_title('Line Collection with mapped colors')
Hope this helps!
An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[ for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)
If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)
This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm =, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
Resulting figure here

how to plot legend for scatter points without reorganising array?

points with label is usually presented in X, y form
X is a multi-dimensional array, y is label/class that belongs to each point of X
what I want to do:
import matplotlib.pyplot as plt
import numpy as np
X = [[0,1],[1,2],[2,3],[3,4]]
X = np.array(X)
y = np.array([0,0,1,2])
myCmap = np.array(['r', 'g', 'b'])
myLabelMap = np.array(['car', 'bicycle', 'plane'])
plt.scatter(X[:, 0], X[:, 1], color=myCmap[y], label=myLabelMap[y])
plt.legend(loc='upper right')
however this will mess up the legend, as you can see in legend section it plot all labels for all points.
Is there a way to solve this without put the X into different arrays?
First you find out the unique labels, and the points they refer to. You then plot those points with the labels, and the others without labels:
import matplotlib.pyplot as plt
import numpy as np
X = [[0,1],[1,2],[2,3],[3,4]]
X = np.array(X)
y = np.array([0,0,1,2])
myCmap = np.array(['r', 'g', 'b'])
myLabelMap = np.array(['car', 'bicycle', 'plane'])
y_unique,id_unique = unique(y,return_index=True)
X_unique = X[id_unique]
X = asarray(X,dtype=float)
for j,yj in enumerate(y_unique):
plt.scatter(X_unique[j, 0], X_unique[j, 1], color=myCmap[yj], label=myLabelMap[yj])
X[id_unique] = nan
plt.scatter(X[:, 0], X[:, 1], color=myCmap[y])
plt.legend(loc='upper right')
See also this question.

Interactively changing the alpha value of matplotlib plots

I've looked at the documentation, but I can't seem to figure out if this is possible -
I have a dataset, with x and y values and discrete z values. Multiple pairs of (x,y) share the same z value. What I want to do is when I mouseover one point with a particular z value, the alpha of all the points with the same z values goes to 1 - i.e., If all the alpha values are initially 0.5, I'd like only the points with the same z value to go to 1.
Here's a minimal working example to illustrate what I'm talking about :
#! /usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(100)
y = np.random.randn(100)
z = np.arange(0, 10, 1)
z = np.repeat(z, 10)
im = plt.scatter(x, y, c=z, alpha = 0.5)
You can probably fake what you want to achieve using a second plot:
import numpy as np
import matplotlib.pyplot as plt
Z = np.zeros(1000, dtype = [("Z", int), ("P", float, 2)])
Z["P"] = np.random.uniform(0.0,1.0,(len(Z),2))
Z["Z"] = np.random.randint(0,50,len(Z))
def on_pick(event):
z = Z[event.ind[0]]['Z']
P = Z[np.where(Z["Z"] == z)]["P"]
fig = plt.figure(figsize=(10,10), facecolor='white')
fig.canvas.mpl_connect('pick_event', on_pick)
ax = plt.subplot(111, aspect=1)
ax.plot(Z['P'][:,0], Z['P'][:,1], 'o', color='k', alpha=0.1, picker=5)
selection_plot, = ax.plot([],[], 'o', color='black', alpha=1.0, zorder=10)

