Python Scatter plot not working with "None" points - python

Say I create three lists:
x=[1,2,3]
y=[4,5,6]
z=[1,None,4]
How can I scatter this and simply only include the points with numbers (i.e. exclude the "none" point). My code won't produce a scatter plot when I include these lists (however when I include a number instead of "None" it works):
from mpl_toolkits import mplot3d
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z, c='r', marker='o')
plt.show()

You can do
import numpy as np
and replace your None with a np.nan. The points containing np.nan will not be plotted in your scatter plot. See this matplotlib doc for more information.
If you have long lists containing None, you can perform the conversion via
array_containing_nans = np.array(list_containing_nones, dtype=float)

you can use numpy.nan instead of None
import numpy as np
z=[1,None,4]
z_numpy = np.asarray(z, dtype=np.float32)
....
ax.scatter(x, y, z_numpy, c='r', marker='o')

You should use NaNs instead of None which is not the same thing. A NaN is a float.
Minimal example
import numpy as np
import matplotlib.pyplot as plt
x=[1,2,3]
y=[4,5,6]
z=[1,np.nan,4]
plt.scatter(x,y,z)
plt.show()

Related

Showing a correct legend when doing scatter plot with palette

Stupid way to plot a scatter plot
Suppose I have a data with 3 classes, the following code can give me a perfect graph with a correct legend, in which I plot out data class by class.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
X0 = X[y==0]
X1 = X[y==1]
X2 = X[y==2]
ax = plt.subplot(1,1,1)
ax.scatter(X0[:,0],X0[:,1], lw=0, s=40)
ax.scatter(X1[:,0],X1[:,1], lw=0, s=40)
ax.scatter(X2[:,0],X2[:,1], lw=0, s=40)
ax.legend(['0','1','2'])
Better way to plot a scatter plot
However, if I have a dataset with 3000 classes, the above method doesn't work anymore. (You won't expect me to write 3000 line corresponding to each class, right?)
So I come up with the following plotting code.
num_classes = len(set(y))
palette = np.array(sns.color_palette("hls", num_classes))
ax = plt.subplot(1,1,1)
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
ax.legend(['0','1','2'])
This code is perfect, we can plot out all the classes with only 1 line. However, the legend is not showing correctly this time.
Question
How to maintain a correct legend when we plot graphs by using the following?
ax.scatter(X[:,0], X[:,1], lw=0, s=40, c=palette[y.astype(np.int)])
plt.legend() works best when you have multiple "artists" on the plot. That is the case in your first example which is why calling plt.legend(labels) works effortlessly.
If you are worried about writing lots of lines of code then you can take advantage of for loops.
As we can see with this example using 5 classes:
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs(centers=5)
ax = plt.subplot(1,1,1)
for c in np.unique(y):
ax.scatter(X[y==c,0],X[y==c,1],label=c)
ax.legend()
np.unique() returns a sorted array of the unique elements of y, by looping through these and plotting each class with its own artist plt.legend() can easily provide a legend.
Edit:
You can also assign labels to the plots as you make them which is probably safer.
plt.scatter(..., label=c) followed by plt.legend()
Why not simply do the following?
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import make_blobs
import numpy as np
X, y = make_blobs()
ngroups = 3
ax = plt.subplot(1, 1, 1)
for i in range(ngroups):
ax.scatter(X[y==i][:,0], X[y==i][:,1], lw=0, s=40, label=i)
ax.legend()

Use Seaborn to plot 1D time series as a line with marginal histogram along y-axis

I'm trying to recreate the broad features of the following figure:
(from E.M. Ozbudak, M. Thattai, I. Kurtser, A.D. Grossman, and A. van Oudenaarden, Nat Genet 31, 69 (2002))
seaborn.jointplot does most of what I need, but it seemingly can't use a line plot, and there's no obvious way to hide the histogram along the x-axis. Is there a way to get jointplot to do what I need? Barring that, is there some other reasonably simple way to create this kind of plot using Seaborn?
Here is a way to create roughly the same plot as shown in the question. You can share the axes between the two subplots and make the width-ratio asymmetric.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
x = np.linspace(0,8, 300)
y = np.tanh(x)+np.random.randn(len(x))*0.08
fig, (ax, axhist) = plt.subplots(ncols=2, sharey=True,
gridspec_kw={"width_ratios" : [3,1], "wspace" : 0})
ax.plot(x,y, color="k")
ax.plot(x,np.tanh(x), color="k")
axhist.hist(y, bins=32, ec="k", fc="none", orientation="horizontal")
axhist.tick_params(axis="y", left=False)
plt.show()
It turns out that you can produce a modified jointplot with the needed characteristics by working directly with the underlying JointGrid object:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
x = np.linspace(0,8, 300)
y = (1 - np.exp(-x*5))*.5
ynoise= y + np.random.randn(len(x))*0.08
grid = sns.JointGrid(x, ynoise, ratio=3)
grid.plot_joint(plt.plot)
grid.ax_joint.plot(x, y, c='C0')
plt.sca(grid.ax_marg_y)
sns.distplot(grid.y, kde=False, vertical=True)
# override a bunch of the default JointGrid style options
grid.fig.set_size_inches(10,6)
grid.ax_marg_x.remove()
grid.ax_joint.spines['top'].set_visible(True)
Output:
You can use ax_marg_x.patches to affect the outcome.
Here, I use it to turn the x-axis plot white so that it cannot be seen (although the margin for it remains):
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="white", color_codes=True)
x, y = np.random.multivariate_normal([2, 3], [[0.3, 0], [0, 0.5]], 1000).T
g = sns.jointplot(x=x, y=y, kind="hex", stat_func=None, marginal_kws={'color': 'green'})
plt.setp(g.ax_marg_x.patches, color="w", )
plt.show()
Output:

mpldatacursor Scatter Plot point colour information

I have a scatter plot with a colour scaling where each plotted point is associated with another value. This is a lazy workaround to make a "countour plot" style image without having to regularise data points. To make analysis easier I am using mpldatacursor to generate interactive annotations on the plot, and I have a custom formatter which is displaying co-ordinates just fine:
datacursor(scatter,
formatter='$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=$$\,$THz'.format,
draggable=True)
but what I really want is for that third line, $\Delta F=$$\,$THz, to include a statement that returns the value associated with the colour map at that point. Does anyone know what kwargs I should use to achieve this?
EDIT: MWE
from mpldatacursor import datacursor
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(np.random.random(100),
np.random.random(100),
c=np.random.random(100),
s=0.5)
cb = plt.colorbar(scatter, label="Colour")
datacursor(scatter,
formatter='$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=$$\,$THz'.format,
draggable=True)
You will need to convert the index of the picked point to the value to be shown. Therefore the scatter's colors should be publicly available, such that the ind of the pick_event can index it and return the value at the picked point.
from mpldatacursor import datacursor
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111)
x = np.random.random(100)
y = np.random.random(100)
c = np.random.random(100)
scatter = ax.scatter(x, y, c=c, s=1)
cb = plt.colorbar(scatter, label="Colour")
def fmt(**dic):
tx = '$T=${x:.2f}$^\circ$C\n$I=${y:.2f}$\,$mA\n$\Delta F=${z:.2f}$\,$THz'
dic.update({"z" : c[dic["ind"][0]]})
return tx.format(**dic)
datacursor(scatter, formatter=fmt, draggable=True)
plt.show()

matplotlib colorbar for scatter

I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot

Python : How to plot 3d graphs using Python?

I am using matplotlib for doing this
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
x = [6,3,6,9,12,24]
y = [3,5,78,12,23,56]
ax.plot(x, y, zs=0, zdir='z', label='zs=0, zdir=z')
plt.show()
Now this builds a graph that is horizontal in the 3d space. How do I make the graph vertical so that it faces the user?
What I want to do is build multiple such vertical graphs that are separated by some distance and are facing the user.
bp's answer might work fine, but there's a much simpler way.
Your current graph is 'flat' on the z-axis, which is why it's horizontal. You want it to be vertical, which means that you want it to be 'flat' on the y-axis. This involves the tiniest modification to your code:
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
x = [6,3,6,9,12,24]
y = [3,5,78,12,23,56]
# put 0s on the y-axis, and put the y axis on the z-axis
ax.plot(xs=x, ys=[0]*len(x), zs=y, zdir='z', label='ys=0, zdir=z')
plt.show()
Then you can easily have multiple such graphs by using different values for the ys parameter (for example, ys=[2]*len(x) instead would put the graph slightly behind).
Mayavi, in particular the mlab module, provides powerful 3D plotting that will work on large and or complex data, and should be easy to use on numpy arrays.
You can set the view angle of the 3d plot with the view_init() function. The example below is for version 1.1 of matplotlib.
from mpl_toolkits.mplot3d import axes3d
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = [6,3,6,9,12,24]
y = [3,5,78,12,23,56]
ax.plot(x, y, zs=0, zdir='z', label='zs=0, zdir=z')
ax.view_init(90, -90)
plt.show()
According to the documentation you want to use the ax.plot_surface(x,y,z) method. More information and chart types here.
The following should work:
x = [1,2,3]
y = [4,5,6]
z = [7,8,9]
data = zip(x,y,z)
#map data on the plane
X, Y = numpy.meshgrid(arange(0, max(x), 1), arange(0, max(y), 1))
Z = numpy.zeros((len(Y), len(X)), 'Float32')
for x_,y_,z_ in data:
Z[x_, y_] = z_ #this should work, but only because x and y are integers
#and arange was done with a step of 1, starting from 0
fig = p.figure()
ax = p3.Axes3D(fig)
ax.plot_surface(X, Y, Z)

Categories

Resources