How to make a scatter plot with different variables - python

Let's say I have a equation y=a*x, where a=[1,2,3,4] and x,y each have a set of values.
I get that for a simple x vs y plot plt.scatter(x,y) is enough, but how can I make a scatter plot of x vs y for each a?

This will create a list of axis objects and then it will make a scatterplot to each of them.
I imported numpy in order to multiply the lists, which are now numpy arrays.
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 4])
y = np.array([4, 5, 2])
a = np.array([1, 5, 2])
#create the axis objects
fig, axis = plt.subplots(1, len(a))
for i in range(len(a)):
axis[i].scatter(x, y * a[i])

Related

What are the required data for pyplot trisurf?

I cannot make it clear for me, how pyplot trisurf works. All the examples I have seen on the Internet use numpy, pandas and other stuff impeding understanding this tool
Pyplot docs say it requires X, Y and Z as 1D arrays. But if I try to provide them, it issues a RuntimeError: Error in qhull Delaunay triangulation calculation: singular input data (exitcode=2); use python verbose option (-v) to see original qhull error. I tried using python list and numpy arange
What are exactly those 1D arrays the tool wants me to provide?
plot_trisurf, when no explicit triangles are given, connects nearby 3D points with triangles to form some kind of surface. X is a 1D array (or a list) of the x-coordinates of these points (similar for Y and Z).
It doesn't work too well when all points lie on the same 3D line. For example, setting all X, Y and Z to [1, 2, 3] will result in a line, not a triangle. P1=(1,1,1), P2=(2,2,2), P3=(3,3,3). The n'th point will use the n'th x, the n'th y and the n'th z. A simple example would be ´ax.plot_trisurf([0, 1, 1], [0, 0, 1], [1, 2, 3])`.
Here is an example:
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
from math import sin, cos, pi
fig = plt.figure(figsize=(14, 9))
ax1 = fig.add_subplot(1, 2, 1, projection='3d')
ax1.plot_trisurf([0, 1, 1], [0, 0, 1], [1, 2, 3],
facecolor='cornflowerblue', edgecolor='crimson', alpha=0.4, linewidth=4, antialiased=True)
ax2 = fig.add_subplot(1, 2, 2, projection='3d')
N = 12
X = [0] + [sin(a * 2 * pi / N) for a in range(N)]
Y = [0] + [cos(a * 2 * pi / N) for a in range(N)]
Z = [1] + [0 for a in range(N)]
ax2.plot_trisurf(X, Y, Z,
facecolor='cornflowerblue', edgecolor='crimson', alpha=0.4, linewidth=4, antialiased=True)
plt.show()

matplotlib imshow -- use any vector as axis

I would like to use any vector as an axis in plt.imshow().
A = np.random.rand(4, 4)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
I imagine something like this:
plt.imshow(a, x_ax=x, y_ax=y)
I know there is an extent parameter available, but sadly it does not allow for non-equally spaced vectors.
Can anyone please help? Thanks in advance.
Imshow plots are always equally spaced. The question would be if you want to have
(a) an equally spaced plot with unequally spaced labels, or
(b) an unequally spaced plot with labels to scale.
(a) equally spaced plot
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(4, 4)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
plt.imshow(a)
plt.xticks(range(len(x)), x)
plt.yticks(range(len(y)), y)
plt.show()
(b) unequally spaced plot
import numpy as np
import matplotlib.pyplot as plt
a = np.random.rand(3, 3)
x = np.array([1, 2, 3, 8])
y = np.array([-1, 0, 2, 3])
X,Y = np.meshgrid(x,y)
plt.pcolormesh(X,Y,a)
plt.xticks(x)
plt.yticks(y)
plt.show()
Note that in this case the "vector" would specify the edges of the grid, thus they would only allow for a 3x3 array.

heatmap with variable datapoint width

I want to plot the coefficients of a linear model over time.
On the y-axis you have the i-th feature of my model, on the x-axis is time and the value of the i-th coefficient is color coded.
In my example, the coefficients are constant from 0 to t1, t1 to t2 and so on. The intervals are not equally sized. Currently I circumvent this by creating many points spaced by delta t:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
xi1 = [0, 1, 2]
t1 = range(4)
xi2 = [1, 1, 2]
t2 = range(5, 8)
data= np.vstack([xi1]*len(t1) + [xi2]*len(t2)).T
sns.heatmap(data)
Is there a way to do this more efficiently (without creating the redundant information)? I am also looking to have the right x-axis labels according to my t values.
You can use a matplotlib pcolormesh.
import matplotlib.pyplot as plt
import numpy as np
a = [[0,1],[1,1],[2,2]]
y = [0,1,2,3]
x = [0,5,8]
X,Y = np.meshgrid(x,y)
Z = np.array(a)
cmap = plt.get_cmap("RdPu", 3)
plt.pcolormesh(X,Y,Z, cmap=cmap)
plt.gca().invert_yaxis()
plt.colorbar(boundaries=np.arange(-0.5,3), ticks=np.unique(Z))
plt.show()

ploting 3D surface using array

Z=np.array([[10.,12.,12.,5.],
[10.,0.,0.,5.],
[10.,0.,0.,5.],
[10.,20.,20.,20.]])
X = np.arange(0, 4, 1)
Y = np.arange(0, 4, 1)
I have a 2D 4x4 array with. I want to make a 3D plot with x and y axes having discrete integer values from 0 to 4. Can someone help me with that?
you first need to make 2D arrays of your X,Y vectors:
import numpy as np
X2D,Y2D = np.meshgrid(X,Y)
then you can use a surface plot (or wireframe):
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = Axes3D(fig)
ax.plot_surface(X2D,Y2D, Z)
the axis will can only be 0 to 3 if you only have 4 points (you need 5 to go from 0 to 4)

Draw / Create Scatterplots of datasets with NaN

I want to draw a scatter plot using pylab, however, some of my data are NaN, like this:
a = [1, 2, 3]
b = [1, 2, None]
pylab.scatter(a,b) doesn't work.
Is there some way that I could draw the points of real value while not displaying these NaN value?
Things will work perfectly if you use NaNs. None is not the same thing. A NaN is a float.
As an example:
import numpy as np
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3], [1, 2, np.nan])
plt.show()
Have a look at pandas or numpy masked arrays (and numpy.genfromtxt to load your data) if you want to handle missing data. Masked arrays are built into numpy, but pandas is an extremely useful library, and has very nice missing value functionality.
As an example:
import matplotlib.pyplot as plt
import pandas
x = pandas.Series([1, 2, 3])
y = pandas.Series([1, 2, None])
plt.scatter(x, y)
plt.show()
pandas uses NaNs to represent masked data, while masked arrays use a separate mask array. This means that masked arrays can potentially preserve the original data, while temporarily flagging it as "missing" or "bad". However, they use more memory, and have a hidden gotchas that can be avoided by using NaNs to represent missing data.
As another example, using both masked arrays and NaNs, this time with a line plot:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 6 * np.pi, 300)
y = np.cos(x)
y1 = np.ma.masked_where(y > 0.7, y)
y2 = y.copy()
y2[y > 0.7] = np.nan
fig, axes = plt.subplots(nrows=3, sharex=True, sharey=True)
for ax, ydata in zip(axes, [y, y1, y2]):
ax.plot(x, ydata)
ax.axhline(0.7, color='red')
axes[0].set_title('Original')
axes[1].set_title('Masked Arrays')
axes[2].set_title("Using NaN's")
fig.tight_layout()
plt.show()
Because you are drawing in 2D space, your points need to be defined by both an X and an Y value. If one of the values is None, that point cannot exist in 2D space so it cannot be plotted, hence you should remove both the None and it's corresponding value from the other list.
There are many ways to accomplish this. Here is one:
a = [1, 2, 3]
b = [1, None, 2]
i = 0
while i < len(a):
if a[i] == None or b[i] == None:
a = a[:i] + a[i+1:]
b = b[:i] + b[i+1:]
else:
i += 1
"""Now a = [1, 3] and b = [1, 2]"""
pylab.scatter(a,b)

Categories

Resources