I want to create subplots within a loop, but the outcome is not excactly what I imagined. I want scatter plots in one big plot. The data originates from the matching columns of two dataframes DF and dfr. DFand dfr have the same amount of rows columns and indexes. The first two columns of both dataframes should be excluded.
This is my approach, but I get i plots with one subplot each. What am I missing?
measurements = 9
for i in range(2,measurements+1):
try:
x = DF.iloc[1:,i]
y = dfr.iloc[1:,i]
inds = ~np.logical_or(np.isnan(x), np.isnan(y))
x = x[inds]
y = y[inds]
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
b, m = polyfit(x, y, 1)
fig, ax = plt.subplots(measurements+1,facecolor='w', edgecolor='k')
ax[i].scatter(x, y, c=z, s=50, cmap='jet', edgecolor='', label=None, picker=True, zorder= 2)
ax[i].plot(x, b + m * x, '-')
except KeyError:
continue
plt.show()
Currently I get several plots, but i would like to have one with multipile subplots.
Indeed, you have to put fig, ax = plt.subplots() out of the loop.
A few other things :
Setting edgecolor='' that way might raise an error. Remove it, or add a specific color.
I am sure if using try and except KeyError is relevant in your code. Python raises a KeyError whenever a dict() object is requested (using the format a = adict[key]) and the key is not in the dictionary. Maybe for: x = x[inds] ? if so, I would suggest do this check earlier in your process.
Try this :
measurements = 9
fig, ax = plt.subplots(measurements+1, facecolor='w', edgecolor='k')
for i in range(2, measurements+1):
try:
x = DF.iloc[1:,i]
y = dfr.iloc[1:,i]
inds = ~np.logical_or(np.isnan(x), np.isnan(y))
x = x[inds]
y = y[inds]
xy = np.vstack([x,y])
z = stats.gaussian_kde(xy)(xy)
b, m = np.polyfit(x, y, 1)
ax[i].scatter(x, y, c=z, s=50, cmap='jet', label=None, picker=True, zorder= 2)
ax[i].plot(x, b + m * x, '-')
except KeyError:
# Temporarily pass but ideally, do something
pass
plt.show()
Related
I am writing a script that generates 3D plots from an initial set of x/y values that are not a mesh grid. The script runs well and converts the data into a mesh grid and plots it fine but when I specify a cmap color the plot disappears. Why would this happen?
code:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = []
y = []
rlim = -4
llim = 4
increment = np.absolute((llim-rlim)*5)
linespace = np.array(np.linspace(rlim,llim,increment, endpoint = False))
for val in linespace:
for val1 in linespace:
x.append(val)
y.append(val1)
x = np.array(x)
y = np.array(y)
z = np.array(np.sin(np.sqrt(np.power(x,2)+np.power(y,2)))/np.sqrt(np.power(x,2)+np.power(y,2)))
rows = len(np.unique(x[~pd.isnull(x)]))
array_size = len(x)
columns = int(array_size/rows)
X = np.reshape(x, (rows, columns))
Y = np.reshape(y, (rows, columns))
Z = np.reshape(z, (rows, columns))
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,
cmap = 'Blues', edgecolor='none')
plt.show()
This yields
However if all I do is delete the cmap entry in ax.plot_surface I ge the following:
Why simply adding a cmap delete the plot?
Matplotlib is having a hard time scaling your colormap with NaNs in the Z matrix. I didn't look too close at your function, but it seems like you will get a NaN at the origin (0,0), and it looks like 1 is a reasonable replacement. Therefore, right after Z = np.reshape(z, (rows, columns)) I added Z[np.isnan(Z)] = 1., resulting in the following pretty graph:
I am new to python and trying to create a plot with one y variable and two x variables. I want the two lines to show up in the same plot with different labels and colors/makrers. Here is my code to attempt:
y = lambda x: x**(-3)
z = lambda x: x**(-10)
x_grid = np.linspace(1,10, 10)
v_y = []
v_z = []
for i in x_grid:
vy=y(i)
v_y.append(vy)
v_y_array = np.array(v_y)
for j in x_grid:
vz=y(j)
v_z.append(vz)
v_z_array = np.array(v_z)
fig, ax = plt.subplots()
line1, = ax.plot(x_grid, v_y_array, 'b--', label='function 1')
line2, = ax.plot(x_grid, v_z_array, 'r--', label='function 1')
ax.legend()
plt.show()
However, the figure only shows the second line and ignores the first.
But if I try to do the following, it works out fine.
y = lambda x: x**(-3)
z = lambda x: x**(-10)
x_grid = np.linspace(1,10, 10)
v_y = []
v_z = []
for i in x_grid:
vy=y(i)
v_y.append(vy)
v_y_array = np.array(v_y)
for j in x_grid:
vz=y(j)
v_z.append(vz)
v_z_array = np.array(v_z)
fig,ax=plt.subplots()
ax.plot(x_grid,v_y_array,'r--', v_z_array, 'b--', label='x**(-3) function')
ax.set_title('Two Functions')
ax.legend(['x**(-3) function','x**(-10) function'])
plt.show()
I wonder what was the problem with my first set of codes that won't produce the figure that I want?
The reason why the red and blue lines don't overlap in the second plot lies in the official documentation.
ax.plot(x_grid, v_y_array,'r--', v_z_array, 'b--', label='x**(-3) function')
The first set of three arguments x_grid, v_y_array,'r--', v_z_array, follows this pattern:
plot(x, y, 'bo') # plot x and y using blue circle markers
The second set has only two arguments: v_z_array, 'b--', and follow this pattern:
plot(y) # plot y using x as index array 0..N-1
plot(y, 'r+') # ditto, but with red plusses
Thus, the second set is infering a sequence of x values that equals range(0, 10)(values from 0 to 9 inclusive), while the first set of arguments uses x_gridwhich equals range(1, 11) (values from 1 to 10 inclusive).
I have a dataset:
a b c d
10-Apr-86 Jimmy 1 Silly.doc
11-Apr-86 Minnie 2 Lala.doc
12-Apr-86 Jimmy 3 Goofy.doc
13-Apr-86 Minnie 4 Hilarious.doc
14-Apr-86 Jimmy 5 Joyous.doc
15-Apr-86 Eliot 6 Crackingup.doc
16-Apr-86 Jimmy 7 Funny.doc
17-Apr-86 Eliot 8 Happy.doc
18-Apr-86 Minnie 9 Mirthful.doc
Using the following code in python 2.7.12..
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(df_wanted.index, df_wanted['Jimmy'], s=50, c='b', marker="s")
ax1.scatter(df_wanted.index,df_wanted['Minnie'], s=50, c='r', marker="o")
ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
plt.legend(loc='upper left');
for k, v in df.set_index('a').iterrows():
plt.text(k, v['c'], v['d'])
plt.show()
.. I can create the following visualization in matplotlib:
The problem: this is only a toy dataset. When I apply this code to my real dataset, which has more than 3000 points, all the data labels blend together in a black illegible block.
I would like to avoid this problem by using the code here to make the data labels appear when they are clicked.
The issue I'm having is with this part of the above-mentioned code,
x=[1,2,3,4,5]
y=[6,7,8,9,10]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
scat = ax.scatter(x, y)
DataCursor(scat, x, y)
plt.show()
Obviously, I need to replace the "x" and "y" with my pivot table columns, but I don't know how to make scat = ax.scatter(x, y) or DataCursor(scat, x, y) work with my data.
I tried the following
scat = ax1.scatter(df_wanted.index, df_wanted['Minnie'], s=50, c='b', marker="s")
scat1 = ax1.scatter(df_wanted.index,df_wanted['Jimmy'], s=50, c='r', marker="o")
scat2 = ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
DataCursor(scat,df_wanted.index,df_wanted['Minnie'])
DataCursor(scat1,df_wanted.index,df_wanted['Jimmy'])
DataCursor(scat2,df_wanted.index,df_wanted['Eliot'])
plt.show()
But I get this error TypeError: Invalid Type Promotion
UPDATE: I used the code from here to get the doc name in the console:
from matplotlib.pyplot import figure, show
import numpy as npy
from numpy.random import rand
import pandas as pd
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection)
c = 'r'
c1 = 'b'
c2 = 'g'
s = 85
y = df_wanted['Minnie']
z = df_wanted['Jimmy']
f = df_wanted['Eliot']
x = df_wanted.index
def onpick3(event):
ind = event.ind
print npy.take(df['d'], ind)
fig = figure()
ax1 = fig.add_subplot(111)
col = ax1.scatter(x, y, s, c, picker=True)
ax2 = fig.add_subplot(111)
col = ax1.scatter(x, z, s, c1, picker=True)
ax3 = fig.add_subplot(111)
col = ax1.scatter(x, f, s, c2, picker=True)
plt.legend(loc='upper left')
#fig.savefig('pscoll.eps')
fig.canvas.mpl_connect('pick_event', onpick3)
show()
The problem now is that the document name being returned is not accurate. I think the problem is that the ind number is for each individual series. I need a way to combine all the series, and assign an ind number to their total.
I found a solution. I realized I wanted to follow this example (Matplotlib scatterplot; colour as a function of a third variable), but needed to first make a single list of x values and a single list of y values, rather than individual lists of x and y values for each series.
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure, show
import numpy as npy
from numpy.random import rand
import pandas as pd
df = (pd.read_csv('python.csv')) #upload dataset
df['a'] = pd.to_datetime(df['a']) #convert date column to useable format
x = list(df['a'].values.flatten()) #get dataframe column data in list format
y= list(df['c'].values.flatten()) #get dataframe column data in list format
var_names = list(df['b'].values.flatten()) #get dataframe column data in list format
var_names1 = list(set(var_names)) #get unique values from column b (names)
d = {var_names1[n]:n for n in range(len(var_names1))} #generate dictionary that assigns number to each unique name in col B
namesAsNumbers = [d[z] for z in var_names] #replace names with numbers in column B
c= namesAsNumbers
if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection) # user picks point on scatter
def onpick3(event):
ind = event.ind
print npy.take(df['d'], ind) #print the document name associated with the point that's been picked
fig = figure()
ax1 = fig.add_subplot(111)
col = ax1.scatter(x, y, s= 100, c=c, picker=True)
#fig.savefig('pscoll.eps')
fig.canvas.mpl_connect('pick_event', onpick3)
plt.legend()
show()
Only problem I still have: can't seem to get a legend to appear.
It is possible to refer to different subplots using two indices to index their axes as in the following example
rows = 2
cols = 2
f, ax = plt.subplots(rows, cols)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1)
def plotFunction(xdata, ydata, ax, i, j):
ax[i,j].plot(xdata, ydata, marker='o', label='quadratic')
however if either rows or cols = 1 pyplot does not permit the use of two indices. This precludes the generic use of my plotting function that relies on double index plotting. So the following won't work
rows = 1
cols = 2
f, ax = plt.subplots(rows, cols)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1)
One way is to use the 'squeeze=False' option when calling subplots.
rows = 1
cols = 2
f, ax = plt.subplots(rows, cols, squeeze=False)
x = np.arange(12)
y = xdata**2
plotFunction(x,y,ax,0,1,label='quadratic')
def plotFunction(xdata, ydata, ax, i, j, label):
ax[i,j].plot(xdata, ydata, marker='o', label=label)
This permits [row,col] indexing in all cases.
I have file in.txt, which have many rows. and 1-20 columns (it's undefined). and contains numbers.
I draw a graphic with this code
y=np.loadtxt('in.txt')
t=np.arange(len(y))*1
plt.subplot(211)
plt.title(r'in')
plt.grid(1)
plt.plot(t,y, label = 'in')
plt.legend(borderpad = 0.1, labelspacing = 0.1)
plt.show()
It is what I have now (in this example I have 10 columns in file in.txt)
But, rather than all names in legend are "in", I want names like "1", "2", "3" etc. (from 1 to n, where n is a number of columns in my in.txt file)
One way you could do this is by plotting each line in an iteration of a for-loop. For example:
y = np.random.random((3,5)) # create fake data
t = np.arange(len(y))
plt.subplot(211)
plt.title(r'in')
plt.grid(1)
for col_indx in range(y.shape[1]):
plt.plot(t, y[:,col_indx], label = col_indx)
plt.legend(borderpad = 0.1, labelspacing = 0.1)
plt.show()
Alternatively, and I'd recommend this solution in your case, is to use the optional arguments of the call to plt.legend. Like this:
plt.plot(t, y)
plt.legend(range((len(y)))
Check out the doc-string of plt.legend when you want to go a bit more advanced.
If you wanted to start labelling using a 1-based index, rather than zero-based, don't forget to add +1 in the label and the range ;-)
You are taking advantage of the broadcasting in plot for the x/y, but the kwargs do not also get broadcast. Either
x = np.arange(25)
y = np.random.rand(25, 6)
fig, ax = plt.subplots()
for j, _y in enumerate(y.T, start=1):
ax.plot(x, _y, label=str(j))
ax.legend(borderpad=0.1, labelspacing=0.1)
or
fig, ax = plt.subplots()
lns = ax.plot(x, y)
labels = [str(j) for j in range(1, y.shape[1] + 1)]
ax.legend(handles=lns, labels=labels, borderpad=0.1, labelspacing=0.1)