I've been learning how to make scatter plots with groups, and I'm using this tutorial as a reference. I'm currently trying to create this scatter plot, however, I've noticed there is no indentation in line 19 in the code:
import numpy as np
import matplotlib.pyplot as plt
# Create data
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N))
data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")
# Create plot
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, axisbg="1.0")
for data, color, group in zip(data, colors, groups):
x, y = data
ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)
plt.title('Matplot scatter plot')
plt.legend(loc=2)
plt.show()
and I receive the following error messsage:
File "<ipython-input-2-58c0796c9f65>", line 19
x,y=data
^
IndentationError: expected an indented block
Therefore, I indented line 19, as such:
for data, color, group in zip(data, colors, groups):
x, y = data
ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)
but I get AttributeErrors after running the code.
As a reference, I've been using the first batch of code in Joe Kington's superb answer in this question Scatter plots in Pandas/Pyplot: How to plot by category, and followed the indentation pattern, but I'm lost with my scatter plot. Where did I go wrong here?
You must also indent the next line (ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group). The following code should work:
import numpy as np
import matplotlib.pyplot as plt
# Create data
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N))
data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")
# Create plot
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
for data, color, group in zip(data, colors, groups):
x, y = data
ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)
plt.title('Matplot scatter plot')
plt.legend(loc=2)
plt.show()
Related
I am trying to generate a scatter plot using dataframe series x & y and the size of the scatter data point using dataframe series z.
I should mention that I iterate through a set of each x,y, and z arrays and add the color plot outside the loop.
I see that the scatter sizes and color bar are generated at each iteration therefore scatter sizes are not consistent with all data points in the plot and also with the colorbar at the end. How do I solve this?
fig, ax = plt.subplots()
for x, y, z in arrays_of_xyz:
splot = ax.scatter(x.to_numpy(), y.to_numpy(), marker= 'o', s = z.to_numpy(), cmap ='viridis_r', c = z.to_numpy())
fig.tight_layout()
plt.colorbar(splot)
plt.show()
Gautham
Can't see in which way the sizes in the plot are inconsistent.
The colorbar can be inconsistent if you do not enforce consistent vmin and vmax when calling scatter.
Can you please try with the following code and tell more about inconsistencies you got:
import numpy as np
import matplotlib.pyplot as plt
num_sets = 3
colors = ("red", "green", "blue")
num_pts_per_set = 20
xs = np.random.randn(num_sets, num_pts_per_set)
ys = np.random.randn(num_sets, num_pts_per_set)
zs = (
np.random.rand(num_sets, num_pts_per_set)
* np.arange(1, num_sets + 1).reshape(-1, 1)
* 30
)
zmin = zs.min()
zmax = zs.max()
fig, (ax1, ax2) = plt.subplots(ncols=2)
ax1.set_title("Sizes according to z\nColors according to set #")
for i, (x, y, z, clr) in enumerate(zip(xs, ys, zs, colors)):
ax1.scatter(x, y, marker="o", s=z, c=clr, label=f"Set #{i}")
ax1.legend()
ax2.set_title("Facecolors according to z\nSizes according to set #")
for i, (x, y, z, clr) in enumerate(zip(xs, ys, zs, colors)):
splot = ax2.scatter(x, y, marker="o", c=z, edgecolors=clr, s=(i+1)*30, vmin=zmin, vmax=zmax, label=f"Set #{i}")
ax2.legend()
fig.colorbar(splot)
plt.show()
i wanted to know how to make a plot with two y-axis so that my plot that looks like this :
to something more like this by adding another y-axis :
i'm only using this line of code from my plot in order to get the top 10 EngineVersions from my data frame :
sns.countplot(x='EngineVersion', data=train, order=train.EngineVersion.value_counts().iloc[:10].index);
I think you are looking for something like:
import matplotlib.pyplot as plt
x = [1,2,3,4,5]
y = [1000,2000,500,8000,3000]
y1 = [1050,3000,2000,4000,6000]
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.bar(x, y)
ax2.plot(x, y1, 'o-', color="red" )
ax1.set_xlabel('X data')
ax1.set_ylabel('Counts', color='g')
ax2.set_ylabel('Detection Rates', color='b')
plt.show()
Output:
#gdubs If you want to do this with Seaborn's library, this code set up worked for me. Instead of setting the ax assignment "outside" of the plot function in matplotlib, you do it "inside" of the plot function in Seaborn, where ax is the variable that stores the plot.
import seaborn as sns # Calls in seaborn
# These lines generate the data to be plotted
x = [1,2,3,4,5]
y = [1000,2000,500,8000,3000]
y1 = [1050,3000,2000,4000,6000]
fig, ax1 = plt.subplots() # initializes figure and plots
ax2 = ax1.twinx() # applies twinx to ax2, which is the second y axis.
sns.barplot(x = x, y = y, ax = ax1, color = 'blue') # plots the first set of data, and sets it to ax1.
sns.lineplot(x = x, y = y1, marker = 'o', color = 'red', ax = ax2) # plots the second set, and sets to ax2.
# these lines add the annotations for the plot.
ax1.set_xlabel('X data')
ax1.set_ylabel('Counts', color='g')
ax2.set_ylabel('Detection Rates', color='b')
plt.show(); # shows the plot.
Output:
Seaborn output example
You could try this code to obtain a very similar image to what you originally wanted.
import seaborn as sb
from matplotlib.lines import Line2D
from matplotlib.patches import Rectangle
x = ['1.1','1.2','1.2.1','2.0','2.1(beta)']
y = [1000,2000,500,8000,3000]
y1 = [3,4,1,8,5]
g = sb.barplot(x=x, y=y, color='blue')
g2 = sb.lineplot(x=range(len(x)), y=y1, color='orange', marker='o', ax=g.axes.twinx())
g.set_xticklabels(g.get_xticklabels(), rotation=-30)
g.set_xlabel('EngineVersion')
g.set_ylabel('Counts')
g2.set_ylabel('Detections rate')
g.legend(handles=[Rectangle((0,0), 0, 0, color='blue', label='Nontouch device counts'), Line2D([], [], marker='o', color='orange', label='Detections rate for nontouch devices')], loc=(1.1,0.8))
What is wrong with my residual plot that is causing to not be aligned with my main graph? My code is below.
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
x = np.array([0.030956,0.032956,0.034956,0.036956,0.038956,0.040956])
y = np.array([10.57821088,11.90701212,12.55570876,13.97542486,16.05403248,16.36634177])
yerr = [0.101614114,0.363255259,0.057234211,0.09289917,0.093288198,0.420165796]
xerr = [0.00021]*len(x)
fig1 = plt.figure(1)
frame1=fig1.add_axes((.1,.3,.8,.6))
m, b = np.polyfit(x, y, 1)
print 'gradient',m,'intercept',b
plt.plot(x, m*x + b, '-', color='grey', alpha=0.5)
plt.plot(x,y,'.',color='black',markersize=6)
plt.errorbar(x,y,xerr=0,yerr=yerr,linestyle="None",color='black')
plt.ylabel('$1/\sqrt{F}$ $(N)$',fontsize=20)
plt.autoscale(enable=True, axis=u'both', tight=True)
plt.grid(False)
frame2=fig1.add_axes((.1,.1,.8,.2))
s = m*x+b #(np.sqrt(4*np.pi*8.85E-12)/2.23E-8)*x
difference = y-s
plt.plot(x, difference, 'ro')
frame2.set_ylabel('$Residual$',fontsize=20)
plt.xlabel('$2s+d_0$ $(m)$',fontsize=20)
you can specify the axis limits. the problem is that autoscale is moving your two plots differently. if you insert 2 lines of code, each specifying the axis limits, it will fix it.
plt.axis([.030,.0415, 10, 17]) #line 17
plt.axis([.030,.0415, -.6, .8]) #line 26
i believe this is what you're looking for.
Try using GridSpec.
from matplotlib import gridspec
fig = plt.figure()
gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1])
ax0 = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])
ax0.plot(x, m*x + b, '-', color='grey', alpha=0.5)
ax0.plot(x,y,'.',color='black',markersize=6)
ax1.plot(x, difference, 'ro')
And use set_ylabel instead of ylabel (which you use for plt for example) for axes.
I'm trying to create a plot with the following components:
Scatter plot
Line of best fit with error bars.
Y scaled to be log.
So this is a standard log linear plot saved to a png, but whilst I can get the scatter plot working I cannot get the fitted line to plot on the diagram. I just get one blob. Here is the code that I am using:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111, xlim=(-2,2), ylim=(1,10E11))
ax.scatter(x, y, s=1, c='black')
line, = ax.semilogy([-0.5, 1], [-0.5*m+c, 1.0*m + c], color='red', linestyle='-', linewidth=2)
ax.errorbar(-0.5, -0.5*m+c, yerr=ser, marker='o', color='red')
ax.errorbar(1, m * 1.0 + c, yerr=ser, marker='o', color='green')
ax.set_yscale('log')
fig.savefig('log.png')
I get the scatter plot. and the log scale, but not the fitted line or the error bar.
x = np.array(x)
y = np.array(y)
~50,000 points
m = gradient = 2.38329162e+09
c = 1.24269722e+09
I've tried lots of variations, but I cannot seem to get the line plotted correctly. I cannot find one example of an error bar plot with log scale.
As an update, I could finally get the line working. It was due to the y heading below zero. However I cannot still get the error bars plotted. I only can get one whisker line plot (not four) and no horizontal joining lines.
matplotlib version: 1.2.0
Since you did not provide any number, I had to guess.
But this works, so your data might be weird (have you zoomed in to see if ser is not just really small?)
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(500,1)*2 -1
y = np.random.rand(500,1)*1e10
m = gradient = 2.38329162e+09
c = 1.24269722e+09
ser = 1e10
fig = plt.figure()
ax = fig.add_subplot(111, xlim=(-2,2), ylim=(1,10E11))
ax.scatter(x, y, s=1, c='black')
ax.plot([-1, 1], [m * -1.0 + c, 1.0*m + c], color='red', linestyle='-', linewidth=2)
ax.errorbar(-1, m * -1.0 + c, yerr=(ser), marker='o', color='green')
ax.errorbar(1, m * 1.0 + c, yerr=(ser), marker='o', color='green')
ax.set_yscale('log')
fig.savefig('log.png')
Result:
I am currently plotting the same data but visualizing it differently in two subplots (see figure):
Code snippet used for producing the above figure:
# Figure
plt.figure(figsize=(14,8), dpi=72)
plt.gcf().suptitle(r'Difference between TI and $\lambda$D', size=16)
# Subplot 1
ax1 = plt.subplot2grid((1,3),(0,0),colspan=2)
# Plot scattered data in first subplot
plt.scatter(LE_x, LE_y, s=40, lw=0, color='gold', marker='o', label=r'$\lambda$D')
plt.scatter(MD_x, MD_y, s=40, lw=0, color='blue', marker='^', label=r'TI')
# Subplot 2
ax2 = plt.subplot2grid((1,3),(0,2))
plt.barh(vpos1, LE_hist, height=4, color='gold', label=r'$\lambda$D')
plt.barh(vpos2, MD_hist, height=4, color='blue', label=r'TI')
# Legend
legend = plt.legend()
Is there any way to make the legend show both the scatter dots and the bars? Would this also go per dummy as described here? Could somebody then please post a minimal working example for this, since I'm not able to wrap my head around this.
This worked for me, you essentially capture the patch handles for each graph plotted and manually create a legend at the end.
import pylab as plt
import numpy as NP
plt.figure(figsize=(14,8), dpi=72)
plt.gcf().suptitle(r'Difference between TI and $\lambda$D', size=16)
# Subplot 1
ax1 = plt.subplot2grid((1,3),(0,0),colspan=2)
N = 100
LE_x = NP.random.rand(N)
LE_y = NP.random.rand(N)
MD_x = NP.random.rand(N)
MD_y = NP.random.rand(N)
# Plot scattered data in first subplot
s1 = plt.scatter(LE_x, LE_y, s=40, lw=0, color='gold', marker='o', label=r'$\lambda$D')
s2 = plt.scatter(MD_x, MD_y, s=40, lw=0, color='blue', marker='^', label=r'TI')
data = NP.random.randn(1000)
LE_hist, bins2 = NP.histogram(data, 50)
data = NP.random.randn(1000)
MD_hist, bins2 = NP.histogram(data, 50)
# Subplot 2
ax2 = plt.subplot2grid((1,3),(0,2))
vpos1 = NP.arange(0, len(LE_hist))
vpos2 = NP.arange(0, len(MD_hist)) + 0.5
h1 = plt.barh(vpos1, LE_hist, height=0.5, color='gold', label=r'$\lambda$D')
h2 = plt.barh(vpos2, MD_hist, height=0.5, color='blue', label=r'TI')
# Legend
#legend = plt.legend()
lgd = plt.legend((s1, s2, h1, h2), (r'$\lambda$D', r'TI', r'$\lambda$D', r'TI'), loc='upper center')
plt.show()