Pyplot scatterplot legend not working with smaller sample sizes

Pyplot scatterplot legend not working with smaller sample sizes - python

I'm using the code below to generate a scatter plot in pyplot where I'd like to have each of the 9 classes plotted in a different color. There are multiple points within each class.
I cannot figure out why the legend does not work with smaller sample sizes.
def plot_scatter_test(x, y, c, title):
data = pd.DataFrame({'x': x, 'y': y, 'c': c})
classes = len(np.unique(c))
colors = cm.rainbow(np.linspace(0, 1, classes))
ax = plt.subplot(111)
for s in range(0,classes):
ss = data[data['c']==s]
plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s)
ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, bbox_to_anchor=(0, -.4), title='Legend')
plt.show()
My data looks like this
When I plot this by calling
plot_scatter_test(test['x'], test['y'],test['group'])
I get varying colors in the chart, but the legend is a single color
So to make sure my data was ok, I created a random dataframe using the same type of data. Now I get different colors, but something is still wrong as they aren't sequential.
test2 = pd.DataFrame({
'y': np.random.uniform(0,1400,36),
'x': np.random.uniform(-250,-220,36),
'group': np.random.randint(0,9,36)
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
Finally, I create a larger plot of 360 data points, and everything looks the way I would expect it to. What am I doing wrong?
test3 = pd.DataFrame({
'y': np.random.uniform(0,1400,360),
'x': np.random.uniform(-250,-220,360),
'group': np.random.randint(0,9,360)
})
plot_scatter_test(test3['x'], test3['y'],test3['group'])

You need to make sure not to confuse the class itself with the number you use for indexing.
To better observe what I mean, use the following dataset with your function:
np.random.seed(22)
X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8))
test2 = pd.DataFrame({
'y': Y.flatten(),
'x': X.flatten(),
'group': np.random.randint(0,9,len(X.flatten()))
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
which results in the following plot, where points are missing.
So, make a clear distinction between the index and the class, e.g. as follows
import numpy as np; np.random.seed(22)
import matplotlib.pyplot as plt
import pandas as pd
def plot_scatter_test(x, y, c, title="title"):
data = pd.DataFrame({'x': x, 'y': y, 'c': c})
classes = np.unique(c)
print classes
colors = plt.cm.rainbow(np.linspace(0, 1, len(classes)))
print colors
ax = plt.subplot(111)
for i, clas in enumerate(classes):
ss = data[data['c']==clas]
plt.scatter(ss["x"],ss["y"],c=[colors[i]]*len(ss), label=clas)
ax.legend(loc='lower left',scatterpoints=1, ncol=3, fontsize=8, title='Legend')
plt.show()
X,Y= np.meshgrid(np.arange(3,7), np.arange(4,8))
test2 = pd.DataFrame({
'y': Y.flatten(),
'x': X.flatten(),
'group': np.random.randint(0,9,len(X.flatten()))
})
plot_scatter_test(test2['x'], test2['y'],test2['group'])
Apart from that it is indeed necessary not to supply the color 4-tuple directly to c as this would be interpreted as four single colors.

I feel silly now after staring at this for a while. The error was in the color being passed. I was passing a single color to the .scatter function. However since there are multiple points, you need to pass an equal number of colors. Therefore
plt.scatter(x=ss['x'], y=ss['y'],c=colors[s], label=s)
Can be something like
plt.scatter(x=ss['x'], y=ss['y'],c=[colors[s]]*len(ss), label=s)

Related

Plotly: Choose a different intersection of X and Y axes

In Plotly, in order to create scatter plots, I usually do the following:
fig = px.scatter(df, x=x, y=y)
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
I want the yaxis to intersect the xaxis at x=6. So, instead of left yaxis representing negative numbers, I want it to be from [2,6] After the intersection, right side of graph is from [6,10].
Likewise, yaxis from below axis goes from [2,6]. Above the xaxis, it goes from [6,10].
How can I do this in Plotly?

Following on from my comment, as far as I am aware, what you're after is not currently available.
However, here is an example of a work-around which uses a shapes dictionary to add horizontal and vertical lines - acting as intersecting axes - placed at your required x/y intersection of 6.
Sample dataset:
import numpy as np
x = (np.random.randn(100)*2)+6
y1 = (np.random.randn(100)*2)+6
y2 = (np.random.randn(100)*2)+6
Example plotting code:
import plotly.io as pio
layout = {'title': 'Intersection of X/Y Axes Demonstration'}
shapes = []
traces = []
traces.append({'x': x, 'y': y1, 'mode': 'markers'})
traces.append({'x': x, 'y': y2, 'mode': 'markers'})
shapes.append({'type': 'line',
'x0': 2, 'x1': 10,
'y0': 6, 'y1': 6})
shapes.append({'type': 'line',
'x0': 6, 'x1': 6,
'y0': 2, 'y1': 10})
layout['shapes'] = shapes
layout['xaxis'] = {'range': [2, 10]}
layout['yaxis'] = {'range': [2, 10]}
pio.show({'data': data, 'layout': layout})
Output:
Comments (TL;DR):
The example code shown here uses the low-level Plotly API (plotly.io), rather than a convenience wrapper such as graph_objects or express. The reason is that I (personally) feel it's helpful to users to show what is occurring 'under the hood', rather than masking the underlying code logic with a convenience wrapper.
This way, when the user needs to modify a finer detail of the graph, they will have a better understanding of the lists and dicts which Plotly is constructing for the underlying graphing engine (orca).

I think fig.add_hline() and fig.add_vline() is the function your need.
Example code
import plotly.express as px
import pandas as pd
df = pd.DataFrame({'x':[6,7,3], 'y':[4,5,6]})
fig = px.scatter(df, x='x', y='y')
fig.update_xaxes(range=[2, 10])
fig.update_yaxes(range=[2, 10])
fig.add_hline(y=4)
fig.add_vline(x=6)
fig.show()
Output

Plotting multiple wav files in python [duplicate]

I would like to plot one ore more signals into one plot.
For each signal, a individual color, linewidth and linestyle may be specified.
If multiple signals have to be plotted, a legend should be provided as well.
So far, I use the following code which allows me to plot up to three signals.
import matplotlib
fig = matplotlib.figure.Figure(figsize=(8,6))
subplot = fig.add_axes([0.1, 0.2, 0.8, 0.75])
Signal2, Signal3, legend, t = None, None, None, None
Signal1, = subplot.plot(xDataSignal1, yDataSignal1, color=LineColor[0], linewidth=LineWidth[0],linestyle=LineStyle[0])
if (yDataSignal2 != [] and yDataSignal3 != []):
Signal2, = subplot.plot(xDataSignal2, yDataSignal2, color=LineColor[1], linewidth=LineWidth[1],linestyle=LineStyle[1])
Signal3, = subplot.plot(xDataSignal3, yDataSignal3, color=LineColor[2], linewidth=LineWidth[2],linestyle=LineStyle[2])
legend = subplot.legend([Signal1, Signal2, Signal3], [yLabel[0], yLabel[1], yLabel[2]],LegendPosition,labelspacing=0.1, borderpad=0.1)
legend.get_frame().set_linewidth(0.5)
for t in legend.get_texts():
t.set_fontsize(10)
elif (yDataSignal2 != []):
Signal2, = subplot.plot(xDataSignal2, yDataSignal2, color=LineColor[1], linewidth=LineWidth[1],linestyle=LineStyle[1])
legend = subplot.legend([Signal1, Signal2], [yLabel[0], yLabel[1]], LegendPosition,labelspacing=0.1, borderpad=0.1)
legend.get_frame().set_linewidth(0.5)
for t in legend.get_texts():
t.set_fontsize(10)
Is it possible to generalize that code such that it is more Pythonic and supports up to n signals by still making use of matplotlib and subplot?
Any suggestions are highly appreciated.

A list of dicts might be a good solution for this (you could even use a defaultdict to default the color and linewidth in case you don't want to specify it, read more here)
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
mysignals = [{'name': 'Signal1', 'x': np.arange(10,20,1),
'y': np.random.rand(10), 'color':'r', 'linewidth':1},
{'name': 'Signal2', 'x': np.arange(10,20,1),
'y': np.random.rand(10), 'color':'b', 'linewidth':3},
{'name': 'Signal3', 'x': np.arange(10,20,1),
'y': np.random.rand(10), 'color':'k', 'linewidth':2}]
fig, ax = plt.subplots()
for signal in mysignals:
ax.plot(signal['x'], signal['y'],
color=signal['color'],
linewidth=signal['linewidth'],
label=signal['name'])
# Enable legend
ax.legend()
ax.set_title("My graph")
plt.show()

How to plot a lineplot with dots on specific points with specific colors and linetypes for each line using seaborn?

I have the following dataframe
import pandas as pd
data_tmp = pd.DataFrame({'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],
'color': ['#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#247AB2','#247AB2','#247AB2','#247AB2','#247AB2'],
'point': [14,14,14,14,14,28,28,28,28,28],
'linestyles':['-','-','-','-','-','--','--','--','--','--']})
I would like to produce a lineplot with different color and linestyles per cat. But I would like to give the specific color and linestyles per cat as they are defined in the dataframe. Lastly I would like to mark the points on each line with the same color.
I have only tried:
sns.lineplot(x="x", y="y", hue="cat", data=data_tmp)
sns.scatterplot(x="point",y="y",hue="cat", data=data_tmp[data_tmp.point==data_tmp.x])
plt.show()
Any ideas ?

Maybe you want to use matplotlib directly, like
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],})
d = {"A" : {"color": '#B5D8F0', "markersize": 5, "linestyle": "-"},
"B" : {"color": '#247AB2', "markersize": 10, "linestyle": "--"}}
for n, grp in df.groupby("cat"):
plt.plot(grp.x, grp.y, marker="o", label=n, **d[n])
plt.legend()
plt.show()

This is how I could do this. You need to use the cat column to control the different plot parameters (color, style, marker size), and then create mapping objects (here dicts) that tell which parameter value to use for each category. The color is easy. The linestyle is harder, because Seaborn only offers dashes as a configurable parameter, which needs to be given in the advanced Matplotlib format of (segment, gap). The function matplotlib.lines._get_dash_pattern translates the string value (e.g. --) to this format, although the returned value needs to be handled with care. For the marker size, unfortunately lineplot does not offer the possibility to change the marker size with the category (even though you can change the marker style), so you need to use a scatterplot on top. The last bit is the legend, you probably want to disable it for the second plot, to avoid repeating it, but the problem is that the first legend will not have the markers in it. If that bothers you, you can still edit the legend manually. All in all, it could look like this:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
# Converts a line style to a format acceptable by Seaborn
def get_dash_pattern(style):
_, dash = mpl.lines._get_dash_pattern(style)
return dash if dash else (None, None)
data_tmp = pd.DataFrame({
'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],
'color': ['#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0',
'#247AB2','#247AB2','#247AB2','#247AB2','#247AB2'],
'point': [14,14,14,14,14,28,28,28,28,28],
'linestyles':['-','-','-','-','-','--','--','--','--','--']})
# Extract plot features as dicts
feats = (data_tmp[['cat', 'color', 'linestyles', 'point']]
.set_index('cat').drop_duplicates().to_dict())
palette, dashes, sizes = feats['color'], feats['linestyles'], feats['point']
# Convert line styles to dashes
dashes = {k: get_dash_pattern(v) for k, v in dashes.items()}
# Lines
lines = sns.lineplot(x="x", y="y", hue="cat", style="cat", data=data_tmp,
palette=palette, dashes=dashes)
# Points
sns.scatterplot(x="x", y="y", hue="cat", size="cat", data=data_tmp,
palette=palette, sizes=sizes, legend=False)
# Fix legend
for t, l in zip(lines.legend().get_texts(), lines.legend().get_lines()):
l.set_marker('o')
l.set_markersize(sizes.get(l.get_label(), 0) / t.get_fontsize())
plt.show()
Output:

Here is my solution with the help of #jdehesa
I also put the legend outside of the plot here and some polishing to the labels
def get_dash_pattern(style):
_, dash = mpl.lines._get_dash_pattern(style)
return dash if dash else (None, None)
palette = dict(zip(data_tmp.cat, data_tmp.color))
dashes = dict(zip(data_tmp.cat, data_tmp.linestyles))
dashes = {k: get_dash_pattern(v) for k, v in dashes.items()}
ax = sns.lineplot(x="x", y="y", hue="cat", data=data_tmp, palette=palette, style='cat', dashes=dashes)
ax = sns.scatterplot(x="point", y="y", hue="cat", data=data_tmp[data_tmp.point == data_tmp.x], palette=palette,
legend=False)
ax.set_title('title')
ax.set_ylabel('y label')
ax.set_xlabel('x label')
ax.legend(loc=(1.04, 0))
plt.show()

Add Legend to Seaborn point plot

I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :

I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()

Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.

I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.

This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:

using a series as markersize in python plt.plot

Is it possible to use a column in a dataframe to scale the marker size in matplotlib? I keep getting an error about using a series when I do the following.
import pandas as pd
import matplotlib.pyplot as plt
my_dict = {'Vx': [16,25,85,45], 'r': [1315,5135,8444,1542], 'ms': [10,50,100, 25]}
df= pd.DataFrame(my_dict)
fig, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.plot(df.Vx, df.r, '.', markersize= df.ms)
when I run
ValueError: setting an array element with a sequence.
I'm guessing it does not like the fact that Im feeding a series to the marker, but there must be a way to make it work...

Use plt.scatter instead of plt.plot. Scatter lets you specify the size s as well as the color c of the points using a tuple or list.
import pandas as pd
import matplotlib.pyplot as plt
my_dict = {'Vx': [16,25,85,45], 'r': [1315,5135,8444,1542], 'ms': [10,50,100, 25]}
df= pd.DataFrame(my_dict)
fig, ax = plt.subplots(1, 1, figsize=(20, 10))
ax.scatter(df.Vx, df.r, s= df.ms)
plt.show()

Better to use the built-in scatter plot function in pandas where you can pass a whole series object as the size param to vary the bubble size:
df.plot.scatter(x=['Vx'], y=['r'], s=df['ms'], c='g') # df['ms']*5 bubbles more prominent
Or, if you want to go via the matplotlib route, you need to pass a scalar value present in the series object each time to the markersize arg.
fig, ax = plt.subplots()
[ax.plot(row['Vx'], row['r'], '.', markersize=row['ms']) for idx, row in df.iterrows()]
plt.show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pyplot scatterplot legend not working with smaller sample sizes - python

Related

Plotly: Choose a different intersection of X and Y axes

Plotting multiple wav files in python [duplicate]

How to plot a lineplot with dots on specific points with specific colors and linetypes for each line using seaborn?

Add Legend to Seaborn point plot

using a series as markersize in python plt.plot

Categories

Resources