Related
I am trying to plot two columns of a pandas dataframe against each other, grouped by a values in a third column. The color of each line should be determined by that third column, i.e. one color per group.
For example:
import pandas as pd
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3]})
df.groupby('colors').plot('x','y',ax=ax)
If I do it this way, I end up with three different lines plotting x against y, with each line a different color. I now want to determine the color by the values in 'colors'. How do I do this using a gradient colormap?
Looks like seaborn is applying the color intensity automatically based on the value in hue..
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2,3,4,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3,1.5,1.5,1.5]})
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors')
Gives:
you can change the colors by adding palette argument as below:
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
#more combinations : viridis, mako, flare, etc.
gives:
Edit (for colormap):
based on answers at Make seaborn show a colorbar instead of a legend when using hue in a bar plot?
import seaborn as sns
fig = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
norm = plt.Normalize(vmin = df['colors'].min(), vmax = df['colors'].max())
sm = plt.cm.ScalarMappable(cmap="mako", norm = norm)
fig.figure.colorbar(sm)
fig.get_legend().remove()
plt.show()
gives..
Hope that helps..
Complementing to Prateek's very good answer, once you have assigned the colors based on the intensity of the palette you choose (for example Mako):
plots = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors',palette='mako')
You can add a colorbar with matplotlib's function plt.colorbar() and assign the palette you used:
sm = plt.cm.ScalarMappable(cmap='mako')
plt.colorbar(sm)
After plt.show(), we get the combined output:
I have the following dataframe
import pandas as pd
data_tmp = pd.DataFrame({'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],
'color': ['#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#247AB2','#247AB2','#247AB2','#247AB2','#247AB2'],
'point': [14,14,14,14,14,28,28,28,28,28],
'linestyles':['-','-','-','-','-','--','--','--','--','--']})
I would like to produce a lineplot with different color and linestyles per cat. But I would like to give the specific color and linestyles per cat as they are defined in the dataframe. Lastly I would like to mark the points on each line with the same color.
I have only tried:
sns.lineplot(x="x", y="y", hue="cat", data=data_tmp)
sns.scatterplot(x="point",y="y",hue="cat", data=data_tmp[data_tmp.point==data_tmp.x])
plt.show()
Any ideas ?
Maybe you want to use matplotlib directly, like
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],})
d = {"A" : {"color": '#B5D8F0', "markersize": 5, "linestyle": "-"},
"B" : {"color": '#247AB2', "markersize": 10, "linestyle": "--"}}
for n, grp in df.groupby("cat"):
plt.plot(grp.x, grp.y, marker="o", label=n, **d[n])
plt.legend()
plt.show()
This is how I could do this. You need to use the cat column to control the different plot parameters (color, style, marker size), and then create mapping objects (here dicts) that tell which parameter value to use for each category. The color is easy. The linestyle is harder, because Seaborn only offers dashes as a configurable parameter, which needs to be given in the advanced Matplotlib format of (segment, gap). The function matplotlib.lines._get_dash_pattern translates the string value (e.g. --) to this format, although the returned value needs to be handled with care. For the marker size, unfortunately lineplot does not offer the possibility to change the marker size with the category (even though you can change the marker style), so you need to use a scatterplot on top. The last bit is the legend, you probably want to disable it for the second plot, to avoid repeating it, but the problem is that the first legend will not have the markers in it. If that bothers you, you can still edit the legend manually. All in all, it could look like this:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
# Converts a line style to a format acceptable by Seaborn
def get_dash_pattern(style):
_, dash = mpl.lines._get_dash_pattern(style)
return dash if dash else (None, None)
data_tmp = pd.DataFrame({
'x': [0,14,28,42,56, 0,14,28,42,56],
'y': [0, 0.003, 0.006, 0.008, 0.001, 0*2, 0.003*2, 0.006*2, 0.008*2, 0.001*2],
'cat': ['A','A','A','A','A','B','B','B','B','B'],
'color': ['#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0','#B5D8F0',
'#247AB2','#247AB2','#247AB2','#247AB2','#247AB2'],
'point': [14,14,14,14,14,28,28,28,28,28],
'linestyles':['-','-','-','-','-','--','--','--','--','--']})
# Extract plot features as dicts
feats = (data_tmp[['cat', 'color', 'linestyles', 'point']]
.set_index('cat').drop_duplicates().to_dict())
palette, dashes, sizes = feats['color'], feats['linestyles'], feats['point']
# Convert line styles to dashes
dashes = {k: get_dash_pattern(v) for k, v in dashes.items()}
# Lines
lines = sns.lineplot(x="x", y="y", hue="cat", style="cat", data=data_tmp,
palette=palette, dashes=dashes)
# Points
sns.scatterplot(x="x", y="y", hue="cat", size="cat", data=data_tmp,
palette=palette, sizes=sizes, legend=False)
# Fix legend
for t, l in zip(lines.legend().get_texts(), lines.legend().get_lines()):
l.set_marker('o')
l.set_markersize(sizes.get(l.get_label(), 0) / t.get_fontsize())
plt.show()
Output:
Here is my solution with the help of #jdehesa
I also put the legend outside of the plot here and some polishing to the labels
def get_dash_pattern(style):
_, dash = mpl.lines._get_dash_pattern(style)
return dash if dash else (None, None)
palette = dict(zip(data_tmp.cat, data_tmp.color))
dashes = dict(zip(data_tmp.cat, data_tmp.linestyles))
dashes = {k: get_dash_pattern(v) for k, v in dashes.items()}
ax = sns.lineplot(x="x", y="y", hue="cat", data=data_tmp, palette=palette, style='cat', dashes=dashes)
ax = sns.scatterplot(x="point", y="y", hue="cat", data=data_tmp[data_tmp.point == data_tmp.x], palette=palette,
legend=False)
ax.set_title('title')
ax.set_ylabel('y label')
ax.set_xlabel('x label')
ax.legend(loc=(1.04, 0))
plt.show()
The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.
I have the following plot build with seaborn using factorplot() method.
Is it possible to use the line style as a legend to replace the legend based on line color on the right?
graycolors = sns.mpl_palette('Greys_r', 4)
g = sns.factorplot(x="k", y="value", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"])
Furthermore I'm trying to get both lines in black color using the color="black" parameter in my factorplot method but this results in an exception "factorplot() got an unexpected keyword argument 'color'". How can I paint both lines in the same color and separate them by the linestyle only?
I have been looking for a solution trying to put the linestyle in the legend like matplotlib, but I have not yet found how to do this in seaborn. However, to make the data clear in the legend I have used different markers:
import seaborn as sns
import numpy as np
import pandas as pd
# creating some data
n = 11
x = np.linspace(0,2, n)
y = np.sin(2*np.pi*x)
y2 = np.cos(2*np.pi*x)
data = {'x': np.append(x, x), 'y': np.append(y, y2),
'class': np.append(np.repeat('sin', n), np.repeat('cos', n))}
df = pd.DataFrame(data)
# plot the data with the markers
# note that I put the legend=False to move it up (otherwise it was blocking the graph)
g=sns.factorplot(x="x", y="y", hue="class", palette=graycolors,
data=df, linestyles=["-", "--"], markers=['o','v'], legend=False)
# placing the legend up
g.axes[0][0].legend(loc=1)
# showing graph
plt.show()
you can try the following:
h = plt.gca().get_lines()
lg = plt.legend(handles=h, labels=['YOUR Labels List'], loc='best')
It worked fine with me.
I have many data frames that I am plotting for a presentation. These all have different columns, but all contain the same additional column foobar. At the moment, I am plotting these different data frames using
df.plot(secondary_y='foobar')
Unfortunately, since these data frames all have different additional columns with different ordering, the color of foobar is always different. This makes the presentation slides unnecessary complicated. I would like, throughout the different plots, assign that foobar is plotted bold and black.
Looking at the docs, the only thing coming close appears to be the parameter colormap - I would need to ensure that the xth color in the color map is always black, where x is the order of foobar in the data frame. Seems to be more complicated than it should be, also this wouldn't make it bold.
Is there a (better) approach?
I would suggest using matplotlib directly rather than the dataframe plotting methods. If df.plot returned the artists it added instead of an Axes object it wouldn't be too bad to change the color of the line after it was plotted.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def pandas_plot(ax, df, callout_key):
"""
Parameters
----------
ax : mpl.Axes
The axes to draw to
df : DataFrame
Data to plot
callout_key : str
key to highlight
"""
artists = {}
x = df.index.values
for k, v in df.iteritems():
style_kwargs = {}
if k == callout_key:
style_kwargs['c'] = 'k'
style_kwargs['lw'] = 2
ln, = ax.plot(x, v.values, **style_kwargs)
artists[k] = ln
ax.legend()
ax.set_xlim(np.min(x), np.max(x))
return artists
Usage:
fig, ax = plt.subplots()
ax2 = ax.twinx()
th = np.linspace(0, 2*np.pi, 1024)
df = pd.DataFrame({'cos': np.cos(th), 'sin': np.sin(th),
'foo': np.sin(th + 1), 'bar': np.cos(th +1)}, index=th)
df2 = pd.DataFrame({'cos': -np.cos(th), 'sin': -np.sin(th)}, index=th)
pandas_plot(ax, df, 'sin')
pandas_plot(ax2, df2, 'sin')
Perhaps you could define a function which handles the special column in a separate plot call:
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
df[columns].plot(ax=ax)
df[col].plot(ax=ax, **emphargs)
Using code from tcaswell's example,
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
df[columns].plot(ax=ax)
df[col].plot(ax=ax, **emphargs)
fig, ax = plt.subplots()
th = np.linspace(0, 2*np.pi, 1024)
df = pd.DataFrame({'cos': np.cos(th), 'foobar': np.sin(th),
'foo': np.sin(th + 1), 'bar': np.cos(th +1)}, index=th)
df2 = pd.DataFrame({'cos': -np.cos(th), 'foobar': -np.sin(th)}, index=th)
emphasize_plot(ax, df, 'foobar', lw=2, c='k')
emphasize_plot(ax, df2, 'foobar', lw=2, c='k')
plt.show()
yields
I used #unutbut's answer and extended it to allow for a secondary y axis and correct legends:
def emphasize_plot(ax, df, col, **emphargs):
columns = [c for c in df.columns if c != col]
ax2 = ax.twinx()
df[columns].plot(ax=ax)
df[col].plot(ax=ax2, **emphargs)
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, loc=0)