Colorbar based legend in python matplotlib - python

In the graphic below, I want to put in a legend for the calendar plot. The calendar plot was made using ax.plot(...,label='a') and drawing rectangles in a 52x7 grid (52 weeks, 7 days per week).
The legend is currently made using:
plt.gca().legend(loc="upper right")
How do I correct this legend to something more like a colorbar? Also, the colorbar should be placed at the bottom of the plot.
EDIT:
Uploaded code and data for reproducing this here:
https://www.dropbox.com/sh/8xgyxybev3441go/AACKDiNFBqpsP1ZttsZLqIC4a?dl=0

Aside - existing bugs
The code you put on the dropbox doesn't work "out of the box". In particular - you're trying to divide a datetime.timedelta by a numpy.timedelta64 in two places and that fails.
You do your own normalisation and colour mapping (calling into color_list based on an int() conversion of your normalised value). You subtract 1 from this and you don't need to - you already floor the value by using int(). The result of doing this is that you can get an index of -1 which means your very smallest values are incorrectly mapped to the colour for the maximum value. This is most obvious if you plot column 'BIOM'.
I've hacked this by adding a tiny value (0.00001) to the total range of the values that you divide by. It's a hack - I'm not sure that this method of mapping is at all the best use of matplotlib, but that's a different question entirely.
Solution adapting your code
With those bugs fixed, and adding a last suplot below all the existing ones (i.e. replacing 3 with 4 on all your calls to subplot2grid(), you can do the following:
Replace your
plt.gca().legend(loc="upper right")
with
# plot an overall colorbar type legend
# Grab the new axes object to plot the colorbar on
ax_colorbar = plt.subplot2grid((4,num_yrs), (3,0),rowspan=1,colspan=num_yrs)
mappableObject = matplotlib.cm.ScalarMappable(cmap = palettable.colorbrewer.sequential.BuPu_9.mpl_colormap)
mappableObject.set_array(numpy.array(df[col_name]))
col_bar = fig.colorbar(mappableObject, cax = ax_colorbar, orientation = 'horizontal', boundaries = numpy.arange(min_val,max_val,(max_val-min_val)/10))
# You can change the boundaries kwarg to either make the scale look less boxy (increase 10)
# or to get different values on the tick marks, or even omit it altogether to let
col_bar.set_label(col_name)
ax_colorbar.set_title(col_name + ' color mapping')
I tested this with two of your columns ('NMN' and 'BIOM') and on Python 2.7 (I assume you're using Python 2.x given the print statement syntax)
The finalised code that works directly with your data file is in a gist here
You get
How does it work?
It creates a ScalarMappable object that matplotlib can use to map values to colors. It set the array to base this map on to all the values in the column you are dealing with. It then used Figure.colorbar() to add the colorbar - passing in the mappable object so that the labels are correct. I've added boundaries so that the minimum value is shown explicitly - you can omit that if you want matplotlib to sort that out for itself.
P.S. I've set the colormap to palettable.colorbrewer.sequential.BuPu_9.mpl_colormap, matching your get_colors() function which gets these colours as a 9 member list. I strongly recommend importing the colormap you want to use as a nice name to make the use of mpl_colors and mpl_colormap more easy to understand e.g.
import palettable.colorbrewer.sequential.BuPu_9 as color_scale
Then access it as
color_scale.mpl_colormap
That way, you can keep your code DRY and change the colors with only one change.
Layout (in response to comments)
The colorbar may be a little big (certainly tall) for aesthetic ideal. There are a few possible options to do that. I'll point you to two:
The "right" way to do it is probably to use a Gridspec
You could use your existing approach, but increase the number of rows and have the colorbar still in one row, while the other elements span more rows than they do currently.
I've implemented that with 9 rows, an extra column (so that the month labels don't get lost) and the colorbar on the bottom row, spanning 2 less columns than the main figure. I've also used tight_layout with w_pad=0.0 to avoid label clashes. You can play with this to get your exact preferred size. New code here.
This gives:
:

There are functions to do this in matplotlib.colorbar. With some specific code from your example, I could give you a better answer, but you'll use something like:
myColorbar = matplotlib.colorbar.ColorbarBase(myAxes, cmap=myColorMap,
norm=myNorm,
orientation='vertical')

Related

How to plot unfilled markers in sns.scatterplot with 'hue' set?

I have two sets of x-y data, that I'd like to plot as a scatterplot, using sns.scatterplot. I want to highlight two different things:
the difference between different types of data
the difference between the first and the second set of x-y data
For the first, I'm using the inbuilt hue and style, for the second, I'd like to have filled vs. unfilled markers, but I'm wondering how to do so, without doing it all by hand with plt.scatter, where I would have to implement all the magic of sns.scatterplot by hand.
long version, with MWE:
I have X and Y data, and also have some type info for each point of data. I.e. I have a sample 1 which is of type A and yields X=11, Y=21 at the first sampling and X=10, Y=21 at the second sampling. And the same deal for sample 2 of type A, sample 3 of type B and so on (see example file at the end).
So i want to visualize the differences between two samplings, like so:
data = pd.read_csv('testdata.csv', sep=';', index_col=0, header=0)
# data for the csv at the end of the question
sns.scatterplot(x=data['x1'], y=data['y1'])
sns.scatterplot(x=data['x2'], y=data['y2'])
Nice, I can easily see that the first sampling seems to show a linear relationship between X and Y, whereas the second one shows some differences. Now what interests me, is which type of data is affected the most by these differences and that's why I'm using seaborn, instead of pure matplotlib: sns.scatterplot has a lot of nice stuff built in, e.g. hue (and style, to get symbols for printing in b&w):
sizes = (200, 200) # to make stuff more visible
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
OK, so I can easily distinguish my data types, but I lost all information about which sample is what. The obvious solution to me seem to use filled markers for one, and unfilled ones for the other.
However, I can't seem to do that.
I'm aware of this question/answer, using fc='none' which is not documented in the sns.scatterplot documentation but this fails, when also using hue:
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes, fc='none')
As you can see, the second set of markers simply vanishes (there's some artifacts in the B data, where hints of a white cross are visible).
I can kinda fix that by setting ec=...:
sns.scatterplot(x=data['x1'], y=data['y1'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes)
sns.scatterplot(x=data['x2'], y=data['y2'], hue=data['type'], style=data['type'],
size=data['type'], sizes=sizes, fc='none',
ec=('b','b','y','y','y', 'y', 'g', 'g', 'g','r'))
# I would have to define the proper colors, but for this example, they're close enough
but that obviously has a few issues:
the markers in the legend aren't fitting anymore, neither color nor fill
and I'm already halfway in doing-it-all-by-hand territory anyways, e.g. my ec= would fail when I want to plot a new dataset with sample_no 11.
How can I do that with seaborn? Filled vs. unfilled seems quite an obvious flag for scatterplots, but I can't seem to find it.
data for testdata.csv:
sample_no;type;x1;y1;x2;y2
1;A;11;21;10;21
2;A;12;22;12;21
3;B;13;23;13.2;22.8
4;B;14;24;13.8;24
5;B;15;25;14.8;25.2
6;B;16;26;16.3;25.9
7;C;17;27;18;28
8;C;18;28;20;26
9;C;19;29;20;30
10;D;20;30;19;28

Better scale scatterplot points by size in plotly, some of the points are too small to see?

When I build a scatterplot of this data, you can see see that the one large value (462) is completely swamping even being able to see some of the other points.
Does anyone know of a specific way to normalize this data, so that the small dots can see be seen, while maintaining a link between the size of the dot and the value size. I'm thinking would either of these make sense:
(1) Set a minimum value for the size a dot can be
(2) Do some normalization of the data somehow, but I guess the large data point will always be 462 compared to some of the other points with a value of 1.
Just wondering how other people get around this, so they don't actually miss seeing some points on the plot that are actually there? Or I guess is the most obvious answer just don't scale the points by size, and then add a label to each point somehow with the size.
you can clip() https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.clip.html the values used for size param
full solution below
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.DataFrame(
{"Class": np.linspace(-8, 4, 25), "Values": np.random.randint(1, 40, 25)}
).assign(Class=lambda d: "class_" + d["Class"].astype(str))
df.iloc[7, 1] = 462
px.scatter(df, x="Class", y="Values", size=df["Values"].clip(0, 50))
This isn't really a question linking to Python directly, but more to plotting styles. There are several ways to solve the issue in your case:
Split the data into equally sized categories and assign colorlabels. Your legend would look something like this in this case:
0 - 1: color 1
2 - 20: color 2
...
The way to implement this is to split your data into the sets you want and plotting seperate scatter plots each with a new color. See here or here for examples
The second option that is frequently used is to use the log of the value for the bubble size. You would just have to point that out quite clearly in your legend.
The third option is to limit marker size to an arbitrary value. I personally am not a bit fan of this method since it changes the information shown in a degree that the other alternatives don't, but if you add a data callout, this would still be legitimate.
These options should be fairly easy to implement in code. If you are having difficulties, feel free to post runnable sample code and we could implement an example as well.

How to easily place/accommodate text annotations at the edge of a plot?

I am plotting some points on a line in python using matplotlib, and whenever the point is at/near the boundaries of the plot the annotated text is hard to read due to overlapping axes labels and such (see screenshot below):
I'm currently using code like this to place my point annotations manually:
# add value text to x, y point
jt = x_points_to_plot # a single x-value, in this case
f = ys_func(x_points_to_plot) # a single y-value, in this case
ax.annotate(
'({}C, {:0.0f}%)'.format(jt, f), # the string text to add
xy=(jt + 1, f + 5), # offset the text from the point manually
ha='center')
Usually my points are in the middle and look acceptable, like this:
But I don't want to manually adjust the text for every point, because I have a lot of changing data and it's not where I want to spend my time; instead, I'd love to find a way to accommodate the text so it it easily readable on the plot. Maybe I could expand the plot to contain the new text, or I could move the text to a different place depending on a set of conditions about what might be near the text? I'm not sure...
I think the best answer will be one I can reuse for other projects, robust to points anywhere on the plot, and relatively easy to implement (least amount of custom functions or "hacks" that I would have to recreate for every project). Thanks a ton in advance!

How to change formatting of colorbar in combination with shrink

I am creating a colorbar for my figure like this:
def fmt(x, pos):
a='{:10.1f}'.format(x)
return a
fig.colorbar(CS, ax=ax,shrink=0.35,label=r'Electric Field/(V/$\mathrm{\AA}$)',format=ticker.FuncFormatter(fmt))
Creating the colorbar without the format command works just fine. However, I would like to control the number of decimal points in the colorbar labels. Adding the format command seems to not work in combination with shrink, since the labels are now shifted from the colorbar:
You are shifting the labels away from the colorbar yourself. So if you don't want that don't do it.
I.e. Using '{:10.1f}'.format(1) you tell the formatter to use 10 places before the decimal separator. You may leave out the 10 to get it to only use as many places as it needs,
'{:.1f}'.format(1)

Changing the marker on the same set of data

I have a set of data that comes from two different sources, and I have multiple sets graphed together. So essentially 6 scatterplots with error bars (all different colors), and each scatterplot has two sources.
Basically I want the blue scatterplot to have two different markers, 'o' and's'. I currently have done this by plotting each point individually with a loop and checking to see if the source is 1 or 2. If it is 1 it plots a 's' if the source is 2 then it plots a 'o'.
However this method does not really allow for having a legend. (Data1, Data2,...Data6)
Is there a better way of doing this?
EDIT:
I want a cleaner method for this, something along the lines of
x=[1,2,3]
y=[4,5,6]
m=['o','s','^']
plt.scatter(x,y,marker=m)
But this returns an error Unrecognized marker style
A more pythonic way (but still a loop) might be something like
x=[1,2,3]
y=[4,5,6]
l=['data1','data2','data3']
m=['ob','sb','^b']
f,a = plt.subplots(1,1)
[a.plot(*data, label=lab) for data,lab in zip(zip(x,y,m),l)]
plt.legend(loc='lower right')
plt.xlim(0,4)
plt.ylim(3,7);
But I guess this is not the most efficient way if you have lots of datapoints.
If you want to use scatter try something like
m=['o','s','^']
f,a = plt.subplots(1,1)
[a.scatter(*data, marker=m1, label=l1) for data,m1,l1 in zip(zip(x,y),m,l)]
I'm pretty sure, there is also a possibility to apply ** and dicts here.
UPDATE:
Instead of looping over the plot command the ability of matplotlib's plot function to read an arbitrary number of x,y,fmt groups, see docs.
x=np.random.random((3,6))
y=np.random.random((3,6))
l=['data1','data2','data3']
m=['ob','sb','^b']
plt.plot(*[i[j] for i in zip(x,y,m) for j in range(3)])
plt.legend(l,loc='lower right')
Calling plot in a loop is fine. You just need to keep the list of lines returned by plot and use fig.legend to create a legend for the whole figure. See http://matplotlib.org/examples/pylab_examples/figlegend_demo.html
Seconded to #tcaswell 's comments, .scatter() returns collections.PathCollection, which provides a fast way of plotting a large number of identical shaped objects. You can use a loop to plot the data as many scatter plots (and many different datasets) but in my opinion it looses all the speed benefit provided by .scatter().
With these being said, it is however not true that the dots have to be identical in a scatter plot. You can have different linewidth, edgecolor and many other things. But the dots have to be the same shape. See this example, assigning different colors (and only plot one dataset):
>>> sc=plt.scatter(x, y, label='test')
>>> sc.set_color(['r','g','b'])
>>> plt.legend()
See details in http://matplotlib.org/api/collections_api.html.
These were all alright, but not really what I was looking for. The problem was how I parsed through my data and how I could add a legend in the wouldn't mess that up. Since I did a for-loop and plotted each point individually based on if it was measured at Observation location 1 or 2 whenever I made a legend it would plot over 50 legend entries. So I plotted my data as full sets (Invisibly and with no change in symbols) then again in color with the varying symbols. This worked better. Thanks though

Categories

Resources