Change colorbar in Geopandas - python

The Problem
How can I access the colorbar instance created when plotting a GeoDataFrame? In this example, I've plotted troop movements and sizes during the French invasion of Russia, with army sizes less than 10000 plotted in red. How do I get the colorbar to show that red means under 10000?
MCVE:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from shapely.geometry import Point, LineString
import geopandas as gpd
# ingest troop movement data
PATH = ("https://vincentarelbundock.github.io/Rdatasets/csv/HistData/"
"Minard.troops.csv")
troops = pd.read_csv(PATH)
troops['geometry'] = troops.apply(lambda row: Point(row.long, row.lat), axis=1)
troops = gpd.GeoDataFrame(troops)
# get army group paths
grouped = troops.groupby('group')
groups = [[LineString(group[['long', 'lat']].values), name]
for name, group in grouped]
groups = pd.DataFrame(groups, columns=['geometry', 'army group'])
groups = gpd.GeoDataFrame(groups)
groups.plot(color=['red', 'green', 'blue'], lw=0.5)
# plot troop sizes
cmap = mpl.cm.get_cmap('cool')
cmap.set_under('red')
troops.plot(column='survivors', ax=plt.gca(),
cmap=cmap, vmin=10000, legend=True, markersize=50)
Output:
Here, legend=True in the last line adds the colorbar.
Attempts at a solution
I know that if I made the colorbar myself, I could just pass the argument extend='min' to add a red triangle to the bottom of the colorbar.
And I know I can get the colorbar axes through (as suggested in this answer):
cax = plt.gcf().axes[1]
but I don't know how that helps me edit the colorbar. I can't even add a label with cax.set_label('troop size'). (i.e. I can't see that label anywhere although cax.get_label() does return 'troop size')
These axes appear to consist of two polygons:
In[315]: cax.artists
Out[315]:
[<matplotlib.patches.Polygon at 0x1e0e73c8>,
<matplotlib.patches.Polygon at 0x190c8f28>]
No idea what to make of that. And even if I could find the actual colorbar instance, I wouldn't know how to extend it as the docs for the Colorbar class don't mention anything like that.
Alternatives
Is there a way to pass the extend keyword through the GeoDataFrame.plot function?
Can I somehow access the colorbar instance by either saving it when plotting or finding it in the figure?
How would I go about constructing the colorbar directly with Matplotlib? And how do I avoid that it deviates from the plot if I change parameters?

There is an issue (and PR) about making it possible to pass keywords to the colorbar construction: https://github.com/geopandas/geopandas/issues/697.
But for now, the best work-around is to create the colorbar yourself I think:
Create the same figure but with legend=False:
cmap = mpl.cm.get_cmap('cool')
cmap.set_under('red')
ax = troops.plot(column='survivors', cmap=cmap, vmin=10000, legend=False, markersize=50)
Now, we get the collection created for the points (from a scatter plot) as the first element of the ax.collections, so we can instruct matplotlib to create a colorbar based on this mapping (and now we can pass additional keywords):
scatter = ax.collections[0]
plt.colorbar(scatter, ax=ax, extend='min')
this gives me

Related

Change colour scheme label to log scale without changing the axis in matplotlib

I am quite new to python programming. I have a script with me that plots out a heat map using matplotlib. Range of X-axis value = (-180 to +180) and Y-axis value =(0 to 180). The 2D heatmap colours areas in Rainbow according to the number of points occuring in a specified area in the x-y graph (defined by the 'bin' (see below)).
In this case, x = values_Rot and y = values_Tilt (see below for code).
As of now, this script colours the 2D-heatmap in the linear scale. How do I change this script such that it colours the heatmap in the log scale? Please note that I only want to change the heatmap colouring scheme to log-scale, i.e. only the number of points in a specified area. The x and y-axis stay the same in linear scale (not in logscale).
A portion of the code is here.
rot_number = get_header_number(headers, AngleRot)
tilt_number = get_header_number(headers, AngleTilt)
psi_number = get_header_number(headers, AnglePsi)
values_Rot = []
values_Tilt = []
values_Psi = []
for line in data:
try:
values_Rot.append(float(line.split()[rot_number]))
values_Tilt.append(float(line.split()[tilt_number]))
values_Psi.append(float(line.split()[psi_number]))
except:
print ('This line didnt work, it may just be a blank space. The line is:' + line)
# Change the values here if you want to plot something else, such as psi.
# You can also change how the data is binned here.
plt.hist2d(values_Rot, values_Tilt, bins=25,)
plt.colorbar()
plt.show()
plt.savefig('name_of_output.png')
You can use a LogNorm for the colors, using plt.hist2d(...., norm=LogNorm()). Here is a comparison.
To have the ticks in base 2, the developers suggest adding the base to the LogLocator and the LogFormatter. As in this case the LogFormatter seems to write the numbers with one decimal (.0), a StrMethodFormatter can be used to show the number without decimals. Depending on the range of numbers, sometimes the minor ticks (shorter marker lines) also get a string, which can be suppressed assigning a NullFormatter for the minor colorbar ticks.
Note that base 2 and base 10 define exactly the same color transformation. The position and the labels of the ticks are different. The example below creates two colorbars to demonstrate the different look.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
import numpy as np
from copy import copy
# create some toy data for a standalone example
values_Rot = np.random.randn(100, 10).cumsum(axis=1).ravel()
values_Tilt = np.random.randn(100, 10).cumsum(axis=1).ravel()
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
cmap = copy(plt.get_cmap('hot'))
cmap.set_bad(cmap(0))
_, _, _, img1 = ax1.hist2d(values_Rot, values_Tilt, bins=40, cmap='hot')
ax1.set_title('Linear norm for the colors')
fig.colorbar(img1, ax=ax1)
_, _, _, img2 = ax2.hist2d(values_Rot, values_Tilt, bins=40, cmap=cmap, norm=LogNorm())
ax2.set_title('Logarithmic norm for the colors')
fig.colorbar(img2, ax=ax2) # default log 10 colorbar
cbar2 = fig.colorbar(img2, ax=ax2) # log 2 colorbar
cbar2.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar2.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar2.ax.yaxis.set_minor_formatter(NullFormatter())
plt.show()
Note that log(0) is minus infinity. Therefore, the zero values in the left plot (darkest color) are left empty (white background) on the plot with the logarithmic color values. If you just want to use the lowest color for these zeros, you need to set a 'bad' color. In order not the change a standard colormap, the latest matplotlib versions wants you to first make a copy of the colormap.
PS: When calling plt.savefig() it is important to call it before plt.show() because plt.show() clears the plot.
Also, try to avoid the 'jet' colormap, as it has a bright yellow region which is not at the extreme. It may look nice, but can be very misleading. This blog article contains a thorough explanation. The matplotlib documentation contains an overview of available colormaps.
Note that to compare two plots, plt.subplots() needs to be used, and instead of plt.hist2d, ax.hist2d is needed (see this post). Also, with two colorbars, the elements on which the colorbars are based need to be given as parameter. A minimal change to your code would look like:
from matplotlib.ticker import NullFormatter, StrMethodFormatter, LogLocator
from matplotlib.colors import LogNorm
from matplotlib import pyplot as plt
from copy import copy
# ...
# reading the data as before
cmap = copy(plt.get_cmap('magma'))
cmap.set_bad(cmap(0))
plt.hist2d(values_Rot, values_Tilt, bins=25, cmap=cmap, norm=LogNorm())
cbar = plt.colorbar()
cbar.ax.yaxis.set_major_locator(LogLocator(base=2))
cbar.ax.yaxis.set_major_formatter(StrMethodFormatter('{x:.0f}'))
cbar.ax.yaxis.set_minor_formatter(NullFormatter())
plt.savefig('name_of_output.png') # needs to be called prior to plt.show()
plt.show()

Make x-axes of all subplots same length on the page

I am new to matplotlib and trying to create and save plots from pandas dataframes via a loop. Each plot should have an identical x-axis, but different y-axis lengths and labels. I have no problem creating and saving the plots with different y-axis lengths and labels, but when I create the plots, matplotlib rescales the x-axis depending on how much space is needed for the y-axis labels on the left side of the figure.
These figures are for a technical report. I plan to place one on each page of the report and I would like to have all of the x-axes take up the same amount of space on the page.
Here is an MSPaint version of what I'm getting and what I'd like to get.
Hopefully this is enough code to help. I'm sure there are lots of non-optimal parts of this.
import pandas as pd
import matplotlib.pyplot as plt
import pylab as pl
from matplotlib import collections as mc
from matplotlib.lines import Line2D
import seaborn as sns
# elements for x-axis
start = -1600
end = 2001
interval = 200 # x-axis tick interval
xticks = [x for x in range(start, end, interval)] # create x ticks
# items needed for legend construction
lw_bins = [0,10,25,50,75,90,100] # bins for line width
lw_labels = [3,6,9,12,15,18] # line widths
def make_proxy(zvalue, scalar_mappable, **kwargs):
color = 'black'
return Line2D([0, 1], [0, 1], color=color, solid_capstyle='butt', **kwargs)
# generic image ID
img_path = r'C:\\Users\\user\\chart'
img_ID = 0
for line_subset in data:
# create line collection for this run through loop
lc = mc.LineCollection(line_subset)
# create plot and set properties
sns.set(style="ticks")
sns.set_context("notebook")
fig, ax = pl.subplots(figsize=(16, len(line_subset)*0.5)) # I want the height of the figure to change based on number of labels on y-axis
# Figure width should stay the same
ax.add_collection(lc)
ax.set_xlim(left=start, right=end)
ax.set_xticks(xticks)
ax.set_ylim(0, len(line_subset)+1)
ax.margins(0.05)
sns.despine(left=True)
ax.xaxis.set_ticks_position('bottom')
ax.set_yticks(line_subset['order'])
ax.set_yticklabels(line_subset['ylabel'])
ax.tick_params(axis='y', length=0)
# legend
proxies = [make_proxy(item, lc, linewidth=item) for item in lw_labels]
ax.legend(proxies, ['0-10%', '10-25%', '25-50%', '50-75%', '75-90%', '90-100%'], bbox_to_anchor=(1.05, 1.0),
loc=2, ncol=2, labelspacing=1.25, handlelength=4.0, handletextpad=0.5, markerfirst=False,
columnspacing=1.0)
# title
ax.text(0, len(line_subset)+2, s=str(img_ID), fontsize=20)
# save as .png images
plt.savefig(r'C:\\Users\\user\\Desktop\\chart' + str(img_ID) + '.png', dpi=300, bbox_inches='tight')
Unless you use an axes of specifically defined aspect ratio (like in an imshow plot or by calling .set_aspect("equal")), the space taken by the axes should only depend on the figure size along that direction and the spacings set to the figure.
You are therefore pretty much asking for the default behaviour and the only thing that prevents you from obtaining that is that you use bbox_inches='tight' in the savefig command.
bbox_inches='tight' will change the figure size! So don't use it and the axes will remain constant in size. `
Your figure size, defined like figsize=(16, len(line_subset)*0.5) seems to make sense according to what I understand from the question. So what remains is to make sure the axes inside the figure are the size you want them to be. You can do that by manually placing it using fig.add_axes
fig.add_axes([left, bottom, width, height])
where left, bottom, width, height are in figure coordinates ranging from 0 to 1. Or, you can adjust the spacings outside the subplot using subplots_adjust
plt.subplots_adjust(left, bottom, right, top)
To get matching x axis for the subplots (same x axis length for each subplot) , you need to share the x axis between subplots.
See the example here https://matplotlib.org/examples/pylab_examples/shared_axis_demo.html

matplotlib mark_inset with different data in inset plot

This is a slightly tricky one to explain. Basically, I want to make an inset plot and then utilize the convenience of mpl_toolkits.axes_grid1.inset_locator.mark_inset, but I want the data in the inset plot to be completely independent of the data in the parent axes.
Example code with the functions I'd like to use:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
from mpl_toolkits.axes_grid1.inset_locator import InsetPosition
data = np.random.normal(size=(2000,2000))
plt.imshow(data, origin='lower')
parent_axes = plt.gca()
ax2 = inset_axes(parent_axes, 1, 1)
ax2.plot([900,1100],[900,1100])
# I need more control over the position of the inset axes than is given by the inset_axes function
ip = InsetPosition(parent_axes,[0.7,0.7,0.3,0.3])
ax2.set_axes_locator(ip)
# I want to be able to control where the mark is connected to, independently of the data in the ax2.plot call
mark_inset(parent_axes, ax2, 2,4)
# plt.savefig('./inset_example.png')
plt.show()
The example code produces the following image:
So to sum up: The location of the blue box is entire controlled by the input data to ax2.plot(). I would like to manually place the blue box and enter whatever I want into ax2. Is this possible?
quick edit: to be clear, I understand why inset plots would have the data linked, as that's the most likely usage. So if there's a completely different way in matplotlib to accomplish this, do feel free to reply with that. However, I am trying to avoid manually placing boxes and lines to all of the axes I would place, as I need quite a few insets into a large image.
If I understand correctly, you want an arbitrarily scaled axis at a given position that looks like a zoomed inset, but has no connection to the inset marker's position.
Following your approach you can simply add another axes to the plot and position it at the same spot of the true inset, using the set_axes_locator(ip) function. Since this axis is drawn after the original inset, it will be on top of it and you'll only need to hide the tickmarks of the original plot to let it disappear completely (set_visible(False) does not work here, as it would hide the lines between the inset and the marker position).
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes, mark_inset, InsetPosition
data = np.random.normal(size=(200,200))
plt.imshow(data, origin='lower')
parent_axes = plt.gca()
ax2 = inset_axes(parent_axes, 1, 1)
ax2.plot([60,75],[90,110])
# hide the ticks of the linked axes
ax2.set_xticks([])
ax2.set_yticks([])
#add a new axes to the plot and plot whatever you like
ax3 = plt.gcf().add_axes([0,0,1,1])
ax3.plot([0,3,4], [2,3,1], marker=ur'$\u266B$' , markersize=30, linestyle="")
ax3.set_xlim([-1,5])
ax3.set_ylim([-1,5])
ip = InsetPosition(parent_axes,[0.7,0.7,0.3,0.3])
ax2.set_axes_locator(ip)
# set the new axes (ax3) to the position of the linked axes
ax3.set_axes_locator(ip)
# I want to be able to control where the mark is connected to, independently of the data in the ax2.plot call
mark_inset(parent_axes, ax2, 2,4)
plt.show()
FWIW, I came up with a hack that works.
In the source code for inset_locator, I added a version of mark_inset that takes another set of axes used to define the TransformedBbox:
def mark_inset_hack(parent_axes, inset_axes, hack_axes, loc1, loc2, **kwargs):
rect = TransformedBbox(hack_axes.viewLim, parent_axes.transData)
pp = BboxPatch(rect, **kwargs)
parent_axes.add_patch(pp)
p1 = BboxConnector(inset_axes.bbox, rect, loc1=loc1, **kwargs)
inset_axes.add_patch(p1)
p1.set_clip_on(False)
p2 = BboxConnector(inset_axes.bbox, rect, loc1=loc2, **kwargs)
inset_axes.add_patch(p2)
p2.set_clip_on(False)
return pp, p1, p2
Then in my original-post code I make an inset axis where I want the box to be, pass it to my hacked function, and make it invisible:
# location of desired axes
axdesire = inset_axes(parent_axes,1,1)
axdesire.plot([100,200],[100,200])
mark_inset_hack(parent_axes, ax2, axdesire, 2,4)
axdesire.set_visible(False)
Now I have a marked box at a different location in data units than the inset that I'm marking:
It is certainly a total hack, and at this point I'm not sure it's cleaner than simply drawing lines manually, but I think for a lot of insets this will keep things conceptually cleaner.
Other ideas are still welcome.

Remove grid lines, but keep frame (ggplot2 style in matplotlib)

Using Matplotlib I'd like to remove the grid lines inside the plot, while keeping the frame (i.e. the axes lines). I've tried the code below and other options as well, but I can't get it to work. How do I simply keep the frame while removing the grid lines?
I'm doing this to reproduce a ggplot2 plot in matplotlib. I've created a MWE below. Be aware that you need a relatively new version of matplotlib to use the ggplot2 style.
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pylab as P
import numpy as np
if __name__ == '__main__':
values = np.random.uniform(size=20)
plt.style.use('ggplot')
fig = plt.figure()
_, ax1 = P.subplots()
weights = np.ones_like(values)/len(values)
plt.hist(values, bins=20, weights=weights)
ax1.set_xlabel('Value')
ax1.set_ylabel('Probability')
ax1.grid(b=False)
#ax1.yaxis.grid(False)
#ax1.xaxis.grid(False)
ax1.set_axis_bgcolor('white')
ax1.set_xlim([0,1])
P.savefig('hist.pdf', bbox_inches='tight')
OK, I think this is what you are asking (but correct me if I misunderstood):
You need to change the colour of the spines. You need to do this for each spine individually, using the set_color method:
for spine in ['left','right','top','bottom']:
ax1.spines[spine].set_color('k')
You can see this example and this example for more about using spines.
However, if you have removed the grey background and the grid lines, and added the spines, this is not really in the ggplot style any more; is that really the style you want to use?
EDIT
To make the edge of the histogram bars touch the frame, you need to either:
Change your binning, so the bin edges go to 0 and 1
n,bins,patches = plt.hist(values, bins=np.linspace(0,1,21), weights=weights)
# Check, by printing bins:
print bins[0], bins[-1]
# 0.0, 1.0
If you really want to keep the bins to go between values.min() and values.max(), you would need to change your plot limits to no longer be 0 and 1:
n,bins,patches = plt.hist(values, bins=20, weights=weights)
ax.set_xlim(bins[0],bins[-1])

Add padding between bars and Y-Axis

I am building a bar chart using matplotlib using the code below. When my first or last column of data is 0, my first column is wedged against the Y-axis.
An example of this. Note that the first column is ON the x=0 point.
If I have data in this column, I get a huge padding between the Y-Axis and the first column as seen here. Note the additional bar, now at X=0. This effect is repeated if I have data in my last column as well.
My code is as follows:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MultipleLocator
binVals = [0,5531608,6475325,1311915,223000,609638,291151,449434,1398731,2516755,3035532,2976924,2695079,1822865,1347155,304911,3562,157,5,0,0,0,0,0,0,0,0]
binTot = sum(binVals)
binNorm = []
for v in range(len(binVals)):
binNorm.append(float(binVals[v])/binTot)
fig = plt.figure(figsize=(6,4))
ax1 = fig.add_subplot(1,1,1)
ax1.bar(range(len(binNorm)),binNorm,align='center', label='Values')
plt.legend(loc=1)
plt.title("Demo Histogram")
plt.xlabel("Value")
plt.xticks(range(len(binLabels)),binLabels,rotation='vertical')
plt.grid(b=True, which='major', color='grey', linestyle='--', alpha=0.35)
ax1.xaxis.grid(False)
plt.ylabel("% of Count")
plt.subplots_adjust(bottom=0.15)
plt.tight_layout()
plt.show()
How can I set a constant margin between the Y-axis and my first/last bar?
Additionally, I realize it's labeled "Demo Histogram", that is a because I missed it when correcting problems discussed here.
I can't run the code snippet you gave, and even with some modification I couldn't replicate the big space. Aside from that, if you need to enforce a border to matplotlib, you ca do somthing like this:
ax.set_xlim( min(your_data) - 10, None )
The first term tells the axis to put the border at 10 units of distance from the minimum of your data, the None parameter teels it to keep the present value.
to put it into contest:
from collections import Counter
from pylab import *
data = randint(20,size=1000)
res = Counter(data)
vals = arange(20)
ax = gca()
ax.bar(vals-0.4, [ res[i] for i in vals ], width=0.8)
ax.set_xlim( min(data)-1, None )
show()
searching around stackoverflow I just learned a new trick: you can call
ax.margins( margin_you_desire )
to let automatically let matplotlib put that amount of space around your plot. It can also be configured differently between x and y.
In your case the best solution would be something like
ax.margins(0.01, None)
The little catch is that the unit is in axes unit, referred to the size of you plot, so a margin of 1 will put space around your plot at both sizes big as your present plot
The problem is align='center'. Remove it.

Categories

Resources