I'm trying to compare two sets of London Airbnb data. I want an elegant way to plot the London shapefile on two subplots, and then overlay the different data as points on each map. My shapefile is from here:
londonshp = gpd.read_file("statistical-gis-boundaries london\ESRI\London_Borough_Excluding_MHW.shp")
londonshp = londonshp.to_crs(4326)`
This is the code to plot the maps:
fig, axes = plt.subplots(ncols=2, figsize = (12,16))
#entire home/apt on left
axes[0].set_aspect('equal')
londonshp.plot(ax = axes[0],
color = '#e0e1dd',
edgecolor = '#1c1c1c')
axes[0].scatter(entirehomedf.longitude,
entirehomedf.latitude,
s = 1,
c = '#2ec4b6',
marker = '.')
axes[0].set_yticklabels([])
axes[0].set_xticklabels([])
axes[0].set_title("Entire Homes/Apts")
#private room on right
axes[1].set_aspect('equal')
londonshp.plot(ax = axes[1],
color = '#e0e1dd',
edgecolor = '#1c1c1c')
axes[1].scatter(privateroomdf.longitude,
privateroomdf.latitude,
s = 1,
c = '#ff9f1c')
axes[1].set_yticklabels([])
axes[1].set_xticklabels([])
axes[1].set_title("Private Rooms")
Result:
The code I have works fine, but it seems inelegant.
Manually plotting the shapefile on each subplot is ok for just two subplots, but not ideal for larger numbers of subplots. I imagine there's a quicker way to do it automatically (e.g. a loop?)
Some scatterplot features (like marker shape/size) are the same on each subplot. I'm sure there's a better way to set these features for the whole figure, and then edit features which are individual to each subplot (like colour) separately.
I won't code it out, but I can give you some tips:
Yes you can use loops to plot multiple subplots, all you have to do is iterate through multiple lists of variables you want to change e.g. colour and data and use them in the loop
When you use the loop, you can easily access all the different variables needed, including all your features for your graphs, e.g.:
c= ["blue","red","yellow"]
for x in range(3):
plt.plot(...,color=c[x])
Related
I am trying to create a figure with three bar plots side by side. These bar plots have different yscales, but the data is fundamentally similar so I'd like all the bars to have the same width.
The only way I was able to get the bars to have the exact same width was by using sharex when creating the subplots, in order to keep the same x scale.
import matplotlib.pyplot as plt
BigData = [[100,300],[400,200]]
MediumData = [[40, 30],[50,20],[60,50],[30,30]]
SmallData = [[3,2],[11,3],[7,5]]
data = [BigData, MediumData, SmallData]
colors = ['#FC766A','#5B84B1']
fig, axs = plt.subplots(1, 3, figsize=(30,5), sharex=True)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
axs[subplot].bar(bar_x,bar_y, width = 0.2, color = colors[type])
subplot += 1
plt.show()
This creates this figure:
The problem with this is that the x-limits of the plot are also shared, leading to unwanted whitespace. I've tried setting the x-bounds after the fact, but it doesn't seem to override sharex. Is there a way to make the bars have the same width, without each subplot also being the same width?
Additionally, is there a way to create such a plot (one with different y scales to depending on the size of the data) without having to sort the data manually beforehand, like shown in my code?
Thanks!
Thanks to Jody Klymak for help finding this solution! I thought I should document it for future users.
We can make use of the 'width_ratios' GridSpec parameter. Unfortunately there's no way to specify these ratios after we've already drawn a graph, so the best way I found to implement this is to write a function that creates a dummy graph, and measures the x-limits from that graph:
def getXRatios(data, size):
phig, aks = plt.subplots(1, 3, figsize=size)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
aks[subplot].bar(bar_x,bar_y, width = 0.2)
subplot += 1
ratios = [aks[i].get_xlim()[1] for i in range(3)]
plt.close(phig)
return ratios
This is essentially identical to the code that creates the actual figure, with the cosmetic aspects removed, as all we want from this dummy figure is the x-limits of the graph (something we can't get from our actual figure as we need to define those limits before we start in order to solve the problem).
Now all you need to do is call this function when you're creating your subplots:
fig, axs = plt.subplots(1, 3, figsize=(40,5), gridspec_kw = {'width_ratios':getXRatios(data,(40,5))})
As long as your XRatio function creates your graph in the same way your actual graph does, everything should work! Here's my output using this solution.
To save space you could re-purpose the getXRatios function to also construct your final graph, by calling itself in the arguments and giving an option to return either the ratios or the final figure. I couldn't be bothered.
optional context feel free to skip: I'm currently using cartopy and matplotlib to read in and plot weather model data on a map. I have three different fields I'm plotting: temperature, wind, and surface pressure. I'm using contourf, barbs, and contour respectively to plot each field. I want one image for each field, and then I'd like one image that contains all three fields overlaid on a single map. Currently I'm doing this by plotting each field individually, saving each of the individual images, then replotting all three fields on a single ax and a new fig, and saving that fig. Since the data takes a while to plot, I would like to be able to plot each of the single fields, then combine the axes into one final image.
I'd like to be able to combine multiple matplotlib axes without replotting the data on the axes. I'm not sure if this is possible, but doing so would be a pretty major time and performance saver. An example of what I'm talking about:
from matplotlib import pyplot as plt
import numpy as np
x1 = np.linspace(0, 2*np.pi, 100)
x2 = x1 + 5
y = np.sin(x1)
firstFig = plt.figure()
firstAx = firstFig.gca()
firstAx.scatter(x1, y, 1, "red")
firstAx.set_xlim([0, 12])
secondFig = plt.figure()
secondAx = secondFig.gca()
secondAx.scatter(x2, y, 1, "blue")
secondAx.set_xlim([0, 12])
firstFig.savefig("1.png")
secondFig.savefig("2.png")
This generates two images, 1.png and 2.png.
Is it possible to save a third file, 3.png that would look something like the following, but without calling scatter again, because for my dataset, the actual plotting takes a long time?
If you just want to save images of your plots and you don't intend to further use the Figure objects, you can use the following after saving "2.png".
# get the scatter object from the first figure
scatter = firstAx.get_children()[0]
# remove it from this collection so you can assign it to a new axis
# the axis reassignment will raise an error if it already belongs to another axis
scatter.remove()
scatter.axes = secondAx
# now you can add it to your new axis
secondAx.add_artist(scatter)
secondFig.savefig("3.png")
This modifies both figures, as it removes a scatter from one and adds it to another. If for some reason you want to preserve them, you can copy the contents of secondFig to a new one and then add the scatter to that. However, this will still modify the first plot as you have to remove the scatter from there.
Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:
I'm writing a pythonic script for a coastal engineering application which should output, amongst other things, a figure with two subplots.
The problem is that I would like to shade a section of both subplots using plt.axvspan() but for some reason it only shades one of them.
Please find below an excerpt of the section of the code where I set up the plots as well as the figure that it's currently outputting (link after code).
Thanks for your help, and sorry if this is a rookie question (but it just happens that I am indeed a rookie in Python... and programming in general) but I couldn't find an answer for this anywhere else.
Feel free to add any comments to the code.
# PLOTTING
# now we generate a figure with the bathymetry vs required m50 and another figure with bathy vs Hs
#1. Generate plots
fig = plt.figure() # Generate Figure
ax = fig.add_subplot(211) # add the first plot to the figure.
depth = ax.plot(results[:,0],results[:,1]*-1,label="Depth [mDMD]") #plot the first set of data onto the first set of axis.
ax2 = ax.twinx() # generate a secondary vertical axis with the same horizontal axis as the first
m50 = ax2.plot(results[:,0],results[:,6],"r",label="M50 [kg]") # plot the second set of data onto the second vertical axis
ax3 = fig.add_subplot(212) # generate the second subplot
hs = ax3.plot(results[:,0],results[:,2],"g",label="Hs(m)")
#Now we want to find where breaking starts to occur so we shade it on the plot.
xBreakingDistance = results[numpy.argmax(breakingIndex),0]
# and now we plot a box from the origin to the depth of breaking.
plt.axvspan(0,xBreakingDistance,facecolor="b",alpha=0.1) # this box is called a span in matplotlib (also works for axhspan)
# and then we write BREAKING ZONE in the box we just created
yLimits = ax.get_ylim() # first we get the range of y being plotted
yMiddle = (float(yLimits[1])-float(yLimits[0])) / 2 + yLimits[0] # then we calculate the middle value in y (to center the text)
xMiddle = xBreakingDistance / 2 # and then the middle value in x (to center the text)
#now we write BREAKING ZONE in the center of the box.
ax.text(xMiddle,yMiddle,"BREAKING ZONE",fontweight="bold",rotation=90,verticalalignment="center",horizontalalignment="center")
#FIGURE FORMATTING
ax.set_xlabel("Distance [m]") # define x label
ax.set_ylabel("Depth [mDMD]") # define y label on the first vertical axis (ax)
ax2.set_ylabel("M50 [kg]") # define y label on the second vertical axis (ax2)
ax.grid() # show grid
ax3.set_xlabel("Distance[m]") #define x label
ax3.set_ylabel("Hs[m]") # define y label
ax3.grid()
plt.tight_layout() # minimize subplot labels overlapping
# generating a label on a plot with 2 vertical axis is not very intuitive. Normally we would just write ax.label(loc=0)
combined_plots = depth+m50 #first we need to combine the plots in a vector
combined_labels = [i.get_label() for i in combined_plots] # and then we combine the labels
ax.legend(combined_plots,combined_labels,loc=0) # and finally we plot the combined_labels of the combined_plots
plt.savefig("Required M50(kg) along the trench.png",dpi=1000)
plt.close(fig)
Output Figure:
By just calling plt.axvspan, you are telling matplotlib to create the axvspan on the currently active axes (i.e. in this case, the last one you created, ax3)
You need to plot the axvspan on both of the axes you would like for it to appear on. In this case, ax and ax3.
So, you could do:
ax.axvspan(0,xBreakingDistance,facecolor="b",alpha=0.1)
ax3.axvspan(0,xBreakingDistance,facecolor="b",alpha=0.1)
or in one line:
[this_ax.axvspan(0,xBreakingDistance,facecolor="b",alpha=0.1) for this_ax in [ax,ax3]]
It's difficult to analyze your code and not being able to reproduce it. I advise you to build a minimal example. In any case notice that you are calling "plt.axvspan(" which is general call to the library.
You need to specifically state that you want this in both "ax" and "ax2" (i think).
Also if you need more control consider using Patches (I don't know axvspan):
import matplotlib.pyplot as plt
import matplotlib.patches as patches
fig1 = plt.figure()
ax1 = fig1.add_subplot(111, aspect='equal')
ax1.add_patch(
patches.Rectangle(
(0.1, 0.1), # (x,y)
0.5, # width
0.5, # height
)
)
fig1.savefig('rect1.png', dpi=90, bbox_inches='tight')
See that call to "ax1" in the example? Just make something similar to yours. Or just add axvspan to each of your plots.
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.