Stacking multiple barchart subplots into one PDF (python, matplotlib) - python

I'm struggling hard with subplot madness. I've made a bunch of bar charts, which I want to save to one PDF in sequence. Each of which summarizes a binary variable (usually stacked, but unstacked is ok if it's simpler). The charts are fine, but when I try fitting them into a grid of subplots I muck it up!
My problems are 1) I'm not iterating through the data properly, and 2) I can't seem to stack one column of charts--only works with 2+.
Sorry for such a lame question, but this is the closest I've gotten! Any suggestions?
df = pd.DataFrame(np.random.randint(0,2,size=(100, 12)), columns=list('ABCDEFGHIJKL')) #load data
key_vars = list('ABCDEFGH') #variables to plot
num_plots = len(key_vars) #number of subplots
fig, ax = plt.subplots(num_plots, 2, sharex='col', sharey='row') #create figure
for i in range(num_plots):
for j in range(2):
ax[i,j].barh(df[key_vars[i]].value_counts(),10) #create subplots
fig.savefig('binary_barcharts.pdf') #save to .pdf

Are you looking for something like this:
(df[key_vars].apply(pd.Series.value_counts
.T.plot.bar(stacked=True)
)
Output:

Related

Setting the same x-scale but different x-limits for adjacent subplots matplotlib

I am trying to create a figure with three bar plots side by side. These bar plots have different yscales, but the data is fundamentally similar so I'd like all the bars to have the same width.
The only way I was able to get the bars to have the exact same width was by using sharex when creating the subplots, in order to keep the same x scale.
import matplotlib.pyplot as plt
BigData = [[100,300],[400,200]]
MediumData = [[40, 30],[50,20],[60,50],[30,30]]
SmallData = [[3,2],[11,3],[7,5]]
data = [BigData, MediumData, SmallData]
colors = ['#FC766A','#5B84B1']
fig, axs = plt.subplots(1, 3, figsize=(30,5), sharex=True)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
axs[subplot].bar(bar_x,bar_y, width = 0.2, color = colors[type])
subplot += 1
plt.show()
This creates this figure:
The problem with this is that the x-limits of the plot are also shared, leading to unwanted whitespace. I've tried setting the x-bounds after the fact, but it doesn't seem to override sharex. Is there a way to make the bars have the same width, without each subplot also being the same width?
Additionally, is there a way to create such a plot (one with different y scales to depending on the size of the data) without having to sort the data manually beforehand, like shown in my code?
Thanks!
Thanks to Jody Klymak for help finding this solution! I thought I should document it for future users.
We can make use of the 'width_ratios' GridSpec parameter. Unfortunately there's no way to specify these ratios after we've already drawn a graph, so the best way I found to implement this is to write a function that creates a dummy graph, and measures the x-limits from that graph:
def getXRatios(data, size):
phig, aks = plt.subplots(1, 3, figsize=size)
subplot = 0
for scale in data:
for type in range(2):
bar_x = [x + type*0.2 for x in range(len(scale))]
bar_y = [d[type] for d in scale]
aks[subplot].bar(bar_x,bar_y, width = 0.2)
subplot += 1
ratios = [aks[i].get_xlim()[1] for i in range(3)]
plt.close(phig)
return ratios
This is essentially identical to the code that creates the actual figure, with the cosmetic aspects removed, as all we want from this dummy figure is the x-limits of the graph (something we can't get from our actual figure as we need to define those limits before we start in order to solve the problem).
Now all you need to do is call this function when you're creating your subplots:
fig, axs = plt.subplots(1, 3, figsize=(40,5), gridspec_kw = {'width_ratios':getXRatios(data,(40,5))})
As long as your XRatio function creates your graph in the same way your actual graph does, everything should work! Here's my output using this solution.
To save space you could re-purpose the getXRatios function to also construct your final graph, by calling itself in the arguments and giving an option to return either the ratios or the final figure. I couldn't be bothered.

Creating a single tidy seaborn plot in a 'for' loop

I'm trying to generate a plot in seaborn using a for loop to plot the contents of each dataframe column on its own row.
The number of columns that need plotting can vary between 1 and 30. However, the loop creates multiple individual plots, each with their own x-axis, which are not aligned and with a lot of wasted space between the plots. I'd like to have all the plots together with a shared x-axis without any vertical spacing between each plot that I can then save as a single image.
The code I have been using so far is below.
comp_relflux = measurements.filter(like='rel_flux_C', axis=1) *# Extracts relevant columns from larger dataframe
comp_relflux=comp_relflux.reindex(comp_relflux.mean().sort_values().index, axis=1) # Sorts into order based on column mean.
plt.rcParams["figure.figsize"] = [12.00, 1.00]
for column in comp_relflux.columns:
plt.figure()
sns.scatterplot((bjd)%1, comp_relflux[column], color='b', marker='.')
This is a screenshot of the resultant plots.
I have also tried using FacetGrid, but this just seems to plot the last column's data.
p = sns.FacetGrid(comp_relflux, height=2, aspect=6, despine=False)
p.map(sns.scatterplot, x=(bjd)%1, y=comp_relflux[column])
To combine the x-axis labels and have just one instead of having it for each row, you can use sharex. Also, using plt.subplot() to the number of columns you have, you would also be able to have just one figure with all the subplots within it. As there is no data available, I used random numbers below to demonstrate the same. There are 4 columns of data in my df, but have kept as much of your code and naming convention as is. Hope this is what you are looking for...
comp_relflux = pd.DataFrame(np.random.rand(100, 4)) #Random data - 4 columns
bjd=np.linspace(0,1,100) # Series of 100 points - 0 to 1
rows=len(comp_relflux.columns) # Use this to get column length = subplot length
fig, ax = plt.subplots(rows, 1, sharex=True, figsize=(12,6)) # The subplots... sharex is assigned here and I move the size in here from your rcParam as well
for i, column in enumerate(comp_relflux.columns):
sns.scatterplot((bjd)%1, comp_relflux[column], color='b',marker='.', ax=ax[i])
1 output plot with 4 subplots

Automatically plot same shapefile on multiple subplots

I'm trying to compare two sets of London Airbnb data. I want an elegant way to plot the London shapefile on two subplots, and then overlay the different data as points on each map. My shapefile is from here:
londonshp = gpd.read_file("statistical-gis-boundaries london\ESRI\London_Borough_Excluding_MHW.shp")
londonshp = londonshp.to_crs(4326)`
This is the code to plot the maps:
fig, axes = plt.subplots(ncols=2, figsize = (12,16))
#entire home/apt on left
axes[0].set_aspect('equal')
londonshp.plot(ax = axes[0],
color = '#e0e1dd',
edgecolor = '#1c1c1c')
axes[0].scatter(entirehomedf.longitude,
entirehomedf.latitude,
s = 1,
c = '#2ec4b6',
marker = '.')
axes[0].set_yticklabels([])
axes[0].set_xticklabels([])
axes[0].set_title("Entire Homes/Apts")
#private room on right
axes[1].set_aspect('equal')
londonshp.plot(ax = axes[1],
color = '#e0e1dd',
edgecolor = '#1c1c1c')
axes[1].scatter(privateroomdf.longitude,
privateroomdf.latitude,
s = 1,
c = '#ff9f1c')
axes[1].set_yticklabels([])
axes[1].set_xticklabels([])
axes[1].set_title("Private Rooms")
Result:
The code I have works fine, but it seems inelegant.
Manually plotting the shapefile on each subplot is ok for just two subplots, but not ideal for larger numbers of subplots. I imagine there's a quicker way to do it automatically (e.g. a loop?)
Some scatterplot features (like marker shape/size) are the same on each subplot. I'm sure there's a better way to set these features for the whole figure, and then edit features which are individual to each subplot (like colour) separately.
I won't code it out, but I can give you some tips:
Yes you can use loops to plot multiple subplots, all you have to do is iterate through multiple lists of variables you want to change e.g. colour and data and use them in the loop
When you use the loop, you can easily access all the different variables needed, including all your features for your graphs, e.g.:
c= ["blue","red","yellow"]
for x in range(3):
plt.plot(...,color=c[x])

Python - Multiple Plots in a Single Figure - Loop in DIfferent columns

I'm trying to plot in a single image, multiple columns of a table.
The idea is to optimize the process with a loop.
It is important to note that all the columns share the same y-axis, and that the x scale varies for each column.
The Final result should look something like this:
I've already tried some things, but with no success, in my code I'm creating several figures, only plotting in the first graph:
def facies_plot_all(logs):
logs = sort_values(by='y')
ztop=logs.Y.min(); zbot=logs.Y.max()
for col in logs.columns:
numcol = (logs.shape[1])
f, ax = plt.subplots (nrows=1, ncols=numcol, figsize (20,25))
ax[x+1].plot(logs[col],logs.Y,'-')
I'm relatively new to programming and still searching for a way to solve this issue.
Any help will be welcome!
Put subplots outside of for loop:
logs = sort_values(by='y')
ztop=logs.Y.min(); zbot=logs.Y.max()
numcol = (logs.shape[1])
f, axes es= plt.subplots (nrows=1, ncols=numcol,
sharey=True,
figsize=(20,25))
for (ax, col) in zip(axes,logs.columns):
ax.plot(logs[col],logs.Y,'-')

Visual output of Matplotlib bar chart - Python

I am working on getting some graphs generated for 4 columns, with the COLUMN_NM being the main index.
The issue I am facing is the column names are showing along the bottom. This is problematic for 2 reasons, first being there could be dozens of these columns so the graph would look messy and could stretch too far to the right. Second being they are getting cut off (though I am sure that can be fixed)
I would prefer to have the column names listed vertically in the box where 'MAX_COL_LENGTH' current resides, and have the bars different colors per column instead.
Any ideas how I would adjust this or suggestions to make this better?
for col in ['DISTINCT_COUNT', 'MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT']:
grid[['COLUMN_NM', col]].set_index('COLUMN_NM').plot.bar(title=col)
plt.show()
In this case you can plot points one by one and setup the label name for each point:
gs = gridspec.GridSpec(1,1)
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(gs[:, :])
data = [1,2,3,4,5]
label = ['l1','l2','l3','l4','l5']
for n,(p,l) in enumerate(zip(data,label)):
ax.bar(n,p,label=l)
ax.set_xticklabels([])
ax.legend()
This is the output for the code above:

Categories

Resources