I am plotting several data types which share the x axis so I am using the matplotlib.pylot subplots command
The shared x axis is time (in years AD). The last subplot I have is the number of independent observations as a function of the time. I have the following code
import numpy as np
import matplotlib.pyplot as plt
#
# There's a bunch of data analysis here
#
f, ax = plt.subplots(4, sharex=True)
# Here I plot the first 3 subplots with no issue
x = np.arange(900, 2000, 1)#make x array in steps of 1
ax[3].plot(x[0:28], np.ones(len(x[0:28])),'k')#one observation from 900-927 AD
ax[3].plot(x[29:62], 2*np.ones(len(x[29:62])),'k')#two observations from 928-961 AD
Now when I run this code, the subplot I get only shows the second ax[3] plot and not the first. How can I fix this?? Thanks
Ok, I think I found an answer. The first plot was plotting but I couldn't see it with the axes so I changed the y limits
ax[3].axes.set_ylim([0 7])
That seemed to work, although is there a way to connect these horizontal lines, perhaps with dashed lines?
Related
I'm trying to make a subplot of histograms for each of the features in the dataset.
The following code is what I have already tried to fix the problem. Consider train dataset, which has 9 columns and which I want to be plotted in a 3*3 subplot.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=3, ncols=3)
i=0
for row in ax:
for col in row:
train.iloc[:,i].hist()
i=i+1
I'm getting all histograms in the last subplot.
here my suggestion:
import matplotlib.pyplot as plt
import random
for i in range(1,7):
# Cut your figure into 3 row and 3 columns
# and create the plot in the i subplot.
# here I used the f-string formatting that is available from python3.6
plt.subplot(f'33{i}')
plt.hist(random.randrange(0, 10))
you can find more ideas at this amazing website: The Python Graph Gallery
pandas.DataFrame.hist can take an ax parameter which is the Matplotlib axes to use.
I'm using matplotlib in python to create heatmaps for different clusters I've created using k-means clustering. Right now I'm able to produce this figure:
But I want the number of rows in each cluster reflected in the size of the heatmap, instead of them all being scaled to the same size. Is GridSpec the right way to do this? It's the only thing I can find trying to Google the solution, but it seems more suited to situations where you have subplots on a grid and you want a certain subplot to span more than one row or column on the grid. In this situation, I would be creating a grid with thousands of rows and telling each subplot to span hundreds of them. Is this still the best way to do it?
Edit: In case my question isn't clear, I'm ultimately trying to create a figure like this one. Notice how it's easy to see in the left figure that cluster E is larger than cluster F:
GridSpec has an argument height_ratios. You can set it to a list of the vertical shape of the heatmaps.
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.rand(n,8) for n in [3,7,10,4]]
fig, axes = plt.subplots(nrows=len(data),
gridspec_kw=dict(height_ratios=[d.shape[0] for d in data]))
for ax, d in zip(axes, data):
ax.imshow(d)
ax.tick_params(labelbottom=False)
plt.show()
After review this question (How do I tell Matplotlib to create a second (new) plot, then later plot on the old one?) I thought I had figured this out, but I think I'm running into an issue with my for loops. Here is a pared down version of what I'm doing.
import matplotlib.pyplot as plt
import numpy as np
for m in range(2):
x=np.arange(5)
y=np.exp(m*x)
plt.figure(1)
plt.plot(x, y)
plt.show()
...
z=np.sin(x+(m*math.pi))
plt.figure(2)
plt.plot(x,z)
...
plt.figure(2)
plt.show()
My hope was that this would display three plots: a plot for e^(0) vs x the first time through, a plot of e^x vs x the second time through, and then one plot with both sin(x) and sin(x+pi) vs x.
But instead I get the first two plots and a plot with just sin(x) and plot with just sin(x+pi).
How do I get all the data I want on to figure 2? It seems to be some sort of issue with the set figure resetting when I return to the beginning of the loop.
This minimal change will probably do what you want (although it is not the best code).
Replace plt.figure(1) with plt.figure(). Remove any plt.show() from inside the loop.
The loop will end and then all 3 figures will be shown. The e^x curves will be in figures #1 and #3.
I am looping through a bunch of CSV files containing various measurements.
Each file might be from one of 4 different data sources.
In each file, I merge the data into monthly datasets, that I then plot in a 3x4 grid. After this plot has been saved, the loop moves on and does the same to the next file.
This part I got figured out, however I would like to add a visual clue to the plots, as to what data it is. As far as I understand it (and tried it)
plt.subplot(4,3,1)
plt.hist(Jan_Data,facecolor='Red')
plt.ylabel('value count')
plt.title('January')
does work, however this way, I would have to add the facecolor='Red' by hand to every 12 subplots. Looping through the plots wont work for this situation, since I want the ylabel only for the leftmost plots, and xlabels for the bottom row.
Setting facecolor at the beginning in
fig = plt.figure(figsize=(20,15),facecolor='Red')
does not work, since it only changes the background color of the 20 by 15 figure now, which subsequently gets ignored when I save it to a PNG, since it only gets set for screen output.
So is there just a simple setthecolorofallbars='Red' command for plt.hist(… or plt.savefig(… I am missing, or should I just copy n' paste it to all twelve months?
You can use mpl.rc("axes", color_cycle="red") to set the default color cycle for all your axes.
In this little toy example, I use the with mpl.rc_context block to limit the effects of mpl.rc to just the block. This way you don't spoil the default parameters for your whole session.
import matplotlib as mpl
import matplotlib.pylab as plt
import numpy as np
np.random.seed(42)
# create some toy data
n, m = 2, 2
data = []
for i in range(n*m):
data.append(np.random.rand(30))
# and do the plotting
with mpl.rc_context():
mpl.rc("axes", color_cycle="red")
fig, axes = plt.subplots(n, m, figsize=(8,8))
for ax, d in zip(axes.flat, data):
ax.hist(d)
The problem with the x- and y-labels (when you use loops) can be solved by using plt.subplots as you can access every axis seperately.
import matplotlib.pyplot as plt
import numpy.random
# creating figure with 4 plots
fig,ax = plt.subplots(2,2)
# some data
data = numpy.random.randn(4,1000)
# some titles
title = ['Jan','Feb','Mar','April']
xlabel = ['xlabel1','xlabel2']
ylabel = ['ylabel1','ylabel2']
for i in range(ax.size):
a = ax[i/2,i%2]
a.hist(data[i],facecolor='r',bins=50)
a.set_title(title[i])
# write the ylabels on all axis on the left hand side
for j in range(ax.shape[0]):
ax[j,0].set_ylabel(ylabel[j])
# write the xlabels an all axis on the bottom
for j in range(ax.shape[1]):
ax[-1,j].set_xlabel(xlabels[j])
fig.tight_layout()
All features (like titles) which are not constant can be put into arrays and placed at the appropriate axis.
I am trying to make a matplotlib figure that will have multiple horizontal boxplots stacked on one another. The documentation shows both how to make a single horizontal boxplot and how to make multiple vertically oriented plots in this section.
I tried using subplots as in the following code:
import numpy as np
import pylab as plt
totfigs = 5
plt.figure()
plt.hold = True
for i in np.arange(totfigs):
x = np.random.random(50)
plt.subplot('{0}{1}{2}'.format(totfigs,1,i+1))
plt.boxplot(x,vert=0)
plt.show()
My output results in just a single horizontal boxplot though.
Any suggestions anyone?
Edit: Thanks to #joaquin, I fixed the plt.subplot call line. Now the subplot version works, but still would like the boxplots all in one figure...
If I'm understanding you correctly, you just need to pass boxplot a list (or a 2d array) containing each array you want to plot.
import numpy as np
import pylab as plt
totfigs = 5
plt.figure()
plt.hold = True
boxes=[]
for i in np.arange(totfigs):
x = np.random.random(50)
boxes.append(x)
plt.boxplot(boxes,vert=0)
plt.show()
try:
plt.subplot('{0}{1}{2}'.format(totfigs, 1, i+1) # n rows, 1 column
or
plt.subplot('{0}{1}{2}'.format(1, totfigs, i+1)) # 1 row, n columns
from the docstring:
subplot(*args, **kwargs)
Create a subplot command, creating axes with::
subplot(numRows, numCols, plotNum)
where plotNum = 1 is the first plot number and increasing plotNums
fill rows first. max(plotNum) == numRows * numCols
if you want them all together, shift them conveniently. As an example with a constant shift:
for i in np.arange(totfigs):
x = np.random.random(50)
plt.boxplot(x+(i*2),vert=0)