I have a set of three variables for which I would like to calculate a boxplot. The three variables share the same title, and have different x label, and I want the three of them to be created in the same figure.
Look to the following example (with fake data):
import numpy as np
import matplotlib.pyplot as plt
data = dict(var1=np.random.normal(0, 1, 1000), var2=np.random.normal(0, 2, 1000), var3=np.random.normal(1, 2, 1000))
var_title = 'Really really long overlapping title'
fig = plt.figure()
for i in range(len(data)):
plt.subplot(1, 3, i + 1)
plt.boxplot(data[data.keys()[i]])
plt.title(var_title)
plt.show()
This code generates the following figure:
Now, what I want is either to set just one title over the three subplots (since the title is the same) or get python to automatically redimension the figure so that the title fits and can be read.
I am asking for this because this figure creation is part of a batch process in which the figures are saved automatically and used on the generation of PDF documents, so I cannot be involved on changing the figure dimensions one at a time.
In order to create a title for multiple subplots you can use Figure.suptitle instead of fig.title and format font size and such as specified there.
So your code will look like:
import numpy as np
import matplotlib.pyplot as plt
data = dict(var1=np.random.normal(0, 1, 1000),
var2=np.random.normal(0, 2, 1000), var3=np.random.normal(1, 2, 1000))
var_title = 'Really really long overlapping title'
fig = plt.figure()
fig.suptitle(var_title)
for i in range(len(data)):
plt.subplot(1, 3, i + 1)
plt.boxplot(data[data.keys()[i]])
plt.show()
Question is also answered here by orbeckst.
Based on the comment of #ImportanceOfBeingErnest you can do this:
import numpy as np
import matplotlib.pyplot as plt
data = dict(var1=np.random.normal(0, 1, 1000), var2=np.random.normal(0, 2, 1000), var3=np.random.normal(1, 2, 1000))
var_title = 'Really really long overlapping title'
f, ax = plt.subplots(1, 3)
for i in range(len(data)):
ax[i].boxplot(data[list(data.keys())[i]])
f.suptitle(var_title)
plt.show()
Related
I am following the statsmodels documentation here:
https://www.statsmodels.org/stable/vector_ar.html
I get to the part at the middle of the page that says:
irf.plot(orth=False)
which produces the following graph for my data:
I need to modify the elements of the graph. E.g., I need to apply tight_layout and also decrease the y-tick sizes so that they don't get into the graphs to their left.
The documentation talks about passing "subplot plotting funcions" in to the subplot argument of irf.plot(). But when I try something like:
irf.plot(subplot_params = {'fontsize': 8, 'figsize' : (100, 100), 'tight_layout': True})
only the fontsize parameter works. I also tried passing these parameters to the 'plot_params' argument but of no avail.
So, my question is how can I access other parameters of this irf.plot, especially the figsize and ytick sizes? I also need to force it to print a grid, as well as all values on the x axis (1, 2, 3, 4, ..., 10)
Is there any way I can create a blank plot using the fig, ax = plt.subplots() way and then create the irf.plot on that figure?
Looks like the function returns a matplotlib.figure:
Try doing this:
fig = irf.plot(orth=False,..)
fig.tight_layout()
fig.set_figheight(100)
fig.set_figwidth(100)
If I run it with this example, it works:
import numpy as np
import pandas
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
mdata = sm.datasets.macrodata.load_pandas().data
dates = mdata[['year', 'quarter']].astype(int).astype(str)
quarterly = dates["year"] + "Q" + dates["quarter"]
from statsmodels.tsa.base.datetools import dates_from_str
quarterly = dates_from_str(quarterly)
mdata = mdata[['realgdp','realcons','realinv']]
mdata.index = pandas.DatetimeIndex(quarterly)
data = np.log(mdata).diff().dropna()
model = VAR(data)
results = model.fit(maxlags=15, ic='aic')
irf = results.irf(10)
fig = irf.plot(orth=False)
fig.tight_layout()
fig.set_figheight(30)
fig.set_figwidth(30)
I simply want to change the the labels on the y-axis to show more numbers. For example with a range from 0 - 40, it shows numbers 0, 10, 20, 30, 40.
I want to see 0, 1, 2, 3, 4, ... 38, 39, 40.
Also I want a grid (supporting lines or how it's called) to be shown.
My code looks like this, where I have a dataframe with train dataset names, classifier names and times.
I am creating a boxplot for each classifier showing times spent on all datasets by that classifier.
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
## agg backend is used to create plot as a .png file
mpl.use('agg')
# read dataset
data = pd.read_csv("classifier_times_sml.csv", ";")
# extract data
g = data.sort_values("time", ascending=False)[["classifier", "train", "time"]].groupby("classifier")
# Create a figure instance
fig = plt.figure(1, figsize=(20, 30))
# Create an axes instance
ax = fig.add_subplot(111)
labels = []
times = []
counter = 0
for group, group_df in g:
# Create the boxplot
times.append( np.asarray(group_df["time"]) )
labels.append(group)
# Create the boxplot
bp = ax.boxplot(times, showfliers=False )
ax.set_xticklabels(labels, rotation=90)
# Save the figure
fig.savefig('times_sml.png', bbox_inches='tight')
I have been searching thotoughly and didn't find any useful option for the boxplot. The grid option for the ax.boxplot(...) is not allowed here. What am I doing wrong?
Use ax.set_yticks(np.arange(min,max,step)) or plt.yticks(np.arange(min,max,step))
and ax.grid(True) to turn on the grids.
Are you looking for something like this ?
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import seaborn as sns;sns.set()
from numpy import arange
data = np.random.randint(0,40,size=40)
fig = plt.figure(1, figsize=(20, 30))
ax = fig.add_subplot(111)
ax.boxplot(data)
ax.set_yticks(np.arange(0, 40, 1.0))
ax.grid(True)
plt.show()
I'm trying to create a 2x2 graphs in python and is struggling with the axes. This is what I get so far - the axes on each subplot is messed up.
This is my code:
def plotCarBar(df):
fig = plt.figure()
j = 1
for i in pandaDF.columns[15:18]:
cat_count = df.groupby(i)[i].count().sort_values().plot(figsize= 12,12), kind = 'line')
ax = fig.add_subplot(2, 2, j)
j += 1
return ax.plot(lw = 1.3)
plotCarBar(pandaDF)
Can someone please help? Thanks in advance!
I am not sure if you need two loops. If you post some sample data, we may be able to make better sense of what your cat_count line is doing. As it stands, I'm not sure if you need two counters (i and j).
Generally, I would also recommend using matplotlib directly, unless you're really just doing some quick and dirty plotting in pandas.
So, something like this might work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
randoms = np.random.rand(10, 4) # generate some data
print(randoms)
fig = plt.figure()
for i in range(1, randoms.shape[1] + 1): # number of cols
ax = fig.add_subplot(2, 2, i)
ax.plot(randoms[i, :])
plt.show()
Output:
[[0.78436298 0.85009767 0.28524816 0.28137471]
[0.58936976 0.00614068 0.25312449 0.58549765]
[0.24216048 0.13100618 0.76956316 0.66210005]
[0.95156085 0.86171181 0.40940887 0.47077143]
[0.91523306 0.33833055 0.74360696 0.2322519 ]
[0.68563804 0.69825892 0.5836696 0.97711073]
[0.62709986 0.44308186 0.24582971 0.97697002]
[0.04356271 0.01488111 0.73322443 0.04890864]
[0.9090653 0.25895051 0.73163902 0.83620635]
[0.51622846 0.6735348 0.20570992 0.13803589]]
Is it possible to create space between my axis labels? They are overlapping (30 labels crunched together) Using python pandas...
genreplot.columns =['genres','pct']
genreplot = genreplot.set_index(['genres'])
genreplot.plot(kind='barh',width = 1)
I would post a picture, but i don't have 10 reputation.....
I tried recreating your problem but not knowing what exactly your labels are, I can only give you general comments on this problem. There are a few things you can do to reduce the overlapping of labels, including their number, their font size, and their rotation.
Here is an example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
genreplot = pd.DataFrame(columns=['genres', 'pct'])
genreplot.genres = np.random.random_integers(1, 10, 20)
genreplot.pct = np.random.random_integers(1, 100, 20)
genreplot = genreplot.set_index(['genres'])
ax = genreplot.plot(kind='barh', width=1)
Now, you can set what your labels 5
pct_labels = np.arange(0, 100, 5)
ax.set_xticks(pct_labels)
ax.set_xticklabels(pct_labels, rotation=45)
For further reference, you can take a look at this page for documentation on xticks and yticks:
If your labels are quite long, and you are specifiying them from e.g. a list, you could consider adding some new lines as well:
labels = ['longgggggg_labelllllll_1',
'longgggggg_labelllllll_2']
new_labels = [label.replace('_', '\n') for label in labels]
new_labels
['longgggggg
labelllllll
1',
'longgggggg
labelllllll
2']
Is it possible to create space between my axis labels? They are overlapping (30 labels crunched together) Using python pandas...
genreplot.columns =['genres','pct']
genreplot = genreplot.set_index(['genres'])
genreplot.plot(kind='barh',width = 1)
I would post a picture, but i don't have 10 reputation.....
I tried recreating your problem but not knowing what exactly your labels are, I can only give you general comments on this problem. There are a few things you can do to reduce the overlapping of labels, including their number, their font size, and their rotation.
Here is an example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
genreplot = pd.DataFrame(columns=['genres', 'pct'])
genreplot.genres = np.random.random_integers(1, 10, 20)
genreplot.pct = np.random.random_integers(1, 100, 20)
genreplot = genreplot.set_index(['genres'])
ax = genreplot.plot(kind='barh', width=1)
Now, you can set what your labels 5
pct_labels = np.arange(0, 100, 5)
ax.set_xticks(pct_labels)
ax.set_xticklabels(pct_labels, rotation=45)
For further reference, you can take a look at this page for documentation on xticks and yticks:
If your labels are quite long, and you are specifiying them from e.g. a list, you could consider adding some new lines as well:
labels = ['longgggggg_labelllllll_1',
'longgggggg_labelllllll_2']
new_labels = [label.replace('_', '\n') for label in labels]
new_labels
['longgggggg
labelllllll
1',
'longgggggg
labelllllll
2']