I am trying to plot multiple figures on a single pane using matplotlib.pyplot's subplot. Here is my current code.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"col1": [1,2], "col2": [3,4], "col3": [5,6], "col4": [7,8], "target": [9,10]})
f, axs = plt.subplots(nrows = 2, ncols = 2, sharey = True)
# for ax in axs.flat:
# ax.label_outer()
for k, col in enumerate(df.columns):
if col != "target":
idx = np.unravel_index(k, (2,2))
axs[idx].scatter(df[col], df.target)
axs[idx].set_xlabel(col)
As it stands, with the two lines commented out, this prints all the xticks but only the xlabels for the bottom two plots.
If I uncomment those two lines, then the all the xlabels appear, but the xticks on the top row disappear. I think this is because the space has been 'freed up' by the [label_outer][2] function
I don't see how I can have both on the top row. If one prints out all the xlabels, then they are indeed all there.
Any help would be most appreciated!
You just need to call plt.tight_layout() after your loop. Refer to the guide to know more about options and capabilities.
Related
Context: I'd like to plot multiple subplots (sparated by legend) based on patterns from the columns of a dataframe inside a subplot however, I'm not being able to separate each subplots into another set of subplots.
This is what I have:
import matplotlib.pyplot as plt
col_patterns = ['pattern1','pattern2']
# define subplot grid
fig, axs = plt.subplots(nrows=len(col_patterns), ncols=1, figsize=(30, 80))
plt.subplots_adjust()
fig.suptitle("Title", fontsize=18, y=0.95)
for col_pat,ax in zip(col_patterns,axs.ravel()):
col_pat_columns = [col for col in df.columns if col_pat in col]
df[col_pat_columns].plot(x='Week',ax=ax)
# chart formatting
ax.set_title(col_pat.upper())
ax.set_xlabel("")
Which results in something like this:
How could I make it so that each one of those suplots turn into another 6 subplots all layed out horizontally? (i.e. each figure legend would be its own subplot)
Thank you!
In your example, you're defining a 2x1 subplot and only looping through two axes objects that get created. In each of the two loops, when you call df[col_pat_columns].plot(x='Week',ax=ax), since col_pat_columns is a list and you're passing it to df, you're just plotting multiple columns from your dataframe. That's why it's multiple series on a single plot.
#fdireito is correct—you just need to set the ncols argument of plt.subplots() to the right number that you need, but you'd need to adjust your loops to accommodate.
If you want to stay in matplotlib, then here's a basic example. I had to take some guesses as to how your dataframe was structured and so on.
# import matplotlib
import matplotlib.pyplot as plt
# create some fake data
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
# a list of lists, where each inner list is a set of
# columns we want in the same row of subplots
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
The following is a simplified example of what your code ends up doing.
fig, axes = plt.subplots(len(col_patterns), 1)
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
2x1 subplot (what you have right now)
I use enumerate() with col_patterns to iterate through the subplot rows, and then use enumerate() with each column name in a given pattern to iterate through the subplot columns.
# the following will size your subplots according to
# - number of different column patterns you want matched (rows)
# - largest number of columns in a given column pattern (columns)
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col])
Correctly sized subplot
Here's all the code, with a couple additions I omitted from the code above for simplicity's sake.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
df = pd.DataFrame({
'a':[1, 1, 1, 1, 1], # horizontal line
'b':[3, 6, 9, 6, 3], # pyramid
'c':[4, 8, 12, 16, 20], # steep line
'd':[1, 10, 3, 13, 5] # zig-zag
})
col_patterns = [['a', 'b', 'c'], ['b', 'c', 'd']]
# what you have now
fig, axes = plt.subplots(len(col_patterns), 1, figsize=(12, 8))
for pat, ax in zip(col_patterns, axes):
ax.plot(x, df[pat])
ax.legend(pat, loc='upper left')
# what I think you want
subplot_rows = len(col_patterns)
subplot_cols = max([len(x) for x in col_patterns])
fig, axes = plt.subplots(subplot_rows, subplot_cols, figsize=(16, 8), sharex=True, sharey=True, tight_layout=True)
for nrow, pat in enumerate(col_patterns):
for ncol, col in enumerate(pat):
axes[nrow][ncol].plot(x, df[col], label=col)
axes[nrow][ncol].legend(loc='upper left')
Another option you can consider is ditching matplotlib and using Seaborn relplots. There are several examples on that page that should help. If you have your dataframe set up correctly (long or "tidy" format), then to achieve the same as above, your one-liner would look something like this:
# import seaborn as sns
sns.relplot(data=df, kind='line', x=x_vals, y=y_vals, row=col_pattern, col=num_weeks_rolling)
I have some dataframes that I'd like to plot the information into the same area. The first data frame uses hue and plots some bars, and subsequently all plots in the same axis should map to those xticks (they might not be in the same order). See this example:
import seaborn as sns
import matplotlib.pyplot as plt
df1 = pd.DataFrame({ "col" : ["col_a", "col_a", "col_a", "col_c", "col_c", "col_b", "col_b"], "cluster": ["A", "B", "C", "A", "B", "A", "C"], "value_x":[2,4,1,5,6,2,1]})
df2 = pd.DataFrame({ "col" : ["col_a", "col_b", "col_c"], "value_y": [11,13,9]})
f, ax = plt.subplots(1, figsize=(15, 5))
# This will write the "master order" of the xticks
sns.barplot(x="col", y="value_x", hue="cluster", data=df1, ax=ax)
# Follow plots in the same plot should map to those xticks
ax = sns.lineplot(
data=df2,
x="col",
y="value_y",
ax=ax,
)
The second line will not map correctly to the xticks. I was thinking in getting all the labels from the initial plot using "get_xticklabels" and using that as the master to join all subsequent frames so that when I plot them the order matches, but I was hoping there might be a better solution.
Thank you!
What is happening is that sns.barplot is plotting the values of df1. First it finds "col_a" than "col_c" and finally "col_b". Then you plot the line, where it finds "col_a", "col_b" and "col_c".
All you need to do is to sort the df1 before plotting:
sns.barplot(x="col", y="value_x", hue="cluster", data=df1.sort_values(by=['col']), ax=ax)
I want to create efficient code in which I can pass a set of dataframe columns to a for-loop or list comprehension and it will return a set of subplots of the same type (one for each variable) depending on the type of matplotlib or seaborn plot I want to use. I'm looking for an approach that is relatively agnostic to the type of graph.
I've only tried to create code using matplotlib. Below, I provide a simple dataframe and the latest code I tried.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
df.index.name='Year'
fig, axs = plt.subplots(ncols=3,figsize=(8,4))
for yvar in df:
ts = pd.Series(yvar, index = df.index)
ts.plot(kind = 'line',ax=axs[i])
plt.show()
I expect to see a subplot for each variable that is passed to the loop.
Is this what you are looking for
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
plt.figure(figsize=(10,10))
for i, col in enumerate(df.columns):
plt.subplot(1,3,i+1)
plt.plot(df.index, df[col], label=col)
plt.xticks(df.index)
plt.legend(loc='upper left')
plt.show()
Use plt.subplot(no_of_rows, no_of_cols, current_subplot_number) to set the current plotting to a subplot. Any plotting done will go the current_subplot_number.
Loop over both, the columns and the axes simultaneously. Show the plot outside the loop.
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
for ax, yvar in zip(axs.flat, df):
df[yvar].plot(ax=ax)
plt.show()
Alternatively, you can also directly plot the complete dataframe
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
df.plot(subplots=True, ax=axs)
plt.show()
I'm creating a lineplot from a dataframe with seaborn and I want to add a horizontal line to the plot. That works fine, but I am having trouble adding the horizontal line to the legend.
Here is a minimal, verifiable example:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x = np.array([2, 2, 4, 4])
y = np.array([5, 10, 10, 15])
isBool = np.array([True, False, True, False])
data = pd.DataFrame(np.column_stack((x, y, isBool)), columns=["x", "y", "someBoolean"])
print(data)
ax = sns.lineplot(x="x", y="y", hue="someBoolean", data=data)
plt.axhline(y=7, c='red', linestyle='dashed', label="horizontal")
plt.legend(("some name", "some other name", "horizontal"))
plt.show()
This results in the following plot:
The legends for "some name" and "some other name" show up correctly, but the "horizontal" legend is just blank. I tried simply using plt.legend() but then the legend consists of seemingly random values from the dataset.
Any ideas?
Simply using plt.legend() tells you what data is being plotting:
You are using someBoolean as the hue. So you are essentially creating two lines by applying a Boolean mask to your data. One line is for values that are False (shown as 0 on the legend above), the other for values that are True (shown as 1 on the legend above).
In order to get the legend you want you need to set the handles and the labels. You can get a list of them using ax.get_legend_handles_labels(). Then make sure to omit the first handle which, as shown above, has no artist:
ax = sns.lineplot(x="x", y="y", hue="someBoolean", data=data)
plt.axhline(y=7, c='red', linestyle='dashed', label="horizontal")
labels = ["some name", "some other name", "horizontal"]
handles, _ = ax.get_legend_handles_labels()
# Slice list to remove first handle
plt.legend(handles = handles[1:], labels = labels)
This gives:
How do I bring the other line to the front or show both the graphs together?
plot_yield_df.plot(figsize=(20,20))
If plot data overlaps, then one way to view both the data is increase the linewidth along with handling transparency, as shown:
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Original', linewidth=5, alpha=0.5)
plt.plot(np.arange(5), [5, 8, 6, 9, 4], label='Predicted')
plt.legend()
Subplotting is other good way.
Problem
The lines are plotted in the order their columns appear in the dataframe. So for example
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = np.random.rand(400)*0.9
b = np.random.rand(400)+1
a = np.c_[a,-a].flatten()
b = np.c_[b,-b].flatten()
df = pd.DataFrame({"A" : a, "B" : b})
df.plot()
plt.show()
Here the values of "B" hide those from "A".
Solution 1: Reverse column order
A solution is to reverse their order
df[df.columns[::-1]].plot()
That has also changed the order in the legend and the color coding.
Solution 2: Reverse z-order
So if that is not desired, you can instead play with the zorder.
ax = df.plot()
lines = ax.get_lines()
for line, j in zip(lines, list(range(len(lines)))[::-1]):
line.set_zorder(j)