I have some dataframes that I'd like to plot the information into the same area. The first data frame uses hue and plots some bars, and subsequently all plots in the same axis should map to those xticks (they might not be in the same order). See this example:
import seaborn as sns
import matplotlib.pyplot as plt
df1 = pd.DataFrame({ "col" : ["col_a", "col_a", "col_a", "col_c", "col_c", "col_b", "col_b"], "cluster": ["A", "B", "C", "A", "B", "A", "C"], "value_x":[2,4,1,5,6,2,1]})
df2 = pd.DataFrame({ "col" : ["col_a", "col_b", "col_c"], "value_y": [11,13,9]})
f, ax = plt.subplots(1, figsize=(15, 5))
# This will write the "master order" of the xticks
sns.barplot(x="col", y="value_x", hue="cluster", data=df1, ax=ax)
# Follow plots in the same plot should map to those xticks
ax = sns.lineplot(
data=df2,
x="col",
y="value_y",
ax=ax,
)
The second line will not map correctly to the xticks. I was thinking in getting all the labels from the initial plot using "get_xticklabels" and using that as the master to join all subsequent frames so that when I plot them the order matches, but I was hoping there might be a better solution.
Thank you!
What is happening is that sns.barplot is plotting the values of df1. First it finds "col_a" than "col_c" and finally "col_b". Then you plot the line, where it finds "col_a", "col_b" and "col_c".
All you need to do is to sort the df1 before plotting:
sns.barplot(x="col", y="value_x", hue="cluster", data=df1.sort_values(by=['col']), ax=ax)
Related
I know there are other entries similar to this, but nothing exactly like this.
Suppose I have this dataframe:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({"percentage": [0.3, 0.4, 0.5, 0.2],
"xaxis": ["set1", "set1", "set2", "set2"],
"hues": ["a", "b", "c", "d"],
"number": [1,2,3,4]
})
and I create a grouped barplot in Seaborn:
sns.set(style="whitegrid")
fig, ax = plt.subplots(figsize=(10,10))
ax = sns.barplot(data=df,
x="xaxis",
y="percentage",
hue="hues")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)
for container in ax.containers:
ax.bar_label(container)
This nicely adds labels from the "percentage" column.
BUT
How do I label the barplots using the entries from the "number" column? For clarification, I chose the numbers 1,2,3,4 as a toy example. They are not consecutive in my real data.
For reference, I am using Python 3.9.X, Seaborn 0.11.2, and Matplotlib 3.5.0.
I suspect the answer lies somewhere in the container but do not know.
I have also seen potential answers that use this code:
for index, row in df.iterrows():
ax.text(insert_codehere)
but that did not seem to work for me either.
Thanks in advance.
for container, number in zip(ax.containers, df.number):
ax.bar_label(container, labels=[number, number])
I am trying to plot multiple figures on a single pane using matplotlib.pyplot's subplot. Here is my current code.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"col1": [1,2], "col2": [3,4], "col3": [5,6], "col4": [7,8], "target": [9,10]})
f, axs = plt.subplots(nrows = 2, ncols = 2, sharey = True)
# for ax in axs.flat:
# ax.label_outer()
for k, col in enumerate(df.columns):
if col != "target":
idx = np.unravel_index(k, (2,2))
axs[idx].scatter(df[col], df.target)
axs[idx].set_xlabel(col)
As it stands, with the two lines commented out, this prints all the xticks but only the xlabels for the bottom two plots.
If I uncomment those two lines, then the all the xlabels appear, but the xticks on the top row disappear. I think this is because the space has been 'freed up' by the [label_outer][2] function
I don't see how I can have both on the top row. If one prints out all the xlabels, then they are indeed all there.
Any help would be most appreciated!
You just need to call plt.tight_layout() after your loop. Refer to the guide to know more about options and capabilities.
I want to create efficient code in which I can pass a set of dataframe columns to a for-loop or list comprehension and it will return a set of subplots of the same type (one for each variable) depending on the type of matplotlib or seaborn plot I want to use. I'm looking for an approach that is relatively agnostic to the type of graph.
I've only tried to create code using matplotlib. Below, I provide a simple dataframe and the latest code I tried.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
df.index.name='Year'
fig, axs = plt.subplots(ncols=3,figsize=(8,4))
for yvar in df:
ts = pd.Series(yvar, index = df.index)
ts.plot(kind = 'line',ax=axs[i])
plt.show()
I expect to see a subplot for each variable that is passed to the loop.
Is this what you are looking for
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A": [1, 2,8,3,4,3], "B": [0, 2,4,8,3,2], "C": [0, 0,7,8,2,1]},
index =[1995,1996,1997,1998,1999,2000] )
plt.figure(figsize=(10,10))
for i, col in enumerate(df.columns):
plt.subplot(1,3,i+1)
plt.plot(df.index, df[col], label=col)
plt.xticks(df.index)
plt.legend(loc='upper left')
plt.show()
Use plt.subplot(no_of_rows, no_of_cols, current_subplot_number) to set the current plotting to a subplot. Any plotting done will go the current_subplot_number.
Loop over both, the columns and the axes simultaneously. Show the plot outside the loop.
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
for ax, yvar in zip(axs.flat, df):
df[yvar].plot(ax=ax)
plt.show()
Alternatively, you can also directly plot the complete dataframe
fig, axs = plt.subplots(ncols=len(df.columns), figsize=(8,4))
df.plot(subplots=True, ax=axs)
plt.show()
I have a barplot that plots Rates by State and by Category (there are 5 categories) but the problem is that some States have more categories than other states.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.plot(kind='bar', color=my_colors, orientation='vertical')
plt.tight_layout()
plt.show()
This does a good job with most of what I need to do however, what happens is that because some States do not have all values for status and hence do not appear in the plot, it makes some of the color coding incorrect because the colors are just shifted to repeat every 5 colors rather then based on whenever a value is missing or not. What can I do about this?
Possibly you want to show the data in a grouped fashion, namely to have 3 categories per group, such that each category has its own color.
In this case it seems this can easily be achieved by unstacking the multi-index dataframe,
df2.unstack().plot(...)
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.unstack().plot(kind='bar', color=my_colors, orientation='vertical', ax=ax)
plt.tight_layout()
plt.show()
I'm trying to create an interactive plotly graph from pandas dataframes.
However, I can't get the legends displayed correctly.
Here is a working example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.plotly as py
# sign into the plotly api
py.sign_in("***********", "***********")
# create some random dataframes
dates = pd.date_range('1/1/2000', periods=8)
df1 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['A'])
df2 = pd.DataFrame(np.random.randn(8, 1), index=dates, columns=['B'])
df1.index.name = 'date'
df2.index.name = 'date'
Now I attempt to plot the dataframes using plotly.
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
py.iplot_mpl(fig, filename='random')
Notice there is no legend
Edit:
Based on suggestions below I have added an update dict. Although this does display the legend, it messes up the plot itself:
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
annotations=[dict(text=' ')], # rm erroneous 'A', 'B', ... annotations
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
Edit 2:
Removing the annotations entry from the layout dict results in the plot being displayed correctly, but the legend is not the y column name, but rather the x column name, the index name of the dataframe
fig, ax = plt.subplots(1,1)
df1.plot(y='A', ax=ax)
df2.plot(y='B', ax=ax)
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update, filename='random')
This results in the following plot:
Edit 3:
I have found a way to override the legend text but it seems a bit klunky. Given that I've specified the dataframe column I want to plot:
df1.plot(y='A', ax=ax)
I would have expected that y='A' would result in 'A' being used as the legend label.
It seems this is not the case, and while it is possible to override using the index label, as seen below, it just feels wrong.
Is there a better way to achieve this result?
update = dict(
layout=dict(
showlegend=True,
),
data=[
dict(name='A'),
dict(name='B'),
]
)
py.iplot_mpl(fig, update=update, filename='random')
Legends don't convert well from matplotlib to plotly.
Fortunately, adding a plotly legend to a matplotlib plot is straight forward:
update = dict(
layout=dict(
showlegend=True # show legend
)
)
py.iplot_mpl(fig, update=update)
See the full working ipython notebook here.
For more information, refer to the plotly user guide.