I'm plotting a CSV file from my simulation results. The plot has three graphs in the same figure fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24, 6)).
However, for comparison purposes I want the y-axis in all graphs starting at zero and the ending at a specific value. I tried the solution mentioned here from the Seaborn author. I don't get any errors, but the solution also does not work for me.
Here's my script:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fname = 'results/filename.csv'
def plot_file():
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(24, 6))
df = pd.read_csv(fname, sep='\t')
profits = \
df.groupby(['providerId', 'periods'], as_index=False)['profits'].sum()
# y-axis needs to start at zero and end at 10
g = sns.lineplot(x='periods',
y='profits',
data=profits,
hue='providerId',
legend='full',
ax=axes[0])
# y-axis need to start at zero and end at one
g = sns.scatterplot(x='periods',
y='price',
hue='providerId',
style='providerId',
data=df,
legend=False,
ax=axes[1])
# y-axis need to start at zero and end at one
g = sns.scatterplot(x='periods',
y='quality',
hue='providerId',
style='providerId',
data=df,
legend=False,
ax=axes[2])
g.set(ylim=(0, None))
plt.show()
print(g) # -> AxesSubplot(0.672059,0.11;0.227941x0.77)
The resulting figure is as follows:
How can I adjust each individual plot?
Based on the way you've written your code, you can refer to each subplot axis with g.axis and use g.axis.set_ylim(low,high). (A difference compared to the linked answer is that your graphs are not being plotted on a seaborn FacetGrid.)
An example using dummy data and different axis ranges to illustrate:
df = pd.DataFrame(np.random.uniform(0,10,(100,2)), columns=['a','b'])
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(8,4))
g = sns.lineplot(x='a',
y='b',
data=df.sample(10),
ax=axes[0])
g.axes.set_ylim(0,25)
g = sns.scatterplot(x='a',
y='b',
data=df.sample(10),
ax=axes[1])
g.axes.set_ylim(0,3.5)
g = sns.scatterplot(x='a',
y='b',
data=df.sample(10),
ax=axes[2])
g.axes.set_ylim(0,0.3)
plt.tight_layout()
plt.show()
Related
Let's use the classic example of weekly precipitation:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from random import randint
data = {
'Week': [i for i in range(1,9)],
'Weekly Precipitation': [randint(1,10) for i in range(1,9)]
}
df = pd.DataFrame(data)
Let's also add a column with the cumulative precipitation:
df['Cumulative'] = df['Weekly Precipitation'].expanding(min_periods=2).sum()
Now, let's say I want a single chart with a barplot for the weekly precipitation, and a lineplot with the cumulative precipitation. So I do this:
fig, ax1 = plt.subplots(figsize=(10,5))
sns.barplot(x='Week', y='Weekly Precipitation', data=df, ax=ax1)
ax2 = ax1.twinx()
sns.lineplot(x='Week', y='Cumulative', data=df, ax=ax2)
Which yields this plot:
And you can see the problem: while both series are commensurate, both x axes use a different scale, which distorts the visualization, as the line should always be higher than the bars.
So, instead of twin axes, I'm trying to put both plot on the same axis:
fig, ax1 = plt.subplots(figsize=(10,5))
ax1.set_facecolor('white')
sns.barplot(x='Week', y='Weekly Precipitation', data=df, ax=ax1)
sns.lineplot(x='Week', y='Cumulative', data=df, ax=ax1)
ax1.set_ylabel('Precipitation')
Now, of course, the scale is right (although I have to do with a single y label), but... the second plot is shifted to the right by one tick!
How does that even make sense?!
This one used to work fine, but somehow it stopped working (I must have changed something mistakenly but I can't find the issue).
I'm plotting a set of 3 bars per date, plus a line that shows the accumulated value of one of them. But only one or another (either the bars or the line) is properly being plotted. If I left the code for the bars last, only the bars are plotted. If I left the code for the line last, only the line is plotted.
fig, ax = plt.subplots(figsize = (15,8))
df.groupby("date")["result"].sum().cumsum().plot(
ax=ax,
marker='D',
lw=2,
color="purple")
df.groupby("date")[selected_columns].sum().plot(
ax=ax,
kind="bar",
color=["blue", "red", "gold"])
ax.legend(["LINE", "X", "Y", "Z"])
Appreciate the help!
Pandas draws bar plots with the x-axis as categorical, so internally numbered 0, 1, 2, ... and then setting the label. The line plot uses dates as x-axis. To combine them, both need to be categorical. The easiest way is to drop the index from the line plot. Make sure that the line plot is draw first, enabling the labels to be set correctly by the bar plot.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('20210101', periods=10),
'earnings': np.random.randint(100, 600, 10),
'costs': np.random.randint(0, 200, 10)})
df['result'] = df['earnings'] - df['costs']
fig, ax = plt.subplots(figsize=(15, 8))
df.groupby("date")["result"].sum().cumsum().reset_index(drop=True).plot(
ax=ax,
marker='D',
lw=2,
color="purple")
df.groupby("date")[['earnings', 'costs', 'result']].sum().plot(
ax=ax,
kind="bar",
rot=0,
width=0.8,
color=["blue", "red", "gold"])
ax.legend(['Cumul.result', 'earnings', 'costs', 'result'])
# shorten the tick labels to only the date
ax.set_xticklabels([tick.get_text()[:10] for tick in ax.get_xticklabels()])
ax.set_ylim(ymin=0) # bar plots are nicer when bars start at zero
plt.tight_layout()
plt.show()
Here I post the solution:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
a=[11.3,222,22, 63.8,9]
b=[0.12,-1.0,1.82,16.67,6.67]
l=[i for i in range(5)]
plt.rcParams['font.sans-serif']=['SimHei']
fmt='%.1f%%'
yticks = mtick.FormatStrFormatter(fmt)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot(l, b,'og-',label=u'A')
ax1.yaxis.set_major_formatter(yticks)
for i,(_x,_y) in enumerate(zip(l,b)):
plt.text(_x,_y,b[i],color='black',fontsize=8,)
ax1.legend(loc=1)
ax1.set_ylim([-20, 30])
ax1.set_ylabel('ylabel')
plt.legend(prop={'family':'SimHei','size':8})
ax2 = ax1.twinx()
plt.bar(l,a,alpha=0.1,color='blue',label=u'label')
ax2.legend(loc=2)
plt.legend(prop={'family':'SimHei','size':8},loc="upper left")
plt.show()
The key to this is the command
ax2 = ax1.twinx()
This is a follow up for a question which i asked here:
The code is as follows:
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import matplotlib.dates as md
fig, (ax1, ax2) = plt.subplots(2, 1)
df = web.DataReader('F', 'yahoo')
df2 = web.DataReader('Fb', 'yahoo')
ax = df.plot(figsize=(35,15), ax=ax1)
df2.plot(y = 'Close', figsize=(35,15), ax=ax2)
plt.xticks(fontsize = 25)
for ax in (ax1, ax2):
ax.xaxis.set_major_locator(md.MonthLocator(bymonth = range(1, 13, 6)))
ax.xaxis.set_major_formatter(md.DateFormatter('%b\n%Y'))
ax.xaxis.set_minor_locator(md.MonthLocator())
plt.setp(ax.xaxis.get_majorticklabels(), rotation = 0 )
plt.show()
This produces this plot:
How can i increase the size of both the xticks in the two subplots as you can see the size was increased for the bottom one only.
[1]: https://stackoverflow.com/questions/62358966/adding-minor-ticks-to-pandas-plot
You can use the tick_params function on the ax instance to control the size of the tick-labels on the x-axis. If you want to control the size of both x and y axis, use axis='both'. You can additionally specify which='major' or which='minor' or which='both' depending on if you want to change major, minor or both tick labels.
for ax in (ax1, ax2):
# Rest of the code
ax.tick_params(axis='x', which='both', labelsize=25)
I have a log which describes my home ADSL speeds.
Log entries are in the following format, where the fields are datetime;level;downspeed;upspeed;testhost:
2020-01-06 18:09:45;INFO;211.5;29.1;0;host:spd-pub-rm-01-01.fastwebnet.it
2020-01-06 18:14:39;WARNING;209.9;28.1;0;host:spd-pub-rm-01-01.fastwebnet.it
2020-01-08 10:51:27;INFO;211.6;29.4;0;host:spd-pub-rm-01-01.fastwebnet.it
(for a full sample file -> https://www.dropbox.com/s/tfmj9ozxe5millx/test.log?dl=0 for you to download for the code below)
I wish to plot a matplot figure with the download speeds on the left axis, the upload speeds (which are on a smaller and lower range of values) and have the shortened datetimes under the x tick marks possibly at 45 degrees angle.
"""Plots the adsl-log generated log."""
import matplotlib.pyplot as plt
# import matplotlib.dates as mdates
import pandas as pd
# set field delimiter and set column names which will also cause reading from row 1
data = pd.read_csv("test.log", sep=';', names=[
'datetime', 'severity', 'down', 'up', 'loss', 'server'])
# we need to filter out ERROR records (with 0 speeds)
indexNames = data[data['severity'] == 'ERROR'].index
data.drop(indexNames, inplace=True)
# convert datetime pandas objecti to datetime64
data['datetime'] = pd.to_datetime(data['datetime'])
# use a dataframe with just the data I need; cleaner
speeds_df = data[['datetime', 'down', 'up']]
speeds_df.info() # this shows datetime column is really a datetime64 value now
# now let's plot
fig, ax = plt.subplots()
y1 = speeds_df.plot(ax=ax, x='datetime', y='down', grid=True, label="DL", legend=True, linewidth=2,ylim=(100,225))
y2 = speeds_df.plot(ax=ax, x='datetime', y='up', secondary_y=True, label="UL", legend=True, linewidth=2, ylim=(100,225))
plt.show()
I am now obtaining the plot I need but would appreciate some clarification about the roles of the ax, y1 and y2 axes in the above code.
First, assigning y1 and y2 objects is unnecessary as you will never use them later on. Also, legend=True is the default.
Per matplotlib.pyplot.subplots docs, the return of ax is:
ax : axes.Axes object or array of Axes objects
Per pandas.DataFrame.plot, the ax argument:
ax : matplotlib axes object, default None
Therefore, you are first initializing an array of axes objects (defaulting to one item, nrow=1 and nrow=2), and then assigning it/them according to the pandas plots. Now, normally, you would be overwriting the assignment of ax with ax=ax, but since you employ a secondary y-axis, plots overlay with each other:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(figsize=(8,4))
# ASSIGN AXES OBJECTS ACCORDINGLY
speeds_df.plot(ax=axs, x='datetime', y='down', grid=True, label="DL", linewidth=2, ylim=(100,225))
speeds_df.plot(ax=axs, x='datetime', y='up', secondary_y=True, label="UL", linewidth=2, ylim=(100,225))
plt.show()
To illustrate how axes objects can be extended, see below with multiple (non-overlaid) plots.
Example of multiple subplots using nrows=2:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(nrows=2, figsize=(8,4))
# ASSIGN AXES OBJECTS WITH INDEXING AND NO Y LIMITS
speeds_df.plot(ax=axs[0], x='datetime', y='down', grid=True, label="DL", linewidth=2)
plt.subplots_adjust(hspace = 1)
speeds_df.plot(ax=axs[1], x='datetime', y='up', label="UL", linewidth=2)
plt.show()
Example of multiple plots using ncols=2:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(ncols=2, figsize=(12,4))
# ASSIGN AXES OBJECTS WITH INDEXING AND NO Y LIMITS
speeds_df.plot(ax=axs[0], x='datetime', y='down', grid=True, label="DL", linewidth=2)
speeds_df.plot(ax=axs[1], x='datetime', y='up', label="UL", linewidth=2)
plt.show()
You can even use subplots=True after setting date/time field as index:
# INITIALIZE FIG DIMENSION AND AXES OBJECTS
fig, axs = plt.subplots(figsize=(8,4))
# ASSIGN AXES OBJECT PLOTTING ALL COLUMNS
speeds_df.set_index('datetime').plot(ax=axs, subplots=True, grid=True, label="DL", linewidth=2)
plt.show()
So thanks to #Parfait I hope I understood things correctly. Here the working code:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
###### Prepare the data to plot
# set field delimiter and set column names which will also cause reading from row 1
data = pd.read_csv('test.log', sep=';', names=[
'datetime', 'severity', 'down', 'up', 'loss', 'server'])
# we need to filter out ERROR records (with 0 speeds)
indexNames = data[data['severity'] == 'ERROR'].index
data.drop(indexNames, inplace=True)
# convert datetime pandas object to datetime64
data['datetime'] = pd.to_datetime(data['datetime'])
# use a dataframe with just the data I need; cleaner
speeds_df = data[['datetime', 'down', 'up']]
# now plot the graph
fig, ax = plt.subplots()
color = 'tab:green'
ax.set_xlabel('thislabeldoesnotworkbutcolordoes', color=color)
ax.tick_params(axis='x', labelcolor=color)
color = 'tab:red'
speeds_df.plot(ax=ax, x='datetime', y='down', label="DL", legend=True, linewidth=2, color=color)
ax.set_ylabel('DL', color=color)
ax.tick_params(axis='y', labelcolor=color)
color = 'tab:blue'
ax2 = speeds_df.plot(ax=ax, x='datetime', y='up', secondary_y=True, label="UL", legend=True, linewidth=2, color=color)
ax2.set_ylabel('UL', color=color)
ax2.tick_params(axis='y', labelcolor=color)
# using ylim in the plot command params does not work the same
# cannot show a grid since the two scales are different
ax.set_ylim(10, 225)
ax2.set_ylim(15, 50)
plt.show()
Which gives:
What I still don't get is:
a) why the x-axis label only seems to honour the color but not the string value :(
b) why the ylim=(n,m) parameters in the df plot does not work well and I have to use the ax.set_ylim constructs instead
I create a plot with two axes on different subplots. Currently one overlays another. The problem is to make legend to contain both labels in stack. How can I do this?
d = data.groupby('atemp_rounded').sum().reset_index()
fig = plt.figure()
ax1 = fig.add_subplot(111) # don't know what 111 stands for...
ax2 = ax1.twinx()
d.plot(ax=ax1, y='casual')
d.plot(ax=ax2, y='registered', color='g')
plt.show()
You may set the legend of the individual plots off and instead create a figure legend. To have this placed within the axes boundaries the position needs to be specified in axes coordinates.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({"A" : [3,2,1], "B" : [2,2,1]})
fig = plt.figure()
ax1 = fig.add_subplot(111) # don't know what 111 stands for...
ax2 = ax1.twinx()
df.plot(ax=ax1, y='A', legend=False)
df.plot(ax=ax2, y='B', color='g', legend=False)
fig.legend(loc="upper right", bbox_to_anchor=(0,0,1,1), bbox_transform=ax1.transAxes)
plt.show()