I have a dataframe(first few rows):
I can plot it with matplotlib.pyplot:
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Price')
df1[['Close']].plot(ax=ax1)
To get:
What I would like to do is to add a marker to plot, down triangle, at the index 2018-09-10 04:00:00 which is indicated by the value -1 in the position column of the dataframe.
I tried to do this:
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Price')
df1[['Close']].plot(ax=ax1)
ax1.plot(
df1.loc[df1.positions == -1.0].index,
df1.Close[df1.positions == -1.0],
'v', markersize=5, color='k'
)
I get the plot like this:
So two things. One is that the index gets converted to something that shoots to year 2055, I don't understand why. Plus is there a way to add a marker at the specific position using just the first plot call? I tried to use markevery but with no success.
If you want to combine pandas plots and matplotlib datetime plots, the pandas plot needs to be plotted in compatibility mode
df1['Close'].plot(ax=ax1, x_compat=True)
That might give you the desired plot already.
If you don't want to use matplotlib, you can plot the filtered dataframe
df1['Close'].plot(ax=ax1)
df1['Close'][df1.positions == -1.0].plot(ax=ax1, marker="v", markersize=5, color='k')
Related
I have an issue when plotting a categorical grouped boxplot by seaborn in Python, especially using 'hue'.
My raw data is as shown in the figure below. And I wanted to plot values in column 8 after categorized by column 1 and 4.
I used seaborn and my code is shown below:
ax = sns.boxplot(x=output[:,1], y=output[:,8], hue=output[:,4])
ax.set_xticklabel(ax.get_xticklabels(), rotation=90)
plt.legend([],[])
However, the generated plot always contains large blank area, as shown in the upper figure below. I tried to add 'dodge=False' in sns.boxplot according to a post here (https://stackoverflow.com/questions/53641287/off-center-x-axis-in-seaborn), but it gives the lower figure below.
Actually, what I want Python to plot is a boxplot like what I generated using JMP below.
It seems that if one of the 2nd categories is empty, seaborn will still leave the space on the generated figure for each 1st category, thus causes the observed off-set/blank area.
So I wonder if there is any way to solve this issue, like using other package in python?
Seaborn reserves a spot for each individual hue value, even when some of these values are missing. When many hue values are missing, this leads to annoying open spots. (When there would be only one box per x-value, dodge=False would solve the problem.)
A workaround is to generate a separate subplot for each individual x-label.
Reproducible example for default boxplot with missing hue values
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230206)
df = pd.DataFrame({'label': np.repeat(['label1', 'label2', 'label3', 'label4'], 250),
'cat': np.repeat(np.random.choice([*'abcdefghijklmnopqrst'], 40), 25),
'value': np.random.randn(1000).cumsum()})
df['cat'] = pd.Categorical(df['cat'], [*'abcdefghijklmnopqrst'])
sns.set_style('white')
plt.figure(figsize=(15, 5))
ax = sns.boxplot(df, x='label', y='value', hue='cat', palette='turbo')
sns.move_legend(ax, loc='upper left', bbox_to_anchor=(1, 1), ncol=2)
sns.despine()
plt.tight_layout()
plt.show()
Individual subplots per x value
A FacetGrid is generated with a subplot ("facet") for each x value
The original hue will be used as x-value for each subplot. To avoid empty spots, the hue should be of string type. When the hue would be pd.Categorical, seaborn would still reserve a spot for each of the categories.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value')
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
plt.tight_layout()
plt.show()
Adding consistent coloring
A dictionary palette can color the boxes such that corresponding boxes in different subplots have the same color. hue= with the same column as the x= will do the coloring, and dodge=False will remove the empty spots.
df['cat'] = df['cat'].astype(str) # the column should be of string type, not pd.Categorical
cats = np.sort(df['cat'].unique())
palette_dict = {cat: color for cat, color in zip(cats, sns.color_palette('turbo', len(cats)))}
g = sns.FacetGrid(df, col='label', sharex=False)
g.map_dataframe(sns.boxplot, x='cat', y='value',
hue='cat', dodge=False, palette=palette_dict)
for label, ax in g.axes_dict.items():
ax.set_title('') # remove the title generated by sns.FacetGrid
ax.set_xlabel(label) # use the label from the dataframe as xlabel
# ax.tick_params(axis='x', labelrotation=90) # optionally rotate the tick labels
plt.tight_layout()
plt.show()
I would like to plot two dfs with two different colors. For each df, I would need to add two markers. Here is what I have tried:
for stats_file in stats_files:
data = Graph(stats_file)
Graph.compute(data)
data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
plt.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
plt.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()
Using this piece of code, I get the servers_df plotted with markers, but on separate graphs.
How I can have both graphs in a single one to compare them better?
Thanks.
TL;DR
Your call to data.servers_df.plot() always creates a new plot, and plt.plot() plots on the latest plot that was created. The solution is to create dedicated axis for everything to plot onto.
Preface
I assumed your variables are the following
data.servers_df: Dataframe with two float columns "time" and "percentage"
data.first_measurements: A dictionary with keys "time" and `"percentage", which each are a list of floats
data.second_measurements: A dictionary with keys "time" and "percentage", which each are a list of floats
I skipped generating stat_files as you did not show what Graph() does, but just created a list of dummy data.
If data.first_measurements and data.second_measurements are also dataframes, let me know and there is an even nicer solution.
Theory - Behind the curtains
Each matplotlib plot (line, bar, etc.) lives on a matplotlib.axes.Axes element. These are like regular axes of a coordinate system. Now two things happen here:
When you use plt.plot(), there are no axes specified and thus, matplotlib looks up the current axes element (in the background), and if there is none, it will create an empty one and use it, and set is as default. The second call to plt.plot() then finds these axes and uses them.
DataFrame.plot() on the other hand, always creates a new axes element if none is given to it (possible through the ax argument)
So in your code, data.servers_df.plot() first creates an axes element behind the curtains (which is then the default), and the two following plt.plot() calls get the default axes and plot onto it - which is why you get two plots instead of one.
Solution
The following solution first creates a dedicated matplotlib.axes.Axes using plt.subplots(). This axis element is then used to draw all lines onto. Note especially the ax=ax in data.server_df.plot(). Note that I changed the display of your markers from o- to o (as we don't want to display a line (-) but only markers (o)).
Mock data can be found below
fig, ax = plt.subplots() # Here we create the axes that all data will plot onto
for i, data in enumerate(stat_files):
y_column = f'percentage_{i}' # Make the columns identifiable
data.servers_df \
.rename(columns={'percentage': y_column}) \
.plot(x='time', y=y_column, linewidth=1, kind='line', ax=ax)
ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o', color='orange')
ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o', color='green')
plt.show()
Mock data
import random
import pandas as pd
import matplotlib.pyplot as plt
# Generation of dummy data
random.seed(1)
NUMBER_OF_DATA_FILES = 2
X_LENGTH = 10
class Data:
def __init__(self):
self.servers_df = pd.DataFrame(
{
'time': range(X_LENGTH),
'percentage': [random.randint(0, 10) for _ in range(X_LENGTH)]
}
)
self.first_measurement = {
'time': self.servers_df['time'].values[:X_LENGTH // 2],
'percentage': self.servers_df['percentage'].values[:X_LENGTH // 2]
}
self.second_measurement = {
'time': self.servers_df['time'].values[X_LENGTH // 2:],
'percentage': self.servers_df['percentage'].values[X_LENGTH // 2:]
}
stat_files = [Data() for _ in range(NUMBER_OF_DATA_FILES)]
DataFrame.plot() by default returns a matplotlib.axes.Axes object. You should then plot the other two plots on this object:
for stats_file in stats_files:
data = Graph(stats_file)
Graph.compute(data)
ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()
If you want to plot them one on top of the others with different colors you can do something like this:
colors = ['C0', 'C1', 'C2'] # matplotlib default color palette
# assuming that len(stats_files) = 3
# if not you need to specify as many colors as necessary
ax = plt.subplot(111)
for stats_file, c in zip(stats_files, colors):
data = Graph(stats_file)
Graph.compute(data)
data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color=c)
ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()
This just changes the color of the servers_df.plot. If you want to change the color of the other two you can just to the same logic: create a list of colors that you want them to take at each iteration, iterate over that list and pass the color value to the color param at each iteration.
You can create an Axes object for plotting in the first place, for example
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df_one = pd.DataFrame({'a':np.linspace(1,10,10),'b':np.linspace(1,10,10)})
df_two = pd.DataFrame({'a':np.random.randint(0,20,10),'b':np.random.randint(0,5,10)})
dfs = [df_one,df_two]
fig,ax = plt.subplots(figsize=(8,6))
colors = ['navy','darkviolet']
markers = ['x','o']
for ind,item in enumerate(dfs):
ax.plot(item['a'],item['b'],c=colors[ind],marker=markers[ind])
as you can see, in the same ax, the two dataframes are plotted with different colors and markers.
You need to create the plot before.
Afterwards, you can explicitly refer to this plot while plotting the graphs.
df.plot(..., ax=ax) or ax.plot(x, y)
import matplotlib.pyplot as plt
(fig, ax) = plt.subplots(figsize=(20,5))
for stats_file in stats_files:
data = Graph(stats_file)
Graph.compute(data)
data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()
I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks
This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.
In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()
I am using df.plot.area() and am very confused by the result. The dataframe has integers as index. The values to plot are in different columns. One column contains zeros from a specific integer onwards, however I can still see a thin line in the plot which isn't right.
After data processing this is the code I am using to actually plot:
# Start plotting
df.plot(kind='area', stacked=True, color=colors)
plt.legend(loc='best')
plt.xlabel('Year', fontsize=12)
plt.ylabel(mylabel, fontsize=12)
# Reverse Legend
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1])
plt.title(filename[:-4])
plt.tight_layout()
plt.autoscale(enable=True, axis='x', tight=True)
And this is a snapshot of the result, the orange thin line shouldn't be visiable because the value in the dataframe is zero.
Thanks for your support!
I can clear the text of the xlabel in a Pandas plot with:
plt.xlabel("")
Instead, is it possible to hide the label?
May be something like .xaxis.label.set_visible(False).
From the Pandas docs -
The plot method on Series and DataFrame is just a simple wrapper around plt.plot():
This means that anything you can do with matplolib, you can do with a Pandas DataFrame plot.
pyplot has an axis() method that lets you set axis properties. Calling plt.axis('off') before calling plt.show() will turn off both axes.
df.plot()
plt.axis('off')
plt.show()
plt.close()
To control a single axis, you need to set its properties via the plot's Axes. For the x axis - (pyplot.axes().get_xaxis().....)
df.plot()
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_visible(False)
plt.show()
plt.close()
Similarly to control an axis label, get the label and turn it off.
df.plot()
ax1 = plt.axes()
x_axis = ax1.axes.get_xaxis()
x_axis.set_label_text('foo')
x_label = x_axis.get_label()
##print isinstance(x_label, matplotlib.artist.Artist)
x_label.set_visible(False)
plt.show()
plt.close()
You can also get to the x axis like this
ax1 = plt.axes()
x_axis = ax1.xaxis
x_axis.set_label_text('foo')
x_axis.label.set_visible(False)
Or this
ax1 = plt.axes()
ax1.xaxis.set_label_text('foo')
ax1.xaxis.label.set_visible(False)
DataFrame.plot
returns a matplotlib.axes.Axes or numpy.ndarray of them
so you can get it/them when you call it.
axs = df.plot()
.set_visible() is an Artist method. The axes and their labels are Artists so they have Artist methods/attributes as well as their own. There are many ways to customize your plots. Sometimes you can find the feature you want browsing the Gallery and Examples
You can remove axis labels and ticks using xlabel= or ylabel= arguments in the plot() call. For example, to remove the xlabel, use xlabel='':
df.plot(xlabel='');
To remove the x-axis ticks, use xticks=[] (for y-axis ticks, use yticks=):
df.plot(xticks=[]);
To remove both:
df.plot(xticks=[], xlabel='');