Non overlapping error bars in line plot - python

I am using Pandas and Matplotlib to create some plots. I want line plots with error bars on them. The code I am using currently looks like this
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5)
ax.set_xscale("log")
plt.show()
With this code, I get 6 lines on a single plot (which is what I want). However, the error bars completely overlap, making the plot difficult to read.
Is there a way I could slightly shift the position of each point on the x-axis so that the error bars no longer overlap?
Here is a screenshot:

One way to achieve what you want is to plot the error bars 'by hand', but it is neither straight forward nor much better looking than your original. Basically, what you do is make pandas produce the line plot and then iterate through the data frame columns and do a pyplot errorbar plot for each of them such, that the index is slightly shifted sideways (in your case, with the logarithmic scale on the x axis, this would be a shift by a factor). In the error bar plots, the marker size is set to zero:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
colors = ['red','blue','green','yellow','purple','black']
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(ax=ax, marker="o",color=colors)
index = df.index
rows = len(index)
columns = len(df.columns)
factor = 0.95
for column,color in zip(range(columns),colors):
y = df.values[:,column]
yerr = df_yerr.values[:,column]
ax.errorbar(
df.index*factor, y, yerr=yerr, markersize=0, capsize=5,color=color,
zorder = 10,
)
factor *= 1.02
ax.set_xscale("log")
plt.show()
As I said, the result is not pretty:
UPDATE
In my opinion a bar plot would be much more informative:
fig2,ax2 = plt.subplots()
df.plot(kind='bar',yerr=df_yerr, ax=ax2)
plt.show()

you can solve with alpha for examples
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5,alpha=0.5)
You can also check this link for reference

Related

How to label these points on the scatter plot

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points
Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:

Independent axis for each subplot in pandas boxplot

The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.

Bar Chart with Line Chart - Using non numeric index

I'd like to show on the same graph a bar chart of a dataframe, and a line chart that represents the sum.
I can do that for a frame for which the index is numeric or text. But it doesn't work for a datetime index.
Here is the code I use:
import datetime as dt
np.random.seed(1234)
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_nums = range(10)
index_text = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k']
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
a_nums = pd.DataFrame(columns=['a', 'b'], index=index_nums, data=data)
a_text = pd.DataFrame(columns=['a', 'b'], index=index_text, data=data)
a_date = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(3, 1)
ax = ax.ravel()
for i, a in enumerate([a_nums, a_text, a_date]):
a.plot.bar(stacked=True, ax=ax[i])
(a.sum(axis=1)).plot(c='k', ax=ax[i])
As you can see the last chart comes only as the line with the bar chart legend. And the dates are missing.
Also if I replace the last line with
ax[i].plot(a.sum(axis=1), c='k')
Then:
The chart with index_nums is the same
The chart with index_text raises an error
the chart with index_date shows the bar chart but not the line chart.
fgo I'm using pytho 3.6.2 pandas 0.20.3 and matplotlib 2.0.2
Plotting a bar plot and a line plot to the same axes may often be problematic, because a bar plot puts the bars at integer positions (0,1,2,...N-1) while a line plot uses the numeric data to determine the ordinates.
In the case from the question, using range(10) as index for both bar and line plot works fine, since those are exactly the numbers a bar plot would use anyways. Using text also works fine, since this needs to be replaced by numbers in order to show it and of course the first N integers are used for that.
The bar plot for a datetime index also uses the first N integers, while the line plot will plot on the dates. Hence depending on which one comes first, you only see the line or bar plot (you would actually see the other by changing the xlimits accordingly).
An easy solution is to plot the bar plot first and reset the index to a numeric one on the dataframe for the line plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(1234)
import datetime as dt
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
df = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(1, 1)
df.plot.bar(stacked=True, ax=ax)
df.sum(axis=1).reset_index().plot(ax=ax)
fig.autofmt_xdate()
plt.show()
Alternatively you can plot the lineplot as usual and use a matplotlib bar plot, which accepts numeric positions. See this answer: Python making combined bar and line plot with secondary y-axis

Remove anti-aliasing for pandas plot.area

I want to plot stacked areas with Python, and find out this Pandas' function:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area();
However, the result is weirdly antialiased, mixing together the colors, as shown on those 2 plots:
The same problem occurs in the example provided in the documentation.
Do you know how to remove this anti-aliasing? (Or another mean to get a neat output for stacked representation of line plots.)
Using a matplotlib stack plot works fine
fig, ax = plt.subplots()
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax.stackplot(df.index, df.values.T)
Since the area plot is a stackplot, the only difference would be the linewidth of the areas, which you can set to zero.
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area(linewidth=0)
The remaining grayish lines are then indeed due to antialiasing. You may turn that off in the matplotlib plot
fig, ax = plt.subplots()
ax.stackplot(df.index, df.values.T, antialiased=False)
The result however, may not be visually appealing:
It looks like there are two boundaries.
Try a zero line width:
df.plot.area(lw=0);

Settings for timedata in seaborn FacetGrid plots

I want to plot data monthly and show year label once per each year.
Here is the data:
timedates = ['2013-01-01', '2013-02-01', '2013-03-01', '2013-04-01', '2013-05-01', '2013-06-01', '2013-07-01',
'2013-08-01', '2013-09-01', '2013-10-01', '2013-11-01', '2013-12-01', '2014-01-01', '2014-02-01',
'2014-03-01', '2014-04-01', '2014-05-01', '2014-06-01', '2014-07-01', '2014-08-01', '2014-09-01',
'2014-10-01', '2014-11-01', '2014-12-01']
timedates = pd.to_datetime(timedates)
amount = [38870, 42501, 44855, 44504, 41194, 42087, 43687, 42347, 45098, 43783, 47275, 49767,
39502, 35951, 47059, 47639, 44236, 40826, 46087, 41462, 38384, 41452, 36811, 37943]
types = ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C',
'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
df_x = pd.DataFrame({'timedates': timedates, 'amount': amount, 'types': types})
I found out how to do that with matplotlib
plt.style.use('ggplot')
fig, ax = plt.subplots()
ax.plot_date(df_x.timedates, df_x.amount, 'v-')
ax.xaxis.set_minor_locator(md.MonthLocator())
ax.xaxis.set_minor_formatter(md.DateFormatter('%m'))
ax.xaxis.grid(True, which="minor")
ax.yaxis.grid()
ax.xaxis.set_major_locator(md.YearLocator())
ax.xaxis.set_major_formatter(md.DateFormatter('\n\n%Y'))
plt.show()
Now I move to seaborn to take into account different types of data. Is it possible to have the same style of ticks using seaborn FacetGrid?
g = sns.FacetGrid(df_x, hue='types', size=8, aspect=1.5)
g.map(sns.pointplot, 'timedates', 'amount')
plt.show()
When I try to apply ticks formatting - they just disappear.
You could format the xticks to just include the month and year of the datetime object and get a pointplot with xticks corresponding to the position of scatter plot points.
df['timedates'] = df['timedates'].map(lambda x: x.strftime('%Y-%m'))
def plot(x, y, data=None, label=None, **kwargs):
sns.pointplot(x, y, data=data, label=label, **kwargs)
g = sns.FacetGrid(df, hue='types', size=8, aspect=1.5)
g.map_dataframe(plot, 'timedates', 'amount')
plt.show()
By far, I did it manually. Just separated lines by type and plotted them together.
Changed this line
ax.plot_date(df_x.timedates, df_x.amount, 'v-')
Into three plot-lines:
types_levels = df_x.types.unique()
for i in types_levels:
ax.plot_date(df_x[df_x.types==i].timedates, df_x[df_x.types==i].amount, 'v-')
plt.legend(types_levels)
Though it's not an answer, I can't use other advantages of seaborn FacetGrid.
You can just use the same code you used for matplotlib!
for ax in g.axes.flat:
# Paste in your own code!

Categories

Resources