Remove anti-aliasing for pandas plot.area - python

I want to plot stacked areas with Python, and find out this Pandas' function:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area();
However, the result is weirdly antialiased, mixing together the colors, as shown on those 2 plots:
The same problem occurs in the example provided in the documentation.
Do you know how to remove this anti-aliasing? (Or another mean to get a neat output for stacked representation of line plots.)

Using a matplotlib stack plot works fine
fig, ax = plt.subplots()
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
ax.stackplot(df.index, df.values.T)
Since the area plot is a stackplot, the only difference would be the linewidth of the areas, which you can set to zero.
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area(linewidth=0)
The remaining grayish lines are then indeed due to antialiasing. You may turn that off in the matplotlib plot
fig, ax = plt.subplots()
ax.stackplot(df.index, df.values.T, antialiased=False)
The result however, may not be visually appealing:

It looks like there are two boundaries.
Try a zero line width:
df.plot.area(lw=0);

Related

Hide non observed categories in a seaborn boxplot

I am currently working on a data analysis, and want to show some data distributions through seaborn boxplots.
I have a categorical data, 'seg1' which can in my dataset take 3 values ('Z1', 'Z3', 'Z4'). However, data in group 'Z4' is too exotic to be reported for me, and I would like to produce boxplots showing only categories 'Z1' and 'Z3'.
Filtering the data source of the plot did not work, as category 'Z4' is still showed with no data point.
Is there any other solution than having to create a new CategoricalDtype with only ('Z1', 'Z3') and cast/project my data back on this new category?
I would simply like to hide 'Z4' category.
I am using seaborn 0.10.1 and matplotlib 3.3.1.
Thanks in advance for your answers.
My tries are below, and some data to reproduce.
Dummy data
dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
sns.boxplot(data=df, x='col1', y='col2')
Apply no filter
fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
sns.boxplot(data=orders, y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)
Which produces:
Filter data source
mask_filter = orders.seg1.isin(['Z1', 'Z3'])
fig, axs = plt.subplots(figsize=(8, 25), nrows=len(indicators2), squeeze=False)
for j, indicator in enumerate(indicators2):
sns.boxplot(data=orders.loc[mask_filter], y=indicator, x='seg1', hue='origin2', ax=axs[j, 0], showfliers=False)
Which produces:
To cut off the last (or first) x-value, set_xlim() can be used, e.g. ax.set_xlim(-0.5, 1.5).
Another option is to work with seaborn's order= parameter and only add the desired values in that list. Optionally that can be created programmatically:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
dummy_cat = pd.CategoricalDtype(['a', 'b', 'c'])
df = pd.DataFrame({'col1': ['a', 'b', 'a', 'b'], 'col2': [12., 5., 3., 2]})
df.col1 = df.col1.astype(dummy_cat)
order = [cat for cat in dummy_cat.categories if df['col1'].str.contains(cat).any()]
sns.boxplot(data=df, x='col1', y='col2', order=order)
plt.show()

Modify plot axis so the order of its tick labels and their respective points change accordingly - without modifying the data itself

I want to reorder x-axis tick labels such that the data also changes appropriately.
Example
y = [5,8,9,10]
x = ['a', 'b', 'c', 'd']
plt.plot(y, x)
What I want the plot to look like by modifying the location of axis ticks.
Please note that I don't want to achieve this by modifying the order of my data
My Try
# attempt 1
fig, ax =plt.subplots()
plt.plot(y,x)
ax.set_xticklabels(['b', 'c', 'a', 'd'])
# this just overwrites the labels, not what we intended
# attempt2
fig, ax =plt.subplots()
plt.plot(y,x)
locs, labels = plt.xticks()
plt.xticks((1,2,0,3)); # This is essentially showing the location
# of the labels to dsiplay irrespective of the order of the tuple.
Edit:
Based on comments here are some further clarifications.
Let's say the first point (a,5) in fig 1. If I changed my x-axis definition such that a is now defined at the third position, then it gets reflected in the plot as well, which means, 5 on y-axis moves with a as shown in fig-2. One way to achieve this would be to re-order the data. However, I would like to see if it is possible to achieve it somehow by changing axis locations. To summarize, the data should be plotted based on how we define our custom axis without re-ordering the original data.
Edit2:
Based on the discussion in the comments it's not possible to do it by just modifying axis labels. Any approach would involve modifying the data. This was an oversimplification of the original problem I was facing. Finally, using dictionary-based labels in a pandas data frame helped me to sort the axis values in a specific order while also making sure that their respective values change accordingly.
Toggling between two different orders of the x axis categories could look as follows,
import numpy as np
import matplotlib.pyplot as plt
x = ['a', 'b', 'c', 'd']
y = [5,8,9,10]
order1 = ['a', 'b', 'c', 'd']
order2 = ['b', 'c', 'a', 'd']
fig, ax = plt.subplots()
line, = ax.plot(x, y, marker="o")
def toggle(order):
_, ind1 = np.unique(x, return_index=True)
_, inv2 = np.unique(order, return_inverse=True)
y_new = np.array(y)[ind1][inv2]
line.set_ydata(y_new)
line.axes.set_xticks(range(len(order)))
line.axes.set_xticklabels(order)
fig.canvas.draw_idle()
curr = [0]
orders = [order1, order2]
def onclick(evt):
curr[0] = (curr[0] + 1) % 2
toggle(orders[curr[0]])
fig.canvas.mpl_connect("button_press_event", onclick)
plt.show()
Click anywhere on the plot to toggle between order1 and order2.

Independent axis for each subplot in pandas boxplot

The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...
You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)
You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.
If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.

Non overlapping error bars in line plot

I am using Pandas and Matplotlib to create some plots. I want line plots with error bars on them. The code I am using currently looks like this
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5)
ax.set_xscale("log")
plt.show()
With this code, I get 6 lines on a single plot (which is what I want). However, the error bars completely overlap, making the plot difficult to read.
Is there a way I could slightly shift the position of each point on the x-axis so that the error bars no longer overlap?
Here is a screenshot:
One way to achieve what you want is to plot the error bars 'by hand', but it is neither straight forward nor much better looking than your original. Basically, what you do is make pandas produce the line plot and then iterate through the data frame columns and do a pyplot errorbar plot for each of them such, that the index is slightly shifted sideways (in your case, with the logarithmic scale on the x axis, this would be a shift by a factor). In the error bar plots, the marker size is set to zero:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
colors = ['red','blue','green','yellow','purple','black']
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(ax=ax, marker="o",color=colors)
index = df.index
rows = len(index)
columns = len(df.columns)
factor = 0.95
for column,color in zip(range(columns),colors):
y = df.values[:,column]
yerr = df_yerr.values[:,column]
ax.errorbar(
df.index*factor, y, yerr=yerr, markersize=0, capsize=5,color=color,
zorder = 10,
)
factor *= 1.02
ax.set_xscale("log")
plt.show()
As I said, the result is not pretty:
UPDATE
In my opinion a bar plot would be much more informative:
fig2,ax2 = plt.subplots()
df.plot(kind='bar',yerr=df_yerr, ax=ax2)
plt.show()
you can solve with alpha for examples
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5,alpha=0.5)
You can also check this link for reference

Bar Chart with Line Chart - Using non numeric index

I'd like to show on the same graph a bar chart of a dataframe, and a line chart that represents the sum.
I can do that for a frame for which the index is numeric or text. But it doesn't work for a datetime index.
Here is the code I use:
import datetime as dt
np.random.seed(1234)
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_nums = range(10)
index_text = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k']
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
a_nums = pd.DataFrame(columns=['a', 'b'], index=index_nums, data=data)
a_text = pd.DataFrame(columns=['a', 'b'], index=index_text, data=data)
a_date = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(3, 1)
ax = ax.ravel()
for i, a in enumerate([a_nums, a_text, a_date]):
a.plot.bar(stacked=True, ax=ax[i])
(a.sum(axis=1)).plot(c='k', ax=ax[i])
As you can see the last chart comes only as the line with the bar chart legend. And the dates are missing.
Also if I replace the last line with
ax[i].plot(a.sum(axis=1), c='k')
Then:
The chart with index_nums is the same
The chart with index_text raises an error
the chart with index_date shows the bar chart but not the line chart.
fgo I'm using pytho 3.6.2 pandas 0.20.3 and matplotlib 2.0.2
Plotting a bar plot and a line plot to the same axes may often be problematic, because a bar plot puts the bars at integer positions (0,1,2,...N-1) while a line plot uses the numeric data to determine the ordinates.
In the case from the question, using range(10) as index for both bar and line plot works fine, since those are exactly the numbers a bar plot would use anyways. Using text also works fine, since this needs to be replaced by numbers in order to show it and of course the first N integers are used for that.
The bar plot for a datetime index also uses the first N integers, while the line plot will plot on the dates. Hence depending on which one comes first, you only see the line or bar plot (you would actually see the other by changing the xlimits accordingly).
An easy solution is to plot the bar plot first and reset the index to a numeric one on the dataframe for the line plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(1234)
import datetime as dt
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
df = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)
fig, ax = plt.subplots(1, 1)
df.plot.bar(stacked=True, ax=ax)
df.sum(axis=1).reset_index().plot(ax=ax)
fig.autofmt_xdate()
plt.show()
Alternatively you can plot the lineplot as usual and use a matplotlib bar plot, which accepts numeric positions. See this answer: Python making combined bar and line plot with secondary y-axis

Categories

Resources