How to prevent from plotting outlier in boxplot in pandas

How to prevent from plotting outlier in boxplot in pandas - python

I have a DataFrame(called result_df) and want to plot one column with boxplot.
But certain outliers spoiled the visualization. How could I prevent from ploting outliers?
Code I used:
fig, ax = pl.subplots()
fig.set_size_inches(18.5,10.5)
result_df.boxplot(ax=ax)
pl.show()

Important: I haven't paid enough attention, apparently that happens a lot, and I missed that it's pandas specific. However from questions I saw it's basically matplotlib for graphing in the background so this could still work. Sorry I failed to be more careful.
Luckily for you there is such a thing. In the manual under results: dict title torwards the bottom of the page it states:
fliers: points representing data that extend beyond the whiskers
(outliers).
Setting showfliers=False will hopefully help you.
I do have to mention though, that I find it really really strange they shortened outliers to fliers. If that doesn't help manual offers a second solution:
sym : str or None, default = None
The default symbol for flier points. Enter an empty string (‘’) if you don’t want to show fliers. If None, then the fliers default to
‘b+’ If you want more control use the flierprops kwarg.

Related

Is there a way to set the vlines below/under the candlesticks on plot? (mplfinance)

I am currently using vlines to shadow in afterhours/premarket trading. However, the candlesticks are under the vlines and as such they become harder to read if I increase the alpha kwarg. Is there a way to put the vlines under the candlesticks as opposed to above them?
Using the mplfinance lib.
PS: the vlines are also ever so slighly off the top of the plot, is there a way for them to go all the way up to the top edge?
fig, axlist = mpf.plot(df,
type='candle',
volume=True,
ylabel='',
ylabel_lower='\n<thousands>',
returnfig=True,
style=style,
figratio=(21, 9),
warn_too_much_data=970,
addplot=vwap,
hlines=dict(hlines=previousCloseLine(), colors='white', linestyle='dashed'),
vlines=dict(vlines=df.between_time('16:00', '09:30').index.tolist(), colors='#323538', linewidths=1, alpha=0.5),
datetime_format='%H:%M',
tight_layout=True
)

Regarding your first question (vlines below candlesticks), sounds like you are asking about zorder. If so, mplfinance presently does not give you direct access to zorder. There is an enhancement request for it that unfortunely got pushed onto the back-burner in favor of some other enhancements.
There may be a work-around using returnfig=True but I'm not sure yet if that can work (would have to run some experiments to see if zorder can be modified this way).
In the meantime, given what you are trying to do (shade an area of the plot) you might try mplfinance's fill_between feature. Don't know if that will have the same zorder issue or not, but it is worth a try.
Regarding your second question (about vlines not quite reaching the top of the plot), not sure what's causing that, or whether it is inherent to mplfinance and/or to matplotlib, possibly only in response to certain data.
One possible workaround, in the meantime, may be to set the ylim=(ymin,ymax) kwarg to set ymin and ymax manually (instead of letting mplfinance determine them automatically). It may be worth adding an issue to the mplfinance page, including all the data and code necessary to reproduce the issue. It doesn't seem to me that vlines should stop short of the top like that.

Seaborn's pairplot seems to have scalling issue on diagonal plots

Here is the code I tested
import seaborn as sns
tips = sns.load_dataset('tips')
sns.pairplot(tips)
By default, the diagonal plots are all histograms, and everything seems right (see the picgure below).
However, when I change the setting of pairplot function to something like below, the scale of the vertical axis of the histograms shrinks while the shape and number of bins are still the same (see the picture below). Does anyone know what happened here? I checked the documentation of the pairplot (https://seaborn.pydata.org/generated/seaborn.pairplot.html), by default, the diag_kind is set to 'auto'. When the kind parameter is equal to scatter (the default setting too), even though diag_kind equals to auto, it will be reset to hist behind the scene (https://github.com/mwaskom/seaborn/blob/master/seaborn/axisgrid.py#L1822). So technically the two scripts I presented here should produce the same histograms. Totally lost here ...
tips = sns.load_dataset('tips', diag_kind='hist')

The reason you're seeing this behaviour is that the diagonal plots will only share the Y with the rest of the row if diag_kind == 'hist'. When diag_kind == 'auto', the diag_sharey parameter to PairGrid is set to False.
I see you're already opened an issue about it on Seaborn's github. I guess a clarification of this behaviour (principle of least astonishment, etc.) in the doc string for diag_kind would be helpful.

Limiting Number of ticks in Matplotlib

Apologies for the really long set of questions.
I am trying to plot a graph in matplotlib. I was faced with this issue of limiting the number of ticks on both of the axes. Looking into pyplot I could not find any solution.
The only solution I came across was by creating a subplot in the following manner.
ax = plt.subplot(111)
ax.xaxis.set_major_locator(plt.MaxNLocator(4))
Although the above works, I am left with a few unsolved questions most of them in relation to how the matplotlib library is structured.
Is there no feature whereby an object of pyplot.plot() can have
the number of ticks limited. Do I have to always depend on
subplotting?
When i create an object ax = plt.subplot(111) I find that it
creates an instance as below
type(ax)
Out[228]: matplotlib.axes._subplots.AxesSubplot
Why does the documentation say that the subplot method returns a class ---> axes.SubplotBase
Also I see that we need to use the xaxis attribute of ax(is it a method) which helps set the property related to the ticks.
type(ax.xaxis)
Out[233]: matplotlib.axis.XAxis
When ax is an object of some subclass of matplotlib.axes (not sure if it is SubplotBase or AxesSubplot) how come we can refer to ax.xaxis. The xaxis (or axis.Xaxis) attribute is not mentioned under to documentation of the matplotlib.axes.
I am pretty confused over the hierarchy and structure of matplotlib. It would be be helpful if someone can point me to an article or blog which details the structure of these features.
Looking through the documentation I could not figure out a suitable attribute of the subplot class which could help solve this problem related to number of ticks. I am not sure how I am going to solve the next problem if I cant go through the documentation and figure it out.
Thanks,
Sree

marker style by third variable

Might seem like a repeat question, but the solution in this post doesn't seem to work for me.
I have a bunch of data I want to plot as lines/curves, and another dataset linked to the curves consisting of XYZ data, where Z represents a labeling variable for the curves.
I've got some example code here with some XY data, and labels for anyone wanting to replicate what I'm doing:
plt.plot(xdata, ydata)
plt.scatter(xlab, ylab, c=lab) # needs a marker function adding
plt.show()
Ideally I want to add some kind of unique marker based on the label values; 0.1,0.5,1,2,3,4,6,8,10,20. The labels are the same for each curve.
I have over 100 curves to plot, so something quick and effective is needed. Any help would be great!
My current solution would be to just split the data by labelling values, and then plot separately for each one (long and messy in my opinion). Figured someone might have a more elegant solution here.
I'm guessing you could do this with a dictionary... but I might need some help doing that!
Cheers, KB

Matplotlib does not accepts different markers per plot.
However, a less verbose and more robust solution for large dataset is using the pandas and seaborn library:
Additionally you can use the pandas.cut function to plot bins (Its something I regularly need to produce graphs where I can use a third continuous value as a parameter). The way to use it is :
import pandas as pd
import seaborn as sns
url = 'https://pastebin.com/raw/dwGBLqSb' # url of paste
df = pd.read_csv(url)
sns.scatterplot(data = df, x='labx', y='laby', style='lab')
and it produces the following example:
If you have something more advanced labelling you could also look at LabelEncoder of Sklearn.
Hopefully, I've edited enough this answer not to offend don't post identical answers to multiple questions. For what is worth, I am not affiliated with seaborn library in any way nor am I trying to promote anything. The only thing I am trying to do is help someone with a similar problem that I've come across and I couldn't find easily a clear answer in SE.

matplot and seaborn figure parameters/customizations

I'm so confused between the two. Every time I make a chart on either pyplot or seaborn, I have to guess what syntax to use. For example, for seaborn doesn't have a title setter so I have to remember to use plt.title. Or, for seaborn charts, plt.xlabel doesn't work, so I have to use sns.axlable(x,y).
And also, randomly I run into the following problem. I'm simply trying to make my seaborn jointplot bigger but I have no success trying both the plt nor the seaborn methods (any tips as to a good documentation showing all the chart parameters??? I find them scattered on the web and it seems like each solution on stack overflow is unique...which adds to the overall confusion).
Here's my code:
a = plt.figure(figsize=(30,30))
a.set_size_inches(30,30)
sns.jointplot(x='COAST',y='NORTH',data = data_df, kind = 'kde')
Notice I used the plt method and the sns.set_size_inches methods. Both gave me a small chart.
So frustrated with the random overlaps of the two libraries. Any pro tips to lessen the confusion will be greatly appreciated!
edit: This is also true for seaborn's pairplot. I have no success in changing the pairplot's size.

sns.jointplot creates its own figure instance (as #tcaswell suspected). It doesn't appear that you can tell jointplot to use an existing figure. I think you have two options:
You can give sns.jointplot the size option. e.g.:
sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde', size=30)
You can alter the JointGrid figure size after creating it, using:
g=sns.jointplot(x='COAST', y='NORTH', data=data_df, kind='kde')
g.fig.set_size_inches(30,30)
I presume option 1 is the better option, as it is a built-in seaborn option

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to prevent from plotting outlier in boxplot in pandas - python

I have a DataFrame(called result_df) and want to plot one column with boxplot. But certain outliers spoiled the visualization. How could I prevent from ploting outliers? Code I used: fig, ax = pl.subplots() fig.set_size_inches(18.5,10.5) result_df.boxplot(ax=ax) pl.show()

Related

Is there a way to set the vlines below/under the candlesticks on plot? (mplfinance)

Seaborn's pairplot seems to have scalling issue on diagonal plots

Limiting Number of ticks in Matplotlib

marker style by third variable

matplot and seaborn figure parameters/customizations

Categories

Resources