When I was organizing my skewed distribution data to boxplot in python, it has a lot of outliers. I want to show only maximum & minimum outlier.
How can I make a code?
I don't want to remove my database. Just I want to show two outliers(Max, Min) in my graph image.
showfliers=False
or
plt.boxplot([data], showfliers=False)
try this ....
Related
One of my projects needs order analysis of vibration signals with Python instead of Matlab, they want to visualize data using colormap which usually has frequency on the horizontal axis and rotational speed on the vertical axis. Just like this picture:
How can I do this?
I am trying to plot time series data in a kind of "climate stripes plot" using the package Altair.
The problem is that I do not know how to change the range in the legend to standardise all my plots with the same colour range and numbers in the legend. At the moment, each time I plot something the legend adapts to the range of the data.
I think the problem is with the "domain" property, maybe is not in the correct place ?
Thank you for your help :)
This is the code for the plot :
chart=alt.Chart(source).mark_rect().encode(
x=('day:O'),
y='subasins:N',
color=alt.Color('90%:Q',legend=alt.Legend(title='CH4'), bin=alt.Bin(maxbins=20),
scale=alt.Scale(scheme='blueorange'),domain=[1830,2000])
).properties(width=100).facet(column=alt.Column('month'))
chart.show()
Plots that I get now with different scales in the legend
You're using the right approach with domain, it just needs to be put inside alt.Scale:
scale=alt.Scale(scheme='blueorange', domain=[1830, 2000])
When you're using a bin transform, one way to ensure the scale is consistent is to specify the bin extent:
bin=alt.Bin(maxbins=20, extent=[1830, 2000])
The dataset consist of 4000+ records. Here , trying to identify anomaly in 'duration' attribute. However, when the box plot is drown, can find that it is highly skewed. Tried to transform data, however results are not got. Attaching the boxplot below. How should we proceed in these cases.
Boxplot
What you could do is create a histogram of your plot and try to fit a distribution on your data. Suppose you were able to fit a standard normal distribution on your data, then you could read off anomalies in your data by checking the probability of the sample in your distribution. If this probability is smaller than a threshold probability p, then you could mark it as an anomoly.
I Posted this question about 3D plots of data frames:
3D plot of 2d Pandas data frame
and the user referred me very very helfully to this:
Plotting Pandas Crosstab Dataframe into 3D bar chart
It use useful and the code worked in principle, but it lookes like a mess (see image below) for several reasons:
I have huge number of values to plot (470 or so, along the y-axis) so perhaps a bar chart is not the best way (I am going for a histogram kind of look, so I assumed very narrow bars would be suitable)
my counts (z axis) do not give almost any information, because the differences I need to see are from 100 to the max value
how can I make the 3D plot that shows up interactive? (being able to rotate etc) - I have seen it done in blogs/videos but sure if it's something on Tools -> Preferences that I can't find
So re: the second issue, simple enough, I tried to just change the limits of the zbar as I would for a 2D Plot, by incorporating:
ax.set_zlim([110,150])
just before the axis labels, but obviously this is the wrong way:
SO do I have to limit the values from the original data set (i.e. filter out <110), or is there a way to do this from the plot?
I am trying to use the matplotlib boxplot to show the boxplot of a number of distributions. One group of distributions is on a very different scale than the other group. I have been trying to use twinx() to plot the second group but it overlaps with the other boxplots.
Is there a better method to add a different scale for specific data?