Set axis limits in matplotlib but autoscale within them - python

Is it possible to set the max and min values of an axis in matplotlib, but then autoscale when the values are much smaller than these limits?
For example, I want a graph of percentage change to be limited between -100 and 100, but many of my plots will be between, say, -5 and 5. When I use ax.set_ylim(-100, 100), this graph is very unclear.
I suppose I could use something like ax.set_ylim(max((-100, data-n)), min((100, data+n))), but is there a more built in way to achieve this?

If you want to drop extreme outliers you could use the numpy quantile function to find say the first 0.001 % of the data and last 99.999 %.
near_min = np.quantile(in_data, 0.0001)
near_max = np.quantile(in_data, 0.9999)
ax.set_ylim(near_min, near_max)
You will need to adjust the quantile depending on the volume of data you drop. You might want to include some test of whether the difference between near_min and true min is significant?

As ImportanceOfBeingErnest pointed out, there is no support for this feature. In the end I just used my original idea, but scaled it by the value of the max and min to give the impression of autoscale.
ax.set_ylim(max((-100, data_min+data_min*0.1)), min((100, data_max+data_max*0.1)))
Where for my case it is true that
data_min <= 0, data_max >= 0

Why not just set axis limits based on range of the data each time plot is updated?
ax.set_ylim(min(data), max(data))
Or check if range of data is below some threshold, and then set axis limits.
if min(abs(data)) < thresh:
ax.set_ylim(min(data), max(data))

Related

bin value of histograms from grouped data

I am a beginner in Python and I am making separate histograms of travel distance per departure hour. Data I'm using, about 2500 rows of this. Distance is float64, the Departuretime is str. However, for making further calculations I'd like to have the value of each bin in a histogram, for all histograms.
Up until now, I have the following:
df['Distance'].hist(by=df['Departuretime'], color = 'red',
edgecolor = 'black',figsize=(15,15),sharex=True,density=True)
This creates in my case a figure with 21 small histograms. Histogram output I'm receiving.
Of all these histograms I want to know the y-axis value of each bar, preferably in a dataframe with the distance binning as rows and the hours as columns.
With single histograms, I'd paste counts, bins, bars = in front of the entire line and the variable counts would contain the data I was looking for, however, in this case it does not work.
Ideally I'd like a dataframe or list of some sort for each histogram, containing the density values of the bins. I hope someone can help me out! Big thanks in advance!
First of all, note that the bins used in the different histograms that you are generating don't have the same edges (you can see this since you are using sharex=True and the resulting bars don't have the same width), in all cases you are getting 10 bins (the default), but they are not the same 10 bins.
This makes it impossible to combine them all in a single table in any meaningful way. You could provide a fixed list of bin edges as the bins parameter to standarize this.
Alternatively, I suggest you calculate a new column that describes to which bin each row belongs, this way we are also unifying the bins calulation.
You can do this with the cut function, which also gives you the same freedom to choose the number of bins or the specific bin edges the same way as with hist.
df['DistanceBin'] = pd.cut(df['Distance'], bins=10)
Then, you can use pivot_table to obtain a table with the counts for each combination of DistanceBin and Departuretime as rows and columns respectively as you asked.
df.pivot_table(index='DistanceBin', columns='Departuretime', aggfunc='count')

Adding upper / lower limits and changing x-axes value format in Dash Plotly

I’ve been working on a mini project using Dash Plotly to visualize some factory data I found online and I have a couple of questions that I could not find answers to.
How to change the format of x-axes values ? My values are in the thousands and Plotly defaults it to 20k, 22.5k, 25k etc. I actually want it as 20000, 22500, 25000.
While I am able to plot x and y values from the data frame easily, my data has upper and lower limits (same scale as y values) to determine pass fail criteria. These limits are in separate columns in the data frame. How do I plot these limits for every corresponding x value ?
Thanks for your help!
Question 1 - How to format axis labels
From the official documentation:
https://plotly.com/python/tick-formatting/#using-tickformat-attribute
Using Tickformat Attribute
For more formatting types, see: https://github.com/d3/d3-format/blob/master/README.md#locale_format
fig.update_layout(yaxis_tickformat = '%')
Question 2
I would recommend to read out the upper and lower limits from the dataframe directly and then using the range property in plotly:
https://plotly.com/python/axes/#setting-the-range-of-axes-manually
fig.update_xaxes(range=[1.5, 4.5])
fig.update_yaxes(range=[3, 9])

plot mean and confidence interval - matplotlib

I want to make a plot that splits a dataset and shows the amount of observations per category on the left axis and a confidence interval (e.g. 90%) including the mean for a certain observed value on the right axis.
It should look like this:
I know how to use ax.hist() or ax.bar() for the first job. A second axis is easily made using ax.twinx(). However, after trying both ax.boxplot() and ax.violinplot(), I believe neither could do the job (plotting the confidence interval + mean) correctly. Any suggestions?

Seaborn distplot: y axis problems with multiple kdeplots

I am currently plotting 3 kernel density estimations together on the same graph. I assume that kdeplots use relative frequency as the y value, however for some of my data the kdeplot has frequencies way above 1.
code I'm using:
sns.distplot(data1, kde_kws={"color": "b", "lw": 1.5, "shade": "False", "kernel": "gau", "label": "t"}, hist=False)
Does anyone know how I can make sure that the kdeplot either makes y value relative frequency, or allow me to adjust the ymax axis limit automatically to the maximum frequency calculated?
Okay so I figured out that I just needed to set the autocaling to Tight, that way it didn't give negative values on the scale.

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Categories

Resources