plot number of samples on a boxplot with matplotlib [duplicate] - python

This question already has answers here:
No. of Observation inside BoxPlot, Matplotlib
(2 answers)
Closed 3 years ago.
I am using matplotlib to plot a boxplot as follows:
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randint(0, 101, size=10).tolist()
x2 = np.random.randint(0, 101, size=20).tolist()
labels = ('Data10', 'Data20')
plt.boxplot((x1, x2))
plt.xticks(range(1, len(labels) + 1), labels, rotation='vertical')
Now, what I want to do is also be able to print the number of observations on top of each of these boxplots. Is there any easy way to do that using matplotlib?

No other way in matplotlib than to use annotations.
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.randint(0, 101, size=10).tolist()
x2 = np.random.randint(0, 101, size=20).tolist()
labels = ('Data10', 'Data20')
plt.boxplot((x1, x2))
plt.xticks(range(1, len(labels) + 1), labels, rotation='vertical')
ax = plt.gca() # get current axis object
ax.annotate('local max', xy=(.5, .5), xycoords='axes fraction',
xytext=(0.2, 0.95), textcoords='axes fraction')

Related

Set y range on Matplotlib boxplot [duplicate]

This question already has answers here:
Changing the tick frequency on the x or y axis
(13 answers)
Closed 1 year ago.
How to set y axis label's on Matplotlib boxplot? I checked the docs and searched the google for it, but perhaps I am not using the right keywords. I want to set the y axis interval to 5000 instead of 20000 as it is shown in this graph.
You can use MultipleLocator and set y axis:
Example:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)
# Sample
t = np.arange(0.0, 100.0, 0.1)
s = np.abs(100000 * np.sin(0.1 * np.pi * t) * np.exp(-t * 0.01))
# Plot function
fig, ax = plt.subplots()
ax.plot(t, s)
# Define the yaxis
ax.yaxis.set_major_locator(MultipleLocator(5000))
plt.show()
With MultipleLocator:
Without MultipleLocator:

Python (matplotlib): Arrange multiple subplots (histograms) in grid [duplicate]

This question already has answers here:
How to plot in multiple subplots
(12 answers)
Closed 1 year ago.
I want to arrange 5 histograms in a grid. Here is my code and the result:
I was able to create the graphs but the difficulty comes by arranging them in a grid. I used the grid function to achieve that but i need to link the graphs to it in the respective places.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
Openness = df['O']
Conscientiousness = df['C']
Extraversion = df['E']
Areeableness = df['A']
Neurocitism = df['N']
grid = plt.GridSpec(2, 3, wspace=0.4, hspace=0.3)
# Plot 1
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['O'], bins = 100)
plt.title("Openness to experience")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 2
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['C'], bins = 100)
plt.title("Conscientiousness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 3
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['E'], bins = 100)
plt.title("Extraversion")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 4
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['A'], bins = 100)
plt.title("Areeableness")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Plot 5
import matplotlib.pyplot as plt
import numpy as np
plt.hist(df['N'], bins = 100)
plt.title("Neurocitism")
plt.xlabel("Value")
plt.ylabel("Frequency")
Results merge everything into one chart
But it should look like this
Could you guys please help me out?
You can use plt.subplots:
fig, axes = plt.subplots(nrows=2, ncols=2)
this creates a 2x2 grid. You can access individual positions by indexing hte axes object:
top left:
ax = axes[0,0]
ax.hist(df['C'], bins = 100)
ax.set_title("Conscientiousness")
ax.set_xlabel("Value")
ax.set_ylabel("Frequency")
and so on.
You also continue use GridSpec. Visit https://matplotlib.org/stable/tutorials/intermediate/gridspec.html
for example -
fig2 = plt.figure(constrained_layout=True)
spec2 = gridspec.GridSpec(ncols=2, nrows=3, figure=fig2)
f2_ax1 = fig2.add_subplot(spec2[0, 0])
f2_ax2 = fig2.add_subplot(spec2[0, 1])
f2_ax3 = fig2.add_subplot(spec2[1, 0])
f2_ax4 = fig2.add_subplot(spec2[1, 1])
f2_ax5 = fig2.add_subplot(spec2[2, 1])
# Plot 1
f2_ax1.hist(df['O'])
f2_ax1.set_title("Openness to experience")
f2_ax1.set_xlabel("Value")
f2_ax1.set_ylabel("Frequency")
` plt.show()

Frequency Distribution Plot: change x-axis to interval

Dear People of the Internet
I have calculated a frequency distribution and I would now like to plot it in a certain manner. So far I have calculated and plotted the frequency distribution, but I couldn't find a solution for the endproduct I am looking for. My code with an example dataset for now is:
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np
import pandas as pd
# example data
rng = np.random.RandomState(seed=12345)
a1 = stats.norm.rvs(size=1000, random_state=rng)
res = stats.relfreq(a1, numbins=34)
x = res.lowerlimit + np.linspace(0, res.binsize*res.frequency.size, res.frequency.size)
# plotting
fig = plt.figure(figsize=(6, 3))
ax = fig.add_subplot(1, 1, 1)
ax.bar(x, res.frequency, width=res.binsize)
ax.set_title('Frequency Distribution of 1D Vix Returns')
ax.set_xlim([x.min(), x.max()])
ax.set_xticks(ax.get_xticks()[::1])
plt.show()
As a last step, I would like to plot the x-Axis just as in the attached picture. Instead of single number I would like to have the interval. I couldn't find a source in which this matter is resolved. Has anyone encountered the same problem or knows any source which has a solution to it? Thanks in advance
Have a look at this nice answer:
https://stackoverflow.com/a/6353051/10372616.
I added the code to your current plot.
import matplotlib.pyplot as plt
from scipy import stats # ????
import numpy as np
import pandas as pd # ????
# example data
rng = np.random.RandomState(seed=12345)
a1 = stats.norm.rvs(size=1000, random_state=rng)
res = stats.relfreq(a1, numbins=34)
x = res.lowerlimit + np.linspace(0, res.binsize*res.frequency.size, res.frequency.size)
# plotting
fig = plt.figure(figsize=(6, 3))
ax = fig.add_subplot(1, 1, 1)
ax.bar(x, res.frequency, width=res.binsize)
ax.set_title('Frequency Distribution of 1D Vix Returns')
ax.set_xlim([x.min(), x.max()])
ax.set_xticks(ax.get_xticks()[::1])
# Change traditional tick labels to range labels
# ----------------------------------------------------------------
ax.set_xticklabels([]) # hide your previous x tick labels
bins = ax.get_xticks()[::1]
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
for a, b, x in zip(bins, bins[1:], bin_centers):
label = '{:0.0f} to {:0.0f}'.format(a, b)
ax.annotate(label, xy=(x, 0), xycoords=('data', 'axes fraction'),
xytext=(0, -10), textcoords='offset points', va='top', ha='center', rotation=90)
plt.show()
Before:
After:

Line plot that continuously varies transparency - Matplotlib

I wish to produce a single line plot in Matplotlib that has variable transparency, i.e. it starts from solid color to full transparent color.
I tried this but it didn't work.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2 * np.pi, 500)
y = np.sin(x)
alphas = np.linspace(1, 0, 500)
fig, ax = plt.subplots(1, 1)
ax.plot(x, y, alpha=alphas)
Matplotlib's "LineCollection" allows you to split the line to be plotted into individual line segments and you can assign a color to each segment. The code example below shows how each horizontal "x" value can be assigned an alpha (transparency) value that indexes into a sequential colormap that runs from transparent to a given color. A suitable colormap "myred" was created using Matplotlib's "colors" module.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
import matplotlib.colors as colors
redfade = colors.to_rgb("red") + (0.0,)
myred = colors.LinearSegmentedColormap.from_list('my',[redfade, "red"])
x = np.linspace(0,1, 1000)
y = np.sin(x * 4 * np.pi)
alphas = x * 4 % 1
points = np.vstack((x, y)).T.reshape(-1, 1, 2)
segments = np.hstack((points[:-1], points[1:]))
fig, ax = plt.subplots()
lc = LineCollection(segments, array=alphas, cmap=myred, lw=3)
line = ax.add_collection(lc)
ax.autoscale()
plt.show()
If you are using the standard white background then you can save a few lines by using one of Matplotlib's builtin sequential colormaps that runs from white to a given color. If you remove the lines that created the colormap above and just put the agument cmap="Reds" in the LineCollection function, it creates a visually similar result.
The only solution I found was to plot each segment independently with varying transparency
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2 * np.pi, 500)
y = np.sin(x)
alphas = np.linspace(1, 0, 499)
fig, ax = plt.subplots(1, 1)
for i in range(499):
ax.plot(x[i:i+2], y[i:i+2], 'k', alpha=alphas[i])
But I don't like it... Maybe this is enough for someone
I don't know how to do this in matplotlib, but it's possible in Altair:
import numpy as np
import pandas as pd
import altair as alt
x = np.linspace(0, 2 * np.pi, 500)
y = np.sin(x)
alt.Chart(
pd.DataFrame({"x": x, "y": y, "o": np.linspace(0, 1, len(x))}),
).mark_point(
).encode(
alt.X("x"),
alt.Y("y"),
alt.Opacity(field="x", type="quantitative", scale=alt.Scale(range=[1, 0]), legend=None),
)
Result:

Reorient Histogram and Scatterplot with Trend Line

I have a dataset that looks similar to the one simulated in the code below. There are two sets of observations, one for those at X=0 and another for those at X>0.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
X1 = np.random.normal(0, 1, 100)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(0, 1, 100)
X0 = np.zeros(100)
Y0 = np.random.normal(0, 1.2, 100) + 2
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
sns.distplot(Y0, color="orange")
plt.show()
sns.scatterplot(X, Y, hue = (X == 0), legend=False)
plt.show()
There are two plots: a histogram with KDE and a scatterplot.
I want to take the histogram with KDE, rotate it, and orient it appropriately with respect to the scatter plot. I would also like to add a trend line for each respective set of observations.
The ideal result would look something like this:
How do you do this in python, either using seaborn or matplotlib?
This can be done by combining plt.subplots with shared y-axis to keep the scale and sns plots. For trend line you need some additional computation, but you can use np for quick fitting. Here is an example how to achieve your goal, and here is jupyter notebook to play with.
import numpy as np
import seaborn as sns; sns.set()
import matplotlib.pyplot as plt
# Prepare some data
np.random.seed(2020)
mean_Y1 = 0
std_Y1 = 1
size_Y1 = 100
X1 = np.random.normal(mean_Y1, std_Y1, size_Y1)
X1 = X1 - np.min(X1)
Y1 = X1 + np.random.normal(mean_Y1, std_Y1, size_Y1)
# this for computing trend line
Z = np.polyfit(X1, Y1, 1)
Y_ = np.poly1d(Z)(X1)
mean_Y0 = 2
std_Y0 = 1.2
size_Y0 = 100
X0 = np.zeros(100)
Y0 = np.random.normal(mean_Y0, std_Y0, size_Y0)
X = np.concatenate((X1, X0))
Y = np.concatenate((Y1, Y0))
# Now time for plotting
fig, axs = plt.subplots(1, 2,
sharey=True,
figsize=(10, 5),
gridspec_kw={'width_ratios': (1, 2)}
)
# control space between plots
fig.subplots_adjust(wspace=0.1)
# set the ticks for y-axis:
axs[0].yaxis.set_tick_params(left=False, labelleft=False, labelright=True)
# if you wish you can rotate xticks on the histogram with:
axs[0].xaxis.set_tick_params(rotation=90)
# plot histogram
dist = sns.distplot(Y0, color="orange", vertical=True, ax=axs[0])
# now we need to get the coordinate of the peak, we need this for mean line
line_data = dist.get_lines()[0].get_data()
max_Y0 = np.max(line_data[0])
# plotting the mean line
axs[0].plot([0, max_Y0], [mean_Y0, mean_Y0], '--', c='orange')
# inverting xaxis
axs[0].invert_xaxis()
# Plotting scatterpot
sns.scatterplot(X, Y, hue = (X == 0), legend=False, ax=axs[1])
# Plotting trend line
sns.lineplot(X1, Y_, ax=axs[1])
# Plotting mean again
axs[1].plot([0, max(X1)], [mean_Y0, mean_Y0], '--', c='orange')
plt.show()
Out:

Categories

Resources