Matplotlib Boxplot: Showing Number of Occurrences of Integer Outliers - python

I have a plot like the following (using plt.boxplot()):
Now, what I want is plotting a number how often those outliers occured (preferably to the top right of each outlier).
Is that somehow achievable?

ax.boxplot returns a dictionary of all the elements in the boxplot. The key you need here from that dict is 'fliers'.
In boxdict['fliers'], there are the Line2D instances that are used to plot the fliers. We can grab their x and y locations using .get_xdata() and .get_ydata().
You can find all the unique y locations using a set, and then find the number of fliers plotted at that location using .count().
Then its just a case of using matplotlib's ax.text to add a text label to the plot.
Consider the following example:
import matplotlib.pyplot as plt
import numpy as np
# Some fake data
data = np.zeros((10000, 2))
data[0:4, 0] = 1
data[4:6, 0] = 2
data[6:10, 0] = 3
data[0:9, 1] = 1
data[9:14, 1] = 2
data[14:20, 1] = 3
# create figure and axes
fig, ax = plt.subplots(1)
# plot boxplot, grab dict
boxdict = ax.boxplot(data)
# the fliers from the dictionary
fliers = boxdict['fliers']
# loop over boxes in x direction
for j in range(len(fliers)):
# the y and x positions of the fliers
yfliers = boxdict['fliers'][j].get_ydata()
xfliers = boxdict['fliers'][j].get_xdata()
# the unique locations of fliers in y
ufliers = set(yfliers)
# loop over unique fliers
for i, uf in enumerate(ufliers):
# print number of fliers
ax.text(xfliers[i] + 0.03, uf + 0.03, list(yfliers).count(uf))
plt.show()

Related

How to plot lines between points, and change their color based on specific values in Python?

Context:
3x35 values array that associates 1 value per segment
4x35x2 matpos array that gathers the coordinates of 4x35 points (hence 3x35 segments).
Question:
How can I define each segment's color based on their values from the values array ?
Code attempt:
# Array of values for each point
values = np.random.rand(3,35)
# Generate array of positions
x = np.arange(0,35)
y = np.arange(0,4)
matpos = np.array([[(y[i], x[j]) for j in range(0,len(x))] for i in range(0,len(y))])
# plot the figure
plt.figure()
for i in range(len(y)-1):
for j in range(len(x)):
# plot each segment
plt.plot(matpos[i:i+2,j,0],matpos[i:i+2,j,1]) #color = values[i,j]
If your values are just along a grid, you might as well just use plt.imshow(values).
Updated code for desired result:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
# Array of values for each point
values = np.random.rand(3,35)
# Transform value to colors depending on colormap
color_norm = mpl.colors.Normalize(np.min(values), np.max(values))
color_map = mpl.cm.get_cmap('viridis')
colors = color_map(color_norm(values))
plt.close('all')
plt.figure()
for (y, x), value in np.ndenumerate(values):
plt.plot([x, x+1], [y, y], c = colors[y,x], linewidth = 10)

Matplotlib -- how to retreive polygons colors from choropleth map

I made the choropleth map using GeoPandas and Matplotlib. I want to add value labels to each polygon of the map in a way that font label color must be a contrast to polygon fill color (white on a darker color and black on a lighter).
Thus, I need to know every polygon's fill color. I found the solution (see minimal working example code below).
But I suppose that a more simple and clear solution exists, so I post this question with the hope to find it with community help.
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from itertools import islice, pairwise
from matplotlib.collections import PatchCollection
def contrast_color(color):
d = 0
r, g, b = (round(x*255, 0) for x in color[:3])
luminance = 1 - (0.299 * r + 0.587 * g + 0.114 * b) / 255
d = 0 if luminance < 0.5 else 255
return (d, d, d)
def get_colors(ax):
# get childrens
# to obtain a PatchCollection
_ = ax.axes.get_children()
collection = _[0] # suppose it is first element
if not isinstance(collection, PatchCollection):
raise TypeError("This is not Collection")
# get information about polygons fill colors
# .get_facecolors() returns ALL colors for ALL polygons
# that belongs to one multipolygon
# e. g. if we have two multipolygons,
# and the first consists of two polygons
# and second consists of one polygon
# we obtain THREE colors
poly_colors = collection.get_facecolors()
return poly_colors.tolist()
gpd.read_file("https://gist.githubusercontent.com/ap-Codkelden/72f988e2bcc90ea3c6c9d6d989d8eb3b/raw/c91927bdb6b199c4dd6df6759200a5a1e4b820f0/obl_sample.geojson")
dfm['coords'] = [x[0] for x in dfm['geometry'].apply(lambda x: x.representative_point().coords[:])]
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.axis('off')
ax.set_title('Title', fontdict={'fontsize': '12', 'fontweight' : '3'})
dfm.plot(
ax=ax,
column='Average', cmap='Blues_r',
linewidth=0.5, edgecolor='k',
scheme='FisherJenks', k=2,
legend=True
)
out = [] # empty container for colors
# count polygons for every multipolygon
# since it can contains more than one
poly_count = dfm.geometry.apply(lambda x: len(x.geoms)).to_list()
poly_colors = get_colors(ax)
# we need split the polygon's colors list into sublists,
# where every sublist will contain all colors for
# every polygon that belongs to one multipolygon
slices = [(0, poly_count[0])] + [x for x in pairwise(np.cumsum(poly_count))]
# splitting
for s in slices:
out.append(
set(tuple(x) for x in islice(poly_colors, *s)),)
# remove transparensy info
out = [next(iter(x))[:3] for x in out]
dfm['color'] = [tuple([y/255 for y in x]) for x in map(contrast_color, out)]
for idx, row in dfm.iterrows():
plt.annotate(
f"{row['reg_en']}\n{row['Average']:.2f}",
xy=row['coords'], horizontalalignment='center',
color=row['color'], size=9)
Desired labels are:

How do I specify the number of axis points in matplotlib and how do I extract theese points?

I have a small script that creates a matplotlib graph with 2000 random points following a random walk.
I'm wondering if there is a simple way to change the number of points on the y-axis as well as how I can extract these values?
When I run the code below, I get 5 points on the Y-axis but I'm looking for a way to expand this to 20 points as well as creating an array or series with these values. Many thanks in advance.
import matplotlib.pyplot as plt
dims = 1
step_n = 2000
step_set = [-1, 0, 1]
origin = np.zeros((1,dims))
random.seed(30)
step_shape = (step_n,dims)
steps = np.random.choice(a=step_set, size=step_shape)
path = np.concatenate([origin, steps]).cumsum(0)
plt.plot(path)
import matplotlib.pyplot as plt
import numpy as np
import random
dims = 1
step_n = 2000
step_set = [-1, 0, 1]
origin = np.zeros((1,dims))
random.seed(30)
step_shape = (step_n,dims)
steps = np.random.choice(a=step_set, size=step_shape)
path = np.concatenate([origin, steps]).cumsum(0)
#first variant
plt.plot(path)
plt.locator_params(axis='x', nbins=20)
plt.locator_params(axis='y', nbins=20)
You can use locator_params in order to specify the number of ticks. Of course you can retrieve these points. For this you must create a subplot with ax, and then you can get the y_ticks with get_yticks.
#second variant
# create subplot
fig, ax = plt.subplots(1,1, figsize=(20, 11))
img = ax.plot(path)
plt.locator_params(axis='y', nbins=20)
y_values = ax.get_yticks() # y_values is a numpy array with your y values

plot a sequence of numbers with different colors

I have a random list of 0 and 1 with a length > 300. I would like to plot the list with 1 as green and 0 as red as shown in the below pic. What is the best way to do this in matplotlib?
You can use a matplotlib table:
import matplotlib.pyplot as plt
data = [0,1,0,1,1,0] # Setup data list
fig, ax = plt.subplots(figsize=(len(data)*0.5, 0.5)) # Setup figure
ax.axis("off") # Just want table, no actual plot
# Create table, with our data array as the single row, and consuming the whole figure space
t = ax.table(cellText=[data], loc="center", cellLoc="center", bbox=[0,0,1,1])
# Iterate over cells to colour them based on value
for idx, cell in t.get_celld().items():
if data[idx[1]] == 1:
c = 'g'
else:
c = 'r'
cell.set_edgecolor(c)
cell.set_facecolor(c)
fig.show()

How do I separate a data set on a scatter plot

I'm very new to python but am interested in learning a new technique whereby I can identify different data points in a scatter plot with different markers according to where they fall in the scatter plot.
My specific example is much to this: http://www.astroml.org/examples/datasets/plot_sdss_line_ratios.html
I have a BPT plot and want to split the data along the demarcation line line.
I have a data set in this format:
data = [[a,b,c],
[a,b,c],
[a,b,c]
]
And I also have the following for the demarcation line:
NII = np.linspace(-3.0, 0.35)
def log_OIII_Hb_NII(log_NII_Ha, eps=0):
return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)
Any help would be great!
There was not enough room in the comments section. Not too dissimilar to what #DrV wrote, but maybe more astronomically inclined:
import random
import numpy as np
import matplotlib.pyplot as plt
def log_OIII_Hb_NII(log_NII_Ha, eps=0):
return 1.19 + eps + 0.61 / (log_NII_Ha - eps - 0.47)
# Make some fake measured NII_Ha data
iternum = 100
# Ranged -2.1 to 0.4:
Measured_NII_Ha = np.array([random.random()*2.5-2.1 for i in range(iternum)])
# Ranged -1.5 to 1.5:
Measured_OIII_Hb = np.array([random.random()*3-1.5 for i in range(iternum)])
# For our measured x-value, what is our cut-off value
Measured_Predicted_OIII_Hb = log_OIII_Hb_NII(Measured_NII_Ha)
# Now compare the cut-off line to the measured emission line fluxes
# by using numpy True/False arrays
#
# i.e., x = numpy.array([1,2,3,4])
# >> index = x >= 3
# >> print(index)
# >> numpy.array([False, False, True, True])
# >> print(x[index])
# >> numpy.array([3,4])
Above_Predicted_Red_Index = Measured_OIII_Hb > Measured_Predicted_OIII_Hb
Below_Predicted_Blue_Index = Measured_OIII_Hb < Measured_Predicted_OIII_Hb
# Alternatively, you can invert Above_Predicted_Red_Index
# Make the cut-off line for a range of values for plotting it as
# a continuous line
Predicted_NII_Ha = np.linspace(-3.0, 0.35)
Predicted_log_OIII_Hb_NII = log_OIII_Hb_NII(Predicted_NII_Ha)
fig = plt.figure(0)
ax = fig.add_subplot(111)
# Plot the modelled cut-off line
ax.plot(Predicted_NII_Ha, Predicted_log_OIII_Hb_NII, color="black", lw=2)
# Plot the data for a given colour
ax.errorbar(Measured_NII_Ha[Above_Predicted_Red_Index], Measured_OIII_Hb[Above_Predicted_Red_Index], fmt="o", color="red")
ax.errorbar(Measured_NII_Ha[Below_Predicted_Blue_Index], Measured_OIII_Hb[Below_Predicted_Blue_Index], fmt="o", color="blue")
# Make it aesthetically pleasing
ax.set_ylabel(r"$\rm \log([OIII]/H\beta)$")
ax.set_xlabel(r"$\rm \log([NII]/H\alpha)$")
plt.show()
I assume you have the pixel coordinates as a, b in your example. The column with cs is then something that is used to calculate whether a point belongs to one of the two groups.
Make your data first an ndarray:
import numpy as np
data = np.array(data)
Now you may create two arrays by checking which part of the data belongs to which area:
dataselector = log_OIII_Hb_NII(data[:,2]) > 0
This creates a vector of Trues and Falses which has a True whenever the data in the third column (column 2) gives a positive value from the function. The length of the vector equals to the number of rows in data.
Then you can plot the two data sets:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
# the plotting part
ax.plot(data[dataselector,0], data[dataselector,1], 'ro')
ax.plot(data[-dataselector,0], data[-dataselector,1], 'bo')
I.e.:
create a list of True/False values which tells which rows of data belong to which group
plot the two groups (-dataselector means "all the rows where there is a False in dataselector")

Categories

Resources