graphing scatter for ranges in python's matplotlib - python

I have this graph so far, it's kind of ugly. Each type of marker is an accuracy for a type of algorithm.
There are two problems with this:
I'd like there to be space between the content and the axises, but only show ticks for y [0, 1] and have the x axis show no negative values (no such thing as negative time).
I'd like to display the x values as their log transform, but keep the original values on the ticks so you can see the actual values.
For #1 I tried playing with xticks as you can see below but without much success.
For #2, it's simple to throw the data into an np.log10() but then the axis ticks are also log transformed. I feel like there should be a simple way to do this log display (seems like a pretty normal thing to do)?
Here's my code so far:
import matplotlib as mpl
mpl.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
import random
# create fake data
data = {}
data['A'] = []
data['B'] = []
data['C'] = []
n = 5
data['A'] = zip(np.random.uniform(0, 10000, size=n), np.random.uniform(0, 0.6, size=n))
data['B'] = zip(np.random.uniform(0, 200, size=n), np.random.uniform(0, 0.6, size=n))
data['C'] = zip(np.random.uniform(0, 5000, size=n), np.random.uniform(0, 0.6, size=n))
# make graph
markers = ['+', '*', 'x']
colors = ['b', 'r', 'g']
fig = plt.figure()
ax1 = fig.add_subplot(111)
plots = []
labels = []
# extract data
i = 0
for algorithm in ['A', 'B', 'C']:
results = data[algorithm]
testing = np.array([float(x[1]) for x in results if x > 0.0])
ts = np.array([int(x[0]) for x in results if x > 0.0])
color = colors[i]
marker = markers[i]
plot = ax1.scatter(ts, testing, color=color, marker=marker, s=10)
plots.append(plot)
labels.append(algorithm)
i += 1
# set axis and title
ax1.legend(plots, labels, loc='lower right')
ax1.set_xlabel("Time (sec)")
ax1.set_ylabel("Testing Accuracy")
ax1.set_title("Time versus testing accuracy")
# set axis limits
xticks, xticklabels = plt.xticks()
xmin = (3*xticks[0] - xticks[1])/2.
xmax = (3*xticks[-1] - xticks[-2])/2.
plt.xlim(xmin, xmax)
plt.xticks(xticks)
plt.ylim(0.0, 1.0)
# save to disk
plt.savefig("scatter.eps")

why you are not doing this?
ax1.set_ylim( -.1, 1.1 )
ax1.set_yticks( np.linspace(0, 1, 10) )
ax1.set_xscale('log')

Related

Matplotlib Draw a Constant y Axis

I want to use matpoltlib to make a plot that with a constant y axis(always from 0 to 14 and the gap is 1), since I want to make labels for them and my dot values will be(x, y) where y is from 0 to 14 gap 1, and a changing x axis. I already tried to play with y ticks. And here is my code for that:
fig, ax = plt.subplots()
fig.canvas.draw()
plt.yticks(np.arange(0, 14, 1))
labels = [item.get_text() for item in ax.get_yticklabels()]
labels[1] = 'Not Detected'
labels[2] = 'A/G'
labels[3] = 'G/G'
labels[4] = 'C/T'
labels[5] = 'C/C'
labels[6] = 'A/A'
labels[7] = '-1'
labels[8] = 'ε3/ε3'
labels[9] = 'A/C'
labels[10] = 'T/T'
labels[11] = 'C/G'
labels[12] = 'ε2/ε3'
labels[13] = 'G/T'
ax.set_yticklabels(labels)
what I'm thinking about is to use some values or lines with white color so those y axis will appear. But I'm looking for a more efficient way of doing it. And here is the diagram I generated with the current code. It only shows C/C right now and I want all labels to appear in the diagram.
I tried draw white points with:
x1 = np.arange(n)
y1 = np.arange(1,15,1)
plt.scatter(x1,y1,color = 'white')
Which did give me what I want: But I was wondering whether there is a lib setting that can do this.
I would recommend just using a fixed locator and fixed formatter for your y axis. The function, ax.set_yticklabels() is simply a convenience wrapper for these tick methods.
I would also recommend having your y_labels in a list or using a loop structure as this is a more generalizable and modifiable implementation.
If I'm understanding the goals of your plot correctly, something like this may work well for you.
import matplotlib.pyplot as plt
import numpy as np
import matplotlib as mpl
#make some data
x = np.arange(25)
y = np.random.randint(1, 14, size=25)
#convert y labels to a list
y_labels = [
'Not Detected','A/G','G/G','C/T','C/C','A/A',
'-1','ε3/ε3', 'A/C','T/T','C/G','ε2/ε3','G/T'
]
#define figure/ax and set figsize
fig, ax = plt.subplots(figsize=(12,8))
#plot data, s is marker size, it's points squared
ax.scatter(x, y, marker='x', s=10**2, color='#5d2287', linewidth=2)
#set major locator and formatter to fixed, add grid, hide top/right spines
locator = ax.yaxis.set_major_locator(mpl.ticker.FixedLocator(np.arange(1, 14)))
formatter = ax.yaxis.set_major_formatter(mpl.ticker.FixedFormatter(y_labels))
grid = ax.grid(axis='y', dashes=(8,3), alpha=0.3, color='gray')
spines = [ax.spines[x].set_visible(False) for x in ['top','right']]
params = ax.tick_params(labelsize=12) #increase label font size

Create a plot in pyplot with several different y-axis scale [duplicate]

Two and three dimensional data can be viewed relatively straight-forwardly using traditional plot types. Even with four dimensional data, we can often find a way to display the data. Dimensions above four, though, become increasingly difficult to display. Fortunately, parallel coordinates plots provide a mechanism for viewing results with higher dimensions.
Several plotting packages provide parallel coordinates plots, such as Matlab, R, VTK type 1 and VTK type 2, but I don't see how to create one using Matplotlib.
Is there a built-in parallel coordinates plot in Matplotlib? I certainly don't see one in the gallery.
If there is no built-in-type, is it possible to build a parallel coordinates plot using standard features of Matplotlib?
Edit:
Based on the answer provided by Zhenya below, I developed the following generalization that supports an arbitrary number of axes. Following the plot style of the example I posted in the original question above, each axis gets its own scale. I accomplished this by normalizing the data at each axis point and making the axes have a range of 0 to 1. I then go back and apply labels to each tick-mark that give the correct value at that intercept.
The function works by accepting an iterable of data sets. Each data set is considered a set of points where each point lies on a different axis. The example in __main__ grabs random numbers for each axis in two sets of 30 lines. The lines are random within ranges that cause clustering of lines; a behavior I wanted to verify.
This solution isn't as good as a built-in solution since you have odd mouse behavior and I'm faking the data ranges through labels, but until Matplotlib adds a built-in solution, it's acceptable.
#!/usr/bin/python
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
def parallel_coordinates(data_sets, style=None):
dims = len(data_sets[0])
x = range(dims)
fig, axes = plt.subplots(1, dims-1, sharey=False)
if style is None:
style = ['r-']*len(data_sets)
# Calculate the limits on the data
min_max_range = list()
for m in zip(*data_sets):
mn = min(m)
mx = max(m)
if mn == mx:
mn -= 0.5
mx = mn + 1.
r = float(mx - mn)
min_max_range.append((mn, mx, r))
# Normalize the data sets
norm_data_sets = list()
for ds in data_sets:
nds = [(value - min_max_range[dimension][0]) /
min_max_range[dimension][2]
for dimension,value in enumerate(ds)]
norm_data_sets.append(nds)
data_sets = norm_data_sets
# Plot the datasets on all the subplots
for i, ax in enumerate(axes):
for dsi, d in enumerate(data_sets):
ax.plot(x, d, style[dsi])
ax.set_xlim([x[i], x[i+1]])
# Set the x axis ticks
for dimension, (axx,xx) in enumerate(zip(axes, x[:-1])):
axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
ticks = len(axx.get_yticklabels())
labels = list()
step = min_max_range[dimension][2] / (ticks - 1)
mn = min_max_range[dimension][0]
for i in xrange(ticks):
v = mn + i*step
labels.append('%4.2f' % v)
axx.set_yticklabels(labels)
# Move the final axis' ticks to the right-hand side
axx = plt.twinx(axes[-1])
dimension += 1
axx.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))
ticks = len(axx.get_yticklabels())
step = min_max_range[dimension][2] / (ticks - 1)
mn = min_max_range[dimension][0]
labels = ['%4.2f' % (mn + i*step) for i in xrange(ticks)]
axx.set_yticklabels(labels)
# Stack the subplots
plt.subplots_adjust(wspace=0)
return plt
if __name__ == '__main__':
import random
base = [0, 0, 5, 5, 0]
scale = [1.5, 2., 1.0, 2., 2.]
data = [[base[x] + random.uniform(0., 1.)*scale[x]
for x in xrange(5)] for y in xrange(30)]
colors = ['r'] * 30
base = [3, 6, 0, 1, 3]
scale = [1.5, 2., 2.5, 2., 2.]
data.extend([[base[x] + random.uniform(0., 1.)*scale[x]
for x in xrange(5)] for y in xrange(30)])
colors.extend(['b'] * 30)
parallel_coordinates(data, style=colors).show()
Edit 2:
Here is an example of what comes out of the above code when plotting Fisher's Iris data. It isn't quite as nice as the reference image from Wikipedia, but it is passable if all you have is Matplotlib and you need multi-dimensional plots.
pandas has a parallel coordinates wrapper:
import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
data = pandas.read_csv(r'C:\Python27\Lib\site-packages\pandas\tests\data\iris.csv', sep=',')
parallel_coordinates(data, 'Name')
plt.show()
Source code, how they made it: plotting.py#L494
When answering a related question, I worked out a version using only one subplot (so it can be easily fit together with other plots) and optionally using cubic bezier curves to connect the points. The plot adjusts itself to the desired number of axes.
import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import numpy as np
fig, host = plt.subplots()
# create some dummy data
ynames = ['P1', 'P2', 'P3', 'P4', 'P5']
N1, N2, N3 = 10, 5, 8
N = N1 + N2 + N3
category = np.concatenate([np.full(N1, 1), np.full(N2, 2), np.full(N3, 3)])
y1 = np.random.uniform(0, 10, N) + 7 * category
y2 = np.sin(np.random.uniform(0, np.pi, N)) ** category
y3 = np.random.binomial(300, 1 - category / 10, N)
y4 = np.random.binomial(200, (category / 6) ** 1/3, N)
y5 = np.random.uniform(0, 800, N)
# organize the data
ys = np.dstack([y1, y2, y3, y4, y5])[0]
ymins = ys.min(axis=0)
ymaxs = ys.max(axis=0)
dys = ymaxs - ymins
ymins -= dys * 0.05 # add 5% padding below and above
ymaxs += dys * 0.05
dys = ymaxs - ymins
# transform all data to be compatible with the main axis
zs = np.zeros_like(ys)
zs[:, 0] = ys[:, 0]
zs[:, 1:] = (ys[:, 1:] - ymins[1:]) / dys[1:] * dys[0] + ymins[0]
axes = [host] + [host.twinx() for i in range(ys.shape[1] - 1)]
for i, ax in enumerate(axes):
ax.set_ylim(ymins[i], ymaxs[i])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
if ax != host:
ax.spines['left'].set_visible(False)
ax.yaxis.set_ticks_position('right')
ax.spines["right"].set_position(("axes", i / (ys.shape[1] - 1)))
host.set_xlim(0, ys.shape[1] - 1)
host.set_xticks(range(ys.shape[1]))
host.set_xticklabels(ynames, fontsize=14)
host.tick_params(axis='x', which='major', pad=7)
host.spines['right'].set_visible(False)
host.xaxis.tick_top()
host.set_title('Parallel Coordinates Plot', fontsize=18)
colors = plt.cm.tab10.colors
for j in range(N):
# to just draw straight lines between the axes:
# host.plot(range(ys.shape[1]), zs[j,:], c=colors[(category[j] - 1) % len(colors) ])
# create bezier curves
# for each axis, there will a control vertex at the point itself, one at 1/3rd towards the previous and one
# at one third towards the next axis; the first and last axis have one less control vertex
# x-coordinate of the control vertices: at each integer (for the axes) and two inbetween
# y-coordinate: repeat every point three times, except the first and last only twice
verts = list(zip([x for x in np.linspace(0, len(ys) - 1, len(ys) * 3 - 2, endpoint=True)],
np.repeat(zs[j, :], 3)[1:-1]))
# for x,y in verts: host.plot(x, y, 'go') # to show the control points of the beziers
codes = [Path.MOVETO] + [Path.CURVE4 for _ in range(len(verts) - 1)]
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor='none', lw=1, edgecolor=colors[category[j] - 1])
host.add_patch(patch)
plt.tight_layout()
plt.show()
Here's similar code for the iris data set. The second axis is reversed to avoid some crossing lines.
import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
ynames = iris.feature_names
ys = iris.data
ymins = ys.min(axis=0)
ymaxs = ys.max(axis=0)
dys = ymaxs - ymins
ymins -= dys * 0.05 # add 5% padding below and above
ymaxs += dys * 0.05
ymaxs[1], ymins[1] = ymins[1], ymaxs[1] # reverse axis 1 to have less crossings
dys = ymaxs - ymins
# transform all data to be compatible with the main axis
zs = np.zeros_like(ys)
zs[:, 0] = ys[:, 0]
zs[:, 1:] = (ys[:, 1:] - ymins[1:]) / dys[1:] * dys[0] + ymins[0]
fig, host = plt.subplots(figsize=(10,4))
axes = [host] + [host.twinx() for i in range(ys.shape[1] - 1)]
for i, ax in enumerate(axes):
ax.set_ylim(ymins[i], ymaxs[i])
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
if ax != host:
ax.spines['left'].set_visible(False)
ax.yaxis.set_ticks_position('right')
ax.spines["right"].set_position(("axes", i / (ys.shape[1] - 1)))
host.set_xlim(0, ys.shape[1] - 1)
host.set_xticks(range(ys.shape[1]))
host.set_xticklabels(ynames, fontsize=14)
host.tick_params(axis='x', which='major', pad=7)
host.spines['right'].set_visible(False)
host.xaxis.tick_top()
host.set_title('Parallel Coordinates Plot — Iris', fontsize=18, pad=12)
colors = plt.cm.Set2.colors
legend_handles = [None for _ in iris.target_names]
for j in range(ys.shape[0]):
# create bezier curves
verts = list(zip([x for x in np.linspace(0, len(ys) - 1, len(ys) * 3 - 2, endpoint=True)],
np.repeat(zs[j, :], 3)[1:-1]))
codes = [Path.MOVETO] + [Path.CURVE4 for _ in range(len(verts) - 1)]
path = Path(verts, codes)
patch = patches.PathPatch(path, facecolor='none', lw=2, alpha=0.7, edgecolor=colors[iris.target[j]])
legend_handles[iris.target[j]] = patch
host.add_patch(patch)
host.legend(legend_handles, iris.target_names,
loc='lower center', bbox_to_anchor=(0.5, -0.18),
ncol=len(iris.target_names), fancybox=True, shadow=True)
plt.tight_layout()
plt.show()
I'm sure there is a better way of doing it, but here's a quick-and-dirty one (a really dirty one):
#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
#vectors to plot: 4D for this example
y1=[1,2.3,8.0,2.5]
y2=[1.5,1.7,2.2,2.9]
x=[1,2,3,8] # spines
fig,(ax,ax2,ax3) = plt.subplots(1, 3, sharey=False)
# plot the same on all the subplots
ax.plot(x,y1,'r-', x,y2,'b-')
ax2.plot(x,y1,'r-', x,y2,'b-')
ax3.plot(x,y1,'r-', x,y2,'b-')
# now zoom in each of the subplots
ax.set_xlim([ x[0],x[1]])
ax2.set_xlim([ x[1],x[2]])
ax3.set_xlim([ x[2],x[3]])
# set the x axis ticks
for axx,xx in zip([ax,ax2,ax3],x[:-1]):
axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
ax3.xaxis.set_major_locator(ticker.FixedLocator([x[-2],x[-1]])) # the last one
# EDIT: add the labels to the rightmost spine
for tick in ax3.yaxis.get_major_ticks():
tick.label2On=True
# stack the subplots together
plt.subplots_adjust(wspace=0)
plt.show()
This is essentially based on a (much nicer) one by Joe Kingon, Python/Matplotlib - Is there a way to make a discontinuous axis?. You might also want to have a look at the other answer to the same question.
In this example I don't even attempt at scaling the vertical scales, since it depends on what exactly you are trying to achieve.
EDIT: Here is the result
When using pandas (like suggested by theta), there is no way to scale the axes independently.
The reason you can't find the different vertical axes is because there aren't any. Our parallel coordinates is "faking" the other two axes by just drawing a vertical line and some labels.
https://github.com/pydata/pandas/issues/7083#issuecomment-74253671
I've adapted the #JohanC code to a pandas dataframe and expanded it to also work with categorical variables. The code needs more improving, like being able to put also a numerical variable as the first one in the dataframe, but I think it is nice for now.
# Paths:
path_data = "data/"
# Packages:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from matplotlib.path import Path
import matplotlib.patches as patches
from functools import reduce
# Display options:
pd.set_option("display.width", 1200)
pd.set_option("display.max_columns", 300)
pd.set_option("display.max_rows", 300)
# Dataset:
df = pd.read_csv(path_data + "nasa_exoplanets.csv")
df_varnames = pd.read_csv(path_data + "nasa_exoplanets_var_names.csv")
# Variables (the first variable must be categoric):
my_vars = ["discoverymethod", "pl_orbper", "st_teff", "disc_locale", "sy_gaiamag"]
my_vars_names = reduce(pd.DataFrame.append,
map(lambda i: df_varnames[df_varnames["var"] == i], my_vars))
my_vars_names = my_vars_names["var_name"].values.tolist()
# Adapt the data:
df = df.loc[df["pl_letter"] == "d"]
df_plot = df[my_vars]
df_plot = df_plot.dropna()
df_plot = df_plot.reset_index(drop = True)
# Convert to numeric matrix:
ym = []
dics_vars = []
for v, var in enumerate(my_vars):
if df_plot[var].dtype.kind not in ["i", "u", "f"]:
dic_var = dict([(val, c) for c, val in enumerate(df_plot[var].unique())])
dics_vars += [dic_var]
ym += [[dic_var[i] for i in df_plot[var].tolist()]]
else:
ym += [df_plot[var].tolist()]
ym = np.array(ym).T
# Padding:
ymins = ym.min(axis = 0)
ymaxs = ym.max(axis = 0)
dys = ymaxs - ymins
ymins -= dys*0.05
ymaxs += dys*0.05
# Reverse some axes for better visual:
axes_to_reverse = [0, 1]
for a in axes_to_reverse:
ymaxs[a], ymins[a] = ymins[a], ymaxs[a]
dys = ymaxs - ymins
# Adjust to the main axis:
zs = np.zeros_like(ym)
zs[:, 0] = ym[:, 0]
zs[:, 1:] = (ym[:, 1:] - ymins[1:])/dys[1:]*dys[0] + ymins[0]
# Colors:
n_levels = len(dics_vars[0])
my_colors = ["#F41E1E", "#F4951E", "#F4F01E", "#4EF41E", "#1EF4DC", "#1E3CF4", "#F41EF3"]
cmap = LinearSegmentedColormap.from_list("my_palette", my_colors)
my_palette = [cmap(i/n_levels) for i in np.array(range(n_levels))]
# Plot:
fig, host_ax = plt.subplots(
figsize = (20, 10),
tight_layout = True
)
# Make the axes:
axes = [host_ax] + [host_ax.twinx() for i in range(ym.shape[1] - 1)]
dic_count = 0
for i, ax in enumerate(axes):
ax.set_ylim(
bottom = ymins[i],
top = ymaxs[i]
)
ax.spines.top.set_visible(False)
ax.spines.bottom.set_visible(False)
ax.ticklabel_format(style = 'plain')
if ax != host_ax:
ax.spines.left.set_visible(False)
ax.yaxis.set_ticks_position("right")
ax.spines.right.set_position(
(
"axes",
i/(ym.shape[1] - 1)
)
)
if df_plot.iloc[:, i].dtype.kind not in ["i", "u", "f"]:
dic_var_i = dics_vars[dic_count]
ax.set_yticks(
range(len(dic_var_i))
)
ax.set_yticklabels(
[key_val for key_val in dics_vars[dic_count].keys()]
)
dic_count += 1
host_ax.set_xlim(
left = 0,
right = ym.shape[1] - 1
)
host_ax.set_xticks(
range(ym.shape[1])
)
host_ax.set_xticklabels(
my_vars_names,
fontsize = 14
)
host_ax.tick_params(
axis = "x",
which = "major",
pad = 7
)
# Make the curves:
host_ax.spines.right.set_visible(False)
host_ax.xaxis.tick_top()
for j in range(ym.shape[0]):
verts = list(zip([x for x in np.linspace(0, len(ym) - 1, len(ym)*3 - 2,
endpoint = True)],
np.repeat(zs[j, :], 3)[1: -1]))
codes = [Path.MOVETO] + [Path.CURVE4 for _ in range(len(verts) - 1)]
path = Path(verts, codes)
color_first_cat_var = my_palette[dics_vars[0][df_plot.iloc[j, 0]]]
patch = patches.PathPatch(
path,
facecolor = "none",
lw = 2,
alpha = 0.7,
edgecolor = color_first_cat_var
)
host_ax.add_patch(patch)
plotly has a nice interactive solution called parallel_coordinates which works just fine:
import plotly.express as px
df = px.data.iris()
fig = px.parallel_coordinates(df, color="species_id", labels={"species_id": "Species",
"sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
"petal_width": "Petal Width", "petal_length": "Petal Length", },
color_continuous_scale=px.colors.diverging.Tealrose,
color_continuous_midpoint=2)
fig.show()
I want to plug a beta-released parallel coordinate plotting package called Paxplot which is based on Matplotlib. It uses similar underlying logic to the other answers and extends functionality while maintaining clean usage.
The documentation provides examples of basic usage, advanced usage, and usage with Pandas. As per the figure provided in the original question, I have provided a solution that plots the iris dataset:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
import paxplot
# Import data
iris = load_iris(as_frame=True)
df = pd.DataFrame(
data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target']
)
cols = df.columns
# Create figure
paxfig = paxplot.pax_parallel(n_axes=len(cols))
paxfig.plot(df.to_numpy())
# Add labels
paxfig.set_labels(cols)
# Set ticks
paxfig.set_ticks(
ax_idx=-1,
ticks=[0, 1, 2],
labels=iris.target_names
)
# Add colorbar
color_col = 0
paxfig.add_colorbar(
ax_idx=color_col,
cmap='viridis',
colorbar_kwargs={'label': cols[color_col]}
)
plt.show()
For full disclosure, I created Paxplot and have been developing and maintaining it with some friends. Definitely feel free to reach out if you are interested in contributing!
Best example I've seen thus far is this one
https://python.g-node.org/python-summerschool-2013/_media/wiki/datavis/olympics_vis.py
See the normalised_coordinates function. Not super fast, but works from what I've tried.
normalised_coordinates(['VAL_1', 'VAL_2', 'VAL_3'], np.array([[1230.23, 1500000, 12453.03], [930.23, 140000, 12453.03], [130.23, 120000, 1243.03]]), [1, 2, 1])
Still far from perfect but it works and is relatively short:
import numpy as np
import matplotlib.pyplot as plt
def plot_parallel(data,labels):
data=np.array(data)
x=list(range(len(data[0])))
fig, axis = plt.subplots(1, len(data[0])-1, sharey=False)
for d in data:
for i, a in enumerate(axis):
temp=d[i:i+2].copy()
temp[1]=(temp[1]-np.min(data[:,i+1]))*(np.max(data[:,i])-np.min(data[:,i]))/(np.max(data[:,i+1])-np.min(data[:,i+1]))+np.min(data[:,i])
a.plot(x[i:i+2], temp)
for i, a in enumerate(axis):
a.set_xlim([x[i], x[i+1]])
a.set_xticks([x[i], x[i+1]])
a.set_xticklabels([labels[i], labels[i+1]], minor=False, rotation=45)
a.set_ylim([np.min(data[:,i]),np.max(data[:,i])])
plt.subplots_adjust(wspace=0)
plt.show()
This is a version using TensorBoard, if not strictly need matplotlib figure.
I'm looking around for something works like Visualize the results in TensorBoard's HParams plugin result. Here is a wrapped function just plotting ignoring training in that tutorial, using TensorBoard. The logic is using metrics_name specified key as metrics, using other columns as HParams. For any other detail, refer original tutorial.
import os
import json
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
def tensorboard_parallel_coordinates_plot(dataframe, metrics_name, metrics_display_name=None, skip_columns=[], log_dir='logs/hparam_tuning'):
skip_columns = skip_columns + [metrics_name]
to_hp_discrete = lambda column: hp.HParam(column, hp.Discrete(np.unique(dataframe[column].values).tolist()))
hp_params_dict = {column: to_hp_discrete(column) for column in dataframe.columns if column not in skip_columns}
if dataframe[metrics_name].values.dtype == 'object': # Not numeric
metrics_map = {ii: id for id, ii in enumerate(np.unique(dataframe[metrics_name]))}
description = json.dumps(metrics_map)
else:
metrics_map, description = None, None
METRICS = metrics_name if metrics_display_name is None else metrics_display_name
with tf.summary.create_file_writer(log_dir).as_default():
metrics = [hp.Metric(METRICS, display_name=METRICS, description=description)]
hp.hparams_config(hparams=list(hp_params_dict.values()), metrics=metrics)
for id in dataframe.index:
log = dataframe.iloc[id]
hparams = {hp_unit: log[column] for column, hp_unit in hp_params_dict.items()}
print({hp_unit.name: hparams[hp_unit] for hp_unit in hparams})
run_dir = os.path.join(log_dir, 'run-%d' % id)
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # record the values used in this trial
metric_item = log[metrics_name] if metrics_map is None else metrics_map[log[metrics_name]]
tf.summary.scalar(METRICS, metric_item, step=1)
print()
if metrics_map is not None:
print("metrics_map:", metrics_map)
print("Start tensorboard by: tensorboard --logdir {}".format(log_dir))
Plotting test:
aa = pd.read_csv('https://raw.github.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/iris.csv')
tensorboard_parallel_coordinates_plot(aa, metrics_name="Name", log_dir="logs/iris")
# metrics_map: {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}
# Start tensorboard by: tensorboard --logdir logs/iris
!tensorboard --logdir logs/iris
# TensorBoard 2.8.0 at http://localhost:6006/ (Press CTRL+C to quit)
Open tesnorboard link, default http://localhost:6006/, go to HPARAMS -> PARALLEL COORDINATES VIEW will show the result:
TensorBoard result is interactive. But this is designed for plotting model hyper parameters tuning results, so I think it's not friendly for plotting large dataset.
You have to clean saved data manually if plotting new data in same log_dir directory.
It seems the final metrics item has to be numeric, while other axes don't have to.
fake_data = {
"optimizer": ["sgd", "adam", "adam", "lamb", "lamb", "lamb", "lamb"],
"weight_decay": [0.1, 0.1, 0.2, 0.1, 0.2, 0.2, 0.3],
"rescale_mode": ["tf", "tf", "tf", "tf", "tf", "torch", "torch"],
"accuracy": [78.5, 78.2, 78.8, 79.2, 79.3, 79.5, 79.6],
}
aa = pd.DataFrame(fake_data)
tensorboard_parallel_coordinates_plot(aa, "accuracy", log_dir="logs/fake")
# Start tensorboard by: tensorboard --logdir logs/fake
!tensorboard --logdir logs/fake
# TensorBoard 2.8.0 at http://localhost:6006/ (Press CTRL+C to quit)

Gradient fill underneath each histogram curve - Python

I am trying to incorporate a gradient fill with multiple histograms using seaborn facet grid where the gradient is determined by the spread of values under each curve, not just by a sequence of row or col using hue. There are some links below that partly perform somewhat similar functions in python:
How to fill histogram with gradient color fills a diverging gradient but each histogram is independent of the others so comparison between histograms is somewhat void. Using the figure below each histogram should be relative to the others. Furthermore, it does not use the seaborn facet grid, which is the central question here.
How to generate series of histograms doesn't plot histograms. It just fills the area under a curve.
I've found a few images displaying what I'm hoping to execute but they all seem to be generated in R with nothing in python. My assumption is the functionality doesn't exist as yet using seaborn and I'll have to use R but I think this will be applicable for many users.
Using the code below, we can change adjust the gradient using hue to either row or col but this doesn't consider the area under the curve.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(120)
g = np.tile(list("ABCD"), 30)
h = np.tile(list("XYZ"), 40)
# Generate df
df = pd.DataFrame(dict(x = x, g = g, h = h))
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(4, rot = -0.25, light = 0.7)
g = sns.FacetGrid(df, col = 'h', hue = 'h', row = 'g', aspect = 3, height= 1, palette = pal)
# Draw the densities
g = g.map(sns.kdeplot, 'x', shade = True, alpha = 0.8, lw = 1, bw = 0.8)
g = g.map(sns.kdeplot, 'x', color= 'w', lw = 1, bw = 0.8)
g = g.map(plt.axhline, y = 0, lw = 1)
# Adjust title and axis labels directly
g.axes[0,0].set_ylabel('L 1')
g.axes[1,0].set_ylabel('L 2')
g.axes[2,0].set_ylabel('L 3')
g.axes[3,0].set_ylabel('L 4')
g.axes[0,0].set_title('Top 1')
g.axes[0,1].set_title('Top 2')
g.axes[0,2].set_title('Top 3')
g.axes[1,0].set_title('')
g.axes[1,1].set_title('')
g.axes[1,2].set_title('')
g.axes[2,0].set_title('')
g.axes[2,1].set_title('')
g.axes[2,2].set_title('')
g.axes[3,0].set_title('')
g.axes[3,1].set_title('')
g.axes[3,2].set_title('')
g.set_axis_labels(x_var = 'Total Amount')
g.set(yticks = [])
Out:
There is a gradient that can be adjusted for row or col but I'm hoping to pass this gradient to the area underneath each histogram curve. Similar to the figure above. So the area underneath each curve would be lighter when lower than zero and darker when higher than zero.
Even adjusting the area under the curve to the median value may suffice.
You can create an image gradient, and use the histogram itself as a clipping path for the image, so that the only visible part is the part under the curve.
As such, you can play around with any cmaps and normalization that are available when creating images.
Here is a quick example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(120)
g = np.tile(list("ABCD"), 30)
h = np.tile(list("XYZ"), 40)
# Generate df
df = pd.DataFrame(dict(x = x, g = g, h = h))
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(4, rot = -0.25, light = 0.7)
g = sns.FacetGrid(df, col = 'h', hue = 'h', row = 'g', aspect = 3, height= 1, palette = pal)
# Draw the densities
g = g.map(sns.kdeplot, 'x', shade = True, alpha = 0.8, lw = 1, bw = 0.8)
g = g.map(sns.kdeplot, 'x', color= 'w', lw = 1, bw = 0.8)
g = g.map(plt.axhline, y = 0, lw = 1)
for ax in g.axes.flat:
ax.set_title("")
# Adjust title and axis labels directly
for i in range(4):
g.axes[i,0].set_ylabel('L {:d}'.format(i))
for i in range(3):
g.axes[0,i].set_title('Top {:d}'.format(i))
# generate a gradient
cmap = 'coolwarm'
x = np.linspace(0,1,100)
for ax in g.axes.flat:
im = ax.imshow(np.vstack([x,x]), aspect='auto', extent=[*ax.get_xlim(), *ax.get_ylim()], cmap=cmap, zorder=10)
path = ax.collections[0].get_paths()[0]
patch = matplotlib.patches.PathPatch(path, transform=ax.transData)
im.set_clip_path(patch)
g.set_axis_labels(x_var = 'Total Amount')
g.set(yticks = [])

How to use dates in this code for y axis?

The person who made this had used dates in the second graph. I was wondering how would dates be used with the scipy.signal.argrelextrema function.
With this code it doesn't do anything it prints out an empty array for peak_x and peak_y:
data_y = np.array('2015-07-04', dtype=np.datetime64) + np.arange(25)
Here's the link for the original code:
https://openwritings.net/pg/python/python-find-peaks-and-valleys-chart-using-scipysignalargrelextrema
import matplotlib
matplotlib.use('Agg') # Bypass the need to install Tkinter GUI framework
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
# Generate random data.
data_x = np.arange(start = 0, stop = 25, step = 1, dtype='int')
data_y = np.array('2015-07-04', dtype=np.datetime64) + np.arange(25) #edited part
# Find peaks(max).
peak_indexes = signal.argrelextrema(data_y, np.greater)
peak_indexes = peak_indexes[0]
# Find valleys(min).
valley_indexes = signal.argrelextrema(data_y, np.less)
valley_indexes = valley_indexes[0]
# Plot main graph.
(fig, ax) = plt.subplots()
ax.plot(data_x, data_y)
# Plot peaks.
peak_x = peak_indexes
peak_y = data_y[peak_indexes]
ax.plot(peak_x, peak_y, marker='o', linestyle='dashed', color='green', label="Peaks")
print(peak_x,peak_y)
# Plot valleys.
valley_x = valley_indexes
valley_y = data_y[valley_indexes]
ax.plot(valley_x, valley_y, marker='o', linestyle='dashed', color='red', label="Valleys")
# Save graph to file.
plt.title('Find peaks and valleys using argrelextrema()')
plt.legend(loc='best')
plt.savefig('argrelextrema.png')
Here's the example how it would work:
You're going to want to use the xticks method. See below:
import matplotlib.pyplot as plt
names = [str(i) for i in range(20)]
x_data = [x for x in range(20)]
y_data = [x for x in range(20)]
plt.plot(x_data, y_data)
plt.xticks(x_data, label=names)
plt.show()
What this does is use an integer between 1-19 cast as a string as the label for the axis X.
Except in your case you want to swap out the names for datatime objects cast to strings. For the xticks, the x_data element prescribes where the ticks will be. You may use any interval of points so long as they are within the bounds of the xdata.
In your case, replace:
data_y = np.array('2015-07-04', dtype=np.datetime64) + np.arange(25)
with
data_y_ticks = np.array('2015-07-04', dtype=np.datetime64) + np.arange(25)
data_y = [i for i, _ in enumerate(data_y_ticks.tolist())]
then plot as follows:
plt.plot(data_y, x_data)
plt.xticks(data_y, label=data_y_ticks)
plt.show()
Just a heads-up, your X and Y axis names are flipped in your code. I did not correct this in my example, however did interchange their locations in the plot to make the plot make sense.

python: scatter plot with median and CI

I am looking for a python plot on the lines of http://www.r-bloggers.com/visually-weighted-watercolor-plots-new-variants-please-vote/
This gives the equivalent of the standard deviation bands:
# generate random variables
x,y = generate_random()
# bin the values and determine the envelopes
df = bin_by(x, y, nbins=25, bins = None)
###
# Plot 1
###
# determine the colors
cols = ['#EE7550', '#F19463', '#F6B176']
with plt.style.context('fivethirtyeight'):
# plot the 3rd stdv
plt.fill_between(df.x, df['5th'], df['95th'], alpha=0.7,color = cols[2])
plt.fill_between(df.x, df['10th'], df['90th'], alpha=0.7,color = cols[1])
plt.fill_between(df.x, df['25th'], df['75th'], alpha=0.7,color = cols[0])
# plt the line
plt.plot(df.x, df['median'], color = '1', alpha = 0.7, linewidth = 1)
# plot the points
plt.scatter(x, y, facecolors='white', edgecolors='0', s = 5, lw = 0.7)
plt.savefig('fig1.png', facecolor='white', edgecolor='none')
plt.show()
def bin_by(x, y, nbins=30, bins = None):
"""
Divide the x axis into sections and return groups of y based on its x value
"""
if bins is None:
bins = np.linspace(x.min(), x.max(), nbins)
bin_space = (bins[-1] - bins[0])/(len(bins)-1)/2
indicies = np.digitize(x, bins + bin_space)
Bit of a discussion and link to my Github from my blog
cut-paste from my larger piece of code. It does not give what I want. I am posting per Evert's suggestion
fig = plt.figure(figsize=(8, 8))
plt.plot(xlist, ylist, 'b,')
plt.plot([0.0,0.8],[0.0,0.8],'y-')
data2d=zip(xlist,ylist)
bins = np.linspace(0.0, 0.2, 21)
medianlist=binpercentile(data2d,bins)
c10list=binpercentile(data2d,bins,0.1)
c90list=binpercentile(data2d,bins,0.9)
centerbins=[(x+y)/2.0 for x,y in zip(bins[:-1],bins[1:])]
centerbins.insert(0,0)
medianlist.insert(0,0)
c10list.insert(0,0)
c90list.insert(0,0)
plt.plot(centerbins,c10list,'r--')
plt.plot(centerbins,c90list,'r--')
plt.plot(centerbins,medianlist,'r-')
imagefilename='%s.%s'%('.'.join(infile.split('.')[0:-1]),'diffmed.pdf')
plt.savefig(imagefilename)

Categories

Resources