Plotting discrete X/Y values over Seaborn heatmap - python

I'm trying to mark a specific X/Y coordinate on a Seaborn heatmap. The output is plotting the correct X coordinate but is plotting the Y as zero/axis minimum. Code and its heatmap output is shown below.
speeds = pd.read_csv('pytest20140730.csv')
speedspiv = pd.pivot_table(speeds, values = 'speed', index = 'road_order',
columns = 'tod_value', aggfunc = np.mean)
speedspiv = speedspiv.sort_index(axis=0, ascending = False)
plt.subplots(figsize=(20,15))
ax = sns.heatmap(speedspiv)
ax.scatter(414, 0.244444444, marker='*', s=100, color='yellow')
See yellow star on x-axis of the following image
What's up with the Y axis value not plotting correctly?

A seaborn heatmap produces a plot with x- and y-axes, which are index based, not data based. That means, even if labeled with 382...453 the y-axis goes from 0 to 71.
To test this, just print the result of ax.get_xlim() and ax.get_ylim().
For your Problem to put a marker at a meaningful place in the heatmap, this means, that you'll have to compute the correct (fractional) indices of the x- and y- value of the marker and use these in the scatter plot:
x_vals = np.linspace(0, 1, 1500) # I don't know the real range of you x-data
x_idx = np.interp(.24, x_vals, range(len(x_vals)))
y_vals = np.arange(382, 453)
y_idx = np.interp(414, y_vals, range(len(y_vals)))
In: x_idx, y_idx
Out: (359.76, 32.0)

Related

Python: scatter plot with non-linear x axis

I have data with lots of x values around zero and only a few as you go up to around 950,
I want to create a plot with a non-linear x axis so that the relationship can be seen in a 'straight line' form. Like seen in this example,
I have tried using plt.xscale('log') but it does not achieve what I want.
I have not been able to use the log scale function with a scatter plot as it then only shows 3 values rather than the thousands that exist.
I have tried to work around it using
plt.plot(retper, aep_NW[y], marker='o', linewidth=0)
to replicate the scatter function which plots but does not show what I want.
plt.figure(1)
plt.scatter(rp,aep,label="SSI sum")
plt.show()
Image 3:
plt.figure(3)
plt.scatter(rp, aep)
plt.xscale('log')
plt.show()
Image 4:
plt.figure(4)
plt.plot(rp, aep, marker='o', linewidth=0)
plt.xscale('log')
plt.show()
ADDITION:
Hi thank you for the response.
I think you are right that my x axis is truncated but I'm not sure why or how...
I'm not really sure what to post code wise as the data is all large and coming from a server so can't really give you the data to see it with.
Basically aep_NW is a one dimensional array with 951 elements, values from 0-~140, with most values being small and only a few larger values. The data represents a storm severity index for 951 years.
Then I want the x axis to be the return period for these values, so basically I made a rp array, of the same size, which is given values from 951 down decreasing my a half each time.
I then sort the aep_NW values from lowest to highest with the highest value being associated with the largest return value (951), then the second highest aep_NW value associated with the second largest return period value (475.5) ect.
So then when I plot it I need the x axis scale to be similar to the example you showed above or the first image I attatched originally.
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1]/2
i = i - 1
y = np.argsort(aep_NW)
fig, ax = plt.subplots()
ax.scatter(rp,aep_NW[y],label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
plt.title("AEP for NW Europe: total loss per entire extended winter season")
plt.show()
It looks like in your "Image 3" the x axis is truncated, so that you don't see the data you are interested in. It appears this is due to there being 0's in your 'rp' array. I updated the examples to show the error you are seeing, one way to exclude the zeros, and one way to clip them and show them on a different scale.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
numseas = np.logspace(-5, 3, n)
aep_NW = np.linspace(0, 140, n)
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1] /2
i = i - 1
y = np.argsort(aep_NW)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.scatter(rp, aep_NW[y], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
ax = axes[1]
rp = np.array(rp)[y]
mask = rp > 0
ax.scatter(rp[mask], aep_NW[y][mask], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period (0 values excluded)")
ax = axes[2]
log2_clipped_rp = np.log2(rp.clip(2**-100, None))[y]
ax.scatter(log2_clipped_rp, aep_NW[y], label="SSI sum")
xticks = list(range(-110, 11, 20))
xticklabels = [f'$2^{{{i}}}$' for i in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.set_xlabel("log$_2$ Return period (values clipped to 2$^{-100}$)")
plt.show()

Python pccolor plot with two with two x axis description

The dataset I'm working with are wind velocity observations from a satelite. I have got 2d array of the dimension (192,24) where 24 is the number of vertical range bins: wind_velocity(192,24), latitude(192,24), longitude(192,24), altitude(192,24),...
To get an overview I would like to plot the data. I think the right tool for plotting 2d arrays with python is pccolor. The aim is to get a figure with latitude and longitude on the x-axis, the height on the y-axis and colored the wind data(wind_velocity variable name is hlos here)
wind_min, wind_max = hlos.min(), hlos.max()
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
c1 = ax1.pcolor(lats, z, hlos, cmap='RdBu' , vmin = wind_min, vmax = wind_max)
ax1.set_title('Rayleigh HLOS')
ax1.axis([lats.min(), lats.max(), z.min(), z.max()])
fig.colorbar(c1, ax=ax1)
How can I add the longitude values so that it looks like the subplot down left on this example: https://www.ecmwf.int/sites/default/files/Aeolus-Blog-image1-1000pix.jpg ?

Plotting two y axes on the same scale on an implot (matplotlib)

I have an image plot, representing a matrix, with two axes. The y axis on th left of my image plot represents the rows and the x axis represents the column, while each grid cell represents the value as a function of x and y.
I'd like to plot my y-axis in another form on the right side of my image plot, which takes on much smaller values, but should still be in the same positions as the y-axis on the left, as the values are just different forms of one another. The problem is that when I use fig.twinx()and go to plot the y axis, it doesn't even show up! Does anyone know what's gong on? Thanks.
Current code:
# Setup the figure
fig5 = pyplot.figure(5, figsize=(10,9), facecolor='white')
pyplot.gcf().clear()
# plt.rc('xtick', labelsize=20)
# plt.rc('ytick', labelsize=20)
plt.rcParams.update({'font.size': 18})
fig5ax = pyplot.axes()
# Code to calculate extent based on min/max range and my x values
implot = pyplot.imshow(valgrid, extent=MyExtent , aspect='auto', vmin = myVmin, vmax = myVmax)
fig5ax.yaxis.set_major_formatter(plt.ticker.FixedFormatter([str(x) for x in ranges]))
fig5ax.yaxis.set_major_locator(plt.ticker.FixedLocator(ranges))
fig5Ax2 = fig5ax.twinx()
fig5Ax2.yaxis.set_major_formatter(plt.ticker.FixedFormatter([str(x) for x in time]))
# Setting locater the same as ranges, because I want them to line up
# as they are different forms of the same y value
fig5Ax2.yaxis.set_major_locator(plt.ticker.FixedLocator(ranges))
pyplot.show()
The answer was:
fig5Ax2.yaxis.set_view_interval(minRange, maxRange)

Python Matplotlib plot with x-axis labels correctly aligned and matching colors for series and errors

I am looking to plot some data from Pandas Dataframes using Matplotlib. I need to have control over the various properties of the plot. I am having difficulty with 2 properties:
a. Correct spacing for custom x-axis lables
b. How to plot a data series and its error bars with the same color
I have the following Dataframes in Python Pandas:
x = pd.DataFrame(np.random.rand(4,5), columns = list('ABCDE'))
y = pd.DataFrame(np.random.rand(4,5), columns = list('ABCDE'))
x_err = pd.DataFrame(np.random.rand(4,5), columns = list('ABCDE'))
y_err = pd.DataFrame(np.random.rand(4,5), columns = list('ABCDE'))
x.insert(0,'Name',['Temp_C','Pressure_Rear','Barometric_High','Facility_depletion_rate']
y.insert(0,'Name',['Temp_C','Pressure_Rear','Barometric_High','Facility_depletion_rate']
x_err.insert(0,'Name',['Temp_C','Pressure_Rear','Barometric_High','Facility_depletion_rate']
y_err.insert(0,'Name',['Temp_C','Pressure_Rear','Barometric_High','Facility_depletion_rate']
In the dataframe x, each column gives the x co-ordinates. In the dataframe y, each column gives the corresponding y co-ordinates.
I am looking to plot y vs x in a scatter plot, where the points are connected y lines like this plot: http://40.media.tumblr.com/2bf0909609003f549e0d03406dc5a2dd/tumblr_mik00mS7rv1s6xcwuo1_1280.png
I also need to put error bars (x_err is the x-error and y_err is the y_error)
In each of these dataframes, there are 5 columns. The column headers must be in the legend and the x-axis needs to have labels.
The x-axis labels should be 'Temp_C','Pressure_Rear','Barometric_High' and 'Facility_depletion_rate'.
Here is the code that I have and a sample output is shown at the end of this post:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pylab as pl
fig = plt.figure(1)
ax = fig.add_subplot(111)
fig.set_facecolor('white')
for i in range(1,len(x.columns.tolist())):
ax.errorbar(x.iloc[:,i], y.iloc[:,i], yerr=y_err.iloc[:,i], xerr=x_err.iloc[:,i], fmt='o') #generate plot for each set of errors (x and y) in the x-variable column list
plt.plot(x.iloc[:,i], y.iloc[:,i], linestyle='-', marker='o', linewidth=1.5, label = x.columns.tolist()[i])
ttl = 'Room conditions - tracking monitor'
xtitle = 'Type of reading'
ytitle = 'Reading value (units)'
title_font = {'fontname':'Times New Roman', 'size':'28', 'color':'black', 'weight':'bold','verticalalignment':'bottom'} #Bottom vertical alignment for more space
axis_font = {'fontname':'Constantia', 'size':'26'}
axis_tick_font = {'fontname':'Times New Roman', 'size':'20'}
#plt.legend(loc='upper left')
ax.set_xticklabels(x.Name.tolist())
ax.tick_params(axis='x', pad=10)
plt.title(ttl, **title_font)
plt.xlabel(xtitle,**axis_font)
plt.ylabel(ytitle,**axis_font)
plt.xticks(**axis_tick_font)
plt.yticks(**axis_tick_font)
params = {'legend.fontsize': 20} #set legend properties
pl.rcParams.update(params)
plt.legend(loc = 1, prop={'family':title_font['fontname']}, numpoints = 1)
plt.show()
Here are the two problems I am having and questions about these:
The x-axis labels are completely mis-aligned. Is there a way to
print the labels such that:
a. they are aligned with the data points. By this I mean that the x-axis labels should be aligned with the x co-ordinates which are given in each column of the dataframe x.i.e. x.iloc[:,1], y.iloc[:,1] are the x and y co-ordinates for the dataset A, x.iloc[:,2], y.iloc[:,2] are the x and y co-ordinates for the dataset B. I need the x-axis labels to ONLY be aligned with x.iloc[:,1], y.iloc[:,1].
b. the labels are equally spaced from eachother and from the left and right border
The color of the error bars is not the same as the color of the data series. For example, if data series 'A' is blue, is there a way to force errorbars for 'A' to also be blue?

Plot histogram normalized by fixed parameter

I need to plot a plot a normalized histogram (by normalized I mean divided by a fixed value) using the histtype='step' style.
The issue is that plot.bar() doesn't seem to support that style and if I use instead plot.hist() which does, I can't (or at least don't know how) plot the normalized histogram.
Here's a MWE of what I mean:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=10., high=20., size=(200,))
# Generate data.
x1 = rand_data()
# Define histogram params.
binwidth = 0.25
x_min, x_max = x1.min(), x1.max()
bin_n = np.arange(int(x_min), int(x_max + binwidth), binwidth)
# Obtain histogram.
hist1, edges1 = np.histogram(x1, bins=bin_n)
# Normalization parameter.
param = 5.
# Plot histogram normalized by the parameter defined above.
plt.ylim(0, 3)
plt.bar(edges1[:-1], hist1 / param, width=binwidth, color='none', edgecolor='r')
plt.show()
(notice the normalization: hist1 / param) which produces this:
I can generate a histtype='step' histogram using:
plt.hist(x1, bins=bin_n, histtype='step', color='r')
and get:
but then it wouldn't be normalized by the param value.
The step plot will generate the appearance that you want from a set of bins and the count (or normalized count) in those bins. Here I've used plt.hist to get the counts, then plot them, with the counts normalized. It's necessary to duplicate the first entry in order to get it to actually have a line there.
(a,b,c) = plt.hist(x1, bins=bin_n, histtype='step', color='r')
a = np.append(a[0],a[:])
plt.close()
step(b,a/param,color='r')
This is not quite right, because it doesn't finish the plot correctly. the end of the line is hanging in free space rather than dropping down the x axis.
you can fix that by adding a 0 to the end of 'a' and one more bin point to b
a=np.append(a[:],0)
b=np.append(b,(2*b[-1]-b[-2]))
step(b,a/param,color='r')
lastly, the ax.step mentioned would be used if you had used
fig, ax = plt.subplots()
to give you access to the figure and axis directly. For examples, see http://matplotlib.org/examples/ticks_and_spines/spines_demo_bounds.html
Based on tcaswell's comment (use step) I've developed my own answer. Notice that I need to add elements to both the x (one zero element at the beginning of the array) and y arrays (one zero element at the beginning and another at the end of the array) so that step will plot the vertical lines at the beginning and the end of the bars.
Here's the code:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=10., high=20., size=(5000,))
# Generate data.
x1 = rand_data()
# Define histogram params.
binwidth = 0.25
x_min, x_max = x1.min(), x1.max()
bin_n = np.arange(int(x_min), int(x_max + binwidth), binwidth)
# Obtain histogram.
hist1, edges1 = np.histogram(x1, bins=bin_n)
# Normalization parameter.
param = 5.
# Create arrays adding elements so plt.bar will plot the first and last
# vertical bars.
x2 = np.concatenate((np.array([0.]), edges1))
y2 = np.concatenate((np.array([0.]), (hist1 / param), np.array([0.])))
# Plot histogram normalized by the parameter defined above.
plt.xlim(min(edges1) - (min(edges1) / 10.), max(edges1) + (min(edges1) / 10.))
plt.bar(x2, y2, width=binwidth, color='none', edgecolor='b')
plt.step(x2, y2, where='post', color='r', ls='--')
plt.show()
and here's the result:
The red lines generated by step are equal to those blue lines generated by bar as can be seen.

Categories

Resources