Is there a better way of reproducing matplotlibs scatter_matrix (plot all data against all data) in Bokeh than the code below:
defaults.width = 100
defaults.height = 100
scatter_plots = []
y_max = len(dataset.columns)-1
for i, y_col in enumerate(dataset):
for j, x_col in enumerate(dataset):
df = pd.DataFrame({x_col: dataset[x_col].tolist(), y_col: dataset[y_col].tolist()})
p = Scatter(df, x=x_col, y=y_col)
if j > 0:
p.yaxis.axis_label = ""
p.yaxis.visible = False
if i < y_max:
p.xaxis.axis_label = ""
p.xaxis.visible = False
scatter_plots.append(p)
grid = gridplot(scatter_plots, ncols = len(dataset.columns))
show(grid)
In particular I would like to be able to zoom and pan the entire grid of plots as a single entity rather than zoom/pan the subplot the mouse is hovering over.
In general, to have linked panning/zooming, you share the ranges that you want to be linked between plots. This is described here in the Users Guide:
https://docs.bokeh.org/en/latest/docs/user_guide/interaction/linking.html
You can also check out this linked SPLOM example:
https://github.com/bokeh/bokeh/blob/master/examples/models/iris_splom.py
That example is longer/more verbose because it uses the low level bokeh.models API. The important part is where it re-uses the ranges xdr and ydr on ever plot that gets created.
In your particular case, since high level charts don't accept range parameters up front (IIRC), I think you'll have to fix up the charts "after the fact", so maybe something like:
xr = scatter_plots[0].x_range
yr = scatter_plots[0].y_range
for p in scatter_plots:
p.x_range = xr
p.y_range = yr
In case it is useful, I faced the same problem. In actual fact you don't want all the axis linked - but rather each rows y-axis linked and each columns x-axis linked. I'm surprised that this isn't a built in bokeh feature. even iris the example gets this wrong:
http://docs.bokeh.org/en/latest/docs/gallery/iris_splom.html
Here's a code snippet I used:
def scatter_matrix(dataset):
dataset_source = ColumnDataSource(data=dataset)
scatter_plots = []
y_max = len(dataset.columns)-1
for i, y_col in enumerate(dataset.columns):
for j, x_col in enumerate(dataset.columns):
p = figure(plot_width=100, plot_height=100, x_axis_label=x_col, y_axis_label=y_col)
p.circle(source=dataset_source,x=x_col, y=y_col, fill_alpha=0.3, line_alpha=0.3, size=3)
if j > 0:
p.yaxis.axis_label = ""
p.yaxis.visible = False
p.y_range = linked_y_range
else:
linked_y_range = p.y_range
p.plot_width=160
if i < y_max:
p.xaxis.axis_label = ""
p.xaxis.visible = False
else:
p.plot_height=140
if i > 0:
p.x_range = scatter_plots[j].x_range
scatter_plots.append(p)
grid = gridplot(scatter_plots, ncols = len(dataset.columns))
show(grid)
Related
I have plotted a box and whiskers plot for my data using the following code:
def make_labels(ax, boxplot):
iqr = boxplot['boxes'][0]
caps = boxplot['caps']
med = boxplot['medians'][0]
fly = boxplot['fliers'][0]
xpos = med.get_xdata()
xoff = 0.1 * (xpos[1] - xpos[0])
xlabel = xpos[1] + xoff
median = med.get_ydata()[1]
pc25 = iqr.get_ydata().min()
pc75 = iqr.get_ydata().max()
capbottom = caps[0].get_ydata()[0]
captop = caps[1].get_ydata()[0]
ax.text(xlabel, median, 'Median = {:6.3g}'.format(median), va='center')
ax.text(xlabel, pc25, '25th percentile = {:6.3g}'.format(pc25), va='center')
ax.text(xlabel, pc75, '75th percentile = {:6.3g}'.format(pc75), va='center')
ax.text(xlabel, capbottom, 'Bottom cap = {:6.3g}'.format(capbottom), va='center')
ax.text(xlabel, captop, 'Top cap = {:6.3g}'.format(captop), va='center')
for flier in fly.get_ydata():
ax.text(1 + xoff, flier, 'Flier = {:6.3g}'.format(flier), va='center')
and this gives me the following graph:
Now, what I want to do is to grab all the 'Flier' points that we can see in the graph and make it into a list and for that I did the following:
fliers_data = []
def boxplots(boxplot):
iqr = boxplot['boxes'][0]
fly = boxplot['fliers'][0]
pc25 = iqr.get_ydata().min()
pc75 = iqr.get_ydata().max()
inter_quart_range = pc75 - pc25
max_q3 = pc75 + 1.5*inter_quart_range
min_q1 = pc25 - 1.5*inter_quart_range
for flier in fly.get_ydata():
if (flier > max_q3):
fliers_data.append(flier)
elif (flier < min_q1):
fliers_data.append(flier)
Now, I have 2 queries:
In both functions, there are a few lines that are similar. Is there a way I can define them once and use them in both the functions?
Can the second function be edited or neatened in a more efficient way?
I think mostly its quite neat, the only thing I can suggest is spaces between different parts of the functions and maybe some quotes to tell someone reading what each part does?
Something like this, for example:
def myfunction(x):
# checking if x equals 10
if x == 10:
return True
# if equals 0 return string
elif x == 0:
return "equals zero"
# else return false
else:
return False
Also, I think you can locate any variables that are the same outside and before both functions (say, at the very start of your code) they should still be accessible in the functions.
EDIT: I figured out that the Problem always occours if one tries to plot to two different lists of figures. Does that mean that one can not do plots to different figure-lists in the same loop? See latest code for much simpler sample of a problem.
I try to analyze a complex set of data which consists basically about measurements of electric devices under different conditions. Hence, the code is a bit more complex but I tried to strip it down to a working example - however it is still pretty long. Hence, let me explain what you see: You see 3 classes with Transistor representing an electronic device. It's attribute Y represents the measurement data - consisting of 2 sets of measurements. Each Transistor belongs to a group - 2 in this example. And some groups belong to the same series - one series where both groups are included in this example.
The aim is now to plot all measurement data for each Transistor (not shown), then to also plot all data belonging to the same group in one plot each and all data of the same series to one plot. In order to program it in an efficent way without having a lot of loops my idea was to use the object orientated nature of matplotlib - I will have figures and subplots for each level of plotting (initialized in initGrpPlt and initSeriesPlt) which are then filled with only one loop over all Transistors (in MainPlt: toGPlt and toSPlt). In the end it should only be printed / saved to a file / whatever (PltGrp and PltSeries).
The Problem: Even though I specify where to plot, python plots the series plots into the group plots. You can check this yourself by running the code with the line 'toSPlt(trans,j)' and without. I have no clue why python does this because in the function toSPlt I explicetly say that python should use the subplots from the series-subplot-list. Would anyone have an idea to why this is like this and how to solve this problem in an elegent way?
Read the code from the bottom to the top, that should help with understanding.
Kind regards
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
maxNrVdrain = 2
X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
A = [[1*np.cos(X),2*np.cos(X),3*np.cos(X),4*np.cos(X)],[1*np.tan(X),2*np.tan(X),3*np.tan(X),4*np.tan(X)]]
B = [[2* np.sin(X),4* np.sin(X),6* np.sin(X),8* np.sin(X)],[2*np.cos(X),4*np.cos(X),6*np.cos(X),8*np.cos(X)]]
class Transistor(object):
_TransRegistry = []
def __init__(self,y1,y2):
self._TransRegistry.append(self)
self.X = X
self.Y = [y1,y2]
self.group = ''
class Groups():
_GroupRegistry = []
def __init__(self,trans):
self._GroupRegistry.append(self)
self.transistors = [trans]
self.figlist = []
self.axlist = []
class Series():
_SeriesRegistry = []
def __init__(self,group):
self._SeriesRegistry.append(self)
self.groups = [group]
self.figlist = []
self.axlist = []
def initGrpPlt():
for group in Groups._GroupRegistry:
for j in range(maxNrVdrain):
group.figlist.append(plt.figure(j))
group.axlist.append(group.figlist[j].add_subplot(111))
return
def initSeriesPlt():
for series in Series._SeriesRegistry:
for j in range(maxNrVdrain):
series.figlist.append(plt.figure(j))
series.axlist.append(series.figlist[j].add_subplot(111))
return
def toGPlt(trans,j):
colour = cm.rainbow(np.linspace(0, 1, 4))
group = trans.group
group.axlist[j].plot(trans.X,trans.Y[j], color=colour[group.transistors.index(trans)], linewidth=1.5, linestyle="-")
return
def toSPlt(trans,j):
colour = cm.rainbow(np.linspace(0, 1, 2))
series = Series._SeriesRegistry[0]
group = trans.group
if group.transistors.index(trans) == 0:
series.axlist[j].plot(trans.X,trans.Y[j],color=colour[series.groups.index(group)], linewidth=1.5, linestyle="-", label = 'T = nan, RH = nan' )
else:
series.axlist[j].plot(trans.X,trans.Y[j],color=colour[series.groups.index(group)], linewidth=1.5, linestyle="-")
return
def PltGrp(group,j):
ax = group.axlist[j]
ax.set_title('Test Grp')
return
def PltSeries(series,j):
ax = series.axlist[j]
ax.legend(loc='upper right', frameon=False)
ax.set_title('Test Series')
return
def MainPlt():
initGrpPlt()
initSeriesPlt()
for trans in Transistor._TransRegistry:
for j in range(maxNrVdrain):
toGPlt(trans,j)
toSPlt(trans,j)#plots to group plot for some reason
for j in range(maxNrVdrain):
for group in Groups._GroupRegistry:
PltGrp(group,j)
plt.show()
return
def Init():
for j in range(4):
trans = Transistor(A[0][j],A[1][j])
if j == 0:
Groups(trans)
else:
Groups._GroupRegistry[0].transistors.append(trans)
trans.group = Groups._GroupRegistry[0]
Series(Groups._GroupRegistry[0])
for j in range(4):
trans = Transistor(B[0][j],B[1][j])
if j == 0:
Groups(trans)
else:
Groups._GroupRegistry[1].transistors.append(trans)
trans.group = Groups._GroupRegistry[1]
Series._SeriesRegistry[0].groups.append(Groups._GroupRegistry[1])
return
def main():
Init()
MainPlt()
return
main()
latest example that does not work:
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
X = np.linspace(-np.pi, np.pi, 256,endpoint=True)
Y1 = np.cos(X)
Y2 = np.sin(X)
figlist1 = []
figlist2 = []
axlist1 = []
axlist2 = []
for j in range(4):
figlist1.append(plt.figure(j))
axlist1.append(figlist1[j].add_subplot(111))
figlist2.append(plt.figure(j))#this should be a new set of figures!
axlist2.append(figlist2[j].add_subplot(111))
colour = cm.rainbow(np.linspace(0, 1, 4))
axlist1[j].plot(X,j*Y1, color=colour[j], linewidth=1.5, linestyle="-")
axlist1[j].set_title('Test Grp 1')
colour = cm.rainbow(np.linspace(0, 1, 4))
axlist2[j].plot(X,j*Y2, color=colour[int(j/2)], linewidth=1.5, linestyle="-")
axlist2[j].set_title('Test Grp 2')
plt.show()
Ok, stupid mistake if one thinks of the Background but maybe someone has a similar Problem and is unable to see the cause as I was first. So here is the solution:
The Problem is that the Name of the listobjects like figlist1[j] do not define the figure - they are just pointers to the actual figure object. and if such an object is created by plt.figure(j) one has to make sure that j is different for each figure - hence, in a Loop where multiple figures shall be initialized one Needs to somehow Change the number of the figure or the first object will be overwritten. Hope that helps! Cheers.
I really like this python example:https://plot.ly/python/distplot/ scroll to Plot Multiple Datasets. I would expect the exact same thing is available for R, but it's not documented. Does this mean it's not possible? I came across this example https://community.plot.ly/t/r-plotly-overlay-density-histogram/640/4 which I find far less nice.
This doesn't work but would give an idea about the data I use.
# Add histogram data
x1 = data.table(a=rnorm(n = 200,mean = 0,sd = .1), by='Group1')
x2 = data.table(a=rnorm(n = 200,mean = 1,sd = .15), by='Group2')
x3 = data.table(a=rnorm(n = 200,mean = 2,sd = .2), by='Group3')
x4 = data.table(a=rnorm(n = 200,mean = 3,sd = .25), by='Group4')
agg <- rbind(x1,x2,x3,x4)
plot_ly(data = agg, type = "histogram",histnorm, name = "Histogram",group_by='by')
plot_ly(data = agg, type = "density",histnorm, name = "Density",group_by='by')
I'm not entirely sure which critical element you are missing in R, but here is a plotly-based density plus rug plot example based on your sample data.
This is the static ggplot version.
require(ggplot2);
gg <- ggplot(agg, aes(x = a, colour = by)) + geom_density() + geom_rug();
And the interactive ggplotlyed version including screenshot.
require(plotly);
ggplotly(gg);
You can also add a histogram with e.g.
gg + geom_histogram(aes(y = ..density.., fill = by), alpha = 0.2, bins = 50)
I’m trying to plot data an in order to check my code, I’m making a comparison of the resulting plots with what has already been generated with Matlab. I am encountering several issues however with this:
Generally, the parsing of RINEX files works, and the general pattern of the presentation of the data looks similar to that the Matlab scripts plotted. However there are small deviations in data that should become apparent when zooming in on the data i.e. when using a smaller time series, for example plotting over a special 2 hour period, not 24 hours. In Matlab, this small discrepancy can be seen, and a polynomial fitting applied. However for the Python plots (the first plot shown below), the curved line of this two hour period appears “smooth” and does not deviate at all, like that seen in the Matlab script (the second plot shows the blue line as the data, against the red line of the polyfit, hence, the blue line shows a slight discrepancy at x=9.4). The Matlab script is assumed correct, as this deviation is because of an Seismic activity that disrupts the ionosphere temporarily. Please refer to the plots below:
The third plot is in Matlab, where this is simply the polyfit minus the live data.
Therefore, it is not clear just how this data is being plotted on the axes for the Python script, because the data appears to smooth? Nor if my code is wrong (see below) and somehow “smooths” out the data somehow:
#Calculating by looping through
for sv in range(32):
sat = self.obs_data_chunks_dataframe[sv, :]
#print "sat.index_{0}: {1}".format(sv+1, sat.index)
phi1 = sat['L1'] * LAMBDA_1 #Change units of L1 to meters
phi2 = sat['L2'] * LAMBDA_2 #Change units of L2 to meters
pr1 = sat['P1']
pr2 = sat['P2']
#CALCULATION: teqc Calculation
iono_teqc = COEFF * (pr2 - pr1) / 1000000 #divide to make values smaller (tbc)
print "iono_teqc_{0}: {1}".format(sv+1, iono_teqc)
#PLOTTING
#Plotting of the data
plt.plot(sat.index, iono_teqc, label=‘teqc’)
plt.xlabel('Time (UTC)')
plt.ylabel('Ionosphere Delay (meters)')
plt.title("Ionosphere Delay on {0} for Satellite {1}.".format(self.date, sv+1))
plt.legend()
ax = plt.gca()
ax.ticklabel_format(useOffset=False)
plt.grid()
if sys.platform.startswith('win'):
plt.savefig(winpath + '\Figure_SV{0}'.format(sv+1))
elif sys.platform.startswith('darwin'):
plt.savefig(macpath + 'Figure_SV{0}'.format(sv+1))
plt.close()
Following on from point 1, the polynomial fitting code below does not run the way I’d like, so I’m overlooking something here. I assume this has to do with the data used upon the x,y-axes but can’t pinpoint exactly what. Would anyone know where I am going wrong here?
#Zoomed in plots
if sv == 19:
#Plotting of the data
plt.plot(sat.index, iono_teqc, label=‘teqc’) #sat.index to plot for time in UTC
plt.xlim(8, 10)
plt.xlabel('Time (UTC)')
plt.ylabel('Ionosphere Delay (meters)')
plt.title("Ionosphere Delay on {0} for Satellite {1}.".format(self.date, sv+1))
plt.legend()
ax = plt.gca()
ax.ticklabel_format(useOffset=False)
plt.grid()
#Polynomial fitting
coefficients = np.polyfit(sat.index, iono_teqc, 2)
plt.plot(coefficients)
if sys.platform.startswith('win'):
#os.path.join(winpath, 'Figure_SV{0}'.format(sv+1))
plt.savefig(winpath + '\Zoom_SV{0}'.format(sv+1))
elif sys.platform.startswith('darwin'):
plt.savefig(macpath + 'Zoom_SV{0}'.format(sv+1))
plt.close()
My RINEX file comprises 32 satellites. However when trying to generate the plots for all 32, I receive:
IndexError: index 31 is out of bounds for axis 0 with size 31
Changing the code below to 31 solves this partly, only excluding the 32nd satellite. I’d like to also plot for satellite 32. The functions for the parsing, and formatting of the data are given below:
def read_obs(self, RINEXfile, n_sat, sat_map):
obs = np.empty((TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
lli = np.zeros((TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
signal_strength = np.zeros((TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
for i in range(n_sat):
# Join together observations for a single satellite if split across lines.
obs_line = ''.join(padline(RINEXfile.readline()[:-1], 16) for _ in range((len(self.obs_types) + 4) / 5))
#obs_line = ''.join(padline(RINEXfile.readline()[:-1], 16) for _ in range(2))
#while obs_line
for j in range(len(self.obs_types)):
obs_record = obs_line[16*j:16*(j+1)]
obs[sat_map[i], j] = floatornan(obs_record[0:14])
lli[sat_map[i], j] = digitorzero(obs_record[14:15])
signal_strength[sat_map[i], j] = digitorzero(obs_record[15:16])
return obs, lli, signal_strength
def read_data_chunk(self, RINEXfile, CHUNK_SIZE = 10000):
obss = np.empty((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
llis = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
signal_strengths = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
epochs = np.zeros(CHUNK_SIZE, dtype='datetime64[us]')
flags = np.zeros(CHUNK_SIZE, dtype=np.uint8)
i = 0 #ggfrfg
while True:
hdr = self.read_epoch_header(RINEXfile)
if hdr is None:
break
epoch_time, flags[i], sats = hdr
#epochs[i] = np.datetime64(epoch_time)
epochs[i] = epoch_time
sat_map = np.ones(len(sats)) * -1
for n, sat in enumerate(sats):
if sat[0] == 'G':
sat_map[n] = int(sat[1:]) - 1
obss[i], llis[i], signal_strengths[i] = self.read_obs(RINEXfile, len(sats), sat_map)
i += 1
if i >= CHUNK_SIZE:
break
return obss[:i], llis[:i], signal_strengths[:i], epochs[:i], flags[:i]
def read_data(self, RINEXfile):
obs_data_chunks = []
while True:
obss, _, _, epochs, _ = self.read_data_chunk(RINEXfile)
epochs = epochs.astype(np.int64)
epochs = np.divide(epochs, float(3600.000))
if obss.shape[0] == 0:
break
obs_data_chunks.append(pd.Panel(
np.rollaxis(obss, 1, 0),
items=['G%02d' % d for d in range(1, 33)],
major_axis=epochs,
minor_axis=self.obs_types
).dropna(axis=0, how='all').dropna(axis=2, how='all'))
self.obs_data_chunks_dataframe = obs_data_chunks[0]
Any suggestions?
Cheers, pymat.
I managed to solve Qu1 as it was a conversion issue with my calculation that was overlooked, the other two points are however open...
Working with Matplotlib in Python (2.7.9). I have to plot a table in a subplot (in this case subplot name is tab) but I can't seem to find a way to change the font size of the table (http://imgur.com/0Ttvzee - bottom left). Antman is happy about the results, I am not.
This is the code I've been using.
EDIT: Added full code
def stat_chart(self):
DN = self.diff
ij = self.ij_list
mcont = self.mcont
ocont = self.ocont
ucont = self.ucont
dist = self.widths
clon = '%1.2f' %self.mclon
clat = '%1.2f' %self.mclat
clonlat = "{0}/{1}".format(clon,clat)
area = self.area
perim = self.perimeter
mdist = np.array(self.widths)
mdist = mdist[:,0]*10
mdist = np.mean(mdist)
pstat = self.polygon_status
if pstat == 1:
status = "Overestimation"
else:
status = "Underestimation"
# Setting up the plot (2x2) and subplots
fig = plt.figure()
gs = gridspec.GridSpec(2,2,width_ratios=[2,1],height_ratios=[4,1])
main = plt.subplot(gs[0,0])
polyf = plt.subplot(gs[0,1])
tab = plt.subplot(gs[1,0])
leg = plt.subplot(gs[1,1])
tab.set_xticks([])
leg.set_xticks([])
tab.set_yticks([])
leg.set_yticks([])
tab.set_frame_on(False)
leg.set_frame_on(False)
# Main image on the top left
main.imshow(DN[::-1],cmap='winter')
x1,x2,y1,y2 = np.min(ij[:,1])-15,np.max(ij[:,1])+15,np.min(ij[:,0])-15,np.max(ij[:,0])+15
main.axvspan(x1,x2,ymin=1-((y1-320)/float(len(DN)-320)),ymax=1-((y2-320)/float(len(DN)-320)),color='red',alpha=0.3)
main.axis([0,760,0,800])
# Polygon image on the top right
polyf.imshow(DN,cmap='winter')
polyf.axis([x1,x2,y2,y1])
polyf.plot(mcont[:,1],mcont[:,0],'ro',markersize=4)
polyf.plot(ocont[:,1],ocont[:,0],'yo',markersize=4)
polyf.plot(ucont[:,1],ucont[:,0],'go',markersize=4)
for n,en in enumerate(dist):
polyf.plot([en[2],en[4]],[en[1],en[3]],color='grey',alpha=0.3)
# Legend on the bottom right
mc = mlines.Line2D([],[],color='red',marker='o')
oc = mlines.Line2D([],[],color='yellow',marker='o')
uc = mlines.Line2D([],[],color='green',marker='o')
ed = mlines.Line2D([],[],color='black',alpha=0.5)
pos_p = mpatches.Patch(color='lightgreen')
neg_p = mpatches.Patch(color='royalblue')
leg.legend([mc,oc,uc,ed,pos_p,neg_p],("Model Cont.","Osisaf Cont.","Unknown Cont.","Dist. Mdl to Osi", \
'Model Overestimate','Model Underestimate'),loc='center')
# Statistics table on the bottom left
stats = [[clonlat+' degrees' ,'%1.4E km^2' %area,'%1.4E km' %perim,'%1.4f km' %mdist,status]]
columns = ('Center Lon/Lat','Area','Perimeter','Mean Width','Status')
rows = ['TODOpolyname']
cwid = [0.1,0.1,0.1,0.1,0.1,0.1]
the_table = tab.table(cellText=stats,colWidths=cwid,rowLabels=rows,colLabels=columns,loc='center')
table_props = the_table.properties()
table_cells = table_props['child_artists']
for cell in table_cells: cell.set_height(0.5)
plt.show()
return
EDIT2: Eventually (un)solved plotting text instead of table. Good enough.
I had a similar issue in changing the fontsize. Try the following
the_table.auto_set_font_size(False)
the_table.set_fontsize(5.5)
Worked for me.
According to the docs, table has a kwarg called fontsize, a float value for the size in points.
In your example from above, for a fontsize of 5 points you would use:
the_table =tab.table(cellText=stats,colWidths=cwid,rowLabels=rows,colLabels=columns,loc='center',fontsize=5)
If you require greater control, you can pass a FontManager instance to the cell.set_text_props() method as described in this example. That would enable you to set the family, spacing, style etc, in addition to the size.
EDIT: Playing around with Matplotlib's example, it seems that just passing fontsize to the table has no effect. However, importing
from matplotlib.font_manager import FontProperties
and then looping through the cells and running
cell.set_text_props(fontproperties=FontProperties(size = 5))
does have the desired effect. It is unclear why the documented kwarg fontsize does not work in this (or apparently in your) case.