Plotting Hierarchical Quantitative Data - python

I am trying to visualize some quantitative data that can be summarized hierarchically. The data is arranged such that each node either:
has children, with this node's value the mean of the value of its children
has no children (is terminal) with a set value
For example:
A = mean(AA, AB) = mean(0.75 + 0.85) = 0.8
AA = mean(AAA, AAB) = mean(0.6 + 0.9) = 0.75
AAA = 0.6
AAB = 0.9
AB = 0.85
My current plan is a figure like the following:
Where node A is centered above the figure that shows its children, AD is centered above the subplot showing its children, etc...
My current solution is to approach it recursively using matplotlib and gridspec (traverse down the hierarchy, splitting the plot area using gridspec into a top row and the rest, then recursing on the columns in the row below). It "works", but there's a lot of little cases I have to handle, such as:
making sure the leftmost figure has y-axis labels, which is awkward in cases like the bottom where the second column needs to know it is the leftmost real figure
making sure figures like the CA-CB-CC figure which terminate before the bottom row don't extend beyond their row
handling the vertical padding between plots such that long labels can be rotated 45deg without overlap
when using gridspec the subplots are not always vertically aligned (some are a bit taller or shorter than others)
To catch these I have to pass a lot of state around and fix edge cases. Are there any other plotting libraries or a different approach with matplotlib that might work better?
EDIT:
Adding a colored figure with the pattern of the recursion in my current attempt.
* first (top level) call partitions everything by the green boxes (2 rows, with the bottom row split into 3 col) then recursively plots the three columns independently
* second call (yellow) splits the bottom left into two rows, recursing on its child columns
* third call (orange) ...
* fourth call (red) ...
The downside to this is I don't think I can easily do a shared y-axis across a given row because they're isolated from their neighbors. The only caller that has that context is the top level, so I'd have to reach down into all the plots afterwards and link their y-axes.

Related

Python - overlaying two series using matplot showing percentage change of the Series

I'm new to Python and would like to map two series S1 & S2 on top of each other where they both start at the same point on the left hand side and are then graphed as a percentage change and visually scaled (so they overlay on top of each other). So if S1's high/low points are 1000 - 2000 and S2's high/low points are 2-7 then they can be visualized together. Without this scaling you'd see S1 graphed normally and S2 would be flat line at the bottom of the chart.
To recreate what I'm after on Trading View.
Go to www.tradingview.com
Choose BTCUSD click on > advanced graphs
Click "Compare or Add Symbol" it's the "+" button at the top LHS
Choose another series. (i.e. ETHUSD)
Now you will see two series on one graph showing percentage change on the y axis

Make a 2×2 grid with 3 plots. The 3rd plot should occupy both the lower spots

Aim - To make a 2×2 grid with lower row being occupied entirely by one axis.
Code till now: -
fig=plt.figure(figsize=[10,10])
ax0=fig.add_subplot(221)
ax0.scatter(xlist[:10],ylist[:10],color='red')
ax1=fig.add_subplot(222)
ax1.scatter(xlist[10:],ylist[10:],color='green')
ax3=fig.add_subplot(223)
ax3.scatter(xlist,ylist)
I want the ax3 to occupy the two lower spots in the 2nd row. I've a solution involving GridSpec, however I want to avoid GridSpec for now. This answer has answered it already. But I can't understand how plt.subplot(212) makes it work the way I want. How can he put a (212) in a (2×2) grid. I was told that (223) indicates 3rd position in a 2×2 grid. So how can (212) be valid there?
Any help?
Note : - The Gridspec method uses ax2=fig.add_subplot(gs[:1]) to put ax2 at both [0,1] and [1,1] positions.

matplotlib 3D line plot, color coding throughout graph

I was wondering if there was an option in matplotlib to have different colors along one graph.
So far I manged to have a graph in a specific color as well as having multiple graphs in different colors.
However, all graphs I created so far have a singular color. I was wondering if I could use column c (see below) to color different parts of a graph.
In the example, I want to use the value "0.1" in column c with index 1 to color the graph from the first to the second data point, the value "0.2" in column c with index 2 to color the graph from the second to the third data point and so on.
data for one graph:
index x y z c
1 1 2 1 0.1
2 1 2 2 0.2
3 1 3 1 0.1
I found that I could color data points dependent on a fourth column in a 3D scatter plot and was wondering if that somehow works with line plots as well.
The only "workaround" I can think of is splitting my graph data into x sub-graphs (each subgraph data has only two data points - the start and end point) and color them according to the column c of the first data point. This would result in n-1 separate graphs for n data points however.
The solution for anyone still looking was splitting my graph data into x sub-graphs (each subgraph data has only two data points - the start and end point, or the nth and n+1th point).
Then I colored them according to the column c of the first data point. This results in n-1 separate graphs for n data points.
Further explanation here Line colour of 3D parametric curve in python's matplotlib.pyplot

Reshape data into 'closest square'

I'm fairly new to python. Currently using matplotlib I have a script that returns a variable number of subplots to make, that I pass to another script to do the plotting. I want to arrange these subplots into a nice arrangement, i.e., 'the closest thing to a square.' So the answer is unique, let's say I weight number of columns higher
Examples: Let's say I have 6 plots to make, the grid I would need is 2x3. If I have 9, it's 3x3. If I have 12, it's 3x4. If I have 17, it's 4x5 but only one in the last row is created.
Attempt at a solution: I can easily find the closest square that's large enough:
num_plots = 6
square_size = ceil(sqrt(num_plots))**2
But this will leave empty plots. Is there a way to make the correct grid size?
This what I have done in the past
num_plots = 6
nr = int(num_plots**0.5)
nc = num_plots/nr
if nr*nc < num_plots:
nr+=1
fig,axs = pyplot.subplots(nr,nc,sharex=True,sharey=True)
If you have a prime number of plots like 5 or 7, there's no way to do it unless you go one row or one column. If there are 9 or 15 plots, it should work.
The example below shows how to
Blank the extra empty plots
Force the axis pointer to be a 2D array so you can index it generally even if there's only one plot or one row of plots
Find the correct row and column for each plot as you loop through
Here it is:
nplots=13
#find number of columns, rows, and empty plots
nc=int(nplots**0.5)
nr=int(ceil(nplots/float(nc)))
empty=nr*nc-nplots
#make the plot grid
f,ax=pyplot.subplots(nr,nc,sharex=True)
#force ax to have two axes so we can index it properly
if nplots==1:
ax=array([ax])
if nc==1:
ax=ax.reshape(nr,1)
if nr==1:
ax=ax.reshape(1,nc)
#hide the unused subplots
for i in range(empty): ax[-(1+i),-1].axis('off')
#loop through subplots and make output
for i in range(nplots):
ic=i/nr #find which row we're on. If the definitions of ir and ic are switched, the indecies for empty (above) should be switched, too.
ir=mod(i,nr) #find which column we're on
axx=ax[ir,ic] #get a pointer to the subplot we're working with
axx.set_title(i)

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Categories

Resources