Reshape data into 'closest square' - python

I'm fairly new to python. Currently using matplotlib I have a script that returns a variable number of subplots to make, that I pass to another script to do the plotting. I want to arrange these subplots into a nice arrangement, i.e., 'the closest thing to a square.' So the answer is unique, let's say I weight number of columns higher
Examples: Let's say I have 6 plots to make, the grid I would need is 2x3. If I have 9, it's 3x3. If I have 12, it's 3x4. If I have 17, it's 4x5 but only one in the last row is created.
Attempt at a solution: I can easily find the closest square that's large enough:
num_plots = 6
square_size = ceil(sqrt(num_plots))**2
But this will leave empty plots. Is there a way to make the correct grid size?

This what I have done in the past
num_plots = 6
nr = int(num_plots**0.5)
nc = num_plots/nr
if nr*nc < num_plots:
nr+=1
fig,axs = pyplot.subplots(nr,nc,sharex=True,sharey=True)

If you have a prime number of plots like 5 or 7, there's no way to do it unless you go one row or one column. If there are 9 or 15 plots, it should work.
The example below shows how to
Blank the extra empty plots
Force the axis pointer to be a 2D array so you can index it generally even if there's only one plot or one row of plots
Find the correct row and column for each plot as you loop through
Here it is:
nplots=13
#find number of columns, rows, and empty plots
nc=int(nplots**0.5)
nr=int(ceil(nplots/float(nc)))
empty=nr*nc-nplots
#make the plot grid
f,ax=pyplot.subplots(nr,nc,sharex=True)
#force ax to have two axes so we can index it properly
if nplots==1:
ax=array([ax])
if nc==1:
ax=ax.reshape(nr,1)
if nr==1:
ax=ax.reshape(1,nc)
#hide the unused subplots
for i in range(empty): ax[-(1+i),-1].axis('off')
#loop through subplots and make output
for i in range(nplots):
ic=i/nr #find which row we're on. If the definitions of ir and ic are switched, the indecies for empty (above) should be switched, too.
ir=mod(i,nr) #find which column we're on
axx=ax[ir,ic] #get a pointer to the subplot we're working with
axx.set_title(i)

Related

Figure with multiple traces in subplots

I'm trying to create a plot containing 3 subplots, each subplot containing a number of lines plus 2 threshold lines. So far I'm able to create the subplots and plot a couple of lines, but when I want to add more than 2 lines, it won't display them.
Here is the code I'm using:
# Make many subplots
for p_i in range(poses_values_array.shape[1]-6):
if p_i%3 == 0:
main_fig = subplots.make_subplots(rows=3, cols=1, subplot_titles=("lLeg","rLeg","Hip"))
fig = go.Figure()
# Treshold lines
fig.add_trace(go.Scatter(x= list(range(poses_values_array.shape[2])),
y= [pose_max[p_i]] * poses_values_array.shape[2],
name=f'Max Pose {pose_motion[p_i%3]} {pose_names[int(p_i/3)]} Threshold'))
fig.add_trace(go.Scatter(x= list(range(poses_values_array.shape[2])),
y= [pose_min[p_i]] * poses_values_array.shape[2],
name=f'Min Pose {pose_motion[p_i%3]} {pose_names[int(p_i/3)]} Threshold'))
# Data
for t_i in range(poses_values_array.shape[0]):
fig.add_trace(go.Scatter(x=list(range(len(poses_values_array[t_i, p_i, :]))),
y=poses_values_array[t_i, p_i, :],
name=f'Target {t_i+1} - Pose {pose_motion[p_i%3]} {pose_names[int(p_i/3)]}'))
fig.update_layout(title=f'Pose {p_i}',
xaxis_title='Dataset',
yaxis_title='Pose Value')
fig.update_yaxes(autorange=False, zeroline=True, zerolinewidth=2, zerolinecolor='LightPink')
# Update the subplots
for i in range (poses_values_array.shape[0]):
main_fig.append_trace(fig.data[i], row=(p_i%3)+1, col=1)
main_fig.update_layout(title=f'Aggregated {pose_names[int(p_i/3)]} Pose {p_i}-{p_i+3}')
# Update subplots individual subtitles
main_fig.layout.annotations[p_i%3].update(text=f"{pose_names[int(p_i/3)]} {pose_motion[p_i%3]} Pose")
I also tried placing the Threshold lines after the for loop that plots Data, resulting in my current 2 lines (will have more actually) of data showing up but not the treshold lines.
I tried too using fig.add_hline() with the same result.
This is what results from the code. Ideally I would like to see the t_i lines of data in between the thresholds lines:
Hope I can get a hint of what I'm doing wrong.
Thanks!
Oh, wow, soon after posting this question, giving another read to my code, I found the error.
I was not taking into account the threshold lines as part of fig.data, so I was only looping through the first 2 traces that were added to it on the # Update subplots line. I just had to switch for i in range (poses_values_array.shape[0]) to for i in range (poses_values_array.shape[0]+2).

Plotting Hierarchical Quantitative Data

I am trying to visualize some quantitative data that can be summarized hierarchically. The data is arranged such that each node either:
has children, with this node's value the mean of the value of its children
has no children (is terminal) with a set value
For example:
A = mean(AA, AB) = mean(0.75 + 0.85) = 0.8
AA = mean(AAA, AAB) = mean(0.6 + 0.9) = 0.75
AAA = 0.6
AAB = 0.9
AB = 0.85
My current plan is a figure like the following:
Where node A is centered above the figure that shows its children, AD is centered above the subplot showing its children, etc...
My current solution is to approach it recursively using matplotlib and gridspec (traverse down the hierarchy, splitting the plot area using gridspec into a top row and the rest, then recursing on the columns in the row below). It "works", but there's a lot of little cases I have to handle, such as:
making sure the leftmost figure has y-axis labels, which is awkward in cases like the bottom where the second column needs to know it is the leftmost real figure
making sure figures like the CA-CB-CC figure which terminate before the bottom row don't extend beyond their row
handling the vertical padding between plots such that long labels can be rotated 45deg without overlap
when using gridspec the subplots are not always vertically aligned (some are a bit taller or shorter than others)
To catch these I have to pass a lot of state around and fix edge cases. Are there any other plotting libraries or a different approach with matplotlib that might work better?
EDIT:
Adding a colored figure with the pattern of the recursion in my current attempt.
* first (top level) call partitions everything by the green boxes (2 rows, with the bottom row split into 3 col) then recursively plots the three columns independently
* second call (yellow) splits the bottom left into two rows, recursing on its child columns
* third call (orange) ...
* fourth call (red) ...
The downside to this is I don't think I can easily do a shared y-axis across a given row because they're isolated from their neighbors. The only caller that has that context is the top level, so I'd have to reach down into all the plots afterwards and link their y-axes.

Scatter plot for a matrix of a given form

Suppose I have a matrix of the form where first column is all x points, the second column is all y points, and then the third and fourth are indicator variables telling whether the point belongs to a particular 'cluster' (can be either 1 or 0; so if in column 3 I have 1 for a third row, it means that the point of the third row, belongs to say cluster 1, which is represented by column 3).
My question is, how do I create a figure, scatter plot all the points belonging to cluster 1 and then on the same plot have scatter of the remaining points in another color. In Matlab, I would just say figure, then hold on and write out my commands. I am new to plotting in Python and not sure how this would be performed.
EDIT:
I think I made it work. How would I however, change marker size, depending on which cluster the point belongs to
Let's start with how we'd do this in MATLAB.
Supposing you have N unique clusters, you can simply loop through as many clusters as you have and plot the points in a different colour. Also, we can change the marker size at each iteration. You'll need to use logical indexing to extract out the points that belong to each cluster. Given that your matrix is stored in M, something like this comes to mind:
rng(123); %// Set random seeds
%// Total number of clusters
N = max(M(:,3));
%// Create a colour map
cmap = rand(N,3);
%// Store point sizes per cluster
sizes = [10 14 18];
figure; hold on; %// Create a blank figure and hold for changes
for ii = 1 : N
%// Determine those points belonging to the ith cluster
ind = M(:,3) == ii;
%// Get the x and y coordinates
x = M(ind,1);
y = M(ind,2);
%// Plot the points in a different colour
plot(x,y,'.','Color', cmap(ii,:), 'MarkerSize', sizes(ii));
end
%// Create labels
labels = sprintfc('Label %d', 1:N);
%// Make our legend
legend(labels{:});
The code is pretty self explanatory, you need to define your matrix M and we determine the total number of clusters by taking the max of the third column. Next we create a random colour map which has as many rows as there are clusters and there are three columns corresponding to a unique RGB colour per cluster. Each row defines a colour for each cluster which we'll use when plotting.
Next we create an array of sizes where we store the radius of each point stored in an array per cluster. We create a blank figure, hold it for changes we make to the plot then we iterate over each cluster of points. For each cluster of points, figure out the right points in M to extract out through logical indexing, extract out the x and y coordinates for those points then plot these points on your figure in a scatter formation where we manually specify the colour as a RGB tuple as well as the desired marker size.
We then create a cell array of labels that denote which set of points each cluster belongs to, then show a legend illustrating which points belong to which clusters given this array of labels.
Generating random data with random labels, where we have 20 points uniformly distributed between [0,1] for both x and y and generating a random set of up to three labels:
rng(123);
M = [rand(20,2) randi(3,20,1)];
I get this plot when I run the above code:
To get the equivalent in Python, well that's pretty easy. It's just a transcription from MATLAB to Python and the plotting mechanisms are exactly the same. You're using matplotlib and so I'm assuming numpy can be used as it's a dependency.
As such, the equivalent code would look something like this:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
# Total number of clusters
N = int(np.max(M[:,2]))
# Create a colour map
cmap = np.random.rand(N, 3)
# Store point sizes per cluster
sizes = np.array([10, 14, 18]);
plt.figure(); # Create blank figure. No need to hold on
for ii in range(N):
# Determine those points belonging to the ith cluster
ind = M[:,2] == (ii+1)
# Get the x and y coordinates
x = M[ind,0];
y = M[ind,1];
# Plot the points in a different colour
# Also add in labels for legend
plt.plot(x,y,'.',color=tuple(cmap[ii]), markersize=sizes[ii], label='Cluster #' + str(ii+1))
# Make our legend
plt.legend()
# Show the image
plt.show()
I won't bother explaining this one because it's pretty much the same as what you see in the MATLAB code. There are some nuances, such as the way hold on works in matplotlib. You don't need to use hold on because any changes you make the figure will be remembered until you decide to show the figure. You also have the nuances where numpy and Python start indexing at 0 instead of 1.
Using the same generation data code like in MATLAB:
M = np.column_stack([np.random.rand(20,2), np.random.randint(1,4,size=(20,1))])
I get this figure:

Heatmap with varying y axis

I would like to create a visualization like the upper part of this image. Essentially, a heatmap where each point in time has a fixed number of components but these components are anchored to the y axis by means of labels (that I can supply) rather than by their first index in the heatmap's matrix.
I am aware of pcolormesh, but that does not seem to give me the y-axis functionality I seek.
Lastly, I am also open to solutions in R, although a Python option would be much preferable.
I am not completely sure if I understand your meaning correctly, but by looking at the picture you have linked, you might be best off with a roll-your-own solution.
First, you need to create an array with the heatmap values so that you have on row for each label and one column for each time slot. You fill the array with nans and then write whatever heatmap values you have to the correct positions.
Then you need to trick imshow a bit to scale and show the image in the correct way.
For example:
# create some masked data
a=cumsum(random.random((20,200)), axis=0)
X,Y=meshgrid(arange(a.shape[1]),arange(a.shape[0]))
a[Y<15*sin(X/50.)]=nan
a[Y>10+15*sin(X/50.)]=nan
# draw the image along with some curves
imshow(a,interpolation='nearest',origin='lower',extent=[-2,2,0,3])
xd = linspace(-2, 2, 200)
yd = 1 + .1 * cumsum(random.random(200)-.5)
plot(xd, yd,'w',linewidth=3)
plot(xd, yd,'k',linewidth=1)
axis('normal')
Gives:

Avoid for-loops in assignment of data values

So this is a little follow up question to my earlier question: Generate coordinates inside Polygon and my answer https://stackoverflow.com/a/15243767/1740928
In fact, I want to bin polygon data to a regular grid. Therefore, I calculate a couple of coordinates within the polygon and translate their lat/lon combination to their respective column/row combo of the grid.
Currently, the row/column information is stored in a numpy array with its number of rows corresponding to the number of data polygons and its number of columns corresponding to the coordinates in the polygon.
The whole code takes less then a second, but this code is the bottleneck at the moment (with ~7sec):
for ii in np.arange(len(data)):
for cc in np.arange(data_lats.shape[1]):
final_grid[ row[ii,cc], col[ii,cc] ] += data[ii]
final_grid_counts[ row[ii,cc], col[ii,cc] ] += 1
The array "data" simply contains the data values for each polygon (80000,). The arrays "row" and "col" contain the row and column number of a coordinate in the polygon (shape: (80000,16)).
As you can see, I am summing up all data values within each grid cell and count the number of matches. Thus, I know the average for each grid cell in case different polygons intersect it.
Still, how can these two for loops take around 7 seconds? Can you think of a faster way?
I think numpy should add an nd-bincount function, I had one lying around from a project I was working on some time ago.
import numpy as np
def two_d_bincount(row, col, weights=None, shape=None):
if shape is None:
shape = (row.max() + 1, col.max() + 1)
row = np.asarray(row, 'int')
col = np.asarray(col, 'int')
x = np.ravel_multi_index([row, col], shape)
out = np.bincount(x, weights, minlength=np.prod(shape))
return out.reshape(shape)
weights = np.column_stack([data] * row.shape[1])
final_grid = two_d_bincount(row.ravel(), col.ravel(), weights.ravel())
final_grid_counts = two_d_bincount(row.ravel(), col.ravel())
I hope this helps.
I might not fully understand the shapes of your different grids, but you can maybe eliminate the cc loop using something like this:
final_grid = np.empty((nrows,ncols))
for ii in xrange(len(data)):
final_grid[row[ii,:],col[ii,:]] = data[ii]
This of course assumes that final_grid is starting with no other info (that the count you're incrementing starts at zero). And I'm not sure how to test if it works not understanding how your row and col arrays work.

Categories

Resources