Why are matplotlib subplots so far away from eachother? - python

So in general when I'm doing subplots of multiple rows, the result appears ugly : the distance between each row is much bigger than the distance between each column
Here is an example in which i'm showing on a 2*4 subplot grid the same image contained in Y[...,0]
import matplotlib.pyplot as plt
plt.figure(figsize=(10,10))
for i in range(8):
plt.subplot(2,4,i+1)
plt.imshow(Y[...,0])
plt.show()
Screenshot of the result. As you can see, there is a big white space between the two rows.
Is there a way to fix that ?

Related

How to plot a zero-one 2d matrix that will look like a scatter?

Might be a strange question, but I am wondering if it's possible to replace a 2d matrix made up of ones and zeros with a scatter plot of say, black dots where all the ones are but nothing for zeros:
Unfortunately I don't have the best reproducible answer, but I have a 2D array made up for zeros and ones (size 275 and 357):
I am hoping to basically cover the areas that are made up of ones with small black dots (assuming in the form of a scatter plot which will later be overlayed on another contour plot):
The original contour plot is on the left and the idea I'm going for is on the right (picture more black dots just on the areas made up of ones):
I tried making a reproducible array here:
#array of ones and zeros
array = np.array(([0,0,0,0,0,0,1,1,0,0,0,1,1,1], [0,1,0,0,0,1,0,0,0,0,0,1,0,1]))
plt.pcolormesh(array)
I tried using this as an example and apply it to the 2D array, but getting some errors?
# as an example, borrowed from: https://stackoverflow.com/questions/41133419/how-to-do-the-scatter-plot-for-the-lists-or-2d-array-or-matrix-python
X=[[0,3,4,0,1,1],
[0,0,0,5,1,1],
[6,7,0,8,1,1],
[3,6,1,5,6,1]]
Y=[12,15,11,10]
x_arr = np.array(X)
y = np.array(Y)
fig, ax = plt.subplots()
#colors=list('bgrcmykw')
for i, x in enumerate(x_arr.T):
ax.scatter(x,y, c='k',s=5)
plt.show()
My goal is to basically convert this 2d matrix made up of ones and zeros to a scatter plot or some sort of graph where the ones are made up of black dots and the zeros have nothing. This will later be overlaid on another contour plot. how might I go about setting the ones to a scatter plot made up of black dots?
Here's what I would do. I didn't plot all the points to reduce the computational demand of creating the figure. you might want to do that if you have a lot of points to plot. either way, you can change that according to your need.
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(0)
mask = np.random.randint(0, 2, (20, 20))
ys, xs = np.where(mask.astype(bool))
plt.imshow(mask)
plt.scatter(xs[::2], ys[::2])
output:

Use color intensities instead of bubble size for weighted scatter plot

Is there any way that we can use color intensities instead of bubble size for a weighted scatter plot? I have been searching for solutions online for hours, but I still did not find one. I use the following Penguins data for illustration.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
penguins_data="https://raw.githubusercontent.com/datavizpyr/data/master/palmer_penguin_species.tsv"
penguins_df = pd.read_csv(penguins_data, sep="\t")
sns.set_context("talk", font_scale=1.1)
plt.figure(figsize=(10,6))
sns.scatterplot(x="culmen_length_mm",
y="culmen_depth_mm",
size="body_mass_g",
sizes=(20,500),
alpha=0.5,
data=penguins_df)
# Put the legend out of the figure
plt.legend(bbox_to_anchor=(1.01, 1),borderaxespad=0)
# Put the legend out of the figure
#plt.legend(bbox_to_anchor=(1.01, 0.54), borderaxespad=0.)
plt.xlabel("Culmen Length (mm)")
plt.ylabel("Culmen Depth (mm)")
plt.title("Bubble plot in Seaborn")
plt.tight_layout()
plt.savefig("Bubble_plot_size_range_Seaborn_scatterplot.png",
format='png',dpi=150)
The bubble plot with the smallest bubble corresponding to the smallest body mass and the biggest bubble corresponds to the largest body mass. However, I need the color intensity for the weighted scatter plot. For example, a darker color indicates that it occurs more frequently, and a lighter color indicates that it occurs less frequently. Any suggestion using Stata (preferred), Python, or R is highly appreciated.
I found something in Stata like this one, but my data structure is completely different, so it does not work out.
Have you considered creating a new column in your dataframe for the color, where you adjust the alpha channel by yourself?
After that you can probably work from this question to use it as the color column as the hue for the markers.

Turning matplotlib grid of shaded values into a series of bar charts, one per row?

Using matlotlib, I can create figures that look like this:
Here, each row consists of a series of numbers from 0 to 0.6. The left hand axis text indicates the maximum value in each row. The bottom axis text represents the column indices.
The code for the actual grid essentially involves this line:
im = ax[r,c].imshow(info_to_use, vmin=0, vmax=0.6, cmap='gray')
where ax[r,c] is the current subplot axes at row r and column c, and info_to_use is a numpy array of shape (num_rows, num_cols) and has values between 0 and 0.6.
I am wondering if there is a way to convert the code above so that it instead displays bar charts, one per row? Something like this hand-drawn figure:
(The number of columns is not the same in my hand-drawn figure compared to the earlier one.) I know this would result in a very hard-to-read plot if it were embedded into a plot like the first one here. I would have this for a plot with fewer rows, which would make the bars easier to read.
The references that helped me make the first plot above were mostly from:
Python - Plotting colored grid based on values
custom matplotlib plot : chess board like table with colored cells
https://matplotlib.org/3.1.1/gallery/subplots_axes_and_figures/colorbar_placement.html#sphx-glr-gallery-subplots-axes-and-figures-colorbar-placement-py
https://matplotlib.org/3.1.1/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py
But I'm not sure how to make the jump from these to a bar chart in each row. Or at least something that could mirror it, e.g., instead of shading the full cell gray, only shade as much of it based on the percentage of the vmax?
import numpy as np
from matplotlib import pyplot as plt
a = np.random.rand(10,20)*.6
In a loop, call plt.subplot then plt.bar for each row in the 2-d array.
for i, thing in enumerate(a,1):
plt.subplot(a.shape[0],1,i)
plt.bar(range(a.shape[1]),thing)
plt.show()
plt.close()
Or, create all the subplots; then in a loop make a bar plot with each Axes.
fig, axes = plt.subplots(a.shape[0],1,sharex=True)
for ax, data in zip(axes, a):
ax.bar(range(a.shape[1]), data)
plt.show()
plt.close()

What are numbers of the colorbar in a 2d histogram?

I have plot a 2D histogram with the following python code:
import numpy as np
import matplotlib.pyplot as plt
import pylab
import matplotlib.ticker as ticker
x,y,z,a = np.loadtxt('bca_16_t1.txt', unpack=True, delimiter=',')
plt.hist2d(a, z,bins=(200, 200), cmap=plt.cm.jet)
plt.ylim([2.0, 4.6])
plt.xlim([700, 1300])
ax = plt.axes()
ax.xaxis.set_major_locator(ticker.MultipleLocator(50))
ax.yaxis.set_major_locator(ticker.MultipleLocator(0.2))
plt.grid()
plt.colorbar(fraction=0.15, shrink=1.0, aspect=20)
plt.show()
I have the following questions about it:
How do i remove the white space at the left most end of the plot?
What is the unit of the numbers against the colour bar and what do those numbers mean?
Any help regarding this will be much appreciated.
Thank you
You can set xlim to be the minimum of your data in x axis. You set 700 but seems like you can do slightly higher than that.
the numbers should be your z values.
First of all, my answers are only guesses. I have no experience with that package.
To remove the white space at the left most of the plot set the plt.xlim to something bigger than 700. Try with your lowest value in that axis.
I bet that the unit in the color bar is related to your data. Which is the unit of your data? And what your data mean?
You can limit the left using plt.xlim(xmin=x.min()).
The values on the z axis are the frequency of your data in the corresponding bin, for example if you have 5 elements in the range 700<x<703 and 2<y<2.013 the colorbar would show 5. If you want them to be normalized (in the range zero to one), then you should use hist2d(...,normed=True)
I figured it out.
How do i remove the white space at the left most end of the plot?
A. The white space is due to improper X-axis scale. Remove the lower limit of the X-axis scale to get rid of white space. To set only upper limit of the X-axis scale, add the following line:
plt.xlim(xmax=1300)
What is the unit of the numbers against the colour bar and what do those numbers mean?
A. The unit of the colour bar is binned 'Events' or 'Data points'. You can change the scale by changing the 'bins' parameter in plt.hist2d.

How can I have the size of matplotlib heatmap subplots reflect the number of rows in the data for each subplot?

I'm using matplotlib in python to create heatmaps for different clusters I've created using k-means clustering. Right now I'm able to produce this figure:
But I want the number of rows in each cluster reflected in the size of the heatmap, instead of them all being scaled to the same size. Is GridSpec the right way to do this? It's the only thing I can find trying to Google the solution, but it seems more suited to situations where you have subplots on a grid and you want a certain subplot to span more than one row or column on the grid. In this situation, I would be creating a grid with thousands of rows and telling each subplot to span hundreds of them. Is this still the best way to do it?
Edit: In case my question isn't clear, I'm ultimately trying to create a figure like this one. Notice how it's easy to see in the left figure that cluster E is larger than cluster F:
GridSpec has an argument height_ratios. You can set it to a list of the vertical shape of the heatmaps.
import numpy as np
import matplotlib.pyplot as plt
data = [np.random.rand(n,8) for n in [3,7,10,4]]
fig, axes = plt.subplots(nrows=len(data),
gridspec_kw=dict(height_ratios=[d.shape[0] for d in data]))
for ax, d in zip(axes, data):
ax.imshow(d)
ax.tick_params(labelbottom=False)
plt.show()

Categories

Resources