Make two plots in a single row using imshow in python - python

I want to make plots of the same data for two regions, so they should be placed side by side in a single row for compactness.
Below is my code. As we can expect, it's creating the plots one after another in a column, since the two plots are not linked to each other.
I tried looking for side by side plot methods, however they don't seem to work with imshow() and I need to use imshow() in my code. I've attached my output below as well.
Any suggestions?
wvcount3D=fanibl['IMG_WV'][:]
wvcount3D.shape
wvcount=np.squeeze(wvcount3D)
wvcount.shape
image_wv_global=plt.imshow(wvcount,vmin=800,vmax=975)
plt.title("Global 2D plot for Water Vapor Count for Fani before landfall")
plt.colorbar(image_wv_global)
plt.grid()
plt.show()
image_wv_asia=plt.imshow(wvcount,vmin=800,vmax=975)
plt.title("Asia-specific plot for Water Vapor Count for Fani before landfall")
plt.axis([300, 1000, 200, 700])
plt.colorbar(image_wv_asia)
plt.grid()
plt.show()
This is my current output, plots in a single column:

Related

Python stacked barchart where y-axis scale is linear but the bar fill is logarithmic in the order of 10s

As the title explains, I am trying to reproduce a stacked barchart where the y-axis scale is linear but the inside fill of the plot (i.e. the stacked bars) are logarithmic and grouped in the order of 10s.
I have made this plot before on R-Studio with an in-house package, however I am trying to reproduce the plot with other programs (python) to validate and confirm my analysis.
Quick description of the data w/ more detail:
I have thousands of entries of clonal cell information. They have multiple identifiers, such as "Strain", "Sample", "cloneID", as well as a frequency value ("cloneFraction") for each clone.
This is the .head() of the dataset I am working with to give you an idea of my data
I am trying to reproduce this following plot I made with R-Studio:
this one here
This plot has the dataset divided in groups based on their frequency, with the top 10 most frequent grouped in red, followed by the next top 100, next 1000, etc etc. The y-axis has a 0.00-1.00 scale but also a 100% scale wouldn't change, they mean the same thing in this context.
This is just to get an idea and visualize if I have big clones (the top 10) and how much of the overall dataset they occupy in frequency - i.e. the bigger the red stack the larger clones I have, signifying there has been a significant clonal expansion in my sample of a few selected cells.
What I have done so far:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
%matplotlib inline
MYDATAFRAME.groupby(['Sample','cloneFraction']).size().groupby(level=0).apply(lambda x: 100 * x / x.sum()).unstack().plot(kind='bar',stacked=True, legend=None)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()
And I get this plot here
Now, I realize there is no order in the stacked plot, so the most frequent aren't on top - it's just stacking in the order of the entries in my dataset (which I assume I can just fix by sorting my dataframe by the column of interest).
Other than the axis messing up and not giving my a % when I use log scale (which is a secondary issue), I can't seem/wouldn't know how to group the data entries by frequency as I mentioned above.
I have tried things such as:
temp = X.SOME_IDENTIFIER.value_counts()
temp2 = temp.head(10)
if len(temp) > 10:
temp2['remaining {0} items'.format(len(temp) - 10)] = sum(temp[10:])
temp2.plot(kind='pie')
Just to see if I could separate them in a correct way but this does not achieve what I would like (other than being a pie chart, but I changed that in my code).
I have also tried using iloc[n:n] to select specific entries, but I can't seem to get that working either, as I get errors when I try adding it to the code I've used above to plot my graph - and if I use it without the other fancy stuff in the code (% scale, etc) it gets confused in the stacked barplot and just plots the top 10 out of all the 4 samples in my data, rather than the top 10 per sample. I also wouldn't know how to get the next 100, 1000, etc.
If you have any suggestions and can help in any way, that would be much appreciated!
Thanks
I fixed what I wanted to do with the following:
I created a new column with the category my samples fall in, base on their value (i.e. if they're the top 10 most frequent, next 100, etc etc).
df['category']='10001+'
for sampleref in df.sample_ref.unique().tolist():
print(f'Setting sample {sampleref}')
df.loc[df[df.sample_ref == sampleref].nlargest(10000, 'cloneCount')['category'].index,'category']='1001-10000'
df.loc[df[df.sample_ref == sampleref].nlargest(1000, 'cloneCount')['category'].index,'category']='101-1000'
df.loc[df[df.sample_ref == sampleref].nlargest(100, 'cloneCount')['category'].index,'category']='11-100'
df.loc[df[df.sample_ref == sampleref].nlargest(10, 'cloneCount')['category'].index,'category']='top10'
This code starts from the biggest group (10001+) and goes smaller and smaller, to include overlapping samples that might fall into the next big group.
Following this, I plotted the samples with the following code:
fig, ax = plt.subplots(figsize=(15,7))
df.groupby(['Sample','category']).sum()['cloneFraction'].unstack().plot(ax=ax, kind="bar", stacked=True)
plt.xticks(rotation=0)
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], title='Clonotype',bbox_to_anchor=(1.04,0), loc="lower left", borderaxespad=0)
And here are the results:
I hope this helps anyone struggling with the same issue!

Turning matplotlib grid of shaded values into a series of bar charts, one per row?

Using matlotlib, I can create figures that look like this:
Here, each row consists of a series of numbers from 0 to 0.6. The left hand axis text indicates the maximum value in each row. The bottom axis text represents the column indices.
The code for the actual grid essentially involves this line:
im = ax[r,c].imshow(info_to_use, vmin=0, vmax=0.6, cmap='gray')
where ax[r,c] is the current subplot axes at row r and column c, and info_to_use is a numpy array of shape (num_rows, num_cols) and has values between 0 and 0.6.
I am wondering if there is a way to convert the code above so that it instead displays bar charts, one per row? Something like this hand-drawn figure:
(The number of columns is not the same in my hand-drawn figure compared to the earlier one.) I know this would result in a very hard-to-read plot if it were embedded into a plot like the first one here. I would have this for a plot with fewer rows, which would make the bars easier to read.
The references that helped me make the first plot above were mostly from:
Python - Plotting colored grid based on values
custom matplotlib plot : chess board like table with colored cells
https://matplotlib.org/3.1.1/gallery/subplots_axes_and_figures/colorbar_placement.html#sphx-glr-gallery-subplots-axes-and-figures-colorbar-placement-py
https://matplotlib.org/3.1.1/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py
But I'm not sure how to make the jump from these to a bar chart in each row. Or at least something that could mirror it, e.g., instead of shading the full cell gray, only shade as much of it based on the percentage of the vmax?
import numpy as np
from matplotlib import pyplot as plt
a = np.random.rand(10,20)*.6
In a loop, call plt.subplot then plt.bar for each row in the 2-d array.
for i, thing in enumerate(a,1):
plt.subplot(a.shape[0],1,i)
plt.bar(range(a.shape[1]),thing)
plt.show()
plt.close()
Or, create all the subplots; then in a loop make a bar plot with each Axes.
fig, axes = plt.subplots(a.shape[0],1,sharex=True)
for ax, data in zip(axes, a):
ax.bar(range(a.shape[1]), data)
plt.show()
plt.close()

How to overlay scatter plot on top of a line plot using matplotlib?

So I have a line plot, and I want to add markers on only some of the points along the plot (I have detected the peaks in the plot and want to mark them). When I plot without the peaks labelled it works as it should, and when I plot the peaks alone it seems to plot them properly, but when I try to plot them on the same plot, the line plot disappears over most of the graph and seems to maybe have become compressed to the side of the plot, if that makes any sense?
Here is my code without the peaks plotted and the resulting graph:
def plotPeaks(file):
indices, powerSums, times=detectPeaks(file)
plt.figure(figsize=(100, 10))
plt.plot(times, powerSums)
Plot without peaks marked
Then when I add the code that should show the peaks, which occur at x-values corresponding to the values stored in the indices, I get this:
def plotPeaks(file):
indices, powerSums, times=detectPeaks(file)
plt.figure(figsize=(100, 10))
plt.plot(times, powerSums)
for i in indices:
plt.scatter(i, powerSums[i], marker='o')
Plot with peaks marked
Am I missing something obvious, or is this a glitch that someone has a solution for?
Assuming indices stores indices of times, this should be the last line.
plt.scatter(times[i], powerSums[i], marker='o')

Seaborn Pairgrid: How to share all axes for all off-diagonal plots (i.e each plot shares axes with its mirror)?

I am trying to plot a 3x3 sns.PairGrid of plots. Currently, the axes are shared for the bottom triangle, and the upper triangle separately. Put another way, the x axes and y axes are only shared with their respective columns/row. So the x-axis of plot (1,0) is shared with (0,0) and (2,0).
However, I would like all the off-diagonal plots to share their axes. So for example, I want (1,0) share its x-axis with (0,0) and (2,0) like before, but also with (0,1).
Also, I would prefer it if the y-axes aren't shared with the plots on the diagonal, as those are 1-D kernel density plots, and so if I share their y-axes, some of them will be invisible as the size of the probability density functions isn't the same.
Here's my current code if it helps:
The 3 parameters I am plotting against each other are called 'A', 'C', and 'logsw', and are contained in the pandas.DataFrame called hyperparams
g = sns.PairGrid(hyperparams, diag_sharey=False)
g.map_lower(sns.kdeplot)
g.map_upper(plt.scatter, marker='+')
g.map_diag(sns.kdeplot)
And here's a trivial example of the output plot:
The images on the bottom left are scaled differently to the images on the upper right, which is what I'm trying to avoid.
High level, you could manually set the x and y limits and tickmarks. You could also set variables to what you want to share and then just reuse the variable in the 3 like subplots.
That way, if you need to make an adjustment, you just update the variable and the 3 plots that share it now update all at once.
In the past, I created code for a Pair grid where I set the limits and ticks on all subplots along the y-axis, and all plots along the x-axis in this manner.
There is currently no way of automatically doing this in Seaborn. The workaround suggested in the comment that seems to have solve the problem is to set the axes limits manually for the diagonal subplots. Using variables for the x and y limits ensures that they only need to be changed in one place when updating the axes ranges.

Too many subplots for pyplot?

I am working on a program that plots a decent amount of graphs (the current iteration is outputting 250-300 graphs). I am using subplots so as to condense them to one output/window. However, the output is extremely condensed.
I was hoping that the output would maintain the dimensions of a "regular" plot and output a scroll-able window, as opposed to maintaining the dimensions of the output window and scaling the subplots down. Is there a way to fix this, besides outputting every n plots?
Here is the code that outputs the plots:
f, axarr = plt.subplots(len(events)-1, sharex = True)
x_axis = np.arange(120)
for i in range(len(events)-1):
axarr[i].plot(x_axis,events[i])
plt.show()

Categories

Resources