Python plotly order facet_wrap by specific facet - python

I have a simple facet_wrap barplot generated in python plotly that looks like the attached image.
Is it possible to order the x-axis to another facet than the last one. The pandas dataframe is sorted according to the y-axis (which is what I want) but would like this specifically on second-to-last facet (so that it looks similar to the last one in the current plot) but keep the current order of the facet. simple facet_wrap barplot
Sample code below. This will automatically sort the x-axis according to the bottom facet - which is "DEN_Tumour_WD" in this case.
toPlot = pd.DataFrame(allModel)
toPlot = toPlot.sort_values(by=['Flux Ratio (log-scaled)'])
fig = px.bar(toPlot,
x='Reaction',
y='Flux Ratio (log-scaled)',
template = 'none',
facet_row="Model",
color='Subsystem',
category_orders={"Model": ["nonDEN_Liver_CD",
"nonDEN_Liver_WD",
"DEN_Liver_CD",
"DEN_AdjLiver_WD",
"DEN_Tumour_WD"]})

Related

Python stacked barchart where y-axis scale is linear but the bar fill is logarithmic in the order of 10s

As the title explains, I am trying to reproduce a stacked barchart where the y-axis scale is linear but the inside fill of the plot (i.e. the stacked bars) are logarithmic and grouped in the order of 10s.
I have made this plot before on R-Studio with an in-house package, however I am trying to reproduce the plot with other programs (python) to validate and confirm my analysis.
Quick description of the data w/ more detail:
I have thousands of entries of clonal cell information. They have multiple identifiers, such as "Strain", "Sample", "cloneID", as well as a frequency value ("cloneFraction") for each clone.
This is the .head() of the dataset I am working with to give you an idea of my data
I am trying to reproduce this following plot I made with R-Studio:
this one here
This plot has the dataset divided in groups based on their frequency, with the top 10 most frequent grouped in red, followed by the next top 100, next 1000, etc etc. The y-axis has a 0.00-1.00 scale but also a 100% scale wouldn't change, they mean the same thing in this context.
This is just to get an idea and visualize if I have big clones (the top 10) and how much of the overall dataset they occupy in frequency - i.e. the bigger the red stack the larger clones I have, signifying there has been a significant clonal expansion in my sample of a few selected cells.
What I have done so far:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
%matplotlib inline
MYDATAFRAME.groupby(['Sample','cloneFraction']).size().groupby(level=0).apply(lambda x: 100 * x / x.sum()).unstack().plot(kind='bar',stacked=True, legend=None)
plt.yscale('log')
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter())
plt.show()
And I get this plot here
Now, I realize there is no order in the stacked plot, so the most frequent aren't on top - it's just stacking in the order of the entries in my dataset (which I assume I can just fix by sorting my dataframe by the column of interest).
Other than the axis messing up and not giving my a % when I use log scale (which is a secondary issue), I can't seem/wouldn't know how to group the data entries by frequency as I mentioned above.
I have tried things such as:
temp = X.SOME_IDENTIFIER.value_counts()
temp2 = temp.head(10)
if len(temp) > 10:
temp2['remaining {0} items'.format(len(temp) - 10)] = sum(temp[10:])
temp2.plot(kind='pie')
Just to see if I could separate them in a correct way but this does not achieve what I would like (other than being a pie chart, but I changed that in my code).
I have also tried using iloc[n:n] to select specific entries, but I can't seem to get that working either, as I get errors when I try adding it to the code I've used above to plot my graph - and if I use it without the other fancy stuff in the code (% scale, etc) it gets confused in the stacked barplot and just plots the top 10 out of all the 4 samples in my data, rather than the top 10 per sample. I also wouldn't know how to get the next 100, 1000, etc.
If you have any suggestions and can help in any way, that would be much appreciated!
Thanks
I fixed what I wanted to do with the following:
I created a new column with the category my samples fall in, base on their value (i.e. if they're the top 10 most frequent, next 100, etc etc).
df['category']='10001+'
for sampleref in df.sample_ref.unique().tolist():
print(f'Setting sample {sampleref}')
df.loc[df[df.sample_ref == sampleref].nlargest(10000, 'cloneCount')['category'].index,'category']='1001-10000'
df.loc[df[df.sample_ref == sampleref].nlargest(1000, 'cloneCount')['category'].index,'category']='101-1000'
df.loc[df[df.sample_ref == sampleref].nlargest(100, 'cloneCount')['category'].index,'category']='11-100'
df.loc[df[df.sample_ref == sampleref].nlargest(10, 'cloneCount')['category'].index,'category']='top10'
This code starts from the biggest group (10001+) and goes smaller and smaller, to include overlapping samples that might fall into the next big group.
Following this, I plotted the samples with the following code:
fig, ax = plt.subplots(figsize=(15,7))
df.groupby(['Sample','category']).sum()['cloneFraction'].unstack().plot(ax=ax, kind="bar", stacked=True)
plt.xticks(rotation=0)
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], title='Clonotype',bbox_to_anchor=(1.04,0), loc="lower left", borderaxespad=0)
And here are the results:
I hope this helps anyone struggling with the same issue!

My Seaborn heatmap is showing multiple scales

My seaborn heatmap is showing multiple scales (for each column I presume)
Attached an image showing the code, data & chart.
Wondering how I can remove the multiple scales on the right and show only 1.
clustered_heatmap = clustered_points.groupby("Predicted Clusters").sum()
clustered_heatmap = clustered_heatmap.drop(clustered_heatmap.columns[0], axis = 1)
clustered_heatmap
You can try this:
# Create heatmap
plt.figure(figsize=(16,9))
sns.heatmap(clustered_heatmap)

How to use column values for x axis labels in matplotlib

I have a basic DataFrame in pandas and using matplotlib to create a chart
I have followed advice found on SO and also on the docs for labelling the values on the x axis but they won't change from the indices.
I have this,
Presc_df_asc = Presc_df.sort_values('Total Items',ascending=True)
Presc_df_asc['Total Items'].plot.bar(x="Practice", ylim=[Presc_df_asc['Total Items'].min(), Presc_df_asc['Total Items'].max()])
plt.xlabel('Practice')
plt.ylabel('Total Items')
plt.title('practice total items')
plt.legend(('Items',),loc='upper center')
From what I have found plot.bar(x="Practice" should set the x-axis to show the values int he practice column under each bar.
But no matter what I try I get the x-axis labelled as indices with just the main label saying Practices.
In order for the plotting command to be able to access the "Practice" column, you need to apply the plot function to the entire dataframe (or a sub_dataframe that contains at least these two columns). The code below uses the corresponding labels below each bar. The rot=0 argument prevents the labels from being rotated by 90°.
Presc_df_asc.plot.bar(x="Practice", y ="Total Items",
ylim=[Presc_df_asc['Total Items'].min(),
Presc_df_asc['Total Items'].max()], rot=0)

Groupby Plot - Include Subgroup Name

I'm looking to plot two columns of a time series based on a groupby of a third column. It works as intended more or less, but I can't tell which subgroup is being plotted in the output as it is not included in the legend or anywhere else in the graphs outputted.
Is there a way to include the subgroup name in the graphs outputted?
This is what I've attempted on the dataframe as follows:
dataframe
awareness.groupby('campaign_name')['sum_purchases_value','sum_ad_spend'].plot(figsize=(20,8), legend=True);
Try this:
grouped = awareness.groupby('campaign_name')
titles = [name for name,data in grouped]
plots = grouped['sum_purchases_value',
'sum_ad_spend'].plot(figsize=(20,8), legend=True)
for plot, label in zip(plots, titles):
plot.set(title = label)
The pandas plot function returns a Series of matplotlib subplot objects, so using the for loop you can customize whatever you like (x labels, y labels, font size, etc.)

Python, Seaborn: Logarithmic Swarmplot has unexpected gaps in the swarm

Let's look at a swarmplot, made with Python 3.5 and Seaborn on some data (which is stored in a pandas dataframe df with column lables stored in another class. This does not matter for now, just look at the plot):
ax = sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df)
Now the data is more readable if plotted in log scale on the y-axis because it goes over some decades.
So let's change the scaling to logarithmic:
ax.set_yscale("log")
ax.set_ylim(bottom = 5*10**-10)
Well I have a problem with the gaps in the swarms. I guess they are there because they have been there when the plot is created with a linear axis in mind and the dots should not overlap there. But now they look kind of strange and there is enough space to from 4 equal looking swarms.
My question is: How can I force seaborn to recalculate the position of the dots to create better looking swarms?
mwaskom hinted to me in the comments how to solve this.
It is even stated in the swamplot doku:
Note that arranging the points properly requires an accurate transformation between data and point coordinates. This means that non-default axis limits should be set before drawing the swarm plot.
Setting an existing axis to log-scale and use this for the plot:
fig = plt.figure() # create figure
rect = 0,0,1,1 # create an rectangle for the new axis
log_ax = fig.add_axes(rect) # create a new axis (or use an existing one)
log_ax.set_yscale("log") # log first
sns.swarmplot(x=self.dte.label_temperature, y=self.dte.label_current, hue=self.dte.label_voltage, data = df, ax = log_ax)
This yields in the correct and desired plotting behaviour:

Categories

Resources