Seaborn catplot results in error by changing hue - python

I have a dataset that looks like this:
feature_1
feature_2
feature_3
feature_4
feature_5
feature_6
feature_7
feature_8
0
-0.0020185900105266514
-0.004525512052716703
0.004290147446159787
0.008121342033951665
0.019995812082180105
0.02034942055088337
-0.02236798581774497
-0.018665971326321824
1
0.008327938744324304
0.0057161731520134415
0.015149000101932132
0.014244686228342962
0.031266799783999905
0.02556201262830425
0.00491191281881069
0.002627771331087464
2
0.0056570911367399175
0.006780099460379361
-0.0038521559525533412
-0.0042372049750104175
0.025755417055772233
0.029050369619095566
-0.0016924684746490136
0.001915807620861465
3
-0.0066361424845156666
-0.006829267976941566
0.008195242107994306
0.00993842145208005
0.02794638215808405
0.025168342480038512
-0.013222987355723491
-0.011178407242310215
4
0.005111817323414786
0.002367954071875622
-0.0013140356150100757
-0.0027816139194379794
0.025028881734832177
0.029704777330334546
0.0073461329985677545
0.008414726948742138
I have been able to create a catplot that is almost perfect, like this:
sns.catplot(data=test_df, palette="dark", orient="h")
Resulting in:
However, I want the colors to change depending on the results of a list (which I could append to test_df). The list is as follows:
classifications = ["class_1", "class_2", "class_1", "class_1", "class_2"]. Ideally, I'd like for the colors of the points to be different depending on the class.
Trying to add the hue parameter errors out, resulting in ValueError: Cannot use 'hue' without 'x' and 'y'
How can I change the colors of the points based on the values of the classifications list?

You can add the class column and melt() into seaborn's preferred long form:
test_df["class"] = classifications
melted = test_df.melt("class", value_name="value", var_name="feature")
sns.catplot(data=melted, x="value", y="feature", hue="class", palette="dark", orient="h")

Related

How to show Pandas Dataframe as a table in a pretty way with adjustable colspan?

I have a dataframe that I get from a SQL Select with an Order by condition, so that it looks like:
Class Sublcass Value
0 A X 0.000000
1 A Y 0.184650
2 A Z 1.000000
3 B X 1.381629
4 B Y -0.031118
Then I transposed it to be like:
Class A B
Subclass X Y Z X Y Z
Value 0.0 0.18465 1.0 1.381629 -0.031118 0.636372
Now, I want to show the dataframe as a Plotly table and with the following code, directly from the Plotly doc page
import plotly.graph_objects as go
fig = go.Figure(data=[go.Table(
header=dict(values=list(df.columns),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[df[col] for col in df.columns],
fill_color='lavender',
align='left'))
])
fig.show()
Here the resulting table
But what I want instead is the Class to have a multiple column span, like
this edited picture
I already know that this can be done with df.to_html() but I'd prefer a Plotly solution, cause I also have to plot other types of graphs from the same dataset and thus I can then put them all together.
Edit:
If there's no solution using Plotly, even other plotting libraries are welcome, as far as I can also polt scatter and line graphs in the same subplot
I didn't quite understand your question, but see if it helps you, it's a library that allows you to visualize the data in a better way.

How to adjust scale ranges in altair?

I'm having trouble getting all of the axes onto the same scale when using altair to make a group of plots like so:
class_list = ['c-CS-m','c-CS-s','c-SC-m','c-SC-s','t-CS-m','t-CS-s','t-SC-m','t-SC-s']
list_of_plots = []
for class_name in class_list:
list_of_plots.append(alt.Chart(data[data['class'] == class_name]).mark_bar().encode(
x = alt.X('DYRK1A', bin = True, scale=alt.Scale()),
y = 'count()').resolve_scale(
y='independent'
))
list_of_plots[0] & list_of_plots[1] | list_of_plots[2] & list_of_plots[3] | list_of_plots[4] & list_of_plots[5] | list_of_plots[6] & list_of_plots[7]
I'd like to have the x axis run from 0.0 to 1.4 and the y axis run from 0 to 120 so that all eight plots I'm producing are on the same scale! I've tried to use domain, inside the currently empty Scale() call but it seems to result in the visualisations that have x axis data from say 0.0 to 0.3 being super squished up and I can't understand why?
For context, I'm trying to plot continuous values for protein expression levels. The 8 plots are for different classes of mice that have been exposed to different conditions. The data is available at this link if that helps: https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression
Please let me know if I need to provide some more info in order for you to help me!
First of all, it looks like you're trying to create a wrapped facet chart. Rather than doing that manually with concatenation, it's better to use a wrapped facet encoding.
Second, when you specify resolve_scale(y='independent'), you're specifying that the y-scales should not match between subcharts. If instead you want all scales to be shared, you can use resolve_scale(y='shared'), or equivalently just leave that out, as it is the default.
To specify explicit axis domains, use alt.Scale(domain=[min, max]). Put together, it might look something like this:
alt.Chart(data).mark_bar().encode(
x = alt.X('DYRK1A', bin = True, scale=alt.Scale(domain=[0, 1.4])),
y = alt.Y('count()', scale=alt.Scale(domain=[0, 120]),
facet = alt.Facet('class:N', columns=4),
)

How to plot errorbar in line chart from dataframes

I have two dataframes, df_avg and df_sem, which contain mean values and standard errors of the means, respectively. For example:
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 15.185905 24.954296 22.610052 29.249107 26.151815 34.374257 36.589218
20.08 15.198452 24.998227 22.615342 29.229325 26.187794 34.343738 36.596730
20.23 15.208917 25.055061 22.647499 29.234424 26.193382 34.363549 36.580033
20.47 15.244485 25.092773 22.691421 29.206816 26.202425 34.337385 36.640839
20.62 15.270921 25.145798 22.720752 29.217821 26.235101 34.364162 36.600030
and
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 0.342735 0.983424 0.131502 0.893494 1.223318 0.536450 0.988185
20.08 0.347366 0.983732 0.136239 0.898661 1.230763 0.534779 0.993970
20.23 0.348641 0.981614 0.134729 0.898790 1.227567 0.529240 1.005609
20.47 0.350937 0.993973 0.138411 0.881142 1.237749 0.526841 0.991591
20.62 0.345863 0.983064 0.132934 0.883863 1.234746 0.533048 0.987520
I want to plot a line chart using temp as the x-axis and the dataframe columns as the y-axes. I also want to use the df_sem dataframe to provide error bars for each line (note the column names are the same between the two dataframes).
I can achieve this with the following code:
df_avg.plot(yerr=df_sem), but this does not allow me to change many aspects of the plot, like DPI, labels, and things like that.
So I've tried to make the plot using the following code as an alternative:
plt.figure()
x = df_avg.index
y = df_avg
plt.errorbar(x,y,yerr=df_sem)
plt.show()
But this gives me the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape
How do I go about making the same chart that I am able to using pandas plotting with matplotlib plotting?
Thanks!
You can do just a simple for loop:
for col in df_avg.columns:
plt.errorbar(df_avg.index, df_avg[col], yerr=df_sem[col], label=col)
plt.legend()
Output:

Python sort_values plot is inverted

new Python learner here. This seems like a very simple task but I can't do it to save my life.
All I want to do is to grab 1 column from my DataFrame, sort it, and then plot it. THAT'S IT. But when I plot it, the graph is inverted. Upon examination, I find that the values are sorted, but the index is not...
Here is my simple 3 liner code:
testData = pd.DataFrame([5,2,4,2,5,7,9,7,8,5,4,6],[9,4,3,1,5,6,7,5,4,3,7,8])
x = testData[0].sort_values()
plt.plot(x)
edit:
Using matplotlib
If you're talking about ordering values sequentially on the x-axis like 0, 1, 2, 3, 4 ... You need to re-index your values.
x = testData[0].sort_values()
x.index = range(len(x))
plt.plot(x)
Other than that if you want your values sorted in the data frame but displayed by order of index then you want a scatter plot not a line plot
plt.scatter(x.index, x.values)

Plot rolling mean together with data

I have a DataFrame that looks something like this:
####delays:
Worst case Avg case
2014-10-27 2.861433 0.953108
2014-10-28 2.899174 0.981917
2014-10-29 3.080738 1.030154
2014-10-30 2.298898 0.711107
2014-10-31 2.856278 0.998959
2014-11-01 3.118587 1.147104
...
I would like to plot the data of this DataFrame, together with the rolling mean of the data. I would like the data itself should be a dotted line and the rolling mean to be a full line. The worst case column should be in red, while the average case column should be in blue.
I've tried the following code:
import pandas as pd
import matplotlib.pyplot as plt
rolling = pd.rolling_mean(delays, 7)
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
plt.title('Delays per day on entire network')
plt.xlabel('Date')
plt.ylabel('Minutes')
plt.show()
Unfortunately, this gives me 2 different plots. One with the data and one with the rolling mean. Also, the worst case column and average case column are both in red.
How can I get this to work?
You need to say to pandas where you want to plot. By default pandas creates a new figure.
Just modify these 2 lines:
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
by:
ax_delays = delays.plot(x_compat=True, style='--', color=["r","b"])
rolling.plot(color=["r","b"], ax=ax_delays, legend=0)
in the 2nd line you now tell pandas to plot on ax_delays, and to not show the legend again.
To get 2 different colors for the 2 lines, just pass as many colors with color argument (see above).

Categories

Resources