I'm trying to visualize the following .csv data:
Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
4,4,2,2,4,2,3,5,3,4,2,5,2,1,4,4,2,1,5,2
2,2,4,4,4,2,2,2,4,4,2,4,2,2,3,2,2,4,5,2
4,5,4,1,4,2,2,4,4,3,2,2,2,1,2,4,4,2,5,4
3,4,2,4,4,2,2,2,4,3,2,4,4,3,3,4,2,4,5,1
4,4,3,2,4,3,4,5,4,3,1,5,3,2,4,2,2,3,4,2
4,5,2,3,5,1,3,4,3,3,1,2,4,4,5,4,1,4,5,4
5,5,5,2,4,3,2,4,4,2,2,4,4,2,4,2,2,4,4,5
4,4,3,1,5,3,2,4,2,2,1,4,4,2,4,1,2,5,5,3
1,3,5,2,4,4,3,1,4,4,2,3,1,4,3,4,3,3,4,1
3,3,5,2,4,2,4,4,3,4,1,5,4,2,1,2,2,4,5,2
Here's my code:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
map = sns.clustermap(df, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
plt.show()
Which returns:
I want to rearrange my heatmap and order it column-wise by the frequency of each response. For example, The column Q5 has the value 4 repeated 8 times (more than any other column), so it should be the first column. Columns 17 and 19 have a value that is repeated 7 times, so they should come in second and third (exact order doesn't matter). How can I do this?
You can compute the order and reindex before using the data in clustermap:
order = (df.apply(pd.Series.value_counts)
.max()
.sort_values(ascending=False)
.index
)
import seaborn as sns
cm = sns.clustermap(df[order], col_cluster=False, annot=True, linewidths=2, linecolor='yellow', metric="correlation", method="single")
Output:
I have a multi index dataframe, with the two indices being Sample and Lithology
Sample 20EC-P 20EC-8 20EC-10-1 ... 20EC-43 20EC-45 20EC-54
Lithology Pd Di-Grd Gb ... Hbl Plag Pd Di-Grd Gb
Rb 7.401575 39.055118 6.456693 ... 0.629921 56.535433 11.653543
Ba 24.610102 43.067678 10.716841 ... 1.073115 58.520532 56.946630
Th 3.176471 19.647059 3.647059 ... 0.823529 29.647059 5.294118
I am trying to put it into a seaborn lineplot as such.
spider = sns.lineplot(data = data, hue = data.columns.get_level_values("Lithology"),
style = data.columns.get_level_values("Sample"),
dashes = False, palette = "deep")
The lineplot comes out as
1
I have two issues. First, I want to format hues by lithology and style by sample. Outside of the lineplot function, I can successfully access sample and lithology using data.columns.get_level_values, but in the lineplot they don't seem to do anything and I haven't figured out another way to access these values. Also, the lineplot reorganizes the x-axis by alphabetical order. I want to force it to keep the same order as the dataframe, but I don't see any way to do this in the documentation.
To use hue= and style=, seaborn prefers it's dataframes in long form. pd.melt() will combine all columns and create new columns with the old column names, and a column for the values. The index too needs to be converted to a regular column (with .reset_index()).
Most seaborn functions use order= to set an order on the x-values, but with lineplot the only way is to make the column categorical applying a fixed order.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
column_tuples = [('20EC-P', 'Pd '), ('20EC-8', 'Di-Grd'), ('20EC-10-1 ', 'Gb'),
('20EC-43', 'Hbl Plag Pd'), ('20EC-45', 'Di-Grd'), ('20EC-54', 'Gb')]
col_index = pd.MultiIndex.from_tuples(column_tuples, names=["Sample", "Lithology"])
data = pd.DataFrame(np.random.uniform(0, 50, size=(3, len(col_index))), columns=col_index, index=['Rb', 'Ba', 'Th'])
data_long = data.melt(ignore_index=False).reset_index()
data_long['index'] = pd.Categorical(data_long['index'], data.index) # make categorical, use order of the original dataframe
ax = sns.lineplot(data=data_long, x='index', y='value',
hue="Lithology", style="Sample", dashes=False, markers=True, palette="deep")
ax.set_xlabel('')
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.02))
plt.tight_layout() # fit legend and labels into the figure
plt.show()
The long dataframe looks like:
index Sample Lithology value
0 Rb 20EC-P Pd 6.135005
1 Ba 20EC-P Pd 6.924961
2 Th 20EC-P Pd 44.270570
...
I have a dataframe that looks like this:
num_column is_train
30.75 1
12.05 1
.. ..
43.79 0
15.35 0
I want to see the distribution of num_column using a violin plot and with each side(or split) of the violin showing the data for each of my two categories in is_train column.
From the examples in documentation, here's what I could come up with:
import seaborn as sns
sns.violinplot(x=merged_data.loc[:,'num_column'], hue=merged_data.loc[:,'is_train'], split=True)
From the result of this, I could see that the arguments hue and split had no effect at all. Meaning sides of the violin weren't split and I couldn't see any legend, so I presumed hue argument had no effect.
I am trying to compare distributions of a column from my train and test data.
The split= argument is to be used with hue-nesting, which can only be used if you already have an x= argument. Therefore you need to provide columns for both x (should be the same value for both datasets) and hue (coded depending on the dataset):
merged_data['dummy'] = 0
sns.violinplot(data=merged_data, y='num_column', split=True, hue='is_train', x='dummy')
You can use the x= parameter to create multiple violins. The hue and split parameters are used when a differentiation via a third column is needed.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
merged_data = pd.DataFrame({'num_column': 20 + np.random.randn(1000).cumsum(),
'is_train': np.repeat([0, 1], 500)})
sns.violinplot(data=merged_data, x='is_train', y='num_column')
plt.show()
In my dataset I have a categorical column named 'Type'contain(eg.,INVOICE,IPC,IP) and 'Date' column contain dates(eg,2014-02-01).
how can I plot these two.
On x axis I want date
On y axis a line for (eg.INVOCE) showing its trend
enter image description here
Not very sure what you mean by plot and show trend, one ways is to count like #QuangHoang suggested, and plot with a heatmap, something like below. If it is something different, please expand on your question.
import pandas as pd
import numpy as np
import seaborn as sns
dates = pd.date_range(start='1/1/2018', periods=5, freq='3M')[np.random.randint(0,5,20)]
type = np.random.choice(['INVOICE','IPC','IP'],20)
df = pd.DataFrame({'dates':dates ,'type':type})
tab = pd.crosstab(df['type'],df['dates'].dt.strftime('%d-%m-%Y'))
n = np.unique(tab.values)
cmap = sns.color_palette("BuGn_r",len(n))
sns.heatmap(tab,cmap=cmap)
I have this dataframe Gas Price Brazil /
Data Frame
I get only the gasoline values from this DF and want to plot the average price (PREÇO MEDIO) over time (YEARS - ANO) from each region (REGIAO)
I used Seaborn with HUE and get this:
But when I try to plot the same thing at Plotly the result is:
How can I get the same plot with plotly?
I searched and find this: Seaborn Hue on Plotly
But this didn't work to me.
The answer:
You will achieve the same thing using plotly express and the color attribute:
fig = px.line(dfm, x="dates", y="value", color='variable')
The details:
You haven't described the structure of your data in detail, but assigning hue like this is normally meant to be applied to a data structure such as...
Date Variable Value
01.01.2020 A 100
01.01.2020 B 90
01.02.2020 A 110
01.02.2020 B 120
... where a unique hue or color is assigned to different variable names that are associated with a timestamp column where each timestamp occurs as many times as there are variables.
And that seems to be the case for seaborn too:
hue : name of variables in data or vector data, optional
Grouping variable that will produce points with different colors. Can
be either categorical or numeric, although color mapping will behave
differently in latter case.
You can achieve the same thing with plotly using the color attribute in go.Scatter(), but it seems that you could make good use of plotly.express too. Until you've provided a proper data sample, I'll show you how to do it using some sampled data in a dataframe using numpy and pandas.
Plot:
Code:
# imports
import numpy as np
import pandas as pd
import plotly.express as px
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(50, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2020, 1, 1).strftime('%Y-%m-%d'), periods=50).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum().reset_index()
# melt data to provide the data structure mentioned earlier
dfm=pd.melt(df, id_vars=['dates'], value_vars=df.columns[1:])
dfm.set_index('dates')
dfm.head()
# plotly
fig = px.line(dfm, x="dates", y="value", color='variable')
fig.show()