Plot using seaborn with FacetGrid where values are ndarray in dataframe - python

I want to plot a dataframe where y values are stored as ndarrays within a column
i.e.:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(index=np.arange(0,4), columns=('sample','class','values'))
for iloc in [0,2]:
df.loc[iloc] = {'sample':iloc,
'class':'raw',
'values':np.random.random(5)}
df.loc[iloc+1] = {'sample':iloc,
'class':'predict',
'values':np.random.random(5)}
grid = sns.FacetGrid(df, col="class", row="sample")
grid.map(plt.plot, np.arange(0,5), "value")
TypeError: unhashable type: 'numpy.ndarray'
Do I need to break out the ndarrays into separate rows? Is there a simple way to do this?
Thanks

This is quite an unusual way of storing data in a dataframe. Two options (I'd recommend option B):
A. Custom mapping in seaborn
Indeed seaborn does not support such format natively. You may construct your own function to plot to the grid though.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(index=np.arange(0,4), columns=('sample','class','values'))
for iloc in [0,2]:
df.loc[iloc] = {'sample':iloc,
'class':'raw',
'values':np.random.random(5)}
df.loc[iloc+1] = {'sample':iloc,
'class':'predict',
'values':np.random.random(5)}
grid = sns.FacetGrid(df, col="class", row="sample")
def plot(*args,**kwargs):
plt.plot(args[0].iloc[0], **kwargs)
grid.map(plot, "values")
B. Unnesting
However I would advise to "unnest" the dataframe first and get rid of the numpy arrays inside the cells.
pandas: When cell contents are lists, create a row for each element in the list shows a way to do that.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(index=np.arange(0,4), columns=('sample','class','values'))
for iloc in [0,2]:
df.loc[iloc] = {'sample':iloc,
'class':'raw',
'values':np.random.random(5)}
df.loc[iloc+1] = {'sample':iloc,
'class':'predict',
'values':np.random.random(5)}
res = df.set_index(["sample", "class"])["values"].apply(pd.Series).stack().reset_index()
res.columns = ["sample", "class", "original_index", "values"]
Then use the FacetGrid in the usual way.
grid = sns.FacetGrid(res, col="class", row="sample")
grid.map(plt.plot, "original_index", "values")

Related

Plot specific column values in Seaborn instead of every column value

I can't figure out how to filter a column and then plot it successfully on Seaborn.
The below code works perfectly and plots a line graph with all of the unique columns values separated.
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(style='darkgrid')
data = wcsales1.loc[wcsales1.Sales_Year > 2016]
sales_year = data['Sales_Year']
ppa = data['Price_Per_Acre']
dates = data['LATEST_LAND_SALE_DATE']
juris = data['PLANNING_JURISDICTION']
sns.relplot(x = sales_year, y = ppa, ci=None, kind='line', hue=juris)
plt.show()
However, I want to plot the values in the variable 'egs', listed below, which are two of many unique values in the variable 'juris'
I tried the below code but am getting a Value Error, also included below.
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(style='darkgrid')
data = wcsales1.loc[wcsales1.Sales_Year > 2016]
data = data.reset_index()
sales_year = data['Sales_Year']
ppa = data['Price_Per_Acre']
dates = data['LATEST_LAND_SALE_DATE']
juris = data['PLANNING_JURISDICTION']
egs = ['HS', 'FV']
south = data.loc[data.PLANNING_JURISDICTION.isin(egs)]
print(type(south))
sns.relplot(x = sales_year, y = ppa, ci=None, kind='line', hue=south)
plt.show()
Error below
Shape of passed values is (19, 3), indices imply (1685, 3)
Thanks for your help!
With sns you should pass the data option and x,y, hue as the columns in the data:
sns.relplot(x='Sales_Year', y='Price_Per_Acre',
hue='PLANNING_JURISDICTION',
data=data.loc[data.PLANNING_JURISDICTION.isin(egs)],
kind='line', ci=None
)

Format numerous floats in a data frame

I need help, I am unable to display the seaborn plot well.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
dataset = pd.read_csv('sales.csv', header=0,sep =',',
usecols = [1,2,3,4])
#remove NaN
dataset.dropna(inplace = True)
df = pd.DataFrame(data=dataset)
sns.regplot(data=df, x='TV', y='sales')
plt.show()
As example for sales_csv :
id,TV,radio,newspaper,sales
1,230.10000000,37.8,69.2,22.1
2,1e12,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
5,180.8,10.8,58.4,12.9
6,8.7,48.9,75,7.2
7,57.5,32.8,23.5,11.8
8,120.2,19.6,11.6,13.2
9,8.6,2.1,1,4.8
10,199.8,2.6,21.2,10.6
11,66.1,5.8,24.2,8.6
12,214.7,24,4,17.4
13,23.8,35.1,65.9,9.2
14,97.5,7.6,7.2,9.7
15,1,32.9,46,19
16,195.4,47.7,52.9,22.4
17,67.8,36.6,114,12.5
18,281.4,39.6,55.8,24.4
19,69.2,20.5,18.3,11.3
20,147.3,23.9,19.1,14.6
21,218.4,27.7,53.4,18
22,237.4,5.1,23.5,12.5
23,13.2,15.9,49.6,5.6
24,228.3,16.9,26.2,15.5
25,62.3,12.6,18.3,9.7
26,262.9,3.5,19.5,12
27,142.9,29.3,12.6,15
28,240.1,16.7,22.9,15.9
29,248.8,27.1,22.9,18.9
30,70.6,16,40.8,10.5
31,292.9,28.3,43.2,21.4
32,112.9,17.4,38.6,11.9
33,97.2,1.5,30,9.6
34,1e12,20,0.3,17.4
The main problem is that the dataset contains values of 1e12 used to represent NA. These values should be replaced or dropped. The easiest way to convert '1e12' to NA is via the na_values='1e12' parameter to pd.read_csv().
Alternatively, dataset.replace(1e12, pd.NA, inplace=True) can be used to convert them later.
Note that dataset already is a dataframe, so the call df = pd.DataFrame(data=dataset) is unnecessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
dataset = pd.read_csv('sales.csv', header=0, sep=',', na_values='1e12',
usecols=[1, 2, 3, 4])
# remove NaN
dataset.dropna(inplace=True)
sns.regplot(data=dataset, x='TV', y='sales')
plt.show()

How do I style only the last row of a pandas dataframe?

I can style a pandas dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
df.style.background_gradient(cmap=cm)
but I can't figure out how to only apply a style to the last row. There is a subset option in the background_gradient call, and it suggests that I use an index slice but I cannot figure out how to make just the last row have any kind of styling.
Here is my closest to success:
df.style.background_gradient(cmap=cm, subset=[2], axis='index')
Use the last element of your index as your subset.
df.style.background_gradient(cmap=cm, axis=1, subset=df.index[-1])
You could also use pd.IndexSlice which is useful if you want to apply the style to multiple rows, including the last:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
indices = pd.IndexSlice[[0, df.last_valid_index()], :]
df.style.background_gradient(cmap=cm, axis=1, subset=indices)

Seaborn boxplot showing number on x-axis, not the name of pd.Series object

Problem : I want my seaborn boxplot to show names of pd.Series(Group A, Group B)
on X axis, but it only shows number. The number 0 for the first pd.Series, and 1 for the next pd.Series object.
My codes are as follows.
import pandas as pd
import seaborn as sns
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
sns.set(style="whitegrid")
ax=sns.boxplot(data=[Group_A, Group_B], palette='Set2')
Result :
You can concatenate the two series into a dataframe. There are a lot of options to do so, here is one example which will produce nice names:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
df = pd.DataFrame({"ColumnA" : Group_A, "ColumnB" : Group_B})
sns.set(style="whitegrid")
ax=sns.boxplot(data=df , palette='Set2')
plt.show()

Changing the size of the heatmap specifically in a seaborn clustermap?

I'm making a clustered heatmap in seaborn as follows
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data)
but the rows are squished:
but if I pass a size to the clustermap function then it looks terrible
is there a way to only increase the size of the heatmap part? So that the row names can be read, but not stretch out the cluster portions.
As #mwaskom commented, I was able to use ax_heatmap.set_position along with the get_position function to achieve the correct result.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(2)
data = np.random.randn(100, 10)
cm = sns.clustermap(data)
hm = cm.ax_heatmap.get_position()
plt.setp(cm.ax_heatmap.yaxis.get_majorticklabels(), fontsize=6)
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width*0.25, hm.height])
col = cm.ax_col_dendrogram.get_position()
cm.ax_col_dendrogram.set_position([col.x0, col.y0, col.width*0.25, col.height*0.5])
This can be done by passing the value of the dendrogram ratio in the kw arguments
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data,figsize=(12,30),dendrogram_ratio=0.02,cmap='RdBu')

Categories

Resources