I can't figure out how to filter a column and then plot it successfully on Seaborn.
The below code works perfectly and plots a line graph with all of the unique columns values separated.
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(style='darkgrid')
data = wcsales1.loc[wcsales1.Sales_Year > 2016]
sales_year = data['Sales_Year']
ppa = data['Price_Per_Acre']
dates = data['LATEST_LAND_SALE_DATE']
juris = data['PLANNING_JURISDICTION']
sns.relplot(x = sales_year, y = ppa, ci=None, kind='line', hue=juris)
plt.show()
However, I want to plot the values in the variable 'egs', listed below, which are two of many unique values in the variable 'juris'
I tried the below code but am getting a Value Error, also included below.
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set(style='darkgrid')
data = wcsales1.loc[wcsales1.Sales_Year > 2016]
data = data.reset_index()
sales_year = data['Sales_Year']
ppa = data['Price_Per_Acre']
dates = data['LATEST_LAND_SALE_DATE']
juris = data['PLANNING_JURISDICTION']
egs = ['HS', 'FV']
south = data.loc[data.PLANNING_JURISDICTION.isin(egs)]
print(type(south))
sns.relplot(x = sales_year, y = ppa, ci=None, kind='line', hue=south)
plt.show()
Error below
Shape of passed values is (19, 3), indices imply (1685, 3)
Thanks for your help!
With sns you should pass the data option and x,y, hue as the columns in the data:
sns.relplot(x='Sales_Year', y='Price_Per_Acre',
hue='PLANNING_JURISDICTION',
data=data.loc[data.PLANNING_JURISDICTION.isin(egs)],
kind='line', ci=None
)
I need help, I am unable to display the seaborn plot well.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
dataset = pd.read_csv('sales.csv', header=0,sep =',',
usecols = [1,2,3,4])
#remove NaN
dataset.dropna(inplace = True)
df = pd.DataFrame(data=dataset)
sns.regplot(data=df, x='TV', y='sales')
plt.show()
As example for sales_csv :
id,TV,radio,newspaper,sales
1,230.10000000,37.8,69.2,22.1
2,1e12,39.3,45.1,10.4
3,17.2,45.9,69.3,9.3
4,151.5,41.3,58.5,18.5
5,180.8,10.8,58.4,12.9
5,180.8,10.8,58.4,12.9
6,8.7,48.9,75,7.2
7,57.5,32.8,23.5,11.8
8,120.2,19.6,11.6,13.2
9,8.6,2.1,1,4.8
10,199.8,2.6,21.2,10.6
11,66.1,5.8,24.2,8.6
12,214.7,24,4,17.4
13,23.8,35.1,65.9,9.2
14,97.5,7.6,7.2,9.7
15,1,32.9,46,19
16,195.4,47.7,52.9,22.4
17,67.8,36.6,114,12.5
18,281.4,39.6,55.8,24.4
19,69.2,20.5,18.3,11.3
20,147.3,23.9,19.1,14.6
21,218.4,27.7,53.4,18
22,237.4,5.1,23.5,12.5
23,13.2,15.9,49.6,5.6
24,228.3,16.9,26.2,15.5
25,62.3,12.6,18.3,9.7
26,262.9,3.5,19.5,12
27,142.9,29.3,12.6,15
28,240.1,16.7,22.9,15.9
29,248.8,27.1,22.9,18.9
30,70.6,16,40.8,10.5
31,292.9,28.3,43.2,21.4
32,112.9,17.4,38.6,11.9
33,97.2,1.5,30,9.6
34,1e12,20,0.3,17.4
The main problem is that the dataset contains values of 1e12 used to represent NA. These values should be replaced or dropped. The easiest way to convert '1e12' to NA is via the na_values='1e12' parameter to pd.read_csv().
Alternatively, dataset.replace(1e12, pd.NA, inplace=True) can be used to convert them later.
Note that dataset already is a dataframe, so the call df = pd.DataFrame(data=dataset) is unnecessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
dataset = pd.read_csv('sales.csv', header=0, sep=',', na_values='1e12',
usecols=[1, 2, 3, 4])
# remove NaN
dataset.dropna(inplace=True)
sns.regplot(data=dataset, x='TV', y='sales')
plt.show()
I can style a pandas dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
df.style.background_gradient(cmap=cm)
but I can't figure out how to only apply a style to the last row. There is a subset option in the background_gradient call, and it suggests that I use an index slice but I cannot figure out how to make just the last row have any kind of styling.
Here is my closest to success:
df.style.background_gradient(cmap=cm, subset=[2], axis='index')
Use the last element of your index as your subset.
df.style.background_gradient(cmap=cm, axis=1, subset=df.index[-1])
You could also use pd.IndexSlice which is useful if you want to apply the style to multiple rows, including the last:
import pandas as pd
import numpy as np
import seaborn as sns
cm = sns.diverging_palette(-5, 5, as_cmap=True)
df = pd.DataFrame(np.random.randn(3, 4))
indices = pd.IndexSlice[[0, df.last_valid_index()], :]
df.style.background_gradient(cmap=cm, axis=1, subset=indices)
Problem : I want my seaborn boxplot to show names of pd.Series(Group A, Group B)
on X axis, but it only shows number. The number 0 for the first pd.Series, and 1 for the next pd.Series object.
My codes are as follows.
import pandas as pd
import seaborn as sns
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
sns.set(style="whitegrid")
ax=sns.boxplot(data=[Group_A, Group_B], palette='Set2')
Result :
You can concatenate the two series into a dataframe. There are a lot of options to do so, here is one example which will produce nice names:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Group_A=pd.Series([26,21,22,26,19,22,26,25,24,21,23,23,18,29,22])
Group_B=pd.Series([18,23,21,20,20,29,20,16,20,26,21,25,17,18,19])
df = pd.DataFrame({"ColumnA" : Group_A, "ColumnB" : Group_B})
sns.set(style="whitegrid")
ax=sns.boxplot(data=df , palette='Set2')
plt.show()
I'm making a clustered heatmap in seaborn as follows
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data)
but the rows are squished:
but if I pass a size to the clustermap function then it looks terrible
is there a way to only increase the size of the heatmap part? So that the row names can be read, but not stretch out the cluster portions.
As #mwaskom commented, I was able to use ax_heatmap.set_position along with the get_position function to achieve the correct result.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
np.random.seed(2)
data = np.random.randn(100, 10)
cm = sns.clustermap(data)
hm = cm.ax_heatmap.get_position()
plt.setp(cm.ax_heatmap.yaxis.get_majorticklabels(), fontsize=6)
cm.ax_heatmap.set_position([hm.x0, hm.y0, hm.width*0.25, hm.height])
col = cm.ax_col_dendrogram.get_position()
cm.ax_col_dendrogram.set_position([col.x0, col.y0, col.width*0.25, col.height*0.5])
This can be done by passing the value of the dendrogram ratio in the kw arguments
import numpy as np
import seaborn as sns
np.random.seed(2)
data = np.random.randn(100, 10)
sns.clustermap(data,figsize=(12,30),dendrogram_ratio=0.02,cmap='RdBu')