creating stack chart using categorical data - python

I am going to design a stacked bar chart using the categorical data in this data frame:
gender distress
female high
male low
female high
male high
male medium
female high
male medium
male medium
female low
I know that I can filter the data based on gender and then counts the distress and then draw the stacked chart. Is there a faster way to do it?

First using crosstab, then plot
pd.crosstab(df.gender,df.distress).plot(kind='bar',stacked=True)

Related

Calculating the Entropy of Data in a Table or Matrix

Color Height Sex
----------------------
Red Short Male
Red Tall Male
Blue Medium Female
Green Medium Female
Green Tall Female
Green Short Male
How to compute the entropy of the table as a whole in python?

How could I plot 2 different values (X) in one column in the dataset, python

I want to plot The gender in one scatter plot
x=[0,1] .. 0 refers to male and 1 refers to female
y= cholesterol , here is my code.. the problem I don't want any value in x-axis but male and female
plt.scatter(heart_data["Sex"],heart_data["Cholesterol"] ,s=20,c='pink')
plt.xlabel("Sex")
plt.ylabel("Cholesterol")

Seaborn countplot show wrong results on Titanic dataset

I'm working on the Titanic dataset which I've got it from this website:
https://public.opendatasoft.com/explore/dataset/titanic-passengers/table/?flg=fr
I want to show the number of male and female persons for each survived class (yes or no).
First of all I got the whole number of male and female persons using:
bysex=data1['Sex'].value_counts()
print(bysex)
This gave me these results:
male 577
female 314
Name: Sex, dtype: int64
The results show that the number of male persons is greater than female persons.
But when I use seaborn to show the number of male and female persons for each survived class using this code:
plot1 = sns.FacetGrid(data1, col='Survived')
plot1.map(sns.countplot,'Sex')
Then I get this results:
enter image description here
Here it shows that the number of female is greater than the number of male and for no survived class the number of female (around 450) is even greater than the total number of female persons (314).
How is this possible?
I think there is something wrong with the mapping.
In the left plot Sex are interchanged.
data1.loc[data1["Survived"] == "No", 'Sex'].value_counts()
male 468
female 81
Name: Sex, dtype: int64
and the second plot is right.
data1.loc[data1["Survived"] == "Yes", 'Sex'].value_counts()
female 233
male 109
Name: Sex, dtype: int64
On the other hand when you use
ax = sns.countplot(x="Survived", hue="Sex", data=data1)
you get the right results.

Seaborn lmplot - Changing Marker Style and Color of single Datapoint

I was trying to find an answer to Harvards CS109, Homework 1, Part 1c from the year 2013 using seaborn, which they don't.
"Choose a plot to show this relationship and specifically annotate the Oakland baseball team on the on the plot. Show this plot across multiple years. In which years can you detect a competitive advantage from the Oakland baseball team of using data science? When did this end?"
So we do have for multiple years and multiple teams, salaries as well as wins. I want to build a seaborn facet for each year regressing salaries against wins AND call out the datapoint for Oakland. Building the facet for one regression for each year works fine. But how would I change the data point for oakland?
Thats how my data looks like (the first 5 entries):
teamID yearID salary W
0 ANA 1997 31135472 84
1 ANA 1998 41281000 85
2 ANA 1999 55388166 70
3 ANA 2000 51464167 82
4 ANA 2001 47535167 75
...
This is how I am plotting the data:
facetplots = sns.lmplot(x="salary", y="W", col="yearID", data=df_data, col_wrap=4, size=3)
Any help would be much appreciated.
Regards

Stacked Bar Plot with Two Key DataFrame

I have a dataframe with two keys. I'm looking to do a stacked bar plot of the number of items within key2 (meaning taking the count values from a fully populated column of data).
A small portion of the dataframe I have is:
Sector industry
Basic Industries Agricultural Chemicals 17
Aluminum 3
Containers/Packaging 1
Electric Utilities: Central 2
Engineering & Construction 12
Name: Symbol, dtype: int64
Key1 is Sector, Key2 is Industry. I want the value in Symbol (the counted column to be represented as industry stackings) in a bar comprising Basic Industries.
I know if I do a df.reset_index I'll have a column with (non-unique) Sectors and Industries with an integer counter. Is there a way to simply assign the column 1,2,3 data to pandas plot or matplotlib to make a stacked bar chart?
Alternatively, is there a way to easily specify using both keys in the aforementioned dataframe?
I'm looking for both guidance on approach from more experienced people as well as help with the actual syntax.
I just added a new Sector to improve the example.
Symbol
Sector industry
Basic Industries Agricultural Chemicals 17
Aluminum 3
Containers/Packaging 1
Electric Utilities: Central 2
Engineering & Construction 22
Basic Industries2 Agricultural Chemicals 7
Aluminum 8
Containers/Packaging 11
Electric Utilities: Central 7
Engineering & Construction 4
Assuming your dataframe is indexed by ["Sector", "industry"] you need first reset_index and then pivot your dataframe and finally make the stacked plot.
df.reset_index().pivot_table(index="industry", columns="Sector", values="Symbol").T.plot(kind='bar', stacked=True, figsize=(14, 6))
Another way, instead of reset_index, you can use this:
df.unstack().Symbol.plot(kind='bar', stacked=True)

Categories

Resources