Python Pandas Seaborn - bar chart / histogram with two columns - python

I have the next dataframe in pandas-
Perpetrator Perpetrator Gender
Age Sex
1 2 Female
2 2 Female
3 3 Female
4 5 Female
5 7 Female
6 7 Female
7 7 Female...
where:
Perpetrator Age means the age of the Perpetrator
Gender means the perpetrator gender and
Perpetrator Sex mean the amount of perpetrators of that gender
for example - there are 5 female perpetrators that are 4 years old.
I am trying to make a seaborn bar chart that has two sides (columns) - one for female and one for male, and see the amounts of each age.
tried using-
g = sns.catplot(x="Perpetrator Age", y="Perpetrator Sex",col="Gender",
data=final_df5, saturation=.5,
kind="bar")
and
sns.displot(penguins, x="flipper_length_mm", col="sex", multiple="dodge")
(from here )
but nothing seems to work.
I keep getting this error -
ValueError: Could not interpret input 'Perpetrator Age'
Thank you

What do you get when you try:
print(df.columns)
You want it to look like:
Index(['Perpetrator Age', 'Perpetrator Sex', 'Gender'], dtype='object')
But, it looks like you may have heirarchical-indexed data. If you don't and it looks like above, you can try this seaborn plotting code:
import seaborn as sns
g = sns.catplot(x='Perpetrator Age', y="Perpetrator Sex", hue="Gender",
data=df,saturation=.5, dodge=True, ci=None,kind="bar")
You need to change the col= to hue= in your code, and set dodge=True.
Result from random data.:
EDIT
It looks like your dataframe's index is the Perpetrator's Age. To solve your issue reset the index and then plot (this time the code plot's the genders in two separate plots):
final_df5. reset_index(inplace=True)
import seaborn as sns
g = sns.catplot(x='Perpetrator Age', y="Perpetrator Sex",
col='Gender', color='blue',
data=final_df5, dodge=True,
ci=None, kind="bar")
Result:

Related

plot a bar chart using groupby function and plotly and streamlit

i am trying to plot a bar chart based on groupby function but once i try it crash and display the below error:
this error below appear when the user select 3 items from the multiselect widget.
ValueError: All arguments should have the same length. The length of
argument color is 3, whereas the length of previously-processed
arguments ['gender', 'count'] is 95
code:
some_columns_df = df.loc[:,['gender','country','city','hoby','company','status']]
some_collumns = some_columns_df.columns.tolist()
select_box_var= st.selectbox("Choose X Column",some_collumns)
multiselect_var= st.multiselect("Select Columns To GroupBy",some_collumns)
test_g3 = df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
fig = px.histogram(test_g3,x=select_box_var, y='count',color=multiselect_var ,barmode = 'group',text_auto = True)
I know the error is in the color parameter in the px.histogram
The reason is color only accepts one category.
color=['column_a','column_b']
Would cause
ValueError: All arguments should have the same length. The length of argument color is 2, whereas the length of previously-processed arguments ['total_bill'] is 244
2 is the length of list ['column_a','column_b'], while 244 is the dataframe's rows.
According to the document:
color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.
Therefore, either we use a column_name, or we use a series.
Here's my approach:
import plotly.express as px
df = px.data.tips() # a data set from plotly
df.head()
Output
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
Column:
sex with unique values Female and Male
time with unique values Dinner and Lunch
I choose these two columns, it's easier to figure out that there is only
4 combination.
We create a series that concat columns sex and time
categories = df[['sex','time']].agg(', '.join, axis=1)
print(categories)
Output
0 Female, Dinner
1 Male, Dinner
2 Male, Dinner
3 Male, Dinner
4 Female, Dinner
...
239 Male, Dinner
240 Female, Dinner
241 Male, Dinner
242 Male, Dinner
243 Female, Dinner
Length: 244, dtype: object
Utilize this categories as color reference
fig = px.histogram(df, x="total_bill", color =categories)
fig.show()
If ','.join didn't work, having issue,
categories = df[['sex','time']].agg(', '.join, axis=1)
then we try another way
categories = df['sex'] + df['time']
Sup[1]

How to create a Pandas dataframe from another column in a dataframe by splitting it?

I have the following source dataframe
Person
Country
Is Rich?
0
US
Yes
1
India
No
2
India
Yes
3
US
Yes
4
US
Yes
5
India
No
6
US
No
7
India
No
I need to convert it another dataframe for plotting a bar graph like below for easily accessing data
Bar chart of economic status per country
Data frame to be created is like below.
Country
Rich
Poor
US
3
1
India
1
3
I am new to Pandas and Exploratory data science. Please help here
You can try pivot_table
df['Is Rich?'] = df['Is Rich?'].replace({'Yes': 'Rich', 'No': 'Poor'})
out = df.pivot_table(index='Country', columns='Is Rich?', values='Person', aggfunc='count')
print(out)
Is Rich? Poor Rich
Country
India 3 1
US 1 3
You could do:
converted = df.assign(Rich=df['Is Rich?'].eq('Yes')).eval('Poor = ~Rich').groupby('Country').agg({'Rich': 'sum', 'Poor': 'sum'})
print(converted)
Rich Poor
Country
India 1 3
US 3 1
However, if you want to plot it as a barplot, the following format might work best with a plotting library like seaborn:
plot_df = converted.reset_index().melt(id_vars='Country', value_name='No. of people', var_name='Status')
print(plot_df)
Country Status No. of people
0 India Rich 1
1 US Rich 3
2 India Poor 3
3 US Poor 1
Then, with seaborn:
import seaborn as sns
sns.barplot(x='Country', hue='Status', y='No. of people', data=plot_df)
Resulting plot:

How could I plot 2 different values (X) in one column in the dataset, python

I want to plot The gender in one scatter plot
x=[0,1] .. 0 refers to male and 1 refers to female
y= cholesterol , here is my code.. the problem I don't want any value in x-axis but male and female
plt.scatter(heart_data["Sex"],heart_data["Cholesterol"] ,s=20,c='pink')
plt.xlabel("Sex")
plt.ylabel("Cholesterol")

Seaborn countplot show wrong results on Titanic dataset

I'm working on the Titanic dataset which I've got it from this website:
https://public.opendatasoft.com/explore/dataset/titanic-passengers/table/?flg=fr
I want to show the number of male and female persons for each survived class (yes or no).
First of all I got the whole number of male and female persons using:
bysex=data1['Sex'].value_counts()
print(bysex)
This gave me these results:
male 577
female 314
Name: Sex, dtype: int64
The results show that the number of male persons is greater than female persons.
But when I use seaborn to show the number of male and female persons for each survived class using this code:
plot1 = sns.FacetGrid(data1, col='Survived')
plot1.map(sns.countplot,'Sex')
Then I get this results:
enter image description here
Here it shows that the number of female is greater than the number of male and for no survived class the number of female (around 450) is even greater than the total number of female persons (314).
How is this possible?
I think there is something wrong with the mapping.
In the left plot Sex are interchanged.
data1.loc[data1["Survived"] == "No", 'Sex'].value_counts()
male 468
female 81
Name: Sex, dtype: int64
and the second plot is right.
data1.loc[data1["Survived"] == "Yes", 'Sex'].value_counts()
female 233
male 109
Name: Sex, dtype: int64
On the other hand when you use
ax = sns.countplot(x="Survived", hue="Sex", data=data1)
you get the right results.

Plotting a grouped pandas data in plotly

I have a pandas dataframe which looks like this:
A B
1 USA Y
3 USA Y
4 USA N
5 India Y
8 India N
12 USA N
14 USA Y
19 USA Y
I want to make a countplot for this dataframe. That is, the plot will have country names on X-axis and the counts for each category on Y-axis. I know I can do this in seaborn like this:
sns.countplot(x='A', data=df, hue='B')
But this will not be an interactive plot. I want to achieve the same thing in plotly but I am having a hard time figuring it out. Can anyone please help me out?
Using plotly 3 you can do something like this:
from plotly import graph_objs as go
fig = go.Figure()
for name, group in df.groupby('B'):
trace = go.Histogram()
trace.name = name
trace.x = group['A']
fig.add_trace(trace)
you can also change other properties like the colors by setting trace.marker.color attribute.

Categories

Resources