Python pandas bar graph with titles from column - python

I have the following data frame:
year tradevalueus partner
0 1989 26065 Algeria
1 1989 12345 Albania
2 1991 178144 Argentina
3 1991 44384 Bhutan
4 1990 1756844 Bulgaria
5 1990 57088556 Myanmar
I want a bar graph by year on the x-axis for each trade partner with values. By this, with the above data, I want to have 3 years on the x-axis with 2 bar-graphs for each year with the tradevalueus variable and I want to name each of these by the partner column. I have checked df.plot.bar() and other stackoverflow posts about bar graphs but they don't give the output I desire. Any pointers would be greatly appreciated.
Thanks!

You can either pivot the table and plot:
df.pivot(index='year',columns='partner',values='tradevalueus').plot.bar()
Or use seaborn:
import seaborn as sns
sns.barplot(x='year', y='tradevalueus', hue='partner', data=df, dodge=True)
Output:

Related

How to create a Pandas dataframe from another column in a dataframe by splitting it?

I have the following source dataframe
Person
Country
Is Rich?
0
US
Yes
1
India
No
2
India
Yes
3
US
Yes
4
US
Yes
5
India
No
6
US
No
7
India
No
I need to convert it another dataframe for plotting a bar graph like below for easily accessing data
Bar chart of economic status per country
Data frame to be created is like below.
Country
Rich
Poor
US
3
1
India
1
3
I am new to Pandas and Exploratory data science. Please help here
You can try pivot_table
df['Is Rich?'] = df['Is Rich?'].replace({'Yes': 'Rich', 'No': 'Poor'})
out = df.pivot_table(index='Country', columns='Is Rich?', values='Person', aggfunc='count')
print(out)
Is Rich? Poor Rich
Country
India 3 1
US 1 3
You could do:
converted = df.assign(Rich=df['Is Rich?'].eq('Yes')).eval('Poor = ~Rich').groupby('Country').agg({'Rich': 'sum', 'Poor': 'sum'})
print(converted)
Rich Poor
Country
India 1 3
US 3 1
However, if you want to plot it as a barplot, the following format might work best with a plotting library like seaborn:
plot_df = converted.reset_index().melt(id_vars='Country', value_name='No. of people', var_name='Status')
print(plot_df)
Country Status No. of people
0 India Rich 1
1 US Rich 3
2 India Poor 3
3 US Poor 1
Then, with seaborn:
import seaborn as sns
sns.barplot(x='Country', hue='Status', y='No. of people', data=plot_df)
Resulting plot:

Histogram using plot in Pandas - set x label

Dataframe:
Horror films released in 2019
Title Director Country Year
3 from Hell Rob Zombie United States 2019
Bliss Joe Begos United States 2019
Bedeviled The Vang Brothers United States 2016
Creep 2 Patrick Brice United States 2017
Brightburn David Yarovesky United States 2019
Delirium Dennis Iliadis Ireland 2018
Child's Play Lars Klevberg United States 2019
The Conjuring 2 James Wan United States 2016
Bloodlands Steven Kastrissios Albania 2017
Bird Box Susanne Bier United States 2017
need to plot a histogram showing the number of titles released over the years using Pandas plot function
code:
df=pd.read_csv(filename)
group = df.groupby('Year').count()[['Title']]
new_df = grouped.reset_index()
xtick=newdf['Year'].tolist()
width = newdf.Year[1] - newdf.Year[0]
newdf.iloc[:,1:2].plot(kind='bar', width=width)
Cannot figure out a way to label x axis with values from the Year column, also unsure if my approach is correct.
Thanks in advance :)
It sounds like you want a bar chart, not a histogram, because you have discrete/categorical variables (years). And you say "kind=bar" in your plot statement, so you are on the right track. Try this to see if it works for you. I forced the y-axis to be integers since you are looking for counts, but that is optional.
import pandas as pd
import matplotlib.pyplot as plt
title = [ 'Movie1','Movie2','Movie3',
'Movie4','Movie5','Movie6',
'Movie7','Movie8','Movie9',
]
year = [2019,2019,2018,
2017,2019,2018,
2019,2017,2018
]
df = pd.DataFrame(list(zip(title, year)),
columns =['Title', 'Year']
)
print(df)
group = df.groupby('Year').count()[['Title']]\
.rename(columns={'Title': 'No. of Movies'})\
.reset_index()
print(group)
ax = group.plot.bar(x='Year', rot=0)
ax.yaxis.get_major_locator().set_params(integer=True)
plt.show()
Title Year
0 Movie1 2019
1 Movie2 2019
2 Movie3 2018
3 Movie4 2017
4 Movie5 2019
5 Movie6 2018
6 Movie7 2019
7 Movie8 2017
8 Movie9 2018
Year No. of Movies
0 2017 2
1 2018 3
2 2019 4
The api offers a few different ways to do this (not a great thing imo). Here is one way to get what you want:
df = pd.read_csv(filename)
group = df.groupby('Year').count()[['Title']]
df2 = group.reset_index()
df2.plot(kind='bar', x="Year", y="Title")
Or, even more concisely:
df.value_counts("Year").plot(kind="bar")
Note that in the second case, you're creating a bar plot from a Series object.
You can simply do
df.groupby('Year').Title.count().plot(kind='bar')
Output

how to automate labeling of data in matplotlib?

I would like to find a shortcut to labeling data since I am working with a large data set.
here's the data I'm charting from the large data set:
Nationality
Afghanistan 4
Albania 40
Algeria 60
Andorra 1
Angola 15
...
Uzbekistan 2
Venezuela 67
Wales 129
Zambia 9
Zimbabwe 13
Name: count, Length: 164, dtype: int64
And so far this is my code:
import pandas as pd
import matplotlib.pyplot as plt
the_data = pd.read_csv('fifa_data.csv')
plt.title('Percentage of Players from Each Country')
the_data['count'] = 1
Nations = the_data.groupby(['Nationality']).count()['count']
plt.pie(Nations)
plt.show()
creating the pie chart is easy and quick this way but I haven't figured out how to automatically label each country in the pie chart without having to label each data point one by one.
pandas plot function would automatic label the data for you
# count:
Nations = the_data.groupby('Nationality').size()
# plot data
Nations.plot.pie()
plt.title('Percentage of Players from Each Country')
plt.show()

Seaborn lmplot - Changing Marker Style and Color of single Datapoint

I was trying to find an answer to Harvards CS109, Homework 1, Part 1c from the year 2013 using seaborn, which they don't.
"Choose a plot to show this relationship and specifically annotate the Oakland baseball team on the on the plot. Show this plot across multiple years. In which years can you detect a competitive advantage from the Oakland baseball team of using data science? When did this end?"
So we do have for multiple years and multiple teams, salaries as well as wins. I want to build a seaborn facet for each year regressing salaries against wins AND call out the datapoint for Oakland. Building the facet for one regression for each year works fine. But how would I change the data point for oakland?
Thats how my data looks like (the first 5 entries):
teamID yearID salary W
0 ANA 1997 31135472 84
1 ANA 1998 41281000 85
2 ANA 1999 55388166 70
3 ANA 2000 51464167 82
4 ANA 2001 47535167 75
...
This is how I am plotting the data:
facetplots = sns.lmplot(x="salary", y="W", col="yearID", data=df_data, col_wrap=4, size=3)
Any help would be much appreciated.
Regards

How to boxplot data after different column values in pandas

I have a dataframe like this:
Country Year Column1 Column2
1 Guatemala 1999 5 1
4 Mexico 2000 1 3
5 Mexico 2000 2 2
6 Mexico 2000 2 1
8 Guatemala 2000 3 2
11 Guatemala 2003 4 3
12 Guatemala 2003 6 4
13 Guatemala 2003 5 5
What I want to make is a boxplot for each group in Country, displaying a number of boxes corresponding to the number of unique values in Years. These boxes should represent the values in Column2.
I group the data and get boxplots like this:
df1=df.groupby('Origin').boxplot(column='Column2', subplots=True)
That gives me a boxplot for each Country, but with just one plot in it, representing all the values from that group, not separated by years. How can I get a box for each unique value in year, representing the values in Column2 in my code?
I would use the seaborn package, in particular combining the FacetGrid with boxplot.
For your situation, the code might look like this:
import seaborn as sns
g = sns.FacetGrid(df, col="Country", sharex=False)
g.map(sns.boxplot, 'Year', 'Column2')
Edit: this is what I get for your data above:

Categories

Resources