Histogram using plot in Pandas - set x label - python

Dataframe:
Horror films released in 2019
Title Director Country Year
3 from Hell Rob Zombie United States 2019
Bliss Joe Begos United States 2019
Bedeviled The Vang Brothers United States 2016
Creep 2 Patrick Brice United States 2017
Brightburn David Yarovesky United States 2019
Delirium Dennis Iliadis Ireland 2018
Child's Play Lars Klevberg United States 2019
The Conjuring 2 James Wan United States 2016
Bloodlands Steven Kastrissios Albania 2017
Bird Box Susanne Bier United States 2017
need to plot a histogram showing the number of titles released over the years using Pandas plot function
code:
df=pd.read_csv(filename)
group = df.groupby('Year').count()[['Title']]
new_df = grouped.reset_index()
xtick=newdf['Year'].tolist()
width = newdf.Year[1] - newdf.Year[0]
newdf.iloc[:,1:2].plot(kind='bar', width=width)
Cannot figure out a way to label x axis with values from the Year column, also unsure if my approach is correct.
Thanks in advance :)

It sounds like you want a bar chart, not a histogram, because you have discrete/categorical variables (years). And you say "kind=bar" in your plot statement, so you are on the right track. Try this to see if it works for you. I forced the y-axis to be integers since you are looking for counts, but that is optional.
import pandas as pd
import matplotlib.pyplot as plt
title = [ 'Movie1','Movie2','Movie3',
'Movie4','Movie5','Movie6',
'Movie7','Movie8','Movie9',
]
year = [2019,2019,2018,
2017,2019,2018,
2019,2017,2018
]
df = pd.DataFrame(list(zip(title, year)),
columns =['Title', 'Year']
)
print(df)
group = df.groupby('Year').count()[['Title']]\
.rename(columns={'Title': 'No. of Movies'})\
.reset_index()
print(group)
ax = group.plot.bar(x='Year', rot=0)
ax.yaxis.get_major_locator().set_params(integer=True)
plt.show()
Title Year
0 Movie1 2019
1 Movie2 2019
2 Movie3 2018
3 Movie4 2017
4 Movie5 2019
5 Movie6 2018
6 Movie7 2019
7 Movie8 2017
8 Movie9 2018
Year No. of Movies
0 2017 2
1 2018 3
2 2019 4

The api offers a few different ways to do this (not a great thing imo). Here is one way to get what you want:
df = pd.read_csv(filename)
group = df.groupby('Year').count()[['Title']]
df2 = group.reset_index()
df2.plot(kind='bar', x="Year", y="Title")
Or, even more concisely:
df.value_counts("Year").plot(kind="bar")
Note that in the second case, you're creating a bar plot from a Series object.

You can simply do
df.groupby('Year').Title.count().plot(kind='bar')
Output

Related

how to calculate percentage variation between two values in same one column in pandas dataframe? [duplicate]

This question already has an answer here:
python pandas groupby calculate change
(1 answer)
Closed 9 months ago.
I have this dataframe with the total population number by year.
import pandas as pd
cases_df = pd.DataFrame(data=cases_list, columns=['Year', 'Population', 'Nation'])
cases_df.head(7)
Year Population Nation
0 2019 328239523 United States
1 2018 327167439 United States
2 2017 325719178 United States
3 2016 323127515 United States
4 2015 321418821 United States
5 2014 318857056 United States
6 2013 316128839 United States
I want to calculate how much the population has increased from the year 2013 to 2019 by calculating the percentage change between two values (2013 and 2019):
{[(328239523 - 316128839)/ 316128839] x 100 }
How can I do this? Thank you very much!!
ps. some advice to remove index?
0
1
2
3
4
5
6
This can be done using the pandas method called percentage change.
Syntax:
df.pct_change()
In your case the code will be as follows:
df1 = df.groupby(level='Population').pct_change()
print(df1)

how to calculate percentage variation between two values?

I have this dataframe with the total population number by year.
import pandas as pd
cases_df = pd.DataFrame(data=cases_list, columns=['Year', 'Population', 'Nation'])
cases_df.head(7)
Year Population Nation
0 2019 328239523 United States
1 2018 327167439 United States
2 2017 325719178 United States
3 2016 323127515 United States
4 2015 321418821 United States
5 2014 318857056 United States
6 2013 316128839 United States
I want to calculate how much the population has increased from the year 2013 to 2019 by calculating the percentage change between two values (2013 and 2019):
{[(328239523 - 316128839)/ 316128839] x 100 }
How can I do this? Thank you very much!!
ps. some advice to remove index? 0 1 2 3 4 5 6
i tried to to that
df1 = df.groupby(level='Population').pct_change()
print(df1)
but i get error because "Population" says that is not the name of Index
I would do it following way
import pandas as pd
df = pd.DataFrame({"year":[2015,2014,2013],"population":[321418821,318857056,316128839],"nation":["United States","United States","United States"]})
df = df.set_index("year")
df["percentage"] = df["population"] * 100 / df["population"][2013]
print(df)
output
population nation percentage
year
2015 321418821 United States 101.673363
2014 318857056 United States 100.863008
2013 316128839 United States 100.000000
Note I used subset of data for brevity sake. Using year as index allow easy access to population value in 2013, percentage is computed as (population) * 100 / (population for 2013).
How to remove the mentioned index :
df.set_index('Year',inplace=True)
Now Year will replace your numbered index.
Now
Use cases_df.describe()
or cases_df.attribute_name.describe()
This is more of a math question rather than a programming question.
Let's call this a percentage difference between two values since population can vary both ways (increase or decrease over time).
Now, lets say that in 2013 we had 316128839 people and in 2019 we had 328239523 people:
a = 316128839
b = 328239523
Before we go about calculating the percentage, we need to find the difference between the b and a:
diff = b - a
Now that we have that, we need to see what is the percentage of diff of a:
perc = (diff / a) * 100
And there is your percentage variation between a and b

How to create a Pandas dataframe from another column in a dataframe by splitting it?

I have the following source dataframe
Person
Country
Is Rich?
0
US
Yes
1
India
No
2
India
Yes
3
US
Yes
4
US
Yes
5
India
No
6
US
No
7
India
No
I need to convert it another dataframe for plotting a bar graph like below for easily accessing data
Bar chart of economic status per country
Data frame to be created is like below.
Country
Rich
Poor
US
3
1
India
1
3
I am new to Pandas and Exploratory data science. Please help here
You can try pivot_table
df['Is Rich?'] = df['Is Rich?'].replace({'Yes': 'Rich', 'No': 'Poor'})
out = df.pivot_table(index='Country', columns='Is Rich?', values='Person', aggfunc='count')
print(out)
Is Rich? Poor Rich
Country
India 3 1
US 1 3
You could do:
converted = df.assign(Rich=df['Is Rich?'].eq('Yes')).eval('Poor = ~Rich').groupby('Country').agg({'Rich': 'sum', 'Poor': 'sum'})
print(converted)
Rich Poor
Country
India 1 3
US 3 1
However, if you want to plot it as a barplot, the following format might work best with a plotting library like seaborn:
plot_df = converted.reset_index().melt(id_vars='Country', value_name='No. of people', var_name='Status')
print(plot_df)
Country Status No. of people
0 India Rich 1
1 US Rich 3
2 India Poor 3
3 US Poor 1
Then, with seaborn:
import seaborn as sns
sns.barplot(x='Country', hue='Status', y='No. of people', data=plot_df)
Resulting plot:

Python pandas bar graph with titles from column

I have the following data frame:
year tradevalueus partner
0 1989 26065 Algeria
1 1989 12345 Albania
2 1991 178144 Argentina
3 1991 44384 Bhutan
4 1990 1756844 Bulgaria
5 1990 57088556 Myanmar
I want a bar graph by year on the x-axis for each trade partner with values. By this, with the above data, I want to have 3 years on the x-axis with 2 bar-graphs for each year with the tradevalueus variable and I want to name each of these by the partner column. I have checked df.plot.bar() and other stackoverflow posts about bar graphs but they don't give the output I desire. Any pointers would be greatly appreciated.
Thanks!
You can either pivot the table and plot:
df.pivot(index='year',columns='partner',values='tradevalueus').plot.bar()
Or use seaborn:
import seaborn as sns
sns.barplot(x='year', y='tradevalueus', hue='partner', data=df, dodge=True)
Output:

How do I groupby two columns and create a loop to subplots?

I have a large dataframe (df) in this strutcture:
year person purchase
2016 Peter 0
2016 Peter 223820
2016 Peter 0
2017 Peter 261740
2017 Peter 339987
2018 Peter 200000
2016 Carol 256400
2017 Carol 33083820
2017 Carol 154711
2018 Carol 3401000
2016 Frank 824043
2017 Frank 300000
2018 Frank 214416259
2018 Frank 4268825
2018 Frank 463080
2016 Rita 0
To see how much each person spent per year I do groupby year and person, which gives me what I want.
code:
df1 = df.groupby(['person','year']).sum().reset_index()
How do I create a loop to create subplots for each person containing what he/she spent on purchase each year?
So a subplot for each person where x = year and y = purchase.
I've tried a lot of different things explained here but none seems to work.
Thanks!
You can either do pivot_table or groupby().sum().unstack('person') and then plot:
(df.pivot_table(index='year',
columns='person',
values='purchase',
aggfunc='sum')
.plot(subplots=True)
);
Or
(df.groupby(['person','year'])['purchase']
.sum()
.unstack('person')
.plot(subplots=True)
);
Output:

Categories

Resources