Python - Plot multiple dataframe columns - python

I have a dataframe with 4 columns and I want to do a groupby and plot the data. But I am not sure how to go about this.
Cont Coun X3 Y1
Africa nigeria A 10
Africa nigeria B 93
Africa nigeria C 124
Africa nigeria D 24
-------------------------------
Africa kenya A 123
Africa kenya B 540
Africa kenya C 1000
Africa kenya D 183
--------------------------------
Asia Japan A 1234
Asia Japan B 820
Asia Japan C 2130
Asia Japan D 912
For every distinct continent(cont) and country(coun) pair, plot 4 different bars corresponding to the column X3. The Y1 column is the Y-axis
Result:-

I'd recommend seaborn for this kind of plots:
import seaborn as sns
sns.barplot(df.Cont+'\n'+df.Coun, 'Y1', hue='X3', data=df)
For adjusting figure size you can create a figure with a subplot first and then put the seaborn plot into the desired destination with the ax kwarg:
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16, 8))
sns.barplot(df.Cont+'\n'+df.Coun, 'Y1', hue='X3', data=df, ax=ax)

Related

How do I groupby, count or sum and then plot two lines in Pandas?

Say I have the following dataframes:
Earthquakes:
latitude longitude place year
0 36.087000 -106.168000 New Mexico 1973
1 33.917000 -90.775000 Mississippi 1973
2 37.160000 -104.594000 Colorado 1973
3 37.148000 -104.571000 Colorado 1973
4 36.500000 -100.693000 Oklahoma 1974
… … … … …
13941 36.373500 -96.818700 Oklahoma 2016
13942 36.412200 -96.882400 Oklahoma 2016
13943 37.277167 -98.072667 Kansas 2016
13944 36.939300 -97.896000 Oklahoma 2016
13945 36.940500 -97.906300 Oklahoma 2016
and Wells:
LAT LONG BBLS Year
0 36.900324 -98.218260 300.0 1977
1 36.896636 -98.177720 1000.0 2002
2 36.806113 -98.325840 1000.0 1988
3 36.888589 -98.318530 1000.0 1985
4 36.892128 -98.194620 2400.0 2002
… … … … …
11117 36.263285 -99.557631 1000.0 2007
11118 36.263220 -99.548647 1000.0 2007
11119 36.520160 -99.334183 19999.0 2016
11120 36.276728 -99.298563 19999.0 2016
11121 36.436857 -99.137391 60000.0 2012
How do I manage to make a line graph showing the number of BBLS per year (from Wells), and the number of Earthquakes that occurred in a year (from Earthquakes), where the x-axis shows the year since 1980 and the y1-axis shows the sum of BBLS per year, while y2-axis shows the number of earthquakes.
I believe I need to make a groupby, count(for earthquakes) and sum(for BBLS) in order to make the plot but I really tried so many codings and I just don't get how to do it.
The only one that kinda worked was the line graph for earthquakes as follows:
Earthquakes.pivot_table(index=['year'],columns='type',aggfunc='size').plot(kind='line')
Still, for the line graph for BBLS nothing has worked
Wells.pivot_table(index=['Year'],columns='BBLS',aggfunc='count').plot(kind='line')
This one either:
plt.plot(Wells['Year'].values, Wells['BBL'].values, label='Barrels Produced')
plt.legend() # Plot legends (the two labels)
plt.xlabel('Year') # Set x-axis text
plt.ylabel('Earthquakes') # Set y-axis text
plt.show() # Display plot
This one from another thread either:
fig, ax = plt.subplots(figsize=(10,8))
Earthquakes.plot(ax = ax, marker='v')
ax.title.set_text('Earthquakes and Injection Wells')
ax.set_ylabel('Earthquakes')
ax.set_xlabel('Year')
ax.set_xticks(Earthquakes['year'])
ax2=ax.twinx()
ax2.plot(Wells.Year, Wells.BBL, color='c',
linewidth=2.0, label='Number of Barrels', marker='o')
ax2.set_ylabel('Annual Number of Barrels')
lines_1, labels_1 = ax.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
lines = lines_1 + lines_2
labels = labels_1 + labels_2
ax.legend(lines, labels, loc='upper center')
Input data:
>>> df2 # Earthquakes
year
0 2007
1 1974
2 1979
3 1992
4 2006
.. ...
495 2002
496 2011
497 1971
498 1977
499 1985
[500 rows x 1 columns]
>>> df1 # Wells
BBLS year
0 16655 1997
1 7740 1998
2 37277 2000
3 20195 2014
4 11882 2018
.. ... ...
495 30832 1981
496 24770 2018
497 14949 1980
498 24743 1975
499 46933 2019
[500 rows x 2 columns]
Prepare data to plot:
data1 = df1.value_counts("year").sort_index().rename("Earthquakes")
data2 = df2.groupby("year")["BBLS"].sum()
Simple plot:
ax1 = data1.plot(legend=data1.name, color="blue")
ax2 = data2.plot(legend=data2.name, color="red", ax=ax1.twinx())
Now, you can do whatever with the 2 axes.
A more controlled chart
# Figure and axis
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
# Data
line1, = ax1.plot(data1.index, data1.values, label="Earthquakes", color="b")
line2, = ax2.plot(data2.index, data2.values / 10**6, label="Barrels", color="r")
# Legend
lines = [line1, line2]
ax1.legend(lines, [line.get_label() for line in lines])
# Titles
ax1.set_title("")
ax1.set_xlabel("Year")
ax1.set_ylabel("Earthquakes")
ax2.set_ylabel("Barrels Produced (MMbbl)")

Python pandas bar graph with titles from column

I have the following data frame:
year tradevalueus partner
0 1989 26065 Algeria
1 1989 12345 Albania
2 1991 178144 Argentina
3 1991 44384 Bhutan
4 1990 1756844 Bulgaria
5 1990 57088556 Myanmar
I want a bar graph by year on the x-axis for each trade partner with values. By this, with the above data, I want to have 3 years on the x-axis with 2 bar-graphs for each year with the tradevalueus variable and I want to name each of these by the partner column. I have checked df.plot.bar() and other stackoverflow posts about bar graphs but they don't give the output I desire. Any pointers would be greatly appreciated.
Thanks!
You can either pivot the table and plot:
df.pivot(index='year',columns='partner',values='tradevalueus').plot.bar()
Or use seaborn:
import seaborn as sns
sns.barplot(x='year', y='tradevalueus', hue='partner', data=df, dodge=True)
Output:

how to automate labeling of data in matplotlib?

I would like to find a shortcut to labeling data since I am working with a large data set.
here's the data I'm charting from the large data set:
Nationality
Afghanistan 4
Albania 40
Algeria 60
Andorra 1
Angola 15
...
Uzbekistan 2
Venezuela 67
Wales 129
Zambia 9
Zimbabwe 13
Name: count, Length: 164, dtype: int64
And so far this is my code:
import pandas as pd
import matplotlib.pyplot as plt
the_data = pd.read_csv('fifa_data.csv')
plt.title('Percentage of Players from Each Country')
the_data['count'] = 1
Nations = the_data.groupby(['Nationality']).count()['count']
plt.pie(Nations)
plt.show()
creating the pie chart is easy and quick this way but I haven't figured out how to automatically label each country in the pie chart without having to label each data point one by one.
pandas plot function would automatic label the data for you
# count:
Nations = the_data.groupby('Nationality').size()
# plot data
Nations.plot.pie()
plt.title('Percentage of Players from Each Country')
plt.show()

trying to plot 3 different arrays corresponding to x,y,z

I am trying to make a plot figure of happiness degree in 30 different countries from the year 2012 to 2018, some years are missing happiness degree value.
the arrays are Happiness, Year and Country.
I want the y axis to be the happiness degree, the x axis to be the Years and the Y to be the country(each country is marked by a number from 1-30), so that there will be a color connecting all the different degrees through the years from each country.
The shape of each array is (210,). Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.tri as tri
sns.set(style="white")
x=Year
y=Hapiness
z=country
fig = plt.figure(figsize=(50, 50))
ax = fig.add_subplot(111)
nptsx, nptsy = 100, 100
xg, yg = np.meshgrid(np.linspace(x.min(), x.max(), nptsx),
np.linspace(y.min(), y.max(), nptsy))
triangles = tri.Triangulation(x, y)
tri_interp = tri.CubicTriInterpolator(triangles, z)
zg = tri_interp(xg, yg)
# change levels here according to your data
levels = np.linspace(1, 210, 30)
colormap = ax.contourf(xg, yg, zg, levels,
cmap=plt.cm.Blues,
norm=plt.Normalize(vmax=z.max(), vmin=z.min()))
# plot data points
ax.plot(x, y, color="#444444", marker="o", linestyle="", markersize=15)
# add a colorbar
fig.colorbar(colormap,
orientation='vertical', # horizontal colour bar
shrink=0.85)
# graph extras: look at xlim and ylim
ax.set_xlim((2012, 2018))
ax.set_ylim((0, 10))
ax.set_aspect("equal", "box")
plt.show()
Here is the error I get when I run the code:
RuntimeError Traceback (most recent call
last)
<ipython-input-65-2779759126bf> in <module>
16 np.linspace(y.min(), y.max(), nptsy))
17
---> 18 triangles = tri.Triangulation(x, y)
19 tri_interp = tri.CubicTriInterpolator(triangles, z)
20 zg = tri_interp(xg, yg)
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\tri\triangulation.py in __init__(self, x, y,
triangles, mask)
52 # No triangulation specified, so use
matplotlib._qhull to obtain
53 # Delaunay triangulation.
---> 54 self.triangles, self._neighbors =_qhull.delaunay(x,y)
55 self.is_delaunay = True
56 else:
RuntimeError: Error in qhull Delaunay triangulation calculation: input inconsistency (exitcode=1); use python verbose option (-v) to see original qhull error.
A sample of my data (.CSV file):
Entity Code Year World Happiness Report(Cantril Ladder(0=worst; 10=best))
Argentina 1 2012 6.4
Argentina 1 2013 6.5
Argentina 1 2014 6.6
Argentina 1 2015 6.6
Argentina 1 2016 6.4
Argentina 1 2017 6.0
Argentina 1 2018 5.7
Australia 2 2012 7.1
Australia 2 2013 7.3
Australia 2 2014 7.2
Australia 2 2015 7.3
Australia 2 2016 7.2
Australia 2 2017 7.2
Australia 2 2018 7.1
Brazil 3 2012 6.6
Brazil 3 2013 7.1
Brazil 3 2014 6.9
Brazil 3 2015 6.5
Brazil 3 2016 6.3
Brazil 3 2017 6.3
Brazil 3 2018 6.1
your error is in the way you read your csv file, without assigning a type to your data it will be treated as a String.
try this:
import pandas as pd
import numpy as np
df = pd.read_csv("untitled.csv", delimiter=',')
x=np.array(df['Year'].values, dtype='float')
y=np.array(df['Happiness'].values, dtype='float')
z=np.array(df['Code'].values, dtype='int')

bar chart over pandas DataFrame

I have dataframe like this:
Continent Surplus1980 Surplus1985 ... Surplus2005 Surplus2010
Africa -711.186834 -894.362995 ... -1001.189049 -960.203280
Asia -1464.995609 -1528.688190 ... -1511.834129 -1529.459409
Europe 716.832130 580.341819 ... 574.808741 590.688746
North America 1586.628358 2559.054466 ... 2851.819722 2867.880633
Oceania 4163.456825 3899.532718 ... 3807.652781 3796.396563
South America 1455.955084 1196.506188 ... 1086.940969 1093.484142
Now I want to plot a bar-chart that shows each continent value from 1980 to 2010 in the x-axis. I am using this function:
df.plot(kind="bar", rot=0, ax=ax, width=0.5)
my result shows me:
bar-chart
So how I can change to have continent name in the legend and for each year shows me the value of each continent?
Use:
df.set_index('Continent').T.plot(kind='bar', rot=0, width=0.5, figsize=(10,8))
Output:

Categories

Resources