I would like to find a shortcut to labeling data since I am working with a large data set.
here's the data I'm charting from the large data set:
Nationality
Afghanistan 4
Albania 40
Algeria 60
Andorra 1
Angola 15
...
Uzbekistan 2
Venezuela 67
Wales 129
Zambia 9
Zimbabwe 13
Name: count, Length: 164, dtype: int64
And so far this is my code:
import pandas as pd
import matplotlib.pyplot as plt
the_data = pd.read_csv('fifa_data.csv')
plt.title('Percentage of Players from Each Country')
the_data['count'] = 1
Nations = the_data.groupby(['Nationality']).count()['count']
plt.pie(Nations)
plt.show()
creating the pie chart is easy and quick this way but I haven't figured out how to automatically label each country in the pie chart without having to label each data point one by one.
pandas plot function would automatic label the data for you
# count:
Nations = the_data.groupby('Nationality').size()
# plot data
Nations.plot.pie()
plt.title('Percentage of Players from Each Country')
plt.show()
Related
I have the following data frame:
year tradevalueus partner
0 1989 26065 Algeria
1 1989 12345 Albania
2 1991 178144 Argentina
3 1991 44384 Bhutan
4 1990 1756844 Bulgaria
5 1990 57088556 Myanmar
I want a bar graph by year on the x-axis for each trade partner with values. By this, with the above data, I want to have 3 years on the x-axis with 2 bar-graphs for each year with the tradevalueus variable and I want to name each of these by the partner column. I have checked df.plot.bar() and other stackoverflow posts about bar graphs but they don't give the output I desire. Any pointers would be greatly appreciated.
Thanks!
You can either pivot the table and plot:
df.pivot(index='year',columns='partner',values='tradevalueus').plot.bar()
Or use seaborn:
import seaborn as sns
sns.barplot(x='year', y='tradevalueus', hue='partner', data=df, dodge=True)
Output:
I have a dataframe with 4 columns and I want to do a groupby and plot the data. But I am not sure how to go about this.
Cont Coun X3 Y1
Africa nigeria A 10
Africa nigeria B 93
Africa nigeria C 124
Africa nigeria D 24
-------------------------------
Africa kenya A 123
Africa kenya B 540
Africa kenya C 1000
Africa kenya D 183
--------------------------------
Asia Japan A 1234
Asia Japan B 820
Asia Japan C 2130
Asia Japan D 912
For every distinct continent(cont) and country(coun) pair, plot 4 different bars corresponding to the column X3. The Y1 column is the Y-axis
Result:-
I'd recommend seaborn for this kind of plots:
import seaborn as sns
sns.barplot(df.Cont+'\n'+df.Coun, 'Y1', hue='X3', data=df)
For adjusting figure size you can create a figure with a subplot first and then put the seaborn plot into the desired destination with the ax kwarg:
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16, 8))
sns.barplot(df.Cont+'\n'+df.Coun, 'Y1', hue='X3', data=df, ax=ax)
I want to create multiple (two in this case) boxplots based on a data in a dataframe
I have the following dataframe:
Country Fund R^2 Style
0 Austria BG EMCore Convertibles Global CHF R T 0.739131 Allocation
1 Austria BG EMCore Convertibles Global R T 0.740917 Allocation
2 Austria BG Trend A T 0.738376 Fixed Income
3 Austria Banken Euro Bond-Mix A 0.71161 Fixed Income
4 Austria Banken KMU-Fonds T 0.778276 Allocation
5 Brazil Banken Nachhaltigkeitsfonds T 0.912808 Allocation
6 Brazil Banken Portfolio-Mix A 0.857019 Allocation
7 Brazil Banken Portfolio-Mix T 0.868856 Fixed Income
8 Brazil Banken Sachwerte-Fonds T 0.730626 Fixed Income
9 Brazil Banken Strategie Wachstum T 0.918684 Fixed Income
I want to create a boxplot chart for each country summarized by Style and showing the distribution of R^2.
I was thinking of groupby operation but somehow I don't manage to make two charts for each country.
Thanks in advance
Here You go. Description in code.
=^..^=
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from io import StringIO
data = StringIO("""
Country R^2 Style
Austria 0.739131 Allocation
Austria 0.740917 Allocation
Austria 0.738376 Fixed_Income
Austria 0.71161 Fixed_Income
Austria 0.778276 Allocation
Brazil 0.912808 Allocation
Brazil 0.857019 Allocation
Brazil 0.868856 New_Style
Brazil 0.730626 Fixed_Income
Brazil 0.918684 Fixed_Income
Brazil 0.618684 New_Style
""")
# load data into data frame
df = pd.read_csv(data, sep=' ')
# group data by Country
grouped_data = df.groupby(['Country'])
# create list of grouped data frames
df_list = []
country_list = []
for item in list(grouped_data):
df_list.append(item[1])
country_list.append(item[0])
# plot box for each Country
for df in df_list:
country = df['Country'].unique()
df = df.drop(['Country'], axis=1)
df = df[['Style', 'R^2']]
columns_names = list(set(df['Style']))
# pivot rows into columns
df = df.assign(g = df.groupby('Style').cumcount()).pivot('g','Style','R^2')
# plot box
df.boxplot(column=colums_names)
plt.title(country[0])
plt.show()
Output:
Came up with some solution myself.
df= "This is the table from the original question"
uniquenames=df.Country.unique()
# create dictionary of the data with countries set as keys
diction={elem:pd.DataFrame for elem in uniquenames}
# fill dictionary with values
for key in diction.keys():
diction[key]=df[:][df.Country==key]
#plot the data
for i in diction.keys():
diction[i].boxplot(column="R^2",by="Style",
figsize=(15,6),patch_artist=True,fontsize=12)
plt.xticks(rotation=90)
plt.title(i,fontsize=12)
Use seaborn for this kind of tasks. Here are a couple of options:
Use seaborn's boxplot
import seaborn as sns
sns.set()
# Note - the data is stored in a data frame df
sns.boxplot(x='Country', y='R^2', hue='Style', data=df)
Alternatively, you can use seaborn's FacetGrid.
g = sns.FacetGrid(df, col="Country", row="Style")
g = g.map(sns.boxplot, 'R^2', orient='v')
I have a pandas dataframe which looks like this:
A B
1 USA Y
3 USA Y
4 USA N
5 India Y
8 India N
12 USA N
14 USA Y
19 USA Y
I want to make a countplot for this dataframe. That is, the plot will have country names on X-axis and the counts for each category on Y-axis. I know I can do this in seaborn like this:
sns.countplot(x='A', data=df, hue='B')
But this will not be an interactive plot. I want to achieve the same thing in plotly but I am having a hard time figuring it out. Can anyone please help me out?
Using plotly 3 you can do something like this:
from plotly import graph_objs as go
fig = go.Figure()
for name, group in df.groupby('B'):
trace = go.Histogram()
trace.name = name
trace.x = group['A']
fig.add_trace(trace)
you can also change other properties like the colors by setting trace.marker.color attribute.
I am using a basic python editor (Wing 101). When printing out tables (for analysis), I get rows and columns all messed up. Please see below:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import os
import matplotlib.pyplot as plt
#STEP 2: Access the Data files
data = pd.read_csv('data.csv', index_col=0)
data.sort_values(['Year', "Happiness Score"], ascending=[True, False], inplace=True)
#diplay first 10 rows
data.head(10)
And I get this garbage:
[evaluate World Happness Data Analysis.py]
data.head(10)
Country Region ... Dystopia Residual Year
141 Switzerland Western Europe ... 2.51738 2015
60 Iceland Western Europe ... 2.70201 2015
38 Denmark Western Europe ... 2.49204 2015
108 Norway Western Europe ... 2.46531 2015
25 Canada North America ... 2.45176 2015
46 Finland Western Europe ... 2.61955 2015
102 Netherlands Western Europe ... 2.46570 2015
140 Sweden Western Europe ... 2.37119 2015
103 New Zealand Australia and New Zealand ... 2.26425 2015
6 Australia Australia and New Zealand ... 2.26646 2015
[10 rows x 12 columns]
Actually, it looks reasonable above. But in Python shell, it is totally messed up.
here's the actual output
My question: How can I get the table printout with cell borders?
Change the font settings of your Python shell. Use some monospace font. Most common selections is Courier New or Courier.