Complex slicing - python

I am trying to perform a slice with multiple conditions without success.
Here is what my dataframe looks like
I have many countries, which names are stored as indexes. And for all those countries I have 7 different indicators, for two distinct years.
My goal is to select all the countries (and their indicators), which 'GDP per capita (constant 2005 US$')' is superior or equal than a previously defined treshold (gdp_min), OR that are named 'China', 'India', or 'Brazil'.
To do so, I have tried many different things but still cannot find a way to do it.
Here is my last try, with the error.
gdp_set = final_set[final_set['Indicator Name'] == 'GDP per capita (constant 2005 US$)']['2013'] >= gdp_min | final_set.loc[['China', 'India', 'Brazil']]
--------------------------------------------------------------------------- TypeError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in
na_logical_op(x, y, op)
301 # (xint or xbool) and (yint or bool)
--> 302 result = op(x, y)
303 except TypeError:
~\anaconda3\lib\site-packages\pandas\core\roperator.py in ror_(left,
right)
55 def ror_(left, right):
---> 56 return operator.or_(right, left)
57
TypeError: ufunc 'bitwise_or' not supported for the input types, and
the inputs could not be safely coerced to any supported types
according to the casting rule ''safe''
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call
last) ~\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in
na_logical_op(x, y, op)
315 try:
--> 316 result = libops.scalar_binop(x, y, op)
317 except (
~\anaconda3\lib\site-packages\pandas_libs\ops.pyx in
pandas._libs.ops.scalar_binop()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call
last) ~\AppData\Local\Temp/ipykernel_16016/3232205269.py in
----> 1 gdp_set = final_set[final_set['Indicator Name'] == 'GDP per capita (constant 2005 US$)']['2013'] >= gdp_min |
final_set.loc[['China', 'India', 'Brazil']]
~\anaconda3\lib\site-packages\pandas\core\generic.py in
array_ufunc(self, ufunc, method, *inputs, **kwargs) 2030 self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any
2031 ):
-> 2032 return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs) 2033 2034 # ideally we would define this to avoid the getattr checks, but
~\anaconda3\lib\site-packages\pandas\core\arraylike.py in
array_ufunc(self, ufunc, method, *inputs, **kwargs)
251
252 # for binary ops, use our custom dunder methods
--> 253 result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs)
254 if result is not NotImplemented:
255 return result
~\anaconda3\lib\site-packages\pandas_libs\ops_dispatch.pyx in
pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
~\anaconda3\lib\site-packages\pandas\core\ops\common.py in
new_method(self, other)
67 other = item_from_zerodim(other)
68
---> 69 return method(self, other)
70
71 return new_method
~\anaconda3\lib\site-packages\pandas\core\arraylike.py in
ror(self, other)
72 #unpack_zerodim_and_defer("ror")
73 def ror(self, other):
---> 74 return self.logical_method(other, roperator.ror)
75
76 #unpack_zerodim_and_defer("xor")
~\anaconda3\lib\site-packages\pandas\core\frame.py in
_arith_method(self, other, op) 6864 self, other = ops.align_method_FRAME(self, other, axis, flex=True, level=None)
6865
-> 6866 new_data = self._dispatch_frame_op(other, op, axis=axis) 6867 return self._construct_result(new_data)
6868
~\anaconda3\lib\site-packages\pandas\core\frame.py in
_dispatch_frame_op(self, right, func, axis) 6891 # i.e. scalar, faster than checking np.ndim(right) == 0 6892
with np.errstate(all="ignore"):
-> 6893 bm = self._mgr.apply(array_op, right=right) 6894 return type(self)(bm) 6895
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in
apply(self, f, align_keys, ignore_failures, **kwargs)
323 try:
324 if callable(f):
--> 325 applied = b.apply(f, **kwargs)
326 else:
327 applied = getattr(b, f)(**kwargs)
~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in
apply(self, func, **kwargs)
379 """
380 with np.errstate(all="ignore"):
--> 381 result = func(self.values, **kwargs)
382
383 return self._split_op_result(result)
~\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in
logical_op(left, right, op)
390 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
391
--> 392 res_values = na_logical_op(lvalues, rvalues, op)
393 # error: Cannot call function of unknown type
394 res_values = filler(res_values) # type: ignore[operator]
~\anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in
na_logical_op(x, y, op)
323 ) as err:
324 typ = type(y).name
--> 325 raise TypeError(
326 f"Cannot perform '{op.name}' with a dtyped [{x.dtype}] array "
327 f"and scalar of type [{typ}]"
TypeError: Cannot perform 'ror_' with a dtyped [float64] array and
scalar of type [bool]
The error is very long but from what I may understand, the problem comes from the second condition which is not compatible with an 'OR' ( | ).
Do you guys have any idea how I could do what I intend to please? The only thing I can see is to create a new column with current index names, so that filtering might work with the OR condition.

IIUC, use:
m1 = final_set['Indicator Name'].eq('GDP per capita (constant 2005 US$)')
m2 = fina_set['2013'] >= gdp_min
countries = list(final_set.index[m1 & m2])+['China', 'India', 'Brazil']
gdp_set = final_set[final_set.index.isin(countries)]

UPDATED:
This should do what you're asking:
gdp_set = final_set.loc[list(
{'China', 'India', 'Brazil'} |
set(final_set[((final_set['Indicator Name'] == 'GDP per capita (constant 2005 US$)') &
(final_set['2013'] >= gdp_min))].index)
)]
Explanation:
create a set containing the union of 'China', 'India', 'Brazil' with the set of any index values (i.e., Country Name values) for rows where value of Indicator Name matches the target and value of 2013 column is at least as large as gdp_min.
filter final_set on the countries in this set converted to a list and put the resulting dataframe in gdp_set.
Full test code:
import pandas as pd
final_set = pd.DataFrame({
'Country Name':['Andorra']*6 + ['Argentina']*4 + ['China']*2 + ['India']*2 + ['Brazil']*2,
'Indicator Name':[f'Indicator {i}' for i in range(1, 6)] + ['GDP per capita (constant 2005 US$)'] + [f'Indicator {i}' for i in range(1, 4)] + ['GDP per capita (constant 2005 US$)'] + [f'Indicator {i}'if i % 2 else 'GDP per capita (constant 2005 US$)' for i in range(1,7)],
'2002': [10000.0/2]*6 + [15000.0/2]*4 + [8000.0/2]*6,
'2013': [10000.0]*6 + [15000.0]*4 + [8000.0]*6,
'Currency Unit':['Euro']*6 + ['Argentine peso']*4 + ['RMB']*2 + ['INR']*2 + ['Brazilian real']*2,
'Region':['Europe & Central Asia']*6 + ['Latin America & Caribbean']*4 + ['Asia']*2 + ['South Asia']*2 + ['Latin America & Caribbean']*2,
'GDP per capita (constant 2005 US$)': [10000.0]*6 + [15000.0]*4 + [8000.0]*6
}).set_index('Country Name')
print(final_set)
gdp_min = 14000.0
gdp_set = final_set.loc[list(
{'China', 'India', 'Brazil'} |
set(final_set[((final_set['Indicator Name'] == 'GDP per capita (constant 2005 US$)') &
(final_set['2013'] >= gdp_min))].index)
)]
print(gdp_set)
Input:
Indicator Name 2002 2013 Currency Unit Region GDP per capita (constant 2005 US$)
Country Name
Andorra Indicator 1 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Andorra Indicator 2 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Andorra Indicator 3 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Andorra Indicator 4 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Andorra Indicator 5 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Andorra GDP per capita (constant 2005 US$) 5000.0 10000.0 Euro Europe & Central Asia 10000.0
Argentina Indicator 1 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina Indicator 2 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina Indicator 3 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina GDP per capita (constant 2005 US$) 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
China Indicator 1 4000.0 8000.0 RMB Asia 8000.0
China GDP per capita (constant 2005 US$) 4000.0 8000.0 RMB Asia 8000.0
India Indicator 3 4000.0 8000.0 INR South Asia 8000.0
India GDP per capita (constant 2005 US$) 4000.0 8000.0 INR South Asia 8000.0
Brazil Indicator 5 4000.0 8000.0 Brazilian real Latin America & Caribbean 8000.0
Brazil GDP per capita (constant 2005 US$) 4000.0 8000.0 Brazilian real Latin America & Caribbean 8000.0
Output:
Indicator Name 2002 2013 Currency Unit Region GDP per capita (constant 2005 US$)
Country Name
Brazil Indicator 5 4000.0 8000.0 Brazilian real Latin America & Caribbean 8000.0
Brazil GDP per capita (constant 2005 US$) 4000.0 8000.0 Brazilian real Latin America & Caribbean 8000.0
China Indicator 1 4000.0 8000.0 RMB Asia 8000.0
China GDP per capita (constant 2005 US$) 4000.0 8000.0 RMB Asia 8000.0
India Indicator 3 4000.0 8000.0 INR South Asia 8000.0
India GDP per capita (constant 2005 US$) 4000.0 8000.0 INR South Asia 8000.0
Argentina Indicator 1 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina Indicator 2 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina Indicator 3 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0
Argentina GDP per capita (constant 2005 US$) 7500.0 15000.0 Argentine peso Latin America & Caribbean 15000.0

How about using a query?
# min GDP (I used an example number
gdp_min = 3000.0
# Country name set.
countries = {"China", "India", "Brazil"}
# Create string expression to evaluate on DataFrame.
# Note: Backticks should be used for non-standard pandas field names
# (including names that begin with a numerical value.
expression = f"(`Indicator Name` == 'GDP per capita (constant 2005 US$)' & `2013` >= {gdp_min})"
# Add each country name as 'or' clause for second part of expression.
expression += "or (" + " or ".join([f"`Country Name` == '{n}'" for n in countries]) + ")"
# Collect resulting DataFrame to new variable.
gdp_set = final_set.query(expression)

Related

How to search for a string pattern and print the total values in a column using pandas

Market Region No_of_Orders Profit Sales
Africa Western Africa 251 -12,901.51 78,476.06
Afr3ica Southern Africa 85 11,768.58 51,319.50
Africa North Africa 182 21,643.08 86,698.89
Afr2ica Eastern Africa 110 8,013.04 44,182.60
Africa Central Africa 103 15,606.30 61,689.99
Eur1ope Western Europe 964 82,091.27 656,637.14
EurYope Southern Europe 338 18,911.49 215,703.93
How to filter and print the total market values which have typos !! I tried like this
-->result['Market'].str.contains(u'\|123\|').count()
masked = df['Market'].str.contains("1|2|3")
to filter:
print(df[masked])
to count:
print(masked.sum())

I want to calculate the max GDP column in a data frame using panda

I have data frame
Unnamed: 0 COUNTRY GDP (BILLIONS) CODE
0 0 Afghanistan 21.71 AFG
1 1 Albania 13.40 ALB
2 2 Algeria 227.80 DZA
3 3 American Samoa 0.75 ASM
4 4 Andorra 4.80 AND
... ... ... ... ...
217 217 Virgin Islands 5.08 VGB
218 218 West Bank 6.64 WBG
219 219 Yemen 45.45 YEM
220 220 Zambia 25.61 ZMB
221 221 Zimbabwe 13.74 ZWE
I would like to know how I can output the Max and Min GDP from this dataframe.
I tried
df.loc[df['GDP(BILLIONS)'].idxmax()]
but got an error message
Thank you in advance
Using idxmax:
Return index of first occurrence of maximum over requested axis. NA/null values are excluded.
Series.idxmax:
Return the row label of the maximum value.
You can use idxmax if you want to return the corresponding row values of max value, then max value
row_of_max_index = df.loc[df['GDP'].idxmax()] #series of max index row
print(row_of_max_index ) # 2 Algeria 227.80 DZA
print(row_of_max_index[2]) #index of GDP column to get 227.8
The same thing for idxmin:
row_of_min_index = df.loc[df['GDP'].idxmin()]
You can use
max_val = df['GDP(BILLIONS)'].max()
for maximum value and
min_val = df['GDP(BILLIONS)'].min()
for minimum value

Correlation between two Pandas dataframe columns: why does it not work? [duplicate]

This question already has answers here:
python - cannot make corr work
(2 answers)
Closed 5 years ago.
I run into the problem of calculating the crosscorrelation. For this assignment we are supposed to use the Pandas .corr method.
I searched around but could not find a suitable solution.
Below is the code.
Top15 gives a Pandas df. The
Top15 = answer_one()
%for testing purposes: - works fine :-(
df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
print(df['A'].corr(df['B']))
Top15['Population']=Top15['Energy Supply']/Top15['Energy Supply per capita']
Top15['Citable docs per Capita']=Top15['Citable documents']/Top15['Population']
% check my data
print(Top15['Energy Supply per capita'])
print(Top15['Citable docs per Capita'])
correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
print(correlation)
return correlation
After all this should work. But no, it does not :-(
This the out put I get: (the 1.0 is from test with df.['A] etc.)
1.0
Country
China 93
United States 286
Japan 149
United Kingdom 124
Russian Federation 214
Canada 296
Germany 165
India 26
France 166
South Korea 221
Italy 109
Spain 106
Iran 119
Australia 231
Brazil 59
Name: Energy Supply per capita, dtype: object
Country
China 9.269e-05
United States 0.000298307
Japan 0.000237714
United Kingdom 0.000318721
Russian Federation 0.000127533
Canada 0.000500002
Germany 0.00020942
India 1.16242e-05
France 0.00020322
South Korea 0.000239392
Italy 0.000180175
Spain 0.00020089
Iran 0.00011442
Australia 0.000374206
Brazil 4.17453e-05
Name: Citable docs per Capita, dtype: object
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-124-942c0cf8a688> in <module>()
22 return correlation
23
---> 24 answer_nine()
<ipython-input-124-942c0cf8a688> in answer_nine()
15 Top15['Citable docs per Capita']=np.float64(Top15['Citable docs per Capita'])
16
---> 17 correlation=Top15['Citable docs per Capita'].corr(Top15['Energy Supply per capita'])
18
19
/opt/conda/lib/python3.5/site-packages/pandas/core/series.py in corr(self, other, method, min_periods)
1392 return np.nan
1393 return nanops.nancorr(this.values, other.values, method=method,
-> 1394 min_periods=min_periods)
1395
1396 def cov(self, other, min_periods=None):
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
42 f.__name__.replace('nan', '')))
43 try:
---> 44 return f(*args, **kwargs)
45 except ValueError as e:
46 # we want to transform an object array
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in nancorr(a, b, method, min_periods)
676
677 f = get_corr_func(method)
--> 678 return f(a, b)
679
680
/opt/conda/lib/python3.5/site-packages/pandas/core/nanops.py in _pearson(a, b)
684
685 def _pearson(a, b):
--> 686 return np.corrcoef(a, b)[0, 1]
687
688 def _kendall(a, b):
/opt/conda/lib/python3.5/site-packages/numpy/lib/function_base.py in corrcoef(x, y, rowvar, bias, ddof)
2149 # nan if incorrect value (nan, inf, 0), 1 otherwise
2150 return c / c
-> 2151 return c / sqrt(multiply.outer(d, d))
2152
2153
AttributeError: 'float' object has no attribute 'sqrt'
I am sorry. But by now I have no clue want goes wrong and why it doesn't work.
Could anyone point me to the solution?
Thanks.
edit:
the basic dataframe looks like this (first two line + header):
Rank Documents Citable documents Citations Self-citations Citations per document H index 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Energy Supply Energy Supply per capita % Renewable
Country
China 1 127050 126767 597237 411683 4.70 138 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12 1.271910e+11 93 19.754910
United States 2 96661 94747 792274 265436 8.20 230 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13 9.083800e+10 286 11.570980
Japan 3 30504 30287 223024 61554 7.31 134 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12 1.898400e+10 149 10.232820
This did it:
correlation = Top15['Citable docs perCapita']\
.astype('float64').corr(Top15['Energy Supply per capita']\
.astype('float64'))
Thanks #Shpionus for pointing out the other post.

Importing Excel into Panda Dataframe

The following is only the beginning for an Coursera assignment on Data Science. I hope this is not to trivial for. But I am lost on this and could not find an answer.
I am asked to import an Excelfile into a panda dataframe and to manipulate it afterwards. The file can be found here: http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls
What makes it difficult for me is
a) there is an 'overhead' of 17 lines and a footer
b) the first two columns are empty
c) the index column has no header name
After hours if seraching and reading I came up with this useless line:
energy=pd.read_excel('Energy Indicators.xls',
sheetname='Energy',
header=16,
skiprows=[17],
skipfooter=38,
skipcolumns=2
)
This seems to produce a multindex dataframe. Though the command energy.head() returns nothing.
I have two questions:
what did I wrong. Up to this exercise I thought I understand the dataframe. But now I am totally clueless and lost :-((
How do I have to tackle this? What do I have to do to get this Exceldata into a datafrae with the index consisting of the countries?
Thanks.
I think you need add parameters:
index_col for convert column to index
usecols - parse columns by positions
change header position to 15
energy=pd.read_excel('Energy Indicators.xls',
sheet_name='Energy',
skiprows=[17],
skipfooter=38,
header=15,
index_col=[0],
usecols=[2,3,4,5]
)
print (energy.head())
Energy Supply Energy Supply per capita \
Afghanistan 321 10
Albania 102 35
Algeria 1959 51
American Samoa ... ...
Andorra 9 121
Renewable Electricity Production
Afghanistan 78.669280
Albania 100.000000
Algeria 0.551010
American Samoa 0.641026
Andorra 88.695650
I installed xlrd package, with pip install xlrd and then loaded the file successfully as follows:
In [17]: df = pd.read_excel(r"http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls",
...: sheetname='Energy',
...: header=16,
...: skiprows=[17],
...: skipfooter=38,
...: skipcolumns=2)
In [18]: df.shape
Out[18]: (227, 3)
In [19]: df.head()
Out[19]:
Energy Supply Energy Supply per capita \
NaN Afghanistan Afghanistan 321 10
Albania Albania 102 35
Algeria Algeria 1959 51
American Samoa American Samoa ... ...
Andorra Andorra 9 121
Renewable Electricity Production
NaN Afghanistan Afghanistan 78.669280
Albania Albania 100.000000
Algeria Algeria 0.551010
American Samoa American Samoa 0.641026
Andorra Andorra 88.695650
In [20]: pd.__version__
Out[20]: u'0.20.3'
In [21]: df.columns
Out[21]:
Index([u'Energy Supply', u'Energy Supply per capita',
u'Renewable Electricity Production'],
dtype='object')
Notice that I am using the last version of pandas 0.20.3 make sure you have the latest version on your system.
I modified your code and was able to get the data into the dataframe. Instead of skipcolumns (which did not work), I used the argument usecols as follows
energy=pd.read_excel('Energy_Indicators.xls',
sheetname='Energy',
header=16,
skiprows=[16],
skipfooter=38,
usecols=[2,3,4,5]
)
Unnamed: 2 Petajoules Gigajoules %
0 Afghanistan 321 10 78.669280
1 Albania 102 35 100.000000
2 Algeria 1959 51 0.551010
3 American Samoa ... ... 0.641026
4 Andorra 9 121 88.695650
In order to make the countries as the index, you can do the following
# Rename the column Unnamed: 2 to Country
energy = energy.rename(columns={'Unnamed: 2':'Country'})
# Change the index to country column
energy.index = energy['Country']
# Drop the extra country column
energy = energy.drop('Country', axis=1)

Python Pandas Groupby and Aggregation

Hi I am trying to aggregate some data in a dataframe by using agg but my initial statement mentioned a warning "FutureWarning: using a dict on a Series for aggregation is deprecated and will be removed in a future version". I rewrote it based on Pandas documentation but instead of getting the right column label I am getting a function label. example: "". How can I correct the output so that the labels match the deprecated output above with column names std, mean, size, sum?
Deprecated Syntax Command:
Top15.set_index('Continent').groupby(level=0)['Pop Est']
.agg({'size': np.size, 'sum': np.sum, 'mean': np.mean, 'std': np.std})
Deprecated Syntax Output:
std mean size sum
Continent
Asia 6.790979e+08 5.797333e+08 5.0 2.898666e+09
Australia NaN 2.331602e+07 1.0 2.331602e+07
Europe 3.464767e+07 7.632161e+07 6.0 4.579297e+08
North America 1.996696e+08 1.764276e+08 2.0 3.528552e+08
South America NaN 2.059153e+08 1.0 2.059153e+08
New Syntax Command:
Top15.set_index('Continent').groupby(level=0)['Pop Est']\
.agg(['size', 'sum', 'mean', 'std'])\
.rename(columns={'size': np.size, 'sum': np.sum, 'mean': np.mean, 'std': np.std})
New Syntax Output:
<function size at 0x0000000002DE9950> <function sum at 0x0000000002DE90D0> <function mean at 0x0000000002DE9AE8> <function std at 0x0000000002DE9B70>
Continent
Asia 5 2.898666e+09 5.797333e+08 6.790979e+08
Australia 1 2.331602e+07 2.331602e+07 NaN
Europe 6 4.579297e+08 7.632161e+07 3.464767e+07
North America 2 3.528552e+08 1.764276e+08 1.996696e+08
South America 1 2.059153e+08 2.059153e+08 NaN
Dataframe:
Rank Documents Citable documents Citations Self-citations Citations per document H index Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Pop Est Continent
Country
China 1 127050 126767 597237 411683 4.70 138 1.271910e+11 93.0 19.754910 3.992331e+12 4.559041e+12 4.997775e+12 5.459247e+12 6.039659e+12 6.612490e+12 7.124978e+12 7.672448e+12 8.230121e+12 8.797999e+12 1.367645e+09 Asia
United States 2 96661 94747 792274 265436 8.20 230 9.083800e+10 286.0 11.570980 1.479230e+13 1.505540e+13 1.501149e+13 1.459484e+13 1.496437e+13 1.520402e+13 1.554216e+13 1.577367e+13 1.615662e+13 1.654857e+13 3.176154e+08 North America
Japan 3 30504 30287 223024 61554 7.31 134 1.898400e+10 149.0 10.232820 5.496542e+12 5.617036e+12 5.558527e+12 5.251308e+12 5.498718e+12 5.473738e+12 5.569102e+12 5.644659e+12 5.642884e+12 5.669563e+12 1.274094e+08 Asia
United Kingdom 4 20944 20357 206091 37874 9.84 139 7.920000e+09 124.0 10.600470 2.419631e+12 2.482203e+12 2.470614e+12 2.367048e+12 2.403504e+12 2.450911e+12 2.479809e+12 2.533370e+12 2.605643e+12 2.666333e+12 6.387097e+07 Europe
Russian Federation 5 18534 18301 34266 12422 1.85 57 3.070900e+10 214.0 17.288680 1.385793e+12 1.504071e+12 1.583004e+12 1.459199e+12 1.524917e+12 1.589943e+12 1.645876e+12 1.666934e+12 1.678709e+12 1.616149e+12 1.435000e+08 Europe
Canada 6 17899 17620 215003 40930 12.01 149 1.043100e+10 296.0 61.945430 1.564469e+12 1.596740e+12 1.612713e+12 1.565145e+12 1.613406e+12 1.664087e+12 1.693133e+12 1.730688e+12 1.773486e+12 1.792609e+12 3.523986e+07 North America
Germany 7 17027 16831 140566 27426 8.26 126 1.326100e+10 165.0 17.901530 3.332891e+12 3.441561e+12 3.478809e+12 3.283340e+12 3.417298e+12 3.542371e+12 3.556724e+12 3.567317e+12 3.624386e+12 3.685556e+12 8.036970e+07 Europe
India 8 15005 14841 128763 37209 8.58 115 3.319500e+10 26.0 14.969080 1.265894e+12 1.374865e+12 1.428361e+12 1.549483e+12 1.708459e+12 1.821872e+12 1.924235e+12 2.051982e+12 2.200617e+12 2.367206e+12 1.276731e+09 Asia
France 9 13153 12973 130632 28601 9.93 114 1.059700e+10 166.0 17.020280 2.607840e+12 2.669424e+12 2.674637e+12 2.595967e+12 2.646995e+12 2.702032e+12 2.706968e+12 2.722567e+12 2.729632e+12 2.761185e+12 6.383735e+07 Europe
South Korea 10 11983 11923 114675 22595 9.57 104 1.100700e+10 221.0 2.279353 9.410199e+11 9.924316e+11 1.020510e+12 1.027730e+12 1.094499e+12 1.134796e+12 1.160809e+12 1.194429e+12 1.234340e+12 1.266580e+12 4.980543e+07 Asia
Italy 11 10964 10794 111850 26661 10.20 106 6.530000e+09 109.0 33.667230 2.202170e+12 2.234627e+12 2.211154e+12 2.089938e+12 2.125185e+12 2.137439e+12 2.077184e+12 2.040871e+12 2.033868e+12 2.049316e+12 5.990826e+07 Europe
Spain 12 9428 9330 123336 23964 13.08 115 4.923000e+09 106.0 37.968590 1.414823e+12 1.468146e+12 1.484530e+12 1.431475e+12 1.431673e+12 1.417355e+12 1.380216e+12 1.357139e+12 1.375605e+12 1.419821e+12 4.644340e+07 Europe
Iran 13 8896 8819 57470 19125 6.46 72 9.172000e+09 119.0 5.707721 3.895523e+11 4.250646e+11 4.289909e+11 4.389208e+11 4.677902e+11 4.853309e+11 4.532569e+11 4.445926e+11 4.639027e+11 NaN 7.707563e+07 Asia
Australia 14 8831 8725 90765 15606 10.28 107 5.386000e+09 231.0 11.810810 1.021939e+12 1.060340e+12 1.099644e+12 1.119654e+12 1.142251e+12 1.169431e+12 1.211913e+12 1.241484e+12 1.272520e+12 1.301251e+12 2.331602e+07 Australia
Brazil 15 8668 8596 60702 14396 7.00 86 1.214900e+10 59.0 69.648030 1.845080e+12 1.957118e+12 2.056809e+12 2.054215e+12 2.208872e+12 2.295245e+12 2.339209e+12 2.409740e+12 2.412231e+12 2.319423e+12 2.059153e+08 South America
Try using just this:
Top15.set_index('Continent').groupby(level=0)['Pop Est'].agg(['size', 'sum', 'mean', 'std'])

Categories

Resources