Groupby or Transpose? - python

I got this data
country report_date market_cap_usd
0 Australia 6/3/2020 90758154576
1 Australia 6/4/2020 91897977251
2 Australia 6/5/2020 94558861975
3 Canada 6/3/2020 42899754234
4 Canada 6/4/2020 43597908706
5 Canada 6/5/2020 45287016456
6 United States of America 6/3/2020 1.16679E+12
7 United States of America 6/4/2020 1.15709E+12
8 United States of America 6/5/2020 1.19652E+12
and want to turn it into:
report_date Australia Canada ....
6/3/2020 90758154576 42899754234 ...
6/4/2020 91897977251 43597908706 ...
How can I do this?

Use pivot_table;
# setting minimum example
import pandas
data = pandas.DataFrame({'country': ['Australia', 'Australia', 'Canada', 'Canada'],
'report_data': ['6/3/2020', '6/4/2020', '6/3/2020', '6/4/2020'],
'market_cap_usd': [923740927, 92797294, 20387334, 392738092]
})
# pivot the table
data = data.pivot_table(index='report_data', columns='country')
# drop multi-index column
data.columns = [col[1] for col in data.columns]
Output;
Australia Canada
report_data
6/3/2020 923740927 20387334
6/4/2020 92797294 392738092

Related

How to maintain the same index after sorting a Pandas series?

I have the following Pandas series from the dataframe 'Reducedset':
Reducedset = Top15.iloc[:,10:20].mean(axis=1).sort_values(ascending=False)
Which gives me:
Country
United States 1.536434e+13
China 6.348609e+12
Japan 5.542208e+12
Germany 3.493025e+12
France 2.681725e+12
United Kingdom 2.487907e+12
Brazil 2.189794e+12
Italy 2.120175e+12
India 1.769297e+12
Canada 1.660647e+12
Russian Federation 1.565459e+12
Spain 1.418078e+12
Australia 1.164043e+12
South Korea 1.106715e+12
Iran 4.441558e+11
dtype: float64
I want to update the index, so that index of the dataframe Reducedset is in the same order as the series above.
How can I do this?
In other words, when I then look at the entire dataframe, the index order should be the same as in the series above and not like that below:
Reducedset
Rank Documents Citable documents Citations \
Country
China 1 127050 126767 597237
United States 2 96661 94747 792274
Japan 3 30504 30287 223024
United Kingdom 4 20944 20357 206091
Russian Federation 5 18534 18301 34266
Canada 6 17899 17620 215003
Germany 7 17027 16831 140566
India 8 15005 14841 128763
France 9 13153 12973 130632
South Korea 10 11983 11923 114675
Italy 11 10964 10794 111850
Spain 12 9428 9330 123336
Iran 13 8896 8819 57470
Australia 14 8831 8725 90765
Brazil 15 8668 8596 60702
The answer:
Reducedset = Top15.iloc[:,10:20].mean(axis=1).sort_values(ascending=False)
This first stage finds the mean of columns 10-20 for each row (axis=1) and sorts them in descending order (ascending = False)
Reducedset.reindex(Reducedset.index)
Here, we are resetting the index of the dataframe 'Reducedset' as the index of the amended dataframe above.

How do I get the name of the highest value in a group in Pandas?

I have the following dataframe:
Country Continent Population
--- ------- ------------- ------------
0 United States North America 329,451,665
1 Canada North America 37,602,103
2 Brazil South America 210,147,125
3 Argentina South America 43,847,430
I want to group by the continent, and get the name of the country with the highest population in that continent, so basically I want my result to look as follows:
Continent Country
---------- -------------
North America United States
South America Brazil
How can I do this?
Use idxmax to get index of the max row:
df['Population'] = pd.to_numeric(df['Population'].str.replace(',', ''))
idx = df.groupby('Continent')['Population'].idxmax()
df.loc[idx]
Result:
Country Continent Population
0 United States North America 329451665
2 Brazil South America 210147125

how to merge a multiple of rows into one row and name it in Pandas?

I have a dataframe:
age sex country
25 m USA
30 f Canada
65 f china
42 m Indonesia
32 f mexico
I want to convert the country to 2 categories and then I want to generate 2 columns of dummy variables:
North America=(USA, Canada, Mexico).
Asia= (China, Indonesia)
You can make a single column named continent and get your result:-
df = pd.DataFrame(data = {'age':[25,23,26], 'sex':['m','f','f'], 'country':
['mexico','china','usa']})
north_america = ['usa','mexico','canada']
asia = ['china','indonesia']
def change(country):
if country in north_america:
return "North America"
elif country in asia:
return "Asia"
df['continent'] = df['country'].apply(change)
df
Output
age sex country continent
0 25 m mexico North America
1 23 f china Asia
2 26 f usa North America

how to iterate by loop with values in function using python?

I want to pass values using loop one by one in function using python.Values are stored in dataframe.
def eam(A,B):
y=A +" " +B
return y
Suppose I pass the values of A as country and B as capital .
Dataframe df is
country capital
India New Delhi
Indonesia Jakarta
Islamic Republic of Iran Tehran
Iraq Baghdad
Ireland Dublin
How can I get value using loop
0 India New Delhi
1 Indonesia Jakarta
2 Islamic Republic of Iran Tehran
3 Iraq Baghdad
4 Ireland Dublin
Here you go, just use the following syntax to get a new column in the dataframe. No need to write code to loop over the rows. However, if you must loop, df.iterrows() returns or df.itertuples() provide nice functionality to accomplish similar objectives.
>>> df = pd.read_clipboard(sep='\t')
>>> df.head()
country capital
0 India New Delhi
1 Indonesia Jakarta
2 Islamic Republic of Iran Tehran
3 Iraq Baghdad
4 Ireland Dublin
>>> df.columns
Index(['country', 'capital'], dtype='object')
>>> df['both'] = df['country'] + " " + df['capital']
>>> df.head()
country capital both
0 India New Delhi India New Delhi
1 Indonesia Jakarta Indonesia Jakarta
2 Islamic Republic of Iran Tehran Islamic Republic of Iran Tehran
3 Iraq Baghdad Iraq Baghdad
4 Ireland Dublin Ireland Dublin

How to update/create column in pandas based on values in a list

So, here is my dataframe
import pandas as pd
cols = ['Name','Country','Income']
vals = [['Steve','USA',40000],['Matt','UK',40000],['John','USA',40000],['Martin','France',40000],]
x = pd.DataFrame(vals,columns=cols)
I have another list:
europe = ['UK','France']
I want to create a new column 'Continent' if x.Country is in europe
You need numpy.where with condition with isin:
x['Continent'] = np.where(x['Country'].isin(europe), 'Europe', 'Not Europe')
print (x)
Name Country Income Continent
0 Steve USA 40000 Not Europe
1 Matt UK 40000 Europe
2 John USA 40000 Not Europe
3 Martin France 40000 Europe
Or you can using isin directly
x['New Column']='Not Europe'
x.loc[x.Country.isin(europe),'New Column']='Europe'
Out[612]:
Name Country Income New Column
0 Steve USA 40000 Not Europe
1 Matt UK 40000 Europe
2 John USA 40000 Not Europe
3 Martin France 40000 Europe

Categories

Resources