My current dataframe:
Year CityA CityB
Year Abilene, TX Akron, OH Albany, GA Albany, OR
0 2012 141.997500 92.033333 105.662500 116.250833
1 2013 150.175000 95.971667 109.942500 125.361667
2 2014 157.588333 98.930833 109.628333 132.511667
3 2015 161.584167 102.416667 109.717500 142.058333
4 2016 168.106667 107.449167 110.175833 157.204167
I want to reshape it preferably in-place in the following manner:
`Year City Value'
Year City Value
2012 Abilene, TX, somevalue
2013 Abilene, TX, somevalue
For every city.
How do I go about this in an efficient manner?
I figured it out.
pd.melt(DataFrame, id_vars = "Year", value_vars = DataFrame.columns[1:])
Related
I'm using Pandas where function trying to find the percentage in each state
filter1 = df['state']=='California'
filter2 = df['state']=='Texas'
filter3 = df['state']=='Florida'
df['percentage']= df['total'].where(filter1)/df['total'].where(filter1).sum()
The output is
Year state total percentage
2014 California 914198.0 0.134925
2014 Florida 766441.0 NaN
2014 Texas 1045274.0 NaN
2015 California 874642.0 0.129087
2015 Florida 878760.0 NaN
how do I apply the rest of 2 filters into there too?
Don't use where but groupby.transform:
df['percentage'] = df['total'].div(df.groupby('state')['total'].transform('sum'))
Output:
Year state total percentage
0 2014 California 914198.0 0.511056
1 2014 Florida 766441.0 0.465865
2 2014 Texas 1045274.0 1.000000
3 2015 California 874642.0 0.488944
4 2015 Florida 878760.0 0.534135
You can try out df.loc[(filter1) & (filter2) & (filter3)] in pandas to apply multiple filter together !
I have a dataframe that currently looks like this:
Year Country Subject Descriptor GDP
0 2015 Austria r 344.2
1 2015 Austria n 344.2
2 2015 Austria d 100
3 2015 Austria u 5.742
4 2015 Belgium r 416.7
5 2015 Belgium n 416.7
6 2015 Belgium d 100
7 2015 Belgium u 8.483
I want to transform it to look something along these lines:
Year Country GDP_R GDP_N GDP_D GDP_U
2015 Austria 344.2 344.2 100 5.742
2015 Belgium 416.7 416.7 100 8.483
So far I have attempted to use melt and stack but I feel like I'm just missing it, if you can help me here it'd be much appreciated.
Thank you!
You can first use groupby.agg() and put all values of GDP column in a list. Then, you can convert the object to a new DataFrame, using as columns the prefix 'GDP_' and all the values of the Subject Descriptor column.
Finally, putting the two together using pd.concat() will give your final output.
Please see below an example:
one = df.groupby(['Year','Country'])['GDP'].agg(list).reset_index()
two = pd.DataFrame(one['GDP'].to_list(), columns=['GDP_' + s.upper() for s in set(df['Subject Descriptor'].tolist())])
new = pd.concat([one,two],axis=1).drop('GDP',axis=1)
new prints back:
Year Country GDP_D GDP_N GDP_R GDP_U
0 2015 Austria 344.2 344.2 100.0 5.742
1 2015 Belgium 416.7 416.7 100.0 8.483
First you can use groupby on ['Year', 'Country'] and next you can convert the GDPs for each group to a list and then transpose them to columns. Last few steps are to rename columns, reset index and remove column axis name.
(
df.groupby(['Year', 'Country'])
.apply(lambda x: pd.Series(x.GDP.tolist(), index=x['Subject Descriptor']))
.rename(columns = lambda x: f'GDP_{x.upper()}')
.reset_index()
.rename_axis('', axis=1)
)
You can use a pivot in this case :
(df.pivot(['Year', 'Country'], 'Subject_Descriptor', 'GDP')
.rename(columns = lambda col: f"GDP_{col.upper()}")
.rename_axis(columns=None).reset_index()
)
Year Country GDP_D GDP_N GDP_R GDP_U
0 2015 Austria 100.0 344.2 344.2 5.742
1 2015 Belgium 100.0 416.7 416.7 8.483
I've got the following dataset:
Date Country Specie Monthly Average \
Apr 2015 BR co 5.840000
Apr 2015 BR no2 7.553704
Apr 2015 BR o3 15.561667
Apr 2015 BR pm10 16.283333
Apr 2015 BR pm25 51.633333
... ... ... ... ...
For 10 countries, for certain emissions (specie) with months of 2015 to 2021. I want to convert them into quarterly average data (using the average of the corresponding months in a quarter) of the following form as an example:
Date Country Specie Quarterly Average \
2015 Q1 BR co 6.840000
2015 Q1 BR no2 9.553704
2015 Q1 BR o3 17.561667
2015 Q1 BR pm10 18.283333
2015 Q1 BR pm25 55.633333
... ... ... ... ...
How it would be possible to do this in python pandas?
Also I've got another question, If I want to make a separation of Specie in columns and take the corresponding values, how it would be possible, in the way that I can obtain the following structure:
Date Country co Average no2 Average o3 Average ... \
2015 Q1 BR 6.840000 9.553704 17.561667
2015 Q2 BR 8.840000 10.553704 18.561667
First create a Quarter column from original Date column
df['Quarter'] = pd.to_datetime(df['Date']).dt.to_period('Q')
Then Groupby Country, Specie and Quarter columns, calculate mean of Monthly Average column in each group. Rename the result column as Quarterly Average.
df_ = df.groupby(['Country', 'Specie', 'Quarter'], as_index=False)['Monthly Average'].mean().rename(columns={'Monthly Average': 'Quarterly Average'})
FYI, pandas.pivot_table() is what you want to obtain the structure.
So I'm a beginner at Python and I have a dataframe with Country, avgTemp and year.
What I want to do is calculate new rows on each country where the year adds 20 and avgTemp is multiplied by a variable called tempChange. I don't want to remove the previous values though, I just want to append the new values.
This is how the dataframe looks:
Preferably I would also want to create a loop that runs the code a certain number of times
Super grateful for any help!
If you need to copy the values from the dataframe as an example you can have it here:
Country avgTemp year
0 Afghanistan 14.481583 2012
1 Africa 24.725917 2012
2 Albania 13.768250 2012
3 Algeria 23.954833 2012
4 American Samoa 27.201417 2012
243 rows × 3 columns
If you want to repeat the rows, I'd create a new dataframe, perform any operation in the new dataframe (sum 20 years, multiply the temperature by a constant or an array, etc...) and use then use concat() to append it to the original dataframe:
import pandas as pd
tempChange=1.15
data = {'Country':['Afghanistan','Africa','Albania','Algeria','American Samoa'],'avgTemp':[14,24,13,23,27],'Year':[2012,2012,2012,2012,2012]}
df = pd.DataFrame(data)
df_2 = df.copy()
df_2['avgTemp'] = df['avgTemp']*tempChange
df_2['Year'] = df['Year']+20
df = pd.concat([df,df_2]) #ignore_index=True if you wish to not repeat the index value
print(df)
Output:
Country avgTemp Year
0 Afghanistan 14.00 2012
1 Africa 24.00 2012
2 Albania 13.00 2012
3 Algeria 23.00 2012
4 American Samoa 27.00 2012
0 Afghanistan 16.10 2032
1 Africa 27.60 2032
2 Albania 14.95 2032
3 Algeria 26.45 2032
4 American Samoa 31.05 2032
where df is your data frame name:
df['tempChange'] = df['year']+ 20 * df['avgTemp']
This will add a new column to your df with the logic above. I'm not sure if I understood your logic correct so the math may need some work
I believe that what you're looking for is
dfName['newYear'] = dfName.apply(lambda x: x['year'] + 20,axis=1)
dfName['tempDiff'] = dfName.apply(lambda x: x['avgTemp']*tempChange,axis=1)
This is how you apply to each row.
I have 3 dataframes each with the same columns (years) and same indexes (countries).
Now I want to merge these 3 dataframes. But since all have the same columns it is appending those.
So 'd like to keep the country index and add a subindex for each dataframe because all represent different numbers for each year.
#dataframe 1
#CO2:
2005 2010 2015 2020
country
Afghanistan 169405 210161 259855 319447
Albania 762 940 1154 1408
Algeria 158336 215865 294768 400126
#dataframe 2
#Arrivals + Departures:
2005 2010 2015 2020
country
Afghanistan 977896 1326120 1794547 2414943
Albania 103132 154219 224308 319440
Algeria 3775374 5307448 7389427 10159656
#data frame 3
#Travel distance in km:
2005 2010 2015 2020
country
Afghanistan 9330447004 12529259781 16776152792 22337458954
Albania 63159063 82810491 107799357 139543748
Algeria 12254674181 17776784271 25782632480 37150057977
The result should be something like:
2005 2010 2015 2020
country
Afghanistan co2 169405 210161 259855 319447
flights 977896 1326120 1794547 2414943
traveldistance 9330447004 12529259781 16776152792 22337458954
Albania ....
How can I do this?
NOTE: The years are an input so these are not fixed. They could just be 2005,2010 for example.
Thanks in advance.
I have tried to solve the problem using concat and groupby using your dataset hope it helps
First concat the 3 dfs
l=[df,df2,df3]
f=pd.concat(l,keys= ['CO2','Flights','traveldistance'],axis=0,).reset_index().rename(columns={'level_0':'Category'})
the use groupby to get the values
result_df=f.groupby(['country', 'Category'])[f.columns[2:]].first()
Hope it helps and solve your problem
Output looks like this