I have the following dataframe in pandas:
3/2/20 3/3/20 Measure State City
5 6 Deaths WA King County
0 0 Deaths CA Orange
14 21 Confirmed WA King County
1 1 Confirmed CA Orange
There are several additional date columns range :
1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20
How do I pivot/reshape so I end up with something like the below, but including all the date columns?
State City Measure Date Value
WA King Deaths 3/2/20 5
WA King Deaths 3/3/20 6
WA King Deaths 3/2/20 14
WA King Deaths 3/3/20 21
CA Orange Deaths 3/2/20 0
CA Orange Deaths 3/3/20 0
CA Orange Confirmed 3/2/20 1
CA Orange Confirmed 3/3/20 1
Per comment above, melt solved my problem:
df.melt(id_vars = ['Measure', 'State', 'City'], var_name = 'Date' , value_vars=['1/22/20', '1/23/20',
'1/24/20', '1/25/20', '1/26/20', '1/27/20', '1/28/20', '1/29/20',
'1/30/20', '1/31/20', '2/1/20', '2/2/20', '2/3/20', '2/4/20', '2/5/20',
'2/6/20', '2/7/20', '2/8/20', '2/9/20', '2/10/20', '2/11/20', '2/12/20',
'2/13/20', '2/14/20', '2/15/20', '2/16/20', '2/17/20', '2/18/20',
'2/19/20', '2/20/20', '2/21/20', '2/22/20', '2/23/20', '2/24/20',
'2/25/20', '2/26/20', '2/27/20', '2/28/20', '2/29/20', '3/1/20',
'3/2/20', '3/3/20'])
Related
I have two dataframes:
Table1:
Table2:
How to find:
The country-city combinations that are present only in Table2 but not Table1.
Here [India-Mumbai] is the output.
For each country-city combination, that's present in both the tables, find the "Initiatives" that are present in Table2 but not Table1.
Here {"India-Bangalore": [Textile, Irrigation], "USA-Texas": [Irrigation]}
To answer the first question, we can use the merge method and keep only the NaN rows :
>>> df_merged = pd.merge(df_1, df_2, on=['Country', 'City'], how='left', suffixes = ['_1', '_2'])
>>> df_merged[df_merged['Initiative_2'].isnull()][['Country', 'City']]
Country City
13 India Mumbai
For the next question, we first need to remove the NaN rows from the previously merged DataFrame :
>>> df_both_table = df_merged[~df_merged['Initiative_2'].isnull()]
>>> df_both_table
Country City Initiative_1 Initiative_2
0 India Bangalore Plants Plants
1 India Bangalore Plants Textile
2 India Bangalore Plants Irrigtion
3 India Bangalore Industries Plants
4 India Bangalore Industries Textile
5 India Bangalore Industries Irrigtion
6 India Bangalore Roads Plants
7 India Bangalore Roads Textile
8 India Bangalore Roads Irrigtion
9 USA Texas Plants Plants
10 USA Texas Plants Irrigation
11 USA Texas Roads Plants
12 USA Texas Roads Irrigation
Then, we can filter on the rows that are strictly different on columns Initiative_1 and Initiative_2 and use a groupby to get the list of Innitiative_2 :
>>> df_unique_initiative_2 = df_both_table[~(df_both_table['Initiative_1'] == df_both_table['Initiative_2'])]
>>> df_list_initiative_2 = df_unique_initiative_2.groupby(['Country', 'City'])['Initiative_2'].unique().reset_index()
>>> df_list_initiative_2
Country City Initiative_2
0 India Bangalore [Textile, Irrigation, Plants]
1 USA Texas [Irrigation, Plants]
We do the same but this time on Initiative_1 to get the list as well :
>>> df_list_initiative_1 = df_unique_initiative_2.groupby(['Country', 'City'])['Initiative_1'].unique().reset_index()
>>> df_list_initiative_1
Country City Initiative_1
0 India Bangalore [Plants, Industries, Roads]
1 USA Texas [Plants, Roads]
To finish, we use the set to remove the last redondant Initiative_1 elements to get the expected result :
>>> df_list_initiative_2['Initiative'] = (df_list_initiative_2['Initiative_2'].map(set)-df_list_initiative_1['Initiative_1'].map(set)).map(list)
>>> df_list_initiative_2[['Country', 'City', 'Initiative']]
Country City Initiative
0 India Bangalore [Textile, Irrigation]
1 USA Texas [Irrigation]
Alternative approach (df1 your Table1, df2 your Table2):
combos_1, combos_2 = set(zip(df1.Country, df1.City)), set(zip(df2.Country, df2.City))
in_2_but_not_in_1 = [f"{country}-{city}" for country, city in combos_2 - combos_1]
initiatives = {
f"{country}-{city}": (
set(df2.Initiative[df2.Country.eq(country) & df2.City.eq(city)])
- set(df1.Initiative[df1.Country.eq(country) & df1.City.eq(city)])
)
for country, city in combos_1 & combos_2
}
Results:
['India-Delhi']
{'India-Bangalore': {'Irrigation', 'Textile'}, 'USA-Texas': {'Irrigation'}}
I think you got this "The country-city combinations that are present only in Table2 but not Table1. Here [India-Mumbai] is the output" wrong: The combinations India-Mumbai is not present in Table2?
I have two df's
df1
date League teams
0 201902272215 brazil cup foz do iguacu fcceara ce
1 201902272300 colombia primera a deportes tolimaatletico bucaramanga
2 201902272300 brazil campeonato gaucho 2nd division ypiranga rsuniao frederiquense
3 201902272300 brazil campeonato gaucho 2nd division esportivo rstupi rs
4 201902272300 brazil campeonato gaucho 2nd division sao paulo rsgremio esportivo bage
14 201902280000 four nations women tournament (in usa) usa (w)japan (w)
25 201902280030 bolivia professional football league real potosibolivar
df2
date league teams
0 201902280000 womens international usa womenjapan women
1 201902280000 brazil amazonense sul america ecrio negro am
2 201902280030 bolivia apertura real potosibolivar
3 201902280030 brazil campeonato paulista palmeirasituano
4 201902280030 copa sudamericana racing clubcorinthians
The result I would want is all the rows from df2 that are near matches with df1
date league teams near_match
0 201902280000 womens international usa womenjapan women 1
1 201902280000 brazil amazonense sul america ecrio negro am 0
2 201902280030 bolivia apertura real potosibolivar 1
3 201902280030 brazil campeonato paulista palmeirasituano 0
4 201902280030 copa sudamericana racing clubcorinthians 0
I have tried to use a variation of a for loop using SequenceMatcher and setting the threshold to a match of above 0.8, but haven't had any luck.
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']
for teams_1, date_1, league_1 in df_1[['teams','date']].values:
for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2
# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well
Any advice or guidance would be greatly appreciated.
I would like to add a dictionary to a list, which contains several other dictionaries.
I have a list of ten top travel cities:
City Country Population Area
0 Buenos Aires Argentina 2891000 4758
1 Toronto Canada 2800000 2731571
2 Pyeongchang South Korea 2581000 3194
3 Marakesh Morocco 928850 200
4 Albuquerque New Mexico 559277 491
5 Los Cabos Mexico 287651 3750
6 Greenville USA 84554 68
7 Archipelago Sea Finland 60000 8300
8 Walla Walla Valley USA 32237 33
9 Salina Island Italy 4000 27
10 Solta Croatia 1700 59
11 Iguazu Falls Argentina 0 672
I imported the excel with pandas:
import pandas as pd
travel_df = pd.read_excel('./cities.xlsx')
print(travel_df)
cities = travel_df.to_dict('records')
print(cities)
variables = list(cities[0].keys())
I would like to add a 12th element to the end of the list but don't know how to do so:
beijing = {"City" : "Beijing", "Country" : "China", "Population" : "24000000", "Ares" : "6490" }
print(beijing)
Try appending the new row to the DataFrame you read.
travel_df.append(beijing, ignore_index=True)
I have the following dict which I converted to dataframe
players_info = {'Afghanistan': {'Asghar Stanikzai': 809.0,
'Mohammad Nabi': 851.0,
'Mohammad Shahzad': 1713.0,
'Najibullah Zadran': 643.0,
'Samiullah Shenwari': 774.0},
'Australia': {'AJ Finch': 1082.0,
'CL White': 988.0,
'DA Warner': 1691.0,
'GJ Maxwell': 822.0,
'SR Watson': 1465.0},
'England': {'AD Hales': 1340.0,
'EJG Morgan': 1577.0,
'JC Buttler': 985.0,
'KP Pietersen': 1176.0,
'LJ Wright': 759.0}}
pd.DataFrame(players_info)
The resulting output is
But I want the columns to be mapped with rows like the following
Player Team Score
Mohammad Nabi Afghanistan 851.0
Mohammad Shahzad Afghanistan 1713.0
Najibullah Zadran Afghanistan 643.0
JC Buttler England 985.0
KP Pietersen England 1176.0
LJ Wright England 759.0
I tried reset_index but it is not working as I want. How can I do that ?
You need:
df = df.stack().reset_index()
df.columns=['Player', 'Team', 'Score']
Output of df.head(5):
Player Team Score
0 AD Hales Score 1340.0
1 AJ Finch Team 1082.0
2 Asghar Stanikzai Player 809.0
3 CL White Team 988.0
4 DA Warner Team 1691.0
Let's take a stab at this using melt. Should be pretty fast.
df.rename_axis('Player').reset_index().melt('Player').dropna()
Player variable value
2 Asghar Stanikzai Afghanistan 809.0
10 Mohammad Nabi Afghanistan 851.0
11 Mohammad Shahzad Afghanistan 1713.0
12 Najibullah Zadran Afghanistan 643.0
14 Samiullah Shenwari Afghanistan 774.0
16 AJ Finch Australia 1082.0
18 CL White Australia 988.0
19 DA Warner Australia 1691.0
21 GJ Maxwell Australia 822.0
28 SR Watson Australia 1465.0
30 AD Hales England 1340.0
35 EJG Morgan England 1577.0
37 JC Buttler England 985.0
38 KP Pietersen England 1176.0
39 LJ Wright England 759.0
I have two csv files. The first file contains names of all countries with their capital cities,
CSV 1:
Capital Country Country Code
Budapest Hungary HUN
Rome Italy ITA
Dublin Ireland IRL
Paris France FRA
Berlin Germany DEU
.
.
.
CSV 2:
Second CSV file contains trip details of a bus,
Trip City Trip Country No. of pax
Budapest HUN 24
Paris FRA 36
Munich DEU 9
Florence ITA 5
Milan ITA 25
Rome ITA 2
Rome ITA 45
I would like to add a new column df["Touism visit"] with the values of no of pax, if the Trip City (from CSV 2) is a capital of a country (from CSV 1) and if the number of pax is more than 10.
Thank you.
Try this:
df2['tourism'] = 0
df2.loc[df2['Trip City'].isin(df1['Capital']) & (df2['No. of pax'] > 10), 'tourism'] = df2.loc[df2['Trip City'].isin(df1['Capital'])& (df2['No. of pax'] > 10), 'No. of pax']
I get :
Trip_City Trip_Country No._of_pax tourism
0 Budapest HUN 24 24
1 Paris FRA 36 36
2 Munich DEU 9 0
3 Florence ITA 5 0
4 Milan ITA 25 0
5 Rome ITA 2 0
6 Rome ITA 45 45
(I had to add _s to get pd.read_clipboard() to work properly)
this might also help,
import the dfs
df1 = pd.read_csv("CSV1.csv")
df2 = pd.read_csv("CSV2.csv")
make a dictionary out of the pandas Series
my_dict=dict(zip((df1["Country_Code"]),(df1["Capital"])))
define a function that test your conditions (note i used np.logical_and() to combine the conditions. A normal and
def isTourism(country_code,trip_city,No_of_pax):
if np.logical_and((my_dict[country_code]==trip_city),(No_of_pax >= 10)):
return "Yes"
else:
return "No
call function with map
df2["Tourism"] = list(map(isTourism,df2["Trip Country"],df2["Trip City"], df2["No. Of pax"]))
print(df2)
Trip City Trip Country No. Of pax Tourism
0 Budapest HUN 24 Yes
1 Paris FRA 36 Yes
2 Munich DEU 9 No
3 Florence ITA 5 No
4 Milan ITA 25 No
5 Rome ITA 2 No
6 Rome ITA 45 Yes
If you filter your second dataframe to only the values > 10, you could merge and sum as follows:
import pandas as pd
df1 = pd.DataFrame({'Capital': ['Budapest', 'Rome', 'Dublin', 'Paris',
'Berlin'],
'Country': ['Hungary', 'Italy', 'Ireland', 'France',
'Germany'],
'Country Code': ['HUN', 'ITA', 'IRL', 'FRA', 'DEU']
})
df2 = pd.DataFrame({'Trip City': ['Budapest', 'Paris', 'Munich', 'Florence',
'Milan', 'Rome', 'Rome'],
'Trip Country': ['HUN', 'FRA', 'DEU', 'ITA', 'ITA',
'ITA', 'ITA'],
'No. of pax': [24, 36, 9, 5, 25, 2, 45]
})
df2 = df2[df2['No. of pax'] > 10]
combined = df1.merge(df2,
left_on=['Capital', 'Country Code'],
right_on=['Trip City', 'Trip Country'],
how='left').groupby(['Capital', 'Country Code'],
sort=False,
as_index=False)['No. of pax'].sum()
print combined
This prints:
Capital Country Code No. of pax
0 Budapest HUN 24.0
1 Rome ITA 45.0
2 Dublin IRL NaN
3 Paris FRA 36.0
4 Berlin DEU NaN