I have table like this
Province Country Date infected
New South Wales Australia 1/22/20 12
Victoria Australia 1/22/20 10
British Columbia Canada 1/22/20 5
USA 1/22/20 7
New South Wales Australia 1/23/20 6
Victoria Australia 1/23/20 2
British Columbia Canada 1/23/20 1
USA 1/23/20 10
Now I want to convert that table into like this
Province Country Date infected
New South Wales Australia 1/22/20 12
1/23/20 6
Victoria Australia 1/22/20 10
1/23/20 2
British Columbia Canada 1/22/20 5
1/23/20 1
USA 1/22/20 7
1/23/20 10
I have tried df.sort_values('Date') but no luck.
How can I implement such kind of table?
I'm a Python rookie, but let me think along (I'm sure this can be done neater).
df = df.fillna(method='ffill')
df = df.groupby(['Province', 'Country', 'Date']).sum()
This gave me:
Province Country Date infected
British Columbia Canada 1/22/20 5
1/23/20 1
USA 1/22/20 7
1/23/20 10
New South Wales Australia 1/22/20 12
1/23/20 6
Victoria Australia 1/22/20 10
1/23/20 2
I kind of anticipated you have NaN values in the empty places (at least it's what I had importing the dataframe). I changed all these NaN to values from the index above them.
Then a groupby gave me the result above. Not sure if this is what you were after, but maybe it sparked some ideas =)
dict = {"Province": ["New South Wales", "Victoria", "British Columbia", "", "New South Wales", "Victoria", "British Columbia", ""],
"Country": ["Australia", "Australia", "Canada", "USA", "Australia", "Australia", "Canada", "USA"],
"Date": ["1/22/20", "1/22/20", "1/22/20", "1/22/20", "1/23/20", "1/23/20", "1/23/20", "1/23/20"],
"infected": [12, 10, 6, 5, 2, 3, 4, 5] }
import pandas as pd
brics = pd.DataFrame(dict)
print(brics)
df = brics.set_index(['Country', 'Province', 'Date']).sort_values(['Country', 'Province', 'Date'])
print(df)
Output:
Province Country Date infected
0 New South Wales Australia 1/22/20 12
1 Victoria Australia 1/22/20 10
2 British Columbia Canada 1/22/20 6
3 USA 1/22/20 5
4 New South Wales Australia 1/23/20 2
5 Victoria Australia 1/23/20 3
6 British Columbia Canada 1/23/20 4
7 USA 1/23/20 5
infected
Country Province Date
Australia New South Wales 1/22/20 12
1/23/20 2
Victoria 1/22/20 10
1/23/20 3
Canada British Columbia 1/22/20 6
1/23/20 4
USA 1/22/20 5
1/23/20 5
Related
I have a dataframe like this :
origin
destination
germany
germany
germany
italy
germany
spain
USA
USA
USA
spain
Argentina
Argentina
Argentina
Brazil
and I want to filter the routes that are within the same country, that is, I want to obtain the following dataframe :
origin
destination
germany
italy
germany
spain
USA
spain
Argentina
Brazil
How can i do this with pandas ? I have tried deleting duplicates but it does not give me the results I want
Use a simple filter:
df = df[df['origin'] != df['destination']]
Output:
>>> df
origin destination
1 germany italy
2 germany spain
4 USA spain
6 Argentina Brazil
We could query:
out = df.query('origin!=destination')
Output:
origin destination
1 germany italy
2 germany spain
4 USA spain
6 Argentina Brazil
I have one dataframe which is below-
0
____________________________________
0 Country| India
60 Delhi
62 Mumbai
68 Chennai
75 Country| Italy
78 Rome
80 Venice
85 Milan
88 Country| Australia
100 Sydney
103 Melbourne
107 Perth
I want to Split the data in 2 columns so that in one column there will be country and on other there will be city. I have no idea where to start with. I want like below-
0 1
____________________________________
0 Country| India Delhi
1 Country| India Mumbai
2 Country| India Chennai
3 Country| Italy Rome
4 Country| Italy Venice
5 Country| Italy Milan
6 Country| Australia Sydney
7 Country| Australia Melbourne
8 Country| Australia Perth
Any Idea how to do this?
Look for rows where | is present and pull into another column, and fill down on the newly created column :
(
df.rename(columns={"0": "city"})
# this looks for rows that contain '|' and puts them into a
# new column called Country. rows that do not match will be
# null in the new column.
.assign(Country=lambda x: x.loc[x.city.str.contains("\|"), "city"])
# fill down on the Country column, this also has the benefit
# of linking the Country with the City,
.ffill()
# here we get rid of duplicate Country entries in city and Country
# this ensures that only Country entries are in the Country column
# and cities are in the City column
.query("city != Country")
# here we reverse the column positions to match your expected output
.iloc[:, ::-1]
)
Country city
60 Country| India Delhi
62 Country| India Mumbai
68 Country| India Chennai
78 Country| Italy Rome
80 Country| Italy Venice
85 Country| Italy Milan
100 Country| Australia Sydney
103 Country| Australia Melbourne
107 Country| Australia Perth
Use DataFrame.insert with Series.where and Series.str.startswith for replace not matched values to missing values with ffill for forward filling missing values and then remove rows with same values in both by Series.ne for not equal in boolean indexing:
df.insert(0, 'country', df[0].where(df[0].str.startswith('Country')).ffill())
df = df[df['country'].ne(df[0])].reset_index(drop=True).rename(columns={0:'city'})
print (df)
country city
0 Country|India Delhi
1 Country|India Mumbai
2 Country|India Chennai
3 Country|Italy Rome
4 Country|Italy Venice
5 Country|Italy Milan
6 Country|Australia Sydney
7 Country|Australia Melbourne
8 Country|Australia Perth
Say I have a transposed df like so
id 0 1 2 3
0 1361 Spain Russia South Africa China
1 1741 Portugal Cuba UK Ukraine
2 1783 Germany USA France Egypt
3 1353 Brazil Russia Japan Kenya
4 1458 India Romania Holland Nigeria
How could I get all rows where there is 'er' so it'll return me this
id 0 1 2 3
2 1783 Germany USA France Egypt
4 1458 India Romania Holland Nigeria
because 'er' is contained in Germany and Nigeria.
Thanks!
Using contains
df[df.apply(lambda x :x.str.contains(pat='er')).any(1)]
Out[96]:
id 0 1 2 3
2 1783 Germany USA France Egypt None
4 1458 India Romania Holland Nigeria None
Use apply + str.contains across rows:
df = df[df.apply(lambda x: x.str.contains('er').any(), axis=1)]
print(df)
id 0 1 2 3
2 1783 Germany USA France Egypt
4 1458 India Romania Holland Nigeria
I want to have country codes represented in df dataframe as alpha_3_code, in my field Nationality_Codes of df2 dataframe. For every row in df2 I want to match Reviewer_Nationality with en_short_name in df, and if match, assign country code to Nationality_Codes in df2.
df2.head()
Nationality_Codes Reviewer_Nationality Reviewer_Score
NaN Russia 2.9
NaN United Kingdom 7.5
NaN Australia 7.1
NaN United Kingdom 3.8
NaN Russia 6.7
df.head()
alpha_3_code en_short_name nationality
RUS Russia Russian
ALA Åland Islands Åland Island
ALB Albania Albanian
AUS Australia Australian
UK United Kingdom British, UK
Final Result should be:
df2.head()
Nationality_Codes Reviewer_Nationality Reviewer_Score
RUS Russia 2.9
UK United Kingdom 7.5
AUS Australia 7.1
UK United Kingdom 3.8
RUS Russia 6.7
I tried this code, but didn't worked.
for index, row in df.iterrows():
for index2, row2 in df2.iterrows():
if row2['Reviewer_Nationality']==row['en_short_name']:
df2['Nationality_Codes'][row2]=df['alpha_3_code'][row2]
Can anyone help me?
Many Thanks!
One way would be to create a Series mapping for your english names and codes, and use .map:
#my_map = pd.Series(df.alpha_3_code.values,index=df.en_short_name)
my_map = df.set_index('en_short_name')['alpha_3_code']
df2['Nationality_Codes'] = df2['Reviewer_Nationality'].map(my_map)
Output:
>>> df2
Nationality_Codes Reviewer_Nationality Reviewer_Score
0 RUS Russia 2.9
1 UK United Kingdom 7.5
2 AUS Australia 7.1
3 UK United Kingdom 3.8
4 RUS Russia 6.7
Try this:
merged = df[['alpha_3_code', 'en_short_name']].merge(df2[['Reviewer_Nationality',
'Reviewer_Score']],
left_on='en_short_name', right_on='Reviewer_Nationality', how='left')]
.rename(columns={'alpha_3_code': 'Nationality_Codes'})\
.drop('en_short_name', axis=1)
I have the following function, which returns the pandas series of States - Associated Counties
def answer():
census_df.set_index(['STNAME', 'CTYNAME'])
for name, state, cname in zip(census_df['STNAME'], census_df['STATE'], census_df['CTYNAME']):
print(name, state, cname)
Alabama 1 Tallapoosa County
Alabama 1 Tuscaloosa County
Alabama 1 Walker County
Alabama 1 Washington County
Alabama 1 Wilcox County
Alabama 1 Winston County
Alaska 2 Alaska
Alaska 2 Aleutians East Borough
Alaska 2 Aleutians West Census Area
Alaska 2 Anchorage Municipality
Alaska 2 Bethel Census Area
Alaska 2 Bristol Bay Borough
Alaska 2 Denali Borough
Alaska 2 Dillingham Census Area
Alaska 2 Fairbanks North Star Borough
I would like to know the state with the most counties in it. I can iterate through each state like this:
counter = 0
counter2 = 0
for name, state, cname in zip(census_df['STNAME'], census_df['STATE'], census_df['CTYNAME']):
if state == 1:
counter += 1
print(counter)
if state == 1:
counter2 += 1
print(counter2)
and so on. I can range the number of states (rng = range(1, 56)) and iterate through it, but creating 56 lists is a nightmare. Is there an easier way if doing so?
Pandas allows us to do such operations without loops/iterating:
In [21]: df.STNAME.value_counts()
Out[21]:
Alaska 9
Alabama 6
Name: STNAME, dtype: int64
In [24]: df.STNAME.value_counts().head(1)
Out[24]:
Alaska 9
Name: STNAME, dtype: int64
or
In [18]: df.groupby('STNAME')['CTYNAME'].count()
Out[18]:
STNAME
Alabama 6
Alaska 9
Name: CTYNAME, dtype: int64
In [19]: df.groupby('STNAME')['CTYNAME'].count().idxmax()
Out[19]: 'Alaska'