count total number of comma separated countries in pandas column - python

I have a list of countries I would like to get a count of in a data frame column.
ship_to_countries
Albania, Algeria, Azerbaijan, Bahrain, France, Georgia
Ireland, England, France, Germany
France, Germany,
Ireland
How can I create a column to the right which has the count of countries in pandas?
I've tried this solution but I get a count of how many times a single country is listed.
so If Isreal is in my column once 16 times I get 16. I'd like only get only how many countries are in each pandas row.
(df['ship_to_countries'].str.count(',')
.add(1)
.groupby(df.ship_to_countries)
.sum())

Use str.split() and len:
df["count"] = df["ship_to_countries"].apply(lambda x: len(x.split(",")))

Related

Compare the content of a dataframe with a cell value in other and replace the value matched in Pandas

I have two dataframes,
df1 =
Countries description
Continents
values
C0001 also called America,
America
21tr
C0004 and C0003 are neighbhors
Europe
504 bn
on advancing C0005 with C0001.security
Europe
600bn
C0002, the smallest continent
Australi
1.7tr
df2 =
Countries
Id
US
C0001
Australia
C0002
Finland
C0003
Norway
C0004
Japan
C0005
df1 has columns Countries descriptions but instead of their actual names, codes are given.
df2 has countries with their codes.
I want to replace the countries Code(like C0001, C0002) with their Names in the df1, like this:
df1 =
Countries description
Continents
values
US also called America, some..
America
21tr
Norway and Finland are neighbhors
Europe
504 bn
on advancing Japan with US.security
Europe
600bn
Australia, the smallest continent
Austral
1.7tr
I tried with the Pandas merge method but that didnt work:
df3 = df1.merge(df2, on=['Countries'], how='left')
Thanks :)
Here is one way to approach it with replace :
d = dict(zip(df2["Id"], df2["Countries"]))
​
df1["Countries description"] = df1["Countries description"].replace(d, regex=True)
Output :
​
print(df1)
Countries description Continents values
0 US also called America, America 21tr
1 Norway and Finland are neighbhors Europe 504 bn
2 on advancing Japan with US.security Europe 600bn
3 Australia, the smallest continent Australi 1.7tr

Replace limited values from a column in Pandas

Edit
How would I replace a specific number of values from a specified column in a DataFrame?
# babies born in countries
Date Country
1992-02-15 USA
1995-05-04 USA
1996-02-12 Canada
2003-12-17 France
2005-01-11 USA
Suppose I have the above data and it turns out that the birth country for the first two birth is wrong, instead of USA it should be Spain, France.
I tried the replace method but it changes the values altogether.
Desired result:
# babies born in countries
Date Country
1992-02-15 Spain
1995-05-04 France
1996-02-12 Canada
2003-12-17 France
2005-01-11 USA
Thank you!
To access the first two rows via indexing:
df.iloc[0:2, 1] = ['Spain', 'France']
Or:
df.loc[:2, 'Country'] = ['Spain', 'France']
Also if you need to access specific rows:
df.loc[[0,1], 'Country'] = ['Spain', 'France']

Delete value Pandas Column

I started to learn about pandas and try to analyze a data
So in my data there is a column country which contain a few country,I only want to take the first value and change it to a new column.
An example First index have Colombia,Mexico,United Stated and I only wanna to take the first one Colombia [0] and delete the other contry[1:x],is this possible?
I try a few like loc,iloc or drop() but I hit a dead end so I asked in here
You can use Series.str.split:
df['country'] = df['country'].str.split(',').str[0]
Consider below df for example:
In [1520]: df = pd.DataFrame({'country':['Colombia, Mexico, US', 'Croatia, Slovenia, Serbia', 'Denmark', 'Denmark, Brazil']})
In [1521]: df
Out[1521]:
country
0 Colombia, Mexico, US
1 Croatia, Slovenia, Serbia
2 Denmark
3 Denmark, Brazil
In [1523]: df['country'] = df['country'].str.split(',').str[0]
In [1524]: df
Out[1524]:
country
0 Colombia
1 Croatia
2 Denmark
3 Denmark
Use .str.split():
df['country'] = df['country'].str.split(',',expand=True)[0]

Python dataframe remove substring before specific character if ture

I am trying to remove the numbers before "-" in the name column. But not all rows have numbers before the name. How do I remove the numbers in rows that have numbers and keep the rows that don't have numbers in front untouched?
Sample df:
country Name
UK 5413-Marcus
Russia 5841-Natasha
Hong Kong Keith
China 7777-Wang
Desired df
country Name
UK Marcus
Russia Natasha
Hong Kong Keith
China Wang
I appreciate any assistance! Thanks in advance!
Pandas has string accessors for series. If you split and get the last element of the resulting list, even if a row does not have the delimeter '-' you still want the last element of that one-element list.
df.Name = df.Name.str.split('-').str.get(-1)
You might use str.lstrip for that task following way:
import pandas as pd
df = pd.DataFrame({'country':['UK','Russia','Hong Kong','China'],'Name':['5413-Marcus','5841-Natasha','Keith','7777-Wang']})
df['Name'] = df['Name'].str.lstrip('-0123456789')
print(df)
Output:
country Name
0 UK Marcus
1 Russia Natasha
2 Hong Kong Keith
3 China Wang
.lstrip does remove leading characters, .rstrip trailing characters and .strip both.

Grouping and adding calculated columns to my dataframe

I have a dataframe that looks like this I have made my continents my Index field. I want it to show up a little different. I would like to get the dataframe to just have 3 continents and then have all the countries that fall under that continent to show up as a count
Continent Country
Oceania Australia 53 154.3 203.6 209.9
Europe Austria 28.2 49.3 59.7 59.9
Europe Belgium 33.2 70.3 83.4 82.8
Europe Denmark 18.6 26.0 38.9 36.1
Asia Japan 382.9 835.5 1028.1 1049.0
So my output would look like such: and it would show just the number of countries under that continent. I would also like it for when it combines everything into num_countries that it gives the mean of everything for that country so its all rolled into one for each continent
Continent num_Countries mean
Oceania 1 209.9
Europe 3 328.2
Asia 1 382.9
I have tried to create these columns but i can get the new columns to create and when I do they come up as Nan values and for the continents I cant get the groupby() function to work in the way I want it to because it doesnt roll all of the countries into just the continents it displays the full list of continents and countries.
You can use a pivot table for this. (I labeled the unlabeled columns with 1 to 4)
df.pivot_table(index="Continent", values=["Country", "1"],
aggfunc=('count', 'mean'))
The following groups by 'Continent' and applies a function that counts the number of countries and finds the mean of means (I assumed this is what you wanted since you have 4 columns of numeric data for a number of countries per continent).
def f(group):
return pd.DataFrame([{'num_Countries': group.Country.count(),
'mean': group.mean().mean()}])
grouped = df.groupby('Continent')
result = grouped.apply(f).reset_index(level=1, drop=True)

Categories

Resources