I am trying to remove the numbers before "-" in the name column. But not all rows have numbers before the name. How do I remove the numbers in rows that have numbers and keep the rows that don't have numbers in front untouched?
Sample df:
country Name
UK 5413-Marcus
Russia 5841-Natasha
Hong Kong Keith
China 7777-Wang
Desired df
country Name
UK Marcus
Russia Natasha
Hong Kong Keith
China Wang
I appreciate any assistance! Thanks in advance!
Pandas has string accessors for series. If you split and get the last element of the resulting list, even if a row does not have the delimeter '-' you still want the last element of that one-element list.
df.Name = df.Name.str.split('-').str.get(-1)
You might use str.lstrip for that task following way:
import pandas as pd
df = pd.DataFrame({'country':['UK','Russia','Hong Kong','China'],'Name':['5413-Marcus','5841-Natasha','Keith','7777-Wang']})
df['Name'] = df['Name'].str.lstrip('-0123456789')
print(df)
Output:
country Name
0 UK Marcus
1 Russia Natasha
2 Hong Kong Keith
3 China Wang
.lstrip does remove leading characters, .rstrip trailing characters and .strip both.
Related
I want to subtract column Country from DataFrame dfA with column Country in DataFrame dfB.
I'm trying the following code:
A_minus_B = dfA['Country'] - dfB['Country']
Typerror: - with str & str
What I'm expecting is:
dfA Country
1. United States
2. Puerto Rico
3. Colombia
dfB Country
1. Puerto Rico
2. Argentina
3. Canada
A_minusB Country
1. United States
2. Colombia
3. Argentina
You can use the below
A_minus_B = dfA.loc[~dfA['Country'].isin(dfB['Country']), 'Country']
Use pd.concat:
>>> pd.concat([dfA, dfB]).drop_duplicates(keep=False, ignore_index=True)
Country
0 United States
1 Colombia
2 Argentina
3 Canada
You want to create dfC = dfA.merge(dfB, "Country").
With everything in a single dataframe, then you will then be
in a good position to perform the subtraction.
You neglected to put a reproducible example in your question.
Once you work out the technical details,
describe
the solution and the results that you obtain.
I have two dataframes,
df1 =
Countries description
Continents
values
C0001 also called America,
America
21tr
C0004 and C0003 are neighbhors
Europe
504 bn
on advancing C0005 with C0001.security
Europe
600bn
C0002, the smallest continent
Australi
1.7tr
df2 =
Countries
Id
US
C0001
Australia
C0002
Finland
C0003
Norway
C0004
Japan
C0005
df1 has columns Countries descriptions but instead of their actual names, codes are given.
df2 has countries with their codes.
I want to replace the countries Code(like C0001, C0002) with their Names in the df1, like this:
df1 =
Countries description
Continents
values
US also called America, some..
America
21tr
Norway and Finland are neighbhors
Europe
504 bn
on advancing Japan with US.security
Europe
600bn
Australia, the smallest continent
Austral
1.7tr
I tried with the Pandas merge method but that didnt work:
df3 = df1.merge(df2, on=['Countries'], how='left')
Thanks :)
Here is one way to approach it with replace :
d = dict(zip(df2["Id"], df2["Countries"]))
df1["Countries description"] = df1["Countries description"].replace(d, regex=True)
Output :
print(df1)
Countries description Continents values
0 US also called America, America 21tr
1 Norway and Finland are neighbhors Europe 504 bn
2 on advancing Japan with US.security Europe 600bn
3 Australia, the smallest continent Australi 1.7tr
I have a column in a df that I want to split into two columns splitting by comma delimiter. If the value in that column does not have a comma I want to put that into the second column instead of first.
Origin
New York, USA
England
Russia
London, England
California, USA
USA
I want the result to be:
Location
Country
New York
USA
NaN
England
NaN
Russia
London
England
California
USA
NaN
USA
I used this code
df['Location'], df['Country'] = df['Origin'].str.split(',', 1)
We can try using str.extract here:
df["Location"] = df["Origin"].str.extract(r'(.*),')
df["Country"] = df["Origin"].str.extract(r'(\w+(?: \w+)*)$')
Here is a way by using str.extract() and named groups
df['Origin'].str.extract(r'(?P<Location>[A-Za-z ]+(?=,))?(?:, )?(?P<Country>\w+)')
Output:
Location Country
0 New York USA
1 NaN England
2 NaN Russia
3 London England
4 California USA
5 NaN USA
I started to learn about pandas and try to analyze a data
So in my data there is a column country which contain a few country,I only want to take the first value and change it to a new column.
An example First index have Colombia,Mexico,United Stated and I only wanna to take the first one Colombia [0] and delete the other contry[1:x],is this possible?
I try a few like loc,iloc or drop() but I hit a dead end so I asked in here
You can use Series.str.split:
df['country'] = df['country'].str.split(',').str[0]
Consider below df for example:
In [1520]: df = pd.DataFrame({'country':['Colombia, Mexico, US', 'Croatia, Slovenia, Serbia', 'Denmark', 'Denmark, Brazil']})
In [1521]: df
Out[1521]:
country
0 Colombia, Mexico, US
1 Croatia, Slovenia, Serbia
2 Denmark
3 Denmark, Brazil
In [1523]: df['country'] = df['country'].str.split(',').str[0]
In [1524]: df
Out[1524]:
country
0 Colombia
1 Croatia
2 Denmark
3 Denmark
Use .str.split():
df['country'] = df['country'].str.split(',',expand=True)[0]
I have a list of countries I would like to get a count of in a data frame column.
ship_to_countries
Albania, Algeria, Azerbaijan, Bahrain, France, Georgia
Ireland, England, France, Germany
France, Germany,
Ireland
How can I create a column to the right which has the count of countries in pandas?
I've tried this solution but I get a count of how many times a single country is listed.
so If Isreal is in my column once 16 times I get 16. I'd like only get only how many countries are in each pandas row.
(df['ship_to_countries'].str.count(',')
.add(1)
.groupby(df.ship_to_countries)
.sum())
Use str.split() and len:
df["count"] = df["ship_to_countries"].apply(lambda x: len(x.split(",")))