I have a Series containing a column with names and their nationalities in parenthesis.
I want this column to contain just the individuals nationality and without parenthesis, with the same index.
0 LOMBARDI Domingo (URU)
1 MACIAS Jose (ARG)
2 TEJADA Anibal (URU)
3 WARNKEN Alberto (CHI)
4 REGO Gilberto (BRA)
5 CRISTOPHE Henry (BEL)
6 MATEUCCI Francisco (URU)
7 MACIAS Jose (ARG)
8 LANGENUS Jean (BEL)
9 TEJADA Anibal (URU)
10 SAUCEDO Ulises (BOL)
I have tried using .split(' ')[2] to the series.
But found out "'Series' object has no attribute 'split'."
You need to use str accessor on series.
df.name.str.split('(').str[1].str[:-1]
Output:
0 URU
1 ARG
2 URU
3 CHI
4 BRA
5 BEL
6 URU
7 ARG
8 BEL
9 URU
10 BOL
Name: name, dtype: object
Using extract
s.str.extract('.*\((.*)\).*',expand=True)[0]
Out[463]:
0 URU
1 ARG
2 URU
3 CHI
Name: 0, dtype: object
Using slice. May not be optimal as it assumes the right side of the string is constant but it's another possible solution.
df.name.str.slice(start = -4).str[:-1]
Related
Hello I have this Pandas code (look below) but turn out it give me this error: TypeError: can only concatenate str (not "int") to str
import pandas as pd
import numpy as np
import os
_data0 = pd.read_excel("C:\\Users\\HP\\Documents\\DataScience task\\Gender_Age.xlsx")
_data0['Age' + 1]
I wanted to change the element values from column 'Age', imagine if I wanted to increase the column elements from 'Age' by 1, how do i do that? (With Number of Children as well)
The output I wanted:
First Name Last Name Age Number of Children
0 Kimberly Watson 36 2
1 Victor Wilson 35 6
2 Adrian Elliott 35 2
3 Richard Bailey 36 5
4 Blake Roberts 35 6
Original output:
First Name Last Name Age Number of Children
0 Kimberly Watson 24 1
1 Victor Wilson 23 5
2 Adrian Elliott 23 1
3 Richard Bailey 24 4
4 Blake Roberts 23 5
Try:
df['Age'] = df['Age'] - 12
df['Number of Children'] = df['Number of Children'] - 1
The below is my dataframe :
Sno Name Region Num
0 1 Rubin Indore 79744001550
1 2 Rahul Delhi 89824304549
2 3 Rohit Noida 91611611478
3 4 Chirag Delhi 85879761557
4 5 Shan Bharat 95604535786
5 6 Jordi Russia 80777784005
6 7 El Russia 70008700104
7 8 Nino Spain 87707101233
8 9 Mark USA 98271377772
9 10 Pattinson Hawk Eye 87888888889
Retrieve the numbers and store it region wise from the given CSV file.
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
I am getting the results, but I want to achieve the data by the use of dictionary in python. Can I use it?
IIUC, you can use groupby, apply the list aggregation then use to_dict:
data.groupby('Region')['Num'].apply(list).to_dict()
[out]
{'Bharat': [95604535786],
'Delhi': [89824304549, 85879761557],
'Hawk Eye': [87888888889],
'Indore': [79744001550],
'Noida': [91611611478],
'Russia': [80777784005, 70008700104],
'Spain': [87707101233],
'USA': [98271377772]}
I have the following dataframe with names of people and their abbreviation. The aim is to perform name disambiguation:
Names Abb
0 Michaele Frendu [Mic, Fre]
1 Lucam Zamit [Luc, Zam]
2 magistro Johanne Luckys [Joh, Luc]
3 Albano Fava [Alb, Fav]
4 Augustino Bagliu [Aug, Bag]
5 Lucas Zamit [Luc, Zam]
6 Jngabellavit [Jng]
7 Micheli Frendu [Mic, Fre]
8 Luce [Luc]
9 Far [Far]
Can I group by list ie: row 1, 7 and row 1,5. Later on I was going to do something similar with just the first names.
If want groupby list, is necessary convert column to tuples first:
def func(x):
print (x)
#some code
return x
df1 = df.groupby(df['Abb'].apply(tuple)).apply(func)
Names Abb
3 Albano Fava [Alb, Fav]
Names Abb
3 Albano Fava [Alb, Fav]
Names Abb
4 Augustino Bagliu [Aug, Bag]
Names Abb
9 Far [Far]
Names Abb
6 Jngabellavit [Jng]
Names Abb
2 magistro Johanne Luckys [Joh, Luc]
Names Abb
8 Luce [Luc]
Names Abb
1 Lucam Zamit [Luc, Zam]
5 Lucas Zamit [Luc, Zam]
Names Abb
0 Michaele Frendu [Mic, Fre]
7 Micheli Frendu [Mic, Fre]
Or map:
df.groupby(df['Abb'].map(tuple)).do_something
I do this because list aren't hash-able objects
I was going through this question where Ted Petrou explains the difference between .transform and .apply
This is the DataFrame used
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'],
'a':[4,5,1,3], 'b':[6,10,3,11]})
State a b
0 Texas 4 6
1 Texas 5 10
2 Florida 1 3
3 Florida 3 11
Function inspect is defined
def inspect(x):
print(x)
When I call inspect function using apply, I get 3 dataframes instead of 2
df.groupby('State').apply(lambda x:inspect(x))
State a b
2 Florida 1 3
3 Florida 3 11
State a b
2 Florida 1 3
3 Florida 3 11
State a b
0 Texas 4 6
1 Texas 5 10
Why am I getting 3 dataframes, instead of 2 while printing ? I really want to know how apply function works?
Thanks in advance.
From the docs:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
I have a dataframe with two columns one is Date and the other one is Location(Object) datatype, below is the format of Location columns with values :
Date Location
1 07/12/1912 AtlantiCity, New Jersey
2 08/06/1913 Victoria, British Columbia, Canada
3 09/09/1913 Over the North Sea
4 10/17/1913 Near Johannisthal, Germany
5 03/05/1915 Tienen, Belgium
6 09/03/1915 Off Cuxhaven, Germany
7 07/28/1916 Near Jambol, Bulgeria
8 09/24/1916 Billericay, England
9 10/01/1916 Potters Bar, England
10 11/21/1916 Mainz, Germany
my requirement is to split the Location by "," separator and keep only the second part of it (ex. New Jersey, Canada, Germany, England etc..) in the Location column. I also have to check if its only a single element (values with single element having no ",")
Is there a way I can do it with predefined method without looping each and every row ?
Sorry if the question is off the standard as I am new to Python and still learning.
A straight forward way is to apply the split method to each element of the column and pick up the last one:
df.Location.apply(lambda x: x.split(",")[-1])
1 New Jersey
2 Canada
3 Over the North Sea
4 Germany
5 Belgium
6 Germany
7 Bulgeria
8 England
9 England
10 Germany
Name: Location, dtype: object
To check if each cell has only one element we can use str.contains method on the column:
df.Location.str.contains(",")
1 True
2 True
3 False
4 True
5 True
6 True
7 True
8 True
9 True
10 True
Name: Location, dtype: bool
We could try with str.extract
print(df['Location'].str.extract(r'([^,]+$)'))
#0 New Jersey
#1 Canada
#2 Over the North Sea
#3 Germany
#4 Belgium
#5 Germany
#6 Bulgeria
#7 England
#8 England
#9 Germany