how to slice between two elements in a pandas series

how to slice between two elements in a pandas series - python

I have a Series containing a column with names and their nationalities in parenthesis.
I want this column to contain just the individuals nationality and without parenthesis, with the same index.
0 LOMBARDI Domingo (URU)
1 MACIAS Jose (ARG)
2 TEJADA Anibal (URU)
3 WARNKEN Alberto (CHI)
4 REGO Gilberto (BRA)
5 CRISTOPHE Henry (BEL)
6 MATEUCCI Francisco (URU)
7 MACIAS Jose (ARG)
8 LANGENUS Jean (BEL)
9 TEJADA Anibal (URU)
10 SAUCEDO Ulises (BOL)
I have tried using .split(' ')[2] to the series.
But found out "'Series' object has no attribute 'split'."

You need to use str accessor on series.
df.name.str.split('(').str[1].str[:-1]
Output:
0 URU
1 ARG
2 URU
3 CHI
4 BRA
5 BEL
6 URU
7 ARG
8 BEL
9 URU
10 BOL
Name: name, dtype: object

Using extract
s.str.extract('.*\((.*)\).*',expand=True)[0]
Out[463]:
0 URU
1 ARG
2 URU
3 CHI
Name: 0, dtype: object

Using slice. May not be optimal as it assumes the right side of the string is constant but it's another possible solution.
df.name.str.slice(start = -4).str[:-1]

Related

How do i increase an element value from column in Pandas?

Hello I have this Pandas code (look below) but turn out it give me this error: TypeError: can only concatenate str (not "int") to str
import pandas as pd
import numpy as np
import os
_data0 = pd.read_excel("C:\\Users\\HP\\Documents\\DataScience task\\Gender_Age.xlsx")
_data0['Age' + 1]
I wanted to change the element values from column 'Age', imagine if I wanted to increase the column elements from 'Age' by 1, how do i do that? (With Number of Children as well)
The output I wanted:
First Name Last Name Age Number of Children
0 Kimberly Watson 36 2
1 Victor Wilson 35 6
2 Adrian Elliott 35 2
3 Richard Bailey 36 5
4 Blake Roberts 35 6
Original output:
First Name Last Name Age Number of Children
0 Kimberly Watson 24 1
1 Victor Wilson 23 5
2 Adrian Elliott 23 1
3 Richard Bailey 24 4
4 Blake Roberts 23 5

Try:
df['Age'] = df['Age'] - 12
df['Number of Children'] = df['Number of Children'] - 1

Retrieve the numbers from the file corresponding to the given regions specified in the file

The below is my dataframe :
Sno Name Region Num
0 1 Rubin Indore 79744001550
1 2 Rahul Delhi 89824304549
2 3 Rohit Noida 91611611478
3 4 Chirag Delhi 85879761557
4 5 Shan Bharat 95604535786
5 6 Jordi Russia 80777784005
6 7 El Russia 70008700104
7 8 Nino Spain 87707101233
8 9 Mark USA 98271377772
9 10 Pattinson Hawk Eye 87888888889
Retrieve the numbers and store it region wise from the given CSV file.
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
I am getting the results, but I want to achieve the data by the use of dictionary in python. Can I use it?

IIUC, you can use groupby, apply the list aggregation then use to_dict:
data.groupby('Region')['Num'].apply(list).to_dict()
[out]
{'Bharat': [95604535786],
'Delhi': [89824304549, 85879761557],
'Hawk Eye': [87888888889],
'Indore': [79744001550],
'Noida': [91611611478],
'Russia': [80777784005, 70008700104],
'Spain': [87707101233],
'USA': [98271377772]}

Pandas dataframe group by list

I have the following dataframe with names of people and their abbreviation. The aim is to perform name disambiguation:
Names Abb
0 Michaele Frendu [Mic, Fre]
1 Lucam Zamit [Luc, Zam]
2 magistro Johanne Luckys [Joh, Luc]
3 Albano Fava [Alb, Fav]
4 Augustino Bagliu [Aug, Bag]
5 Lucas Zamit [Luc, Zam]
6 Jngabellavit [Jng]
7 Micheli Frendu [Mic, Fre]
8 Luce [Luc]
9 Far [Far]
Can I group by list ie: row 1, 7 and row 1,5. Later on I was going to do something similar with just the first names.

If want groupby list, is necessary convert column to tuples first:
def func(x):
print (x)
#some code
return x
df1 = df.groupby(df['Abb'].apply(tuple)).apply(func)
Names Abb
3 Albano Fava [Alb, Fav]
Names Abb
3 Albano Fava [Alb, Fav]
Names Abb
4 Augustino Bagliu [Aug, Bag]
Names Abb
9 Far [Far]
Names Abb
6 Jngabellavit [Jng]
Names Abb
2 magistro Johanne Luckys [Joh, Luc]
Names Abb
8 Luce [Luc]
Names Abb
1 Lucam Zamit [Luc, Zam]
5 Lucas Zamit [Luc, Zam]
Names Abb
0 Michaele Frendu [Mic, Fre]
7 Micheli Frendu [Mic, Fre]

Or map:
df.groupby(df['Abb'].map(tuple)).do_something
I do this because list aren't hash-able objects

How .apply works in Pandas?

I was going through this question where Ted Petrou explains the difference between .transform and .apply
This is the DataFrame used
df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'],
'a':[4,5,1,3], 'b':[6,10,3,11]})
State a b
0 Texas 4 6
1 Texas 5 10
2 Florida 1 3
3 Florida 3 11
Function inspect is defined
def inspect(x):
print(x)
When I call inspect function using apply, I get 3 dataframes instead of 2
df.groupby('State').apply(lambda x:inspect(x))
State a b
2 Florida 1 3
3 Florida 3 11
State a b
2 Florida 1 3
3 Florida 3 11
State a b
0 Texas 4 6
1 Texas 5 10
Why am I getting 3 dataframes, instead of 2 while printing ? I really want to know how apply function works?
Thanks in advance.

From the docs:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

Python: How to split a string column in a dataframe?

I have a dataframe with two columns one is Date and the other one is Location(Object) datatype, below is the format of Location columns with values :
Date Location
1 07/12/1912 AtlantiCity, New Jersey
2 08/06/1913 Victoria, British Columbia, Canada
3 09/09/1913 Over the North Sea
4 10/17/1913 Near Johannisthal, Germany
5 03/05/1915 Tienen, Belgium
6 09/03/1915 Off Cuxhaven, Germany
7 07/28/1916 Near Jambol, Bulgeria
8 09/24/1916 Billericay, England
9 10/01/1916 Potters Bar, England
10 11/21/1916 Mainz, Germany
my requirement is to split the Location by "," separator and keep only the second part of it (ex. New Jersey, Canada, Germany, England etc..) in the Location column. I also have to check if its only a single element (values with single element having no ",")
Is there a way I can do it with predefined method without looping each and every row ?
Sorry if the question is off the standard as I am new to Python and still learning.

A straight forward way is to apply the split method to each element of the column and pick up the last one:
df.Location.apply(lambda x: x.split(",")[-1])
1 New Jersey
2 Canada
3 Over the North Sea
4 Germany
5 Belgium
6 Germany
7 Bulgeria
8 England
9 England
10 Germany
Name: Location, dtype: object
To check if each cell has only one element we can use str.contains method on the column:
df.Location.str.contains(",")
1 True
2 True
3 False
4 True
5 True
6 True
7 True
8 True
9 True
10 True
Name: Location, dtype: bool

We could try with str.extract
print(df['Location'].str.extract(r'([^,]+$)'))
#0 New Jersey
#1 Canada
#2 Over the North Sea
#3 Germany
#4 Belgium
#5 Germany
#6 Bulgeria
#7 England
#8 England
#9 Germany

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to slice between two elements in a pandas series - python

You need to use str accessor on series. df.name.str.split('(').str[1].str[:-1] Output: 0 URU 1 ARG 2 URU 3 CHI 4 BRA 5 BEL 6 URU 7 ARG 8 BEL 9 URU 10 BOL Name: name, dtype: object

Using extract s.str.extract('.\((.)\).*',expand=True)[0] Out[463]: 0 URU 1 ARG 2 URU 3 CHI Name: 0, dtype: object

Using slice. May not be optimal as it assumes the right side of the string is constant but it's another possible solution. df.name.str.slice(start = -4).str[:-1]

Related

How do i increase an element value from column in Pandas?

Retrieve the numbers from the file corresponding to the given regions specified in the file

Pandas dataframe group by list

How .apply works in Pandas?

Python: How to split a string column in a dataframe?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to slice between two elements in a pandas series - python

You need to use str accessor on series. df.name.str.split('(').str[1].str[:-1] Output: 0 URU 1 ARG 2 URU 3 CHI 4 BRA 5 BEL 6 URU 7 ARG 8 BEL 9 URU 10 BOL Name: name, dtype: object

Using extract s.str.extract('.*\((.*)\).*',expand=True)[0] Out[463]: 0 URU 1 ARG 2 URU 3 CHI Name: 0, dtype: object

Using slice. May not be optimal as it assumes the right side of the string is constant but it's another possible solution. df.name.str.slice(start = -4).str[:-1]

Related

How do i increase an element value from column in Pandas?

Retrieve the numbers from the file corresponding to the given regions specified in the file

Pandas dataframe group by list

How .apply works in Pandas?

Python: How to split a string column in a dataframe?

Categories

Resources

Using extract s.str.extract('.\((.)\).*',expand=True)[0] Out[463]: 0 URU 1 ARG 2 URU 3 CHI Name: 0, dtype: object