Can I use python 're' to parse complex human names?

Can I use python 're' to parse complex human names? - python

So one of my major pain points is name comprehension and piecing together household names & titles. I have a 80% solution with a pretty massive regex I put together this morning that I probably shouldn't be proud of (but am anyway in a kind of sick way) that matches the following examples correctly:
John Jeffries
John Jeffries, M.D.
John Jeffries, MD
John Jeffries and Jim Smith
John and Jim Jeffries
John Jeffries & Jennifer Wilkes-Smith, DDS, MD
John Jeffries, CPA & Jennifer Wilkes-Smith, DDS, MD
John Jeffries, C.P.A & Jennifer Wilkes-Smith, DDS, MD
John Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD
John Jeffries M.D. and Jennifer Holmes CPA
John Jeffries M.D. & Jennifer Holmes CPA
The regex matcher looks like this:
(?P<first_name>\S*\s*)?(?!and\s|&\s)(?P<last_name>[\w-]*\s*)(?P<titles1>,?\s*(?!and\s|&\s)[\w\.]*,*\s*(?!and\s|&\s)[\w\.]*)?(?P<connector>\sand\s|\s*&*\s*)?(?!and\s|&\s)(?P<first_name2>\S*\s*)(?P<last_name2>[\w-]*\s*)?(?P<titles2>,?\s*[\w\.]*,*\s*[\w\.]*)?
(wtf right?)
For convenience: http://www.pyregex.com/
So, for the example:
'John Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD'
the regex results in a group dict that looks like:
connector: &
first_name: John
first_name2: Jennifer
last_name: Jeffries
last_name2: Wilkes-Smith
titles1: , C.P.A., MD
titles2: , DDS, MD
I need help with the final step that has been tripping me up, comprehending possible middle names.
Examples include:
'John Jimmy Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD'
'John Jeffries, C.P.A., MD & Jennifer Jenny Wilkes-Smith, DDS, MD'
Is this possible and is there a better way to do this without machine learning? Maybe I can use nameparser (discovered after I went down the regex rabbit hole) instead with some way to determine whether or not there are multiple names? The above matches 99.9% of my cases so I feel like it's worth finishing.
TLDR: I can't figure out if I can use some sort of lookahead or lookbehind to make sure that the possible middle name only matches if
there is a last name after it.
Note: I don't need to parse titles like Mr. Mrs. Ms., etc., but I suppose that can be added in the same manner as middle names.
Solution Notes: First, follow Richard's advice and don't do this. Second, investigate NLTK or use/contribute to nameparser for a more robust solution if necessary.

Regular expressions like this are the work of the Dark One.
Who, looking at your code later, will be able to understand what is going on? Will you even?
How will you test all of the possible edge cases?
Why have you chosen to use a regular expression at all? If the tool you are using is so difficult to work with, it suggests that maybe another tool would be better.
Try this:
import re
examples = [
"John Jeffries",
"John Jeffries, M.D.",
"John Jeffries, MD",
"John Jeffries and Jim Smith",
"John and Jim Jeffries",
"John Jeffries & Jennifer Wilkes-Smith, DDS, MD",
"John Jeffries, CPA & Jennifer Wilkes-Smith, DDS, MD",
"John Jeffries, C.P.A & Jennifer Wilkes-Smith, DDS, MD",
"John Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD",
"John Jeffries M.D. and Jennifer Holmes CPA",
"John Jeffries M.D. & Jennifer Holmes CPA",
'John Jimmy Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD',
'John Jeffries, C.P.A., MD & Jennifer Jenny Wilkes-Smith, DDS, MD'
]
def IsTitle(inp):
return re.match('^([A-Z]\.?)+$',inp.strip())
def ParseName(name):
#Titles are separated from each other and from names with ","
#We don't need these, so we remove them
name = name.replace(',',' ')
#Split name and titles on spaces, combining adjacent spaces
name = name.split()
#Build an output object
ret_name = {"first":None, "middle":None, "last":None, "titles":[]}
#First string is always a first name
ret_name['first'] = name[0]
if len(name)>2: #John Johnson Smith/PhD
if IsTitle(name[2]): #John Smith PhD
ret_name['last'] = name[1]
ret_name['titles'] = name[2:]
else: #John Johnson Smith, PhD, MD
ret_name['middle'] = name[1]
ret_name['last'] = name[2]
ret_name['titles'] = name[3:]
elif len(name) == 2: #John Johnson
ret_name['last'] = name[1]
return ret_name
def CombineNames(inp):
if not inp[0]['last']:
inp[0]['last'] = inp[1]['last']
def ParseString(inp):
inp = inp.replace("&","and") #Names are combined with "&" or "and"
inp = re.split("\s+and\s+",inp) #Split names apart
inp = map(ParseName,inp)
CombineNames(inp)
return inp
for e in examples:
print e
print ParseString(e)
Output:
John Jeffries
[{'middle': None, 'titles': [], 'last': 'Jeffries', 'first': 'John'}]
John Jeffries, M.D.
[{'middle': None, 'titles': ['M.D.'], 'last': 'Jeffries', 'first': 'John'}]
John Jeffries, MD
[{'middle': None, 'titles': ['MD'], 'last': 'Jeffries', 'first': 'John'}]
John Jeffries and Jim Smith
[{'middle': None, 'titles': [], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': [], 'last': 'Smith', 'first': 'Jim'}]
John and Jim Jeffries
[{'middle': None, 'titles': [], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': [], 'last': 'Jeffries', 'first': 'Jim'}]
John Jeffries & Jennifer Wilkes-Smith, DDS, MD
[{'middle': None, 'titles': [], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
John Jeffries, CPA & Jennifer Wilkes-Smith, DDS, MD
[{'middle': None, 'titles': ['CPA'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
John Jeffries, C.P.A & Jennifer Wilkes-Smith, DDS, MD
[{'middle': None, 'titles': ['C.P.A'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
John Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD
[{'middle': None, 'titles': ['C.P.A.', 'MD'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
John Jeffries M.D. and Jennifer Holmes CPA
[{'middle': None, 'titles': ['M.D.'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['CPA'], 'last': 'Holmes', 'first': 'Jennifer'}]
John Jeffries M.D. & Jennifer Holmes CPA
[{'middle': None, 'titles': ['M.D.'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['CPA'], 'last': 'Holmes', 'first': 'Jennifer'}]
John Jimmy Jeffries, C.P.A., MD & Jennifer Wilkes-Smith, DDS, MD
[{'middle': 'Jimmy', 'titles': ['C.P.A.', 'MD'], 'last': 'Jeffries', 'first': 'John'}, {'middle': None, 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
John Jeffries, C.P.A., MD & Jennifer Jenny Wilkes-Smith, DDS, MD
[{'middle': None, 'titles': ['C.P.A.', 'MD'], 'last': 'Jeffries', 'first': 'John'}, {'middle': 'Jenny', 'titles': ['DDS', 'MD'], 'last': 'Wilkes-Smith', 'first': 'Jennifer'}]
This took less than fifteen minutes and, at each stage, the logic is clear and the program can be debugged in pieces. While one-liners are cute, clarity and testability should take precedence.

Related

Find out the most popular male/famale name from dataframe

DaraFrame
Decision which came to my mind is:
dataset['Name'].loc[dataset['Sex'] == 'female'].value_counts().idxmax()
But here is not such ordinary decision because there are names of female's husband after Mrs and i need to somehowes split it
Input data:
df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss. Laina', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'Allen, Mr. William Henry', 'Moran, Mr. James', 'McCarthy, Mr. Timothy J', 'Palsson, Master. Gosta Leonard', 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)', 'Nasser, Mrs. Nicholas (Adele Achem)'],
'Sex': ['male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female'],
})
Task 4: Name the most popular female name on the ship.
'some code'
Output: Anna #The most popular female name
Task 5: Name the most popular male name on the ship.
'some code'
Output: Wilhelm #The most popular male name

Quick and dirty would be something like:
from collections import Counter
# Random list of names
your_lst = ["Mrs Braun", "Allen, Mr. Timothy J", "Allen, Mr. Henry William"]
# Split names by space, and flatten the list.
your_lst_flat = [item for sublist in [x.split(" ") for x in your_lst ] for item in sublist]
# Count occurrences. With this you will get a count of all the values, including Mr and Mrs. But you can just ignore these.
Counter(your_lst_flat).most_common()

IIUC, you can use a regex to extract either the first name, or if Mrs. the name after the parentheses:
s = df['Name'].str.extract(r'((?:(?<=Mr. )|(?<=Miss. )|(?<=Master. ))\w+|(?<=\()\w+)',
expand=False)
s.groupby(df['Sex']).value_counts()
output:
Sex Name
female Adele 1
Elisabeth 1
Florence 1
Laina 1
Lily 1
male Gosta 1
James 1
Owen 1
Timothy 1
William 1
Name: Name, dtype: int64
regex demo
once you have s, to get the most frequent female name(s):
s[df['Sex'].eq('female')].mode()

Remove unwanted parts from strings in Dataframe

I am looking for an efficient way to remove unwanted parts from strings in a DataFrame column.
My dataframe:
Passengers
1 Sally Muller, President, Mark Smith, Vicepresident, John Doe, Chief of Staff
2 Sally Muller, President, Mark Smith, Vicepresident
3 Sally Muller, President, Mark Smith, Vicepresident, John Doe, Chief of Staff
4 Mark Smith, Vicepresident, John Doe, Chief of Staff, Peter Parker, Special Effects
5 Sally Muller, President, John Doe, Chief of Staff, Peter Parker, Special Effects, Lydia Johnson, Vice Chief of Staff
...
desired form of df:
Passengers
1 Sally Muller, Mark Smith, John Doe
2 Sally Muller, Mark Smith
3 Sally Muller, Mark Smith, John Doe
4 Mark Smith, John Doe, Peter Parker
5 Sally Muller, John Doe, Peter Parker, Lydia Johnson
...
Up to now I did it with endless handmade copy/paste regex list:
df = df.replace(r'President,','', regex=True)
df = df.replace(r'Vicepresident,','', regex=True)
df = df.replace(r'Chief of Staff,','', regex=True)
df = df.replace(r'Special Effects,','', regex=True)
df = df.replace(r'Vice Chief of Staff,','', regex=True)
...
Is there a more comfortable way to do this?
Edit
More accurate example of original df:
Passengers
1 Sally Muller, President, EL Mark Smith, John Doe, Chief of Staff, Peter Gordon, Director of Central Command
2 Sally Muller, President, EL Mark Smith, Vicepresident
3 Sally Muller, President, EL Mark Smith, Vicepresident, John Doe, Chief of Staff, Peter Gordon, Dir CC
4 Mark Smith, Vicepresident, John Doe, Chief of Staff, Peter Parker, Special Effects
5 President Sally Muller, John Doe Chief of Staff, Peter Parker, Special Effects, Lydia Johnson , Vice Chief of Staff
...
desired form of df:
Passengers
1 Sally Muller, Mark Smith, John Doe, Peter Gordon
2 Sally Muller, Mark Smith
3 Sally Muller, Mark Smith, John Doe, Peter Gordon
4 Mark Smith, John Doe, Peter Parker
5 Sally Muller, John Doe, Peter Parker, Lydia Johnson
...
Up to now I did it with endless handmade copy/paste regex list:
df = df.replace(r'President','', regex=True)
df = df.replace(r'Director of Central Command,','', regex=True)
df = df.replace(r'Dir CC','', regex=True)
df = df.replace(r'Vicepresident','', regex=True)
df = df.replace(r'Chief of Staff','', regex=True)
df = df.replace(r'Special Effects','', regex=True)
df = df.replace(r'Vice Chief of Staff','', regex=True)
...
messy output is like:
Passengers
1 Sally Muller, , Mark Smith, John Doe, , Peter Gordon,
2 Sally Muller, Mark Smith,
3 Sally Muller, Mark Smith,, John Doe, Peter Gordon
4 Mark Smith, John Doe, Peter Parker
5 Sally Muller,, John Doe, Peter Parker , Lydia Johnson,
...

If every passenger has their title, then you can use str.split + explode, then select every second item starting from the first item, then groupby the index and join back:
out = df['Passengers'].str.split(',').explode()[::2].groupby(level=0).agg(', '.join)
or str.split + explode and apply a lambda that does the selection + join
out = df['Passengers'].str.split(',').apply(lambda x: ', '.join(x[::2]))
Output:
0 Sally Muller, Mark Smith, John Doe
1 Sally Muller, Mark Smith
2 Sally Muller, Mark Smith, John Doe
3 Mark Smith, John Doe, Peter Parker
4 Sally Muller, John Doe, Peter Parker, Lydia...
Edit:
If not everyone has a title, then you can create a set of titles and split and filter out the titles. If the order of the names don't matter in each row, then you can use set difference and cast each set to a list in a list comprehension:
titles = {'President', 'Vicepresident', 'Chief of Staff', 'Special Effects', 'Vice Chief of Staff'}
out = pd.Series([list(set(x.split(', ')) - titles) for x in df['Passengers']])
If order matters, then you can use a nested list comprehension:
out = pd.Series([[i for i in x.split(', ') if i not in titles] for x in df['Passengers']])

This is one case where apply is actually faster that explode:
df2 = df['Passengers'].apply(lambda x: ', '.join(x.split(', ')[::2])) #.to_frame() # if dataframe needed
output:
Passengers
0 Sally Muller, Mark Smith, John Doe
1 Sally Muller, Mark Smith
2 Sally Muller, Mark Smith, John Doe
3 Mark Smith, John Doe, Peter Parker
4 Sally Muller, John Doe, Peter Parker, Lydia Jo...

We can create a full regex pattern match on every string you need to remove and replace.
This can handle situations were the passengers will not have a title.
df2 = df['Passengers'].str.replace("(President)|(Vicepresident)|(Chief of Staff)|(Special Effects)|(Vice Chief of Staff)", "",regex=True).replace("( ,)", "", regex=True).str.strip().str.rstrip(",")

json_normalize with multiple record paths

I'm using the example given in the json_normalize documentation given here pandas.json_normalize — pandas 1.0.3 documentation, I can't unfortunately paste my actual JSON but this example works. Pasted from the documentation:
data = [{'state': 'Florida',
'shortname': 'FL',
'info': {'governor': 'Rick Scott'},
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': {'governor': 'John Kasich'},
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
result = json_normalize(data, 'counties', ['state', 'shortname',
['info', 'governor']])
result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
What if the JSON was the one below instead where info is an array instead of a dict:
data = [{'state': 'Florida',
'shortname': 'FL',
'info': [{'governor': 'Rick Scott'},
{'governor': 'Rick Scott 2'}],
'counties': [{'name': 'Dade', 'population': 12345},
{'name': 'Broward', 'population': 40000},
{'name': 'Palm Beach', 'population': 60000}]},
{'state': 'Ohio',
'shortname': 'OH',
'info': [{'governor': 'John Kasich'},
{'governor': 'John Kasich 2'}],
'counties': [{'name': 'Summit', 'population': 1234},
{'name': 'Cuyahoga', 'population': 1337}]}]
How would you get the following output using json_normalize:
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Dade 12345 Florida FL Rick Scott 2
2 Broward 40000 Florida FL Rick Scott
3 Broward 40000 Florida FL Rick Scott 2
4 Palm Beach 60000 Florida FL Rick Scott
5 Palm Beach 60000 Florida FL Rick Scott 2
6 Summit 1234 Ohio OH John Kasich
7 Summit 1234 Ohio OH John Kasich 2
8 Cuyahoga 1337 Ohio OH John Kasich
9 Cuyahoga 1337 Ohio OH John Kasich 2
Or if there is another way to do it, please do let me know.

json_normalize is designed for convenience rather than flexibility. It can't handle all forms of JSON out there (and JSON is just too flexible to write a universal parser for).
How about calling json_normalize twice and then merge. This assumes each state only appear once in your JSON:
counties = json_normalize(data, 'counties', ['state', 'shortname'])
governors = json_normalize(data, 'info', ['state'])
result = counties.merge(governors, on='state')

How to merge the list of dictionary having keys and values [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 3 years ago.
I have list of dictionary below and merge all the dictionaries into one
r = [{'Name': 'Dr. Tajwar Aamir MD',
'Specialised and Location': 'Pediatrics, Princeton, NJ'},
{'Name': 'Dr. Bernard Aaron', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Joseph Aaron MD',
'Specialised and Location': 'Internal Medicine, Short Hills, NJ'},
{'Name': 'Dr. Michael Aaron DO',
'Specialised and Location': 'Cardiology, Neptune, NJ'}]
result = {"Name": [], "Specialised and Location": [] for i in r}
#result["Name"].append(Name)
#result["Specialised and Location"].append(Specialised and Location)
Desired Output
{"Name":['Dr. Tajwar Aamir MD','Dr. Bernard Aaron','Dr. Joseph Aaron MD','Dr. Michael Aaron DO'],
"Specialised and Location":['Pediatrics, Princeton, NJ','Health','Internal Medicine, Short Hills, NJ','Cardiology, Neptune, NJ']}

Use a simple for-loop
Ex:
r = [{'Name': 'Dr. Tajwar Aamir MD',
'Specialised and Location': 'Pediatrics, Princeton, NJ'},
{'Name': 'Dr. Bernard Aaron', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Joseph Aaron MD',
'Specialised and Location': 'Internal Medicine, Short Hills, NJ'},
{'Name': 'Dr. Michael Aaron DO',
'Specialised and Location': 'Cardiology, Neptune, NJ'}]
result = {"Name": [], "Specialised and Location": []}
for i in r:
result["Name"].append(i["Name"])
result["Specialised and Location"].append(i["Specialised and Location"])
print(result)
or
result = {"Name": list(map(lambda d: d['Name'], r)), "Specialised and Location": list(map(lambda d: d['Specialised and Location'], r))}
Output:
{'Name': ['Dr. Tajwar Aamir MD',
'Dr. Bernard Aaron',
'Dr. Joseph Aaron MD',
'Dr. Michael Aaron DO'],
'Specialised and Location': ['Pediatrics, Princeton, NJ',
'Health',
'Internal Medicine, Short Hills, NJ',
'Cardiology, Neptune, NJ']}

r = [{'Name': 'Dr. Tajwar Aamir MD',
'Specialised and Location': 'Pediatrics, Princeton, NJ'},
{'Name': 'Dr. Bernard Aaron', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Joseph Aaron MD',
'Specialised and Location': 'Internal Medicine, Short Hills, NJ'},
{'Name': 'Dr. Michael Aaron DO',
'Specialised and Location': 'Cardiology, Neptune, NJ'}]
result={'Name':[x['Name'] for x in r], 'Specialised and Location':[x['Specialised and Location'] for x in r]}
print(result)
Output
{'Name': ['Dr. Tajwar Aamir MD', 'Dr. Bernard Aaron', 'Dr. Joseph Aaron MD', 'Dr. Michael Aaron DO'], 'Specialised and Location': ['Pediatrics, Princeton, NJ', 'Health', 'Internal Medicine, Short Hills, NJ', 'Cardiology, Neptune, NJ']}

you can try:
from collections import defaultdict
my_dict = defaultdict(list)
for k, v in (item for e in r for item in e.items()):
my_dict[k].append(v)
print(dict(my_dict))
# output:
# {'Name': ['Dr. Tajwar Aamir MD', 'Dr. Bernard Aaron', 'Dr. Joseph Aaron MD', 'Dr. Michael Aaron DO'], 'Specialised and Location': ['Pediatrics, Princeton, NJ', 'Health', 'Internal Medicine, Short Hills, NJ', 'Cardiology, Neptune, NJ']}

pretty nested dictionary as a table

Is there any way to pretty print in a table format a nested dictionary? My data structure looks like this;
data = {'01/09/16': {'In': ['Jack'], 'Out': ['Lisa', 'Tom', 'Roger', 'Max', 'Harry', 'Same', 'Joseph', 'Luke', 'Mohammad', 'Sammy']},
'02/09/16': {'In': ['Jack', 'Lisa', 'Rache', 'Allan'], 'Out': ['Lisa', 'Tom']},
'03/09/16': {'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'], 'Out': ['Lisa', 'Tom
And I'm trying to print it out something like this (the names are kept in one line). Note that the names are listed below one another:
+----------------------------------+-------------+-------------+-------------+
| Status | 01/09/16 | 02/09/16 | 03/09/16 |
+----------------------------------+-------------+-------------+-------------+
| In | Jack Tom Tom
| Lisa | Jack |
+----------------------------------+-------------+-------------+-------------+
| Out | Lisa
Tom | Jack | Lisa |
+----------------------------------+-------------+-------------+-------------+
I've tried using pandas with this code;
pd.set_option('display.max_colwidth', -1)
df = pd.DataFrame(role_assignment)
df.fillna('None', inplace=True)
print df
But the problem above is that pandas prints it like this (The names are printed in a single line and it doesn't look good, especially if there's a lot of names);
01/09/16 \
In [Jack]
Out [Lisa, Tom, Roger, Max, Harry, Same, Joseph, Luke, Mohammad, Sammy]
02/09/16 03/09/16
In [Jack, Lisa, Rache, Allan] [James, Jack, Nowel, Harry, Timmy]
Out [Lisa, Tom] [Lisa, Tom]
I prefer this but names listed below one another;
01/09/16 02/09/16 03/09/16
In [Jack] [Jack] [James]
Out [Lisa] [Lisa] [Lisa]
Is there a way to print it neater using pandas or another tool?

This is nonsense hackery and only for display purposes only.
data = {
'01/09/16': {
'In': ['Jack'],
'Out': ['Lisa', 'Tom', 'Roger',
'Max', 'Harry', 'Same',
'Joseph', 'Luke', 'Mohammad', 'Sammy']
},
'02/09/16': {
'In': ['Jack', 'Lisa', 'Rache', 'Allan'],
'Out': ['Lisa', 'Tom']
},
'03/09/16': {
'In': ['James', 'Jack', 'Nowel', 'Harry', 'Timmy'],
'Out': ['Lisa', 'Tom']
}
}
df = pd.DataFrame(data)
d1 = df.stack().apply(pd.Series).stack().unstack(1).fillna('')
d1.index.set_levels([''] * len(d1.index.levels[1]), level=1, inplace=True)
print(d1)
01/09/16 02/09/16 03/09/16
In Jack Jack James
Lisa Jack
Rache Nowel
Allan Harry
Timmy
Out Lisa Lisa Lisa
Tom Tom Tom
Roger
Max
Harry
Same
Joseph
Luke
Mohammad
Sammy

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can I use python 're' to parse complex human names? - python

Related

Find out the most popular male/famale name from dataframe

Remove unwanted parts from strings in Dataframe

json_normalize with multiple record paths

How to merge the list of dictionary having keys and values [duplicate]

pretty nested dictionary as a table

Categories

Resources