Python algorithm to find the full names in a text - python

I am trying to put a simple algorithm to return first name and last name from a list containing a mix of first name and last names. For instance from the list
l = ['John May', ' May', 'John', 'John Smith','Jack', 'John','May Smith', 'Sandra', 'Tim John','Simon, 'Tim Sandra', 'Sandra Smith']
I would like to do the following:
If there is a single first name or last name in the text return only the full name containing that single first name or last name
If there is no full name return that single first name and last name
I wrote the following code to achieve that but I have only tested on a few test cases. I was wondering if someone can tell how to make this code more efficient and notice any issues with it.
l = ['John May', ' May', 'John', 'John Smith', 'John','May Smith', 'Sandra', 'Tim John', 'Tim Sandra', 'Sandra Smith', 'Simon', 'jack']
def get_unique_full_name(l):
t = list(set(l))
print(t)
person = []
for i in range(0,len(t)):
for j in range(0,len(t)):
if t[i] in t[j].split() or t[j] in t[i].split():
if (t[i] != t[j]):
temp = t[i] if len(t[i]) == max(len(t[i]),len(t[j])) else t[j]
person.append(temp)
print(get_unique_full_name(l))
returns:(Expected output)
['John', 'May Smith', ' May', 'Sandra', 'Tim Sandra', 'John May', 'Sandra Smith', 'Simon', 'Tim John', 'jack', 'John Smith']

Related

List of dictionaries into one dictionary with condition

I have a list of dictionaries:
foo = [{'name':'John Doe', 'customer':'a'},
{'name':'John Doe', 'customer':'b'},
{'name':'Jenny Wang', 'customer':'c'},
{'name':'Mary Rich', 'customer': None}
]
Is there a way to get the value of the first key and set it as the new key and the value of the new dict is the value of the 2nd key.
Expected result:
{'John Doe':['a', 'b'], 'Jenny Wang':['c'], 'Mary Rich':[]}
You could use dict.setdefault. The idea is initialize an empty list as a value for each key. Then check if a name already exists as a key and append non-None values to it (is not None check is necessary because if it's left out, other not-truthy values (such as False) may get left out; thanks #Grismar)
out = {}
for d in foo:
out.setdefault(d['name'], [])
if d['customer'] is not None:
out[d['name']].append(d['customer'])
Output:
{'John Doe': ['a', 'b'], 'Jenny Wang': ['c'], 'Mary Rich': []}
#enke answer is crystal and clear but adding my answer in case it
helps somehow.
A little different implementation could be:
foo = [{'name':'John Doe', 'customer':'a'},
{'name':'John Doe', 'customer':'b'},
{'name':'Jenny Wang', 'customer':'c'},
{'name':'Mary Rich', 'customer': None}
]
new_dict = dict()
for fo in foo:
if fo['name'] not in new_dict:
if fo['customer'] is None:
new_dict[fo['name']] = []
else:
new_dict[fo['name']] = [fo['customer']]
else:
if fo['customer'] is None:
new_dict[fo['name']].append()
else:
new_dict[fo['name']].append(fo['customer'])
print(new_dict)
Output
{'John Doe': ['a', 'b'], 'Jenny Wang': ['c'], 'Mary Rich': []}
There is a function in itertools called groupby. It splits your input list on a criteria you provide. It can then look like that.
from itertools import groupby
foo = [{'name':'John Doe', 'customer':'a'},
{'name':'John Doe', 'customer':'b'},
{'name':'Jenny Wang', 'customer':'c'},
{'name':'Mary Rich', 'customer': None}
]
def func_group(item):
return item['name']
def main():
for key, value in groupby(foo, func_group):
print(key)
print(list(value))
That leads not completely to your expected output but comes close:
John Doe
[{'name': 'John Doe', 'customer': 'a'}, {'name': 'John Doe', 'customer': 'b'}]
Jenny Wang
[{'name': 'Jenny Wang', 'customer': 'c'}]
Mary Rich
[{'name': 'Mary Rich', 'customer': None}]
(You now could apply it two times and get your desired output. I just showed the prinicple here :-) )

How to create a dictionary assigning values from a list and generate the same key for each one

I have a list and I'd like that each item of the list becomes the value of a key that doesn't exist yet but should be created.
This is the list:
['King', 'President', 'VP', ' 2nd VP', '3rd VP']
The desired output must be like this:
[{'title':'King'}, {'title':'President'}, {'title':'VP'}, {'title':'2nd VP'}, {'title':'3rd VP'}]
Thanks for your support
You can do so with list comprehension
titles = ['King', 'President', 'VP', ' 2nd VP', '3rd VP']
print([{'title': title} for title in titles])
# output
[{'title': 'King'}, {'title': 'President'}, {'title': 'VP'}, {'title': ' 2nd VP'}, {'title': '3rd VP'}]
You can do that using List Comprehension
lst = ['King', 'President', 'VP', ' 2nd VP', '3rd VP']
d_lst = [{'title': v} for v in lst]
d_lst = [{'title': 'King'}, {'title': 'President'}, {'title': 'VP'}, {'title': ' 2nd VP'}, {'title': '3rd VP'}]
Like this?:
liste = ['King', 'President', 'VP', ' 2nd VP', '3rd VP']
for i in range(0, 5):
liste[i] = {'title':liste[i]}

Join parts of lists before and after & character

I am trying to get a list, check if it contains the '&' character, then join the data before and after that character depending on where it is. The '&' position will not always be the same.
Lets say i have a list
_list = ['John', 'Adams', '&', 'George', 'Washington']
I want to get the values before and after the ampersand and store them as a string to a variable.
name_one = "John Adams"
name_two = "George Washington"
Keep in mind this would have to be dynamic in that i need to be able to get all of the data before and after no matter how many indices there are
_list = ['John', 'Adams', 'Jr.', '&', 'George', 'Washington']
Would return
name_one = "John Adams Jr."
name_two = "George Washington"
You can use list.index to find the index of the first occurence of'&' and then slice before and after that index.
def get_names(lst):
try:
index = lst.index('&')
except ValueError:
... # return some default value if `'&'` is not in you list
return ' '.join(lst[:index]), ' '.join(lst[index + 1:])
lst = ['John', 'Adams', 'Jr.', '&', 'George', 'Washington']
name_one, name_two = get_names(lst)
name_one # 'John Adams Jr.'
name_two # 'George Washington'
Since you are going to combine the partial lists into strings, anyway, why not first join all the pieces and then split?
_list = ['John', 'Adams', 'Jr.', '&', 'George', 'Washington']
name_one, name_two = " ".join(_list).split(" & ")
print(name_one, name_two, sep=", ")
#John Adams Jr., George Washington
You can even process more than two parts using the same expression:
_list = ['John', 'Adams', '&', 'George', 'Washington', '&', 'Ben', 'Franklin']
name_one, name_two, *more_names = " ".join(_list).split(" & ")
print(name_one, name_two, more_names, sep=", ")
#John Adams, George Washington, ['Ben Franklin']
Here is a simple solution:
l = ['John', 'Adams', '&', 'George', 'Washington']
ind = l.index('&')
name_one = ' '.join(l[:ind])
name_two = ' '.join(l[ind+1:])
Needless to say, you should be careful about the list not containing the '&' character or containing multiple instances of it.
You can use itertools.groupby:
import itertools
l = [['John', 'Adams', '&', 'George', 'Washington'], ['John', 'Adams', 'Jr.', '&', 'George', 'Washington']]
for i in l:
a, _, b = [' '.join(b) for a, b in itertools.groupby(i, key=lambda x:x=='&')]
print(a, b)
Output:
('John Adams', 'George Washington')
('John Adams Jr.', 'George Washington')
you can try this method:
data22=['John', 'Adams', '&', 'George', 'Washington','john','&','paul','&','and','hi']
track=[0]+[j+1 for j,i in enumerate(data22) if i=='&']
for i in range(0,len(track)):
if len(track[i:i+2])==2:
real_data=data22[track[i:i+2][0]:track[i:i+2][1]]
print(" ".join(real_data[:-1]))
else:
print(" ".join(data22[track[i:i+2][0]:]))
output:
John Adams
George Washington john
paul
and hi

Change a list to a string with specific formatting

Let's say that I have a list of names:
names = ['john', 'george', 'ringo', 'paul']
And need to get a string output like:
john', 'george', 'ringo', 'paul
(Note that the missing quote at the beginning and at the end is on purpose)
Is there an easier way to do this than
new_string=''
for x in names:
new_string = new_string + x + "', '"
I know something like that will work, however the real names list will be very very (very) big and was wondering if there is a nicer way to do this.
You can simply use str.join:
>>> names = ['john', 'george', 'ringo', 'paul']
>>> print("', '".join(names))
john', 'george', 'ringo', 'paul
>>>
may be bad way to do it, just wana share it :
>>> names = ['john', 'george', 'ringo', 'paul']
>>> print(str(names)[2:-2])
john', 'george', 'ringo', 'paul

How to sort a list alphabetically by the second word in a string

If I have a list and I want to keep adding lines to it and sorting them alphabetically by their last name, how could this be done?
Sorted only seems to rearrange them by the first letter of the string.
line = "James Edward" #Example line
linesList.append("".join(line)) #Add it to a list
linesList = sorted(linesList) #Sort alphabetically
linesList.sort(key=lambda s: s.split()[1])
More info: https://wiki.python.org/moin/HowTo/Sorting#Key_Functions
If you want fully correct alphabetization (sorted by first name when the last name is the same), you can take advantage of the fact that Python sorting is stable. If you first sort by the default key:
lst.sort()
and then sort by last name:
lst.sort(key=lambda n: n.split()[1])
then the entries with the same last name will wind up in the same order that the first sort put them in - which will be correct.
You can also do this all at once with sorted:
linesList = sorted(sorted(linesList), key=lambda n: n.split()[1])
Assuming the names are in the format FirstName<whitespace>LastName, you can use the key parameter of sorted:
>>> lst = ['Bob D', 'Bob A', 'Bob C', 'Bob B']
>>> lst = sorted(lst, key=lambda x: x.split()[1])
>>> lst
['Bob A', 'Bob B', 'Bob C', 'Bob D']
>>>
The same principle applies to list.sort:
>>> lst = ['Bob D', 'Bob A', 'Bob C', 'Bob B']
>>> lst.sort(key=lambda x: x.split()[1])
>>> lst
['Bob A', 'Bob B', 'Bob C', 'Bob D']
>>>
Also, if you want them, here is a reference on lambda and one on str.split.
In case you have three or n names and you want to sort it always by the last part of the name you can do:
names = ["John Mc Karter", "John Oliver", "Max Raiden", "Naruto Ho Uzumaki"]
print(names.sort(key=lambda x:x.split()[-1]))
>>>['John Mc Karter', 'John Oliver', 'Max Raiden', 'Naruto Ho Uzumaki']

Categories

Resources