Get proper list from list of unicode list - python

I have a list with a unicode string in a form of a list.
my_list = [u'[James, Williams, Kevin, Parker, Alex, Emma, Katie\xa0, Annie]']
I want a list which I am able to iterate such as;
name_list = [James, Williams, Kevin, Parker, Alex, Emma, Katie, Annie]
I have tried several possible solutions given here, but none of them worked in my case.
# Tried
name_list = name_list.encode('ascii', 'ignore').decode('utf-8')
#Gives unicode return type
# Tried
ast.literal_eval(name_list)
#Gives me invalid token error

Firstly, a list does not have a encode method, you have to apply any string methods on the item in the list.
Secondly, if you are looking at normalizing the string, you can use the normalize function from Python's unicodedata library, read more here, this removes the unwanted characters '\xa0' and will help you normalize any other characters.
Then instead of using eval which is generally unsafe, use a list comprehension to build a list:
import unicodedata
li = [u'[James, Williams, Kevin, Parker, Alex, Emma, Katie\xa0, Annie]']
inner_li = unicodedata.normalize("NFKD", li[0]) #<--- notice the list selection
#get only part of the string you want to convert into a list
new_li = [i.strip() for i in inner_li[1:-1].split(',')]
new_li
>> ['James', 'Williams', 'Kevin', 'Parker', 'Alex', 'Emma', 'Katie', 'Annie']
In your expected output, they are actually a list of variables, which unless declared before, will give you an error.

This is a good application for regular expressions:
import re
body = re.findall(r"\[\s*(.+)\s*]", my_list[0])[0] # extract the stuff in []s
names = re.split("\s*,\s*", body) # extract the names
#['James', 'Williams', 'Kevin', 'Parker', 'Alex', 'Emma', 'Katie', 'Annie']

import unicodedata
lst = [u'[James, Williams, Kevin, Parker, Alex, Emma, Katie\xa0, Annie]']
lst = unicodedata.normalize("NFKD", lst[0])
lst2 = lst[1:-1].split(", ") # remove open and close brackets
print(lst2)
output will be:
["James", "Williams", "Kevin", "Parker", "Alex", "Emma", "Katie ", "Annie"]
if you want to remove all spaces leading/trailing whitespaces:
lst3 = [i.strip() for i in lst2]
print(lst3)
output will be:
["James", "Williams", "Kevin", "Parker", "Alex", "Emma", "Katie", "Annie"]

Related

How do i split this list into pairs or single elements. Is there an easier way of doing this?

In python I have a list of names, however some have a second name and some do not, how would I split the list into names with surnames and names without?
I don't really know how to explain it so please look at the code and see if you can understand (sorry if I have worded it really badly in the title)
See code below :D
names = ("Donald Trump James Barack Obama Sammy John Harry Potter")
# the names with surnames are the famous ones
# names without are regular names
list = names.split()
# I want to separate them into a list of separate names so I use split()
# but now surnames like "Trump" are counted as a name
print("Names are:",list)
This outputs
['Donald', 'Trump', 'James', 'Barack', 'Obama', 'Sammy', 'John', 'Harry', 'Potter']
I would like it to output something like ['Donald Trump', 'James', 'Barack Obama', 'Sammy', 'John', 'Harry Potter']
Any help would be appreciated
As said in the comments, you need a list of famous names.
# complete list of famous people
US_PRESIDENTS = (('Donald', 'Trump'), ('Barack', 'Obama'), ('Harry', 'Potter'))
def splitfamous(namestring):
words = namestring.split()
# create all tuples of 2 adjacent words and compare them to US_PRESIDENTS
for i, name in enumerate(zip(words, words[1:])):
if name in US_PRESIDENTS:
words[i] = ' '.join(name)
words[i+1] = None
# remove empty fields and return the result
return list(filter(None, words))
names = "Donald Trump James Barack Obama Sammy John Harry Potter"
print(splitfamous(names))
The resulting list:
['Donald Trump', 'James', 'Barack Obama', 'Sammy', 'John', 'Harry Potter']

How can I extract names from a concatenated string using Python?

Suppose I have a string of concatenated names like so:
name.s = 'johnwilliamsfrankbrown'.
How do I go from here to a list of names and surnames ["john", "williams", "frank", "brown"]?
So far I only found pieces of code to extract words from non concatenated strings.
As timgeb noted in the comments, this is only possible if you already know which names you expect. Assuming that you have this information, you can extract them like this:
>>> import re
>>> names = ['john', 'frank', 'brown', 'williams']
>>> regex = '(' + '|'.join(names) + ')'
>>> separated_names = re.findall(regex, 'johnwilliamsfrankbrown')
>>> separated_names
['john', 'williams', 'frank', 'brown']

I'd like to disaggregate a list of strings to split on "/" and " and " to a list of unique strings

I have a list like ["Alex Smith", "John Jones/John Jones and Anna Conner", "James O'Brien"]. I'd like to convert it to a list of individual unique individuals: ["Alex Smith", "John Jones", "Anna Conner", "James O'Brien"]. I can use list(set(vector)) to get the uniqueness that I want, but the splitting is giving me a headache.
I looked at Flattening a shallow list in Python and it looked good, but it disggregated down to the indivual letters rather than the combination of first and last names.
Pick a delimiter, join on that delimiter, convert all delimiters to that one, split on that delimiter, then use set() as you were planning on to remove the duplicates:
l = ["Alex Smith", "John Jones/John Jones and Anna Conner", "James O'Brien"]
new_set = set('/'.join(l).replace(' and ', '/').split('/'))
Result:
>>> new_set
{"James O'Brien", 'Alex Smith', 'John Jones', 'Anna Conner'}

Python LOB to List

Using:
cur.execute(SQL)
response= cur.fetchall() //response is a LOB object
names = response[0][0].read()
i have following SQL response as String names:
'Mike':'Mike'
'John':'John'
'Mike/B':'Mike/B'
As you can see it comes formatted. It is actualy formatted like:\\'Mike\\':\\'Mike\\'\n\\'John\\'... and so on
in order to check if for example Mike is inside list at least one time (i don't care how many times but at least one time)
I would like to have something like that:
l = ['Mike', 'Mike', 'John', 'John', 'Mike/B', 'Mike/B'],
so i could simply iterate over the list and ask
for name in l:
'Mike' == name:
do something
Any Ideas how i could do that?
Many thanks
Edit:
When i do:
list = names.split()
I receive the list which is nearly how i want it, but the elements inside look still like this!!!:
list = ['\\'Mike\\':\\'Mike\\", ...]
names = ['\\'Mike\\':\\'Mike\\", ...]
for name in names:
if "Mike" in name:
print "Mike is here"
The \\' business is caused by mysql escaping the '
if you have a list of names try this:
my_names = ["Tom", "Dick", "Harry"]
names = ['\\'Mike\\':\\'Mike\\", ...]
for name in names:
for my_name in my_names:
if myname in name:
print myname, " is here"
import re
pattern = re.compile(r"[\n\\:']+")
list_of_names = pattern.split(names)
# ['', 'Mike', 'Mike', 'John', 'John', 'Mike/B', '']
# Quick-tip: Try not to name a list with "list" as "list" is a built-in
You can keep your results this way or do a final cleanup to remove empty strings
clean_list = list(filter(lambda x: x!='', list_of_names))

Switch Lastname, Firstname to Firstname Lastname inside List

I have two lists of sports players. One is structured simply:
['Lastname, Firstname', 'Lastname2, Firstname2'..]
The second is a list of lists structured:
[['Firstname Lastname', 'Team', 'Position', 'Ranking']...]
I ultimately want to search the contents of the second list and pull the info if there is a matching name from the first list.
I need to swap 'Lastname, Firstname' to 'Firstname Lastname' to match list 2's formatting for simplification.
Any help would be great. Thanks!
You can swap the order in the list of names with:
[" ".join(n.split(", ")[::-1]) for n in namelist]
An explanation: this is a list comprehension that does something to each item. Here are a few intermediate versions and what they would return:
namelist = ["Robinson, David", "Roberts, Tim"]
# split each item into a list, around the commas:
[n.split(", ") for n in namelist]
# [['Robinson', 'David'], ['Roberts', 'Tim']]
# reverse the split up list:
[n.split(", ")[::-1] for n in namelist]
# [['David', 'Robinson'], ['Tim', 'Roberts']]
# join it back together with a space:
[" ".join(n.split(", ")[::-1]) for n in namelist]
# ['David Robinson', 'Tim Roberts']

Categories

Resources