Regex extract element after string - python

If I have a string s = "Name: John, Name: Abby, Name: Kate". How do I extract everything in between Name: and ,. So I'd want to have an array a = John, Abby, Kate
Thanks!

No need for a regex:
>>> s = "Name: John, Name: Abby, Name: Kate"
>>> [x[len('Name: '):] for x in s.split(', ')]
['John', 'Abby', 'Kate']
Or even:
>>> prefix = 'Name: '
>>> s[len(prefix):].split(', ' + prefix)
['John', 'Abby', 'Kate']
Now if you still think a regex is more appropriate:
>>> import re
>>> re.findall('Name:\s+([^,]*)', s)
['John', 'Abby', 'Kate']

The interesting question is how you would choose among the many ways to do this in Python. The answer using "split" is nice if you're confident that the format will be exact. If you would like some protection from minor format changes, a regular expression might be useful. You should think through what parts of the format are most likely to be stable, and capture those in your regular expression, while leaving flexibility for the others. Here is an example that assumes that the names are alphabetic, and that the word "Name" and the colon are stable:
import re
s = "Name: John, Name: Abby, Name: Kate"
names = [i.group(1) for i in re.finditer("Name:\s+([A-Za-z]*)", s)]
print names
You might instead want to allow for hyphens or other characters inside a name; you can do so by changing the text inside [A-Za-z].
A good page about Python regular expressions with lots of examples is http://docs.python.org/howto/regex.html.

Few more ways to do it
>>> s
'Name: John, Name: Abby, Name: Kate'
Method 1:
>>> [x.strip() for x in s.split("Name:")[1:]]
['John,', 'Abby,', 'Kate']
Method 2:
>>> [x.rsplit(":",1)[-1].strip() for x in s.split(",")]
['John', 'Abby', 'Kate']
Method 3:
>>> [x.strip() for x in re.findall(":([^,]*)",s)]
['John', 'Abby', 'Kate']
Method 4:
>>> [x.strip() for x in s.replace('Name:','').split(',')]
['John', 'Abby', 'Kate']
Also note, how I always consistently applied strip which makes sense if their can be multiple spaces between 'Name:' token and the actual Name.
Method 2 and 3 can be used in a more generalized way.

Related

How to print a string which is a name with two words in reverse order without commas and unnecessary spaces? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have to write a function that takes a string of full names and prints it in reverse order. It also removes unnecessary spaces and commas. Some of the expected output is as follow:
- >>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'.
I wrote the following code.
def main():
name=input()
reverse_name(name)
print(reverse_name(name))
def reverse_name(string1):
i = 0
for index in string1:
if index != ",":
i += 1
else:
last = string1[i + 1:]
first = string1[0:i]
result = last + " " + first
return result
if __name__ == "__main__":
main()
p.s: I must implement a function that takes a string as a parameter and returns a string. The input will also contain a comma which the output will not print.
You could combine split and join after having inverted the output of split:
def reverse_name(s):
return ' '.join([e.strip() for e in s.split(', ')][::-1])
>>> reverse_name('Techie, Teddy')
'Teddy Techie'
>>> reverse_name(' Duck, Donald ')
'Donald Duck'
Here is another option using the re module:
def reverse_name(s):
return re.sub(r'\s*(.+),\s*(.*\S)\s*', r'\2 \1', s)
if a comma is guaranty to always be there simply using string1.split(",") will give you a list of the separate words in the string, and simple filter the empty one and removing the trailing white spaces with .strip will do the trick
>>> def reverse_name(text):
return " ".join( w for w in map(str.strip,reversed(text.split(","))) if w)
# ^removing trailing white space ^filter empty ones
>>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>>
Use split() to separate the input at the commas. Strip spaces from each element to remove the extraneous spaces, and then reverse the list.
def reverse_names(string1):
names = string1.split(',') # split at commas
names = [name.strip() for name in names] # remove extra spaces
return " ".join(names[::-1]) # return reversed names as a string
you can use a regular expression to replace all the intermediate non-letter characters with a single space (then remove leading/trailing spaces). Then use a regular rsplit() to separate the first and last name (assuming that only the last name can be composite). Reassemble the inverted split result using join():
import re
def reverse_name(name):
name = re.sub('\W+',' ',name).strip()
return " ".join(name.rsplit(' ',1)[::-1])-1])
print(reverse_name("Techie, Teddy"))
print(reverse_name("Scumble, Arnold"))
print(reverse_name("von Grünbaumberger, Herbert"))
print(reverse_name("Fortunato,Frank"))
print(reverse_name("X,"))
print(reverse_name(",X"))
print(reverse_name(" , Y "))
print(reverse_name(" Duck, Donald "))
Teddy Techie
Arnold Scumble
Herbert von Grünbaumberger
Frank Fortunato
X
X
Y
Donald Duck
Of course, this leaves the problem of composite first names such as John-Paul Smith which creates an ambiguity on which words are part of the first and last name. If there is always going to be a comma, then the solution would be different (but you would have to state that explicitly in your question)
Solution based on systematic presence of a comma between last and first name:
def reverse_name(name):
names = re.sub('[^,\w]+',' ',name).split(',',1)
return " ".join(map(str.strip,names)).strip()

Remove an element from a str

I have a list in which I add elements like this:
listA.append('{:<30s} {:>10s}'.format(element, str(code)))
so listA looks like this:
Paris 75
Amsterdam 120
New York City 444
L.A 845
I would like, now from this listA, to add elements to a "listB" list, without the code. I would like to do this:
for i in listA:
listB.append(i - str(code)) #that's what i want to do. The code is not good
and I want the listB to look like this:
Paris
Amsterdam
New York City
L.A
and only using listA and without having access to 'element' and 'code'
Can someone can help me ?
You can use regex for that
import re
for i in listA:
listB.append(re.sub(r"\W+\d+", "", i))
This will remove the code that is numbers and the spaces before it.
Try this:
import re
listB = [re.sub('\d+', '', x).strip() for x in listA]
print(listB)
Output:
['Paris', 'Amsterdam', 'New York City', 'L.A']
This seems to work for me
listB=[]
for i in listA:
listB.append(listA[0][:len(element)])
print(listB)
You can try the following:
listB = [element.rsplit(' ', maxsplit=1)[0].rstrip() for element in listA]
rsplit(' ', maxsplit=1) means you will split the element of listA once at first space from the right side. Additional rstrip() will get rid of the other spaces from the right side.
How about to use re.sub? And instead of using for in it would be better to map functional style or [for in] list comprehension:
import re
listB = list(map(lambda x: re.sub(r"\s+\d+", "", x), listA))
or, even better
import re
listB = [re.sub(r"\s+\d+", "", x) for x in listA]
A little about regex:
re.sub - is a function what searches an removes all occurrences of first argument to second in third one
r"\s+\d+" - is a 'raw' string literal with regex inside
\s+ - one or more whitespaces (\t\n\r etc)
\d+ - one or more digits (\d is an allias for [0-9])
For more information about regex use this documentation page
Easiest way without any new libraries.
You can create a variable with 40 spaces(because of 40 spaces in the format clause). eg: space_var = " "
Then use the following code to extract element from listA:
listB=[]
for i in listA:
listB.append(listA[0].rsplit(space_var,1)[0])

How to split list elements to a line separated by space

I have a list in python as :
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("".join(values))
I want output should be as :-
Subjects: Maths English Hindi Science Physical_Edu Accounts
I am new to Python, I used join() method but unable to get expected output.
You could map the str.stripfunction to every element in the list and join them afterwards.
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects:", " ".join(map(str.strip, values)))
Using a regular expression approach:
import re
lst = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
rx = re.compile(r'.*')
print("Subjects: {}".format(" ".join(match.group(0) for item in lst for match in [rx.match(item)])))
# Subjects: Maths English Hindi Science Physical_Edu Accounts
But better use strip() (or even better: rstrip()) as provided in other answers like:
string = "Subjects: {}".format(" ".join(map(str.rstrip, lst)))
print(string)
strip() each element of the string and then join() with a space in between them.
a = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects: " +" ".join(map(lambda x:x.strip(), a)))
Output:
Subjects: Maths English Hindi Science Physical_Edu Accounts
As pointed out by #miindlek, you can also achieve the same thing, by using map(str.strip, a) in place of map(lambda x:x.strip(), a))
What you can do is use this example to strip the newlines and join them using:
joined_string = " ".join(stripped_array)

How can I extract names from a concatenated string using Python?

Suppose I have a string of concatenated names like so:
name.s = 'johnwilliamsfrankbrown'.
How do I go from here to a list of names and surnames ["john", "williams", "frank", "brown"]?
So far I only found pieces of code to extract words from non concatenated strings.
As timgeb noted in the comments, this is only possible if you already know which names you expect. Assuming that you have this information, you can extract them like this:
>>> import re
>>> names = ['john', 'frank', 'brown', 'williams']
>>> regex = '(' + '|'.join(names) + ')'
>>> separated_names = re.findall(regex, 'johnwilliamsfrankbrown')
>>> separated_names
['john', 'williams', 'frank', 'brown']

Python Multiple Strings to Tuples

Hi everyone I wonder if you can help with my problem.
I am defining a function which takes a string and converts it into 5 items in a tuple. The function will be required to take a number of strings, in which some of the items will vary in length. How would I go about doing this as using the indexes of the string does not work for every string.
As an example -
I want to convert a string like the following:
Doctor E212 40000 Peter David Jones
The tuple items of the string will be:
Job(Doctor), Department(E212), Pay(40000), Other names (Peter David), Surname (Jones)
However some of the strings have 2 other names where others will have just 1.
How would I go about converting strings like this into tuples when the other names can vary between 1 and 2?
I am a bit of a novice when it comes to python as you can probably tell ;)
With Python 3, you can just split() and use "catch-all" tuple unpacking with *:
>>> string = "Doctor E212 40000 Peter David Jones"
>>> job, dep, sal, *other, names = string.split()
>>> job, dep, sal, " ".join(other), names
('Doctor', 'E212', '40000', 'Peter David', 'Jones')
Alternatively, you can use regular expressions, e.g. something like this:
>>> m = re.match(r"(\w+) (\w+) (\d+) ([\w\s]+) (\w+)", string)
>>> job, dep, sal, other, names = m.groups()
>>> job, dep, sal, other, names
('Doctor', 'E212', '40000', 'Peter David', 'Jones')

Categories

Resources