I have a set of two names with spaces in them. I want to do a regex search for "George Bush" or "Barack Obama". Following this example I tried this, which gets the desired output
p = "(George\sBush|Barack\sObama)"
s = "recent Presidents George Bush and Barack Obama"
print re.findall(p,s) #Prints George Bush and Barack Obama
However, now I want to go from a list ["George Bush", "Barack Obama"] to the pattern shown above.
I tried this:
for l in list:
p = p + "|" + l
p = p.strip("|")
p = ('.{75}(' + p + ').{75}').replace(" ", "\s")
But it gives : '.{75}(George\\sBush|Barack\\sObama).{75}'
How can I replace space characters with just "\s" instead of "\\s"?
You already have. The backslash is special and must be escaped in the representation (and should be escaped in the string), but you really do have "\s". Try printing the string instead.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have to write a function that takes a string of full names and prints it in reverse order. It also removes unnecessary spaces and commas. Some of the expected output is as follow:
- >>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'.
I wrote the following code.
def main():
name=input()
reverse_name(name)
print(reverse_name(name))
def reverse_name(string1):
i = 0
for index in string1:
if index != ",":
i += 1
else:
last = string1[i + 1:]
first = string1[0:i]
result = last + " " + first
return result
if __name__ == "__main__":
main()
p.s: I must implement a function that takes a string as a parameter and returns a string. The input will also contain a comma which the output will not print.
You could combine split and join after having inverted the output of split:
def reverse_name(s):
return ' '.join([e.strip() for e in s.split(', ')][::-1])
>>> reverse_name('Techie, Teddy')
'Teddy Techie'
>>> reverse_name(' Duck, Donald ')
'Donald Duck'
Here is another option using the re module:
def reverse_name(s):
return re.sub(r'\s*(.+),\s*(.*\S)\s*', r'\2 \1', s)
if a comma is guaranty to always be there simply using string1.split(",") will give you a list of the separate words in the string, and simple filter the empty one and removing the trailing white spaces with .strip will do the trick
>>> def reverse_name(text):
return " ".join( w for w in map(str.strip,reversed(text.split(","))) if w)
# ^removing trailing white space ^filter empty ones
>>> reverse_name("Techie, Teddy")
'Teddy Techie'
>>> reverse_name("Scumble, Arnold")
'Arnold Scumble'
>>> reverse_name("von Grünbaumberger, Herbert")
'Herbert von Grünbaumberger'
>>> reverse_name("Fortunato,Frank")
'Frank Fortunato'
>>> reverse_name("X,")
'X'
>>> reverse_name(",X")
'X'
>>> reverse_name(" , Y ")
'Y'
>>> reverse_name(" Duck, Donald ")
'Donald Duck'
>>>
Use split() to separate the input at the commas. Strip spaces from each element to remove the extraneous spaces, and then reverse the list.
def reverse_names(string1):
names = string1.split(',') # split at commas
names = [name.strip() for name in names] # remove extra spaces
return " ".join(names[::-1]) # return reversed names as a string
you can use a regular expression to replace all the intermediate non-letter characters with a single space (then remove leading/trailing spaces). Then use a regular rsplit() to separate the first and last name (assuming that only the last name can be composite). Reassemble the inverted split result using join():
import re
def reverse_name(name):
name = re.sub('\W+',' ',name).strip()
return " ".join(name.rsplit(' ',1)[::-1])-1])
print(reverse_name("Techie, Teddy"))
print(reverse_name("Scumble, Arnold"))
print(reverse_name("von Grünbaumberger, Herbert"))
print(reverse_name("Fortunato,Frank"))
print(reverse_name("X,"))
print(reverse_name(",X"))
print(reverse_name(" , Y "))
print(reverse_name(" Duck, Donald "))
Teddy Techie
Arnold Scumble
Herbert von Grünbaumberger
Frank Fortunato
X
X
Y
Donald Duck
Of course, this leaves the problem of composite first names such as John-Paul Smith which creates an ambiguity on which words are part of the first and last name. If there is always going to be a comma, then the solution would be different (but you would have to state that explicitly in your question)
Solution based on systematic presence of a comma between last and first name:
def reverse_name(name):
names = re.sub('[^,\w]+',' ',name).split(',',1)
return " ".join(map(str.strip,names)).strip()
I have a list in python as :
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("".join(values))
I want output should be as :-
Subjects: Maths English Hindi Science Physical_Edu Accounts
I am new to Python, I used join() method but unable to get expected output.
You could map the str.stripfunction to every element in the list and join them afterwards.
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects:", " ".join(map(str.strip, values)))
Using a regular expression approach:
import re
lst = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
rx = re.compile(r'.*')
print("Subjects: {}".format(" ".join(match.group(0) for item in lst for match in [rx.match(item)])))
# Subjects: Maths English Hindi Science Physical_Edu Accounts
But better use strip() (or even better: rstrip()) as provided in other answers like:
string = "Subjects: {}".format(" ".join(map(str.rstrip, lst)))
print(string)
strip() each element of the string and then join() with a space in between them.
a = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects: " +" ".join(map(lambda x:x.strip(), a)))
Output:
Subjects: Maths English Hindi Science Physical_Edu Accounts
As pointed out by #miindlek, you can also achieve the same thing, by using map(str.strip, a) in place of map(lambda x:x.strip(), a))
What you can do is use this example to strip the newlines and join them using:
joined_string = " ".join(stripped_array)
new = ['mary 2jay 3ken +', 'mary 2jay 3ken +', 'steven +john ']
print(new):
mary 2jay 3ken +
mary 2jay 3ken +
steven +john -
How could I get the sign/number after each person's name? I'm wondering whether dict would work in this case as my expected output is:
mary:2
jay:3
ken:+
steven:+
john:-
To get the index of "+" in a string, you can use:
index = a_string.index("+")
To check if "+" exist in a string, use:
if "+" in a_string:
# ...
To iterate a list of string, you can do:
for text in new:
# ...
There are fifty ways to do what you want. I suggest you to read the Python tutorial.
edit
You can use a RegEx to extract the fields name/number
for text in next:
couples = re.findall(r"(\S+)\s+(\d+|\+|\-|$)", text)
for name, num in couples:
print(name, num)
I am creating something which takes a tuple, converts it into a string and then reorganises the string using print formatting. 'other' can sometimes have 2 names, hence why I have used * and the " ".join(other) in this function:
def strFormat(x):
#Convert to string
s=' '
s = s.join(x)
print(s)
#Split string into different parts
payR, dep, sal, *other, surn = s.split()
payR, dep, sal, " ".join(other), surn
#Print formatting!
print (surn , other, payR, dep, sal)
The problem with this is that it prints a list of 'other' within the string like this:
Jones ['David', 'Peter'] 84921 Python 63120
But I want it more like this:
Jones David Peter 84921 Python 63120
So that it is ready for formatting into something like this:
Jones, David Peter 84921 Python £63120
Am I going about this the right way and how do I stop the list appearing within the string?
You're close. Change this line (which does nothing):
payR, dep, sal, " ".join(other), surn
to
other = " ".join(other)
Consider the following text:
"Mr. McCONNELL. yadda yadda jon stewart is mean to me. The PRESIDING OFFICER. Suck it up. Mr. McCONNELL. but noooo. Mr. REID. Really dude?"
And a list of words to split on:
["McCONNELL", "PRESIDING OFFICER", "REID"]
I want to have the output be the dictionary
{"McCONNELL": "yadda yadd jon stewart is mean to me. but noooo.",
"PRESIDING OFFICER": "Suck it up. "
"REID": "Really dude?"}
So I need a way to split by elements of a list (on any of those names), and then be aware of which one it split on and be able to map that to the chunk of text in that split. In the case of more than one chunks of text having the same speaker ("McCONNELL", in the example), just concatenate the strings.
Edit: Here is the function I have been using. It works on the example, but is not robust when I try it on a much larger scale (and isn't clear why it messes up)
def split_by_speaker(txt, seps):
'''
Given raw text and a list of separators (generally possible speaker names), splits based
on those names and returns a dictionary of text attributable to that name
'''
speakers = []
default_sep = seps[0]
rv = {}
for sep in seps:
if sep in txt:
all_occurences = [m.start() for m in re.finditer(sep, txt)]
for occ in all_occurences:
speakers.append((sep, occ))
txt = txt.replace(sep, default_sep)
temp_t = [i.strip() for i in txt.split(default_sep)][1:]
speakers.sort(key = lambda x: x[1])
for i in range(len(temp_t)):
if speakers[i][0] in rv:
rv[speakers[i][0]] = rv[speakers[i][0]] + " " + temp_t[i]
else:
rv[speakers[i][0]] = temp_t[i]
return rv
Use re module from standard library to define splits. Hint: split "separator" - regular expression - can be of the form: (WORD1|WORD2|WORD3)
See these examples on what are the results of re.split.
import re
text = "Mr. McCONNELL. yadda yadda jon stewart is mean to me. The PRESIDING OFFICER. Suck it up. Mr. McCONNELL. but noooo. Mr. REID. Really dude?"
speakers = ["McCONNELL", "PRESIDING OFFICER", "REID"]
speakers_re = re.compile('(' + '|'.join([re.escape(s) for s in speakers]) + ')')
print speakers_re.split(text)
Result:
['Mr. ', 'McCONNELL',
'. yadda yadda jon stewart is mean to me. The ',
'PRESIDING OFFICER', '. Suck it up. Mr. ',
'McCONNELL', '. but noooo. Mr. ', 'REID', '. Really dude?']
Removing unnecessary punctuation can also be done by regexps, or simple .rstrip() and .lstrip() methods for strings.