pywikipedia (python) regex to add string if lacking

pywikipedia (python) regex to add string if lacking - python

I have a set of records like:
Name
Name Paul Berry: present
Address George Necky: not present
Name Bob van Basten: present
Name Richard Von Rumpy: not present
Name Daddy Badge: not present
Name Paul Berry: present
Street George Necky: not present
Street Bob van Basten: present
Name Richard Von Rumpy: not present
City Daddy Badge: not present
and I want that all the records beginning with Name be in the form
Name Name Surname: not present
leaving untouched the records beginnning with other word.
i.e. I want to add the string "not" to the records beginning with Name where it isn't. I'm working with python (pywikipediabot)
Trying
python replace.py -dotall -regex 'Name ((?!not ).*?)present' 'Name \1not present'
but it adds the "not" even where it is already present.
Perhaps I haven't understood the negative lookahead syntax?

Just look for : present and replace it with : not present.
Edit: Improved answer:
for line in lines:
m = re.match('^Name[^:]*: present', line)
if m:
print re.sub(': present', ': not present', line)
else:
print line

You need a "negative look-behind" expression. This substitution will work:
'Name (.*)(?<!not )present' -> 'Name \1not present'
The .* matches everything between "Name" and "present", but the whole regexp matches only if "present" is not preceded by "not".
And are you sure you need -dotall? It looks like you want .* to match within a line only.

The following will do it:
re.sub(r'(Name.*?)(not )?present$', r'\1not present', s)

Related

How to add a Zero-or-more-condition (?) to multiple characters via regex without creating a capturing group?

The function rearrange_name should be given a name in the format:
Last Name (Normal or Double-barrelled name) followed by a "," " " and the First Name (either just one first name or together with middle initial name or full middle name)
Then the name should be rearranged to print it out as first name + last name.
This is the start of the code.
import re
def rearrange_name(name):
result = re.search (r"^(\w*), (\w*)$", name)
if result == None:
return name
return "{} {}".format(result[2], result[1])
name=rearrange_name("Kennedy, John F.")
print(name)
I know this specific problem has already been posted before
(Fix the regular expression used in the rearrange_name function so that it can match middle names, middle initials, as well as double surnames),
but i have a problem with the solution that was given that time as it allows for nonsense names like "-, John F." or " , John F." to be processed as well. I would have added a comment, but i don't have any reputation at all. This is my first post ever on stack overflow.
I'd like to change the code for it to be correct 100%.
The original solution given:
import re
def rearrange_name(name):
result = re.search(r"^([\w -]+), ([\w. ]+)$", name)
if result == None:
return name
return "{} {}".format(result[2], result[1])
name=rearrange_name("Kennedy, John F.")
print(name)
name=rearrange_name("Kennedy, John Fitzgerald")
print(name)
name=rearrange_name("Kennedy-McJohnson, John Fitzgerald")
print(name)
My solution approach, which you can see in the screenshot of regex101.com detects all the possible names given correctly, but the groups aren't detected the way they should.
enter image description here
I am struggling with it, as at least in my opinion you have to use "or" sequences ()? as groups which then aren't detected by the print function.
To give some examples:
These should all work and everything else shouldnt (obviously varying letters should be allowed:
"Kennedy, John"
just normal Last name + First name
Output: John Kennedy
"Kennedy, John F." - Last name + First name + Middle name initials
Output: John F. Kennedy
"Kennedy, John Fitzgerald" Last name + First name + Middle name
John Fitzgerald Kennedy
"Kennedy-McJohnson, John Fitzgerald" Last name double barreled + First name + Middle name
Output: John Fitzgerald Kennedy-McJohnson
"Kennedy-McJohnson, John F." Last name double barreled + First name + Middle name initials
John F. Kennedy-McJohnson
Swap every letter for another letter.
Characters that should be allowed: Letters (except for the spaces in between the names, the "." for the initial, the "-" for the double barreled name.
Not expected output as it should be considered invalid input:
input: |||?!**Kennedy, John F#####.
output:
|||?!**Kennedy, John F#####.
So if it is a valid name, the order is changed and put to the screen.
If it is not a valid name, the name is printed out the way it is presented first.

Try the pattern:
([A-Z][a-zA-Z]+(?:-[A-Z][a-zA-Z]+)?), ([A-Z][a-zA-Z]+\s*(?:[A-Z][a-zA-Z]+|[A-Z]\.)?)
Regex demo.
import re
pat = re.compile(
r"([A-Z][a-zA-Z]+(?:-[A-Z][a-zA-Z]+)?), ([A-Z][a-zA-Z]+\s*(?:[A-Z][a-zA-Z]+|[A-Z]\.)?)"
)
def rearrange_name(name):
m = pat.match(name)
if m:
return "{} {}".format(m.group(2), m.group(1))
return name
name = rearrange_name("Kennedy, John F.")
print(name)
name = rearrange_name("Kennedy, John Fitzgerald")
print(name)
name = rearrange_name("Kennedy-McJohnson, John Fitzgerald")
print(name)
Prints:
John F. Kennedy
John Fitzgerald Kennedy
John Fitzgerald Kennedy-McJohnson

Regular Expressions. cant understand how to get the entire name if it start with Mr / Mrs / Ms

I cant understand how does re module work. I performed many attempts to get the entire name if there is only one name or multiple names (surname).
This is the re.compile() format that I'm using to get the name if the string has the the surname optionally:
the_formmat = re.compile(r"Mr?s?\.?\s[A-Z][a-z]+\s[A-Z][a-z]+")
the_string = "this is Mr Samantha Rajapaksa and his wife Mrs. Chalani Rajapaksa. his fathers name is Mr Prabath and his mothers name is Mrs Karunarathnage Dayawathi Bandara Peiris "
print(the_formmat.findall(the_string))
I know the use case of the ? modifier but I don't know where to put it to get the surname if there is one or more.
From the above example I get this output:
['Mr Samantha Rajapaksa', 'Mrs. Chalani Rajapaksa', 'Mrs Karunarathnage Dayawathi']
The output that I want is:
['Mr Samantha Rajapaksa', 'Mrs. Chalani Rajapaksa', 'Mr Prabath', 'Mrs Karunarathnage Dayawathi Bandara Peiris']

Try this Regex:
/(?:Mr|Ms|Mrs)\.?(?: [A-Z][a-z]+)+/
Edited thanks to #treuss.
So change your the_formmat variable to:
the_formmat = re.compile(r"(?:Mr|Ms|Mrs)\.?(?: [A-Z][a-z]+)+")
What is does it it checks for Mr/Ms/Mrs, then when there's a space it will keep checking for words starting with an uppercase letter followed by a space until it doesn't match anymore.
You could check this RegExr link to learn more.

New to Python: How to keep the first letter of each word capitalized?

I was practicing with this tiny program with the hopes to capitalize the first letter of each word in: john Smith.
I wanted to capitalize the j in john so I would have an end result of John Smith and this is the code I used:
name = "john Smith"
if (name[0].islower()):
name = name.capitalize()
print(name)
Though, capitalizing the first letter caused an output of: John smith where the S was converted to a lowercase. How can I capitalize the letter j without messing with the rest of the name?
I thank you all for your time and future responses!
I appreciate it very much!!!

As #j1-lee pointed out, what you are looking for is the title method, which will capitalize each word (as opposed to capitalize, which will capitalize only the first word, as if it was a sentence).
So your code becomes
name = "john smith"
name = name.title()
print(name) #> John Smith

Of course you should be using str.title(). However, if you want to reinvent that functionality then you could do this:
name = 'john paul smith'
r = ' '.join(w[0].upper()+w[1:].lower() for w in name.split())
print(r)
Output:
John Paul Smith
Note:
This is not strictly equivalent to str.title() as it assumes all whitespace in the original string is replaced with a single space

Is there a way to combine multiple strings using Regex?

Having an issue with Regex and not really understanding its usefulness right now.
Trying to extrapolate data from a file. file consists of first name, last name, grade
File:
Peter Jenkins: A
Robert Right: B
Kim Long: C
Jim Jim: B
Opening file code:
##Regex Code r'([A-Za-z]+)(: B)
regcode = r'([A-Za-z]+)(: B)'
answer=re.findall(regcode,file)
return answer
The expected result is first name last name. The given result is last name and letter grade. How do I just get the first name and last name for all B grades?

Since you must use regex for this task, here's a simple regex solution that returns the full name:
'(.*): B'
Which works in this case because:
(.*) returns all text up to a match of : B
Click here to see my test and matching output. I recommend this site for your regex testing needs.

You can do it without regex:
students = '''Peter Jenkins: A
Robert Right: B
Kim Long: C
Jim Jim: B'''
for x in students.split('\n'):
string = x.split(': ')
if string[1] == 'B':
print(string[0])
# Robert Right
# Jim Jim
or
[x[0:-3] for x in students.split('\n') if x[-1] == 'B']

If a regex solution is required (I perosnally like the solution of Roman Zhak more), put inside a group what you are interested in, i.e. the first name and the second name. Follows colon and B:
import re
file = """
Peter Jenkins: A
Robert Right: B
Kim Long: C
Jim Jim: B
"""
regcode = r'([A-Za-z]+) ([A-Za-z]+): B'
answer=re.findall(regcode,file,re.)
print(answer) # [('Robert', 'Right'), ('Jim', 'Jim')]

Add a capturing group ('()') to your expression. Everything outside the group will be ignored, even if it matches the expression.
re.findall('(\w+\s+\w+):\s+B', file)
#['Robert Right', 'Jim Jim']
'\w' is any alphanumeric character, '\s' is any space-like character.
You can add two groups, one for the first name and one for the last name:
re.findall('(\w+)\s+(\w+):\s+B', data)
#[('Robert', 'Right'), ('Jim', 'Jim')]
The latter will not work if there are more than two names on one line.

How to replace every second space with a comma the Pythonic way

I have a string with first and last names all separated with a space.
For example:
installers = "Joe Bloggs John Murphy Peter Smith"
I now need to replace every second space with ', ' (comma followed by a space) and output this as string.
The desired output is
print installers
Joe Bloggs, John Murphy, Peter Smith

You should be a able to do this with a regex that that finds the spaces and replaces the last one:
import re
installers = "Joe Bloggs John Murphy Peter Smith"
re.sub(r'(\s\S*?)\s', r'\1, ',installers)
# 'Joe Bloggs, John Murphy, Peter Smith'
This says, find a space followed by some non-spaces followed by a space and replace it with the found space followed by some non-spaces and ", ". You could add installers.strip() if there's a possibility of trailing spaces on the string.

One way to do this is to split the string into a space-separated list of names, get an iterator for the list, then loop over the iterator in a for loop, collecting the first name and then advancing to loop iterator to get the second name too.
names = installers.split()
it = iter(names)
out = []
for name in it:
next_name = next(it)
full_name = '{} {}'.format(name, next_name)
out.append(full_name)
fixed = ', '.join(out)
print fixed
'Joe Bloggs, John Murphy, Peter Smith'
The one line version of this would be
>>> ', '.join(' '.join(s) for s in zip(*[iter(installers.split())]*2))
'Joe Bloggs, John Murphy, Peter Smith'
this works by creating a list that contains the same iterator twice, so the zip function returns both parts of the name. See also the grouper recipe from the itertools recipes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pywikipedia (python) regex to add string if lacking - python

Just look for : present and replace it with : not present. Edit: Improved answer: for line in lines: m = re.match('^Name[^:]*: present', line) if m: print re.sub(': present', ': not present', line) else: print line

The following will do it: re.sub(r'(Name.*?)(not )?present$', r'\1not present', s)

Related

How to add a Zero-or-more-condition (?) to multiple characters via regex without creating a capturing group?

Regular Expressions. cant understand how to get the entire name if it start with Mr / Mrs / Ms

New to Python: How to keep the first letter of each word capitalized?

Is there a way to combine multiple strings using Regex?

How to replace every second space with a comma the Pythonic way

Categories

Resources