Replace subtext of a word - python

I want to replace this string
ramesh#gmail.com
to
rxxxxh#gxxxl.com
this is what I have done so far
print( re.sub(r'([A-Za-z](.*)[A-Za-z]#)','x', i))

One way to go is to use capturing groups and in the replacement for the parts that should be replaced with x return a repetition for number of characters in the matched group.
For the second and the fourth group use a negated character class [^ matching any char except the listed.
\b([A-Za-z])([^#\s]*)([A-Za-z]#[A-Za-z])([^#\s.]*)([A-Za-z])\b
Regex demo | Python demo
For example
import re
i = "ramesh#gmail.com"
res = re.sub(
r'\b([A-Za-z])([^#\s]*)([A-Za-z]#[A-Za-z])([^#\s.]*)([A-Za-z])\b',
lambda x: x.group(1) + "x" * len(x.group(2)) + x.group(3) + "x" * len(x.group(4)) + x.group(5),
i)
print(res)
Output
rxxxxh#gxxxl.com

Related

Extract substring from a python string

I want to extract the string before the 9 digit number below:
tmp = place1_128017000_gw_cl_mask.tif
The output should be place1
I could do this:
tmp.split('_')[0] but I also want the solution to work for:
tmp = place1_place2_128017000_gw_cl_mask.tif where the result would be:
place1_place2
You can assume that the number will also be 9 digits long
Using regular expressions and the lookahead feature of regex, this is a simple solution:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'.+(?=_\d{9}_)', tmp)
print(m.group())
Result:
place1_place2
Note that the \d{9} bit matches exactly 9 digits. And the bit of the regex that is in (?= ... ) is a lookahead, which means it is not part of the actual match, but it only matches if that follows the match.
Assuming we can phrase your problem as wanting the substring up to, but not including the underscore which is followed by all numbers, we can try:
tmp = "place1_place2_128017000_gw_cl_mask.tif"
m = re.search(r'^([^_]+(?:_[^_]+)*)_\d+_', tmp)
print(m.group(1)) # place1_place2
Use a regular expression:
import re
places = (
"place1_128017000_gw_cl_mask.tif",
"place1_place2_128017000_gw_cl_mask.tif",
)
pattern = re.compile("(place\d+(?:_place\d+)*)_\d{9}")
for p in places:
matched = pattern.match(p)
if matched:
print(matched.group(1))
prints:
place1
place1_place2
The regex works like this (adjust as needed, e.g., for less than 9 digits or a variable number of digits):
( starts a capture
place\d+ matches "places plus 1 to many digits"
(?: starts a group, but does not capture it (no need to capture)
_place\d+ matches more "places"
) closes the group
* means zero or many times the previous group
) closes the capture
\d{9} matches 9 digits
The result is in the first (and only) capture group.
Here's a possible solution without regex (unoptimized!):
def extract(s):
result = ''
for x in s.split('_'):
try: x = int(x)
except: pass
if isinstance(x, int) and len(str(x)) == 9:
return result[:-1]
else:
result += x + '_'
tmp = 'place1_128017000_gw_cl_mask.tif'
tmp2 = 'place1_place2_128017000_gw_cl_mask.tif'
print(extract(tmp)) # place1
print(extract(tmp2)) # place1_place2

Python conditional replace

I need conditional replace in string.
input_str = "a111a11b111b22"
condition : ("b" + any number + "b") to ("Z" + any number)
output_str = "a111a11Z11122"
maybe I need to use [0] and [-1] for remove "b"s and "Z"+any number
but I can't find conditional replace for it.
You should use regular expressions. They are really useful:
import re
input_str = "a111a11b111b22"
output_str = re.sub(r'b(\d+)b', r'Z\1', input_str)
# output_str is "a111a11Z11122"
The r'b(\d+)b' regexpr matches letter b, followed by 1 or more digits and other letter b. The parenthesis memorizes the digits for further use (with \1) in the replacement part of the sentence (letter Z and \1).
Try with regex:
import re
input_str = "a111a11b111b22"
output_str = re.sub(r'[b](\d)',r'Z\1',input_str)
print(output_str)

The problem of regex strings containing special characters in python

I have a string: "s = string.charAt (0) == 'd'"
I want to retrieve a tuple of ('0', 'd')
I have used: re.search(r "\ ((. ?) \) == '(.?)' && "," string.charAt (0) == 'd' ")
I checked the s variable when printed as "\\ ((.?) \\) == '(.?) '&& "
How do I fix it?
You may try:
\((\d+)\).*?'(\w+)'
Explanation of the above regex:
\( - Matches a ( literally.
(\d+) - Represents the first capturing group matching digits one or more times.
\) - Matches a ) literally.
.*? - Lazily matches everything except a new-line.
'(\w+)' - Represents second capturing group matching ' along with any word character([0-9a-zA-Z_]) one or more times.
Regex Demo
import re
regex = r"\((\d+)\).*?'(\w+)'"
test_str = "s = string.charAt (0) == 'd'"
print(re.findall(regex, test_str))
# Output: [('0', 'd')]
You can find the sample run of the above implementation in here.
Your regular expression should be ".*\((.?)\) .* '(.?)\'". This will get both the character inside the parenthesis and then the character inside the single quotes.
>>> import re
>>> s = " s = string.charAt (0) == 'd'"
>>> m = re.search(r".*\((.?)\) .* '(.?)'", s)
>>> m.groups()
('0', 'd')
Use
\((.*?)\)\s*==\s*'(.*?)'
See proof. The first variable is captured inside Group 1 and the second variable is inside Group 2.
Python code:
import re
string = "s = string.charAt (0) == 'd'"
match_data = re.search(r"\((.*?)\)\s*==\s*'(.*?)'", string)
if match_data:
print(f"Var#1 = {match_data.group(1)}\nVar#2 = {match_data.group(2)}")
Output:
Var#1 = 0
Var#2 = d
Thanks everyone for the very helpful answer. My problem has been solved ^^

How to use regex to only keep first n repeated words

If I have an input sentence
input = 'ok ok, it is very very very very very hard'
and what I want to do is to only keep the first three replica for any repeated word:
output = 'ok ok, it is very very very hard'
How can I achieve this with re or regex module in python?
One option could be to use a capturing group with a backreference and use that in the replacement.
((\w+)(?: \2){2})(?: \2)*
Explanation
( Capture group 1
(\w+) capture group 2, match 1+ word chars (The example data only uses word characters. To make sure they are no part of a larger word use a word boundary \b)
(?: \2){2} Repeat 2 times matching a space and a backreference to group 2. Instead of a single space you could use [ \t]+ to match 1+ spaces or tabs or use \s+ to match 1+ whitespace chars. (Note that that would also match a newline)
) Close group 1
(?: \2)* Match 0+ times a space and a backreference to group 2 to match the same words that you want to remove
Regex demo | Python demo
For example
import re
regex = r"((\w+)(?: \2){2})(?: \2)*"
s = "ok ok, it is very very very very very hard"
result = re.sub(regex, r"\1", s)
if result:
print (result)
Result
ok ok, it is very very very hard
You can group a word and use a backreference to refer to it to ensure that it repeats for more than 2 times:
import re
print(re.sub(r'\b((\w+)(?:\s+\2){2})(?:\s+\2)+\b', r'\1', input))
This outputs:
ok ok, it is very very very hard
One solution with re.sub with custom function:
s = 'ok ok, it is very very very very very hard'
def replace(n=3):
last_word, cnt = '', 0
current_word = yield
while True:
if last_word == current_word:
cnt += 1
else:
cnt = 0
last_word = current_word
if cnt >= n:
current_word = yield ''
else:
current_word = yield current_word
import re
replacer = replace()
next(replacer)
print(re.sub(r'\s*[\w]+\s*', lambda g: replacer.send(g.group(0)), s))
Prints:
ok ok, it is very very very hard

How to perform a transform on a matched group in a substitution in python [duplicate]

I try to put this steps in one, but it doesnt work
w = re.sub('[0-9]', r'9', w)
w = re.sub('[A-Z]', r'X', w)
w = re.sub('[a-z]', r'x', w)
Does anybody knows how to make from such strings as XXxxxx999 --> Xx9.
You may use a callback method as a replacement argument like this:
import re
rx = r'([0-9]+)|([A-Z]+)|[a-z]+'
w = "XXxxxx999"
def repl(m):
if m.group(1): # if ([0-9]) matched
return '9' # replace with 9
elif m.group(2): # if ([A-Z]) matched
return 'X' # replace with X
else: # if ([a-z]) matched
return 'x' # replace with x
print(re.sub(rx, repl, w)) # => Xx9
See the Python demo.
The regex matches:
([0-9]+) - Group 1: 1+ digits
| - or
([A-Z]+) - Group 2: 1+ uppercase letters
| - or
[a-z]+ - 1+ lowercase letters.

Categories

Resources