Python Regex replace part of string - python

I am trying to replace a specific part of my string. Everytime I have a backslash, followed by a capital letter, I want the backslash to be replaced with a tab. Like in this case:
Hello/My daugher/son
The output should look like
Hello My daugher/son
I have tried to use re.sub():
for x in a:
x = re.sub('\/[A-Z]', '\t[A-Z]', x)
But then my output changes into:
Hello [A-Z]y daugher/son
Which is really not what I want. Is there a better way to tackle this, maybe not in regex?

You can replace /(?=[A-Z]) with \t. Notice in Python you don't need to escape / as \/
Check this Python code,
import re
s = 'Hello/My daugher/son'
print(re.sub(r'/(?=[A-Z])',r'\t',s))
Prints,
Hello My daugher/son
Alternatively, following the way you were trying to replace, you need to capture the capital letter in a group using /([A-Z]) regex and then replace it with \t\1 to restore what got captured in group1. Check this Python codes,
import re
s = 'Hello/My daugher/son'
print(re.sub(r'/([A-Z])',r'\t\1',s))
Again prints,
Hello My daugher/son

Related

Regex Puzzle: Match a pattern only if it is between two $$ without indefinite look behind

I am writing a snippet for the Vim plugin UltiSnips which will trigger on a regex pattern (as supported by Python 3). To avoid conflicts I want to make sure that my snippet only triggers when contained somewhere inside of $$___$$. Note that the trigger pattern might contain an indefinite string in front or behind it. So as an example I might want to match all "a" in "$$ccbbabbcc$$" but not "ccbbabbcc". Obviously this would be trivial if I could simply use indefinite look behind. Alas, I may not as this isn't .NET and vanilla Python will not allow it. Is there a standard way of implementing this kind of expression? Note that I will not be able to use any python functions. The expression must be a self-contained trigger.
If what you are looking for only occurs once between the '$$', then:
\$\$.*?(a)(?=.*?\$\$)
This allows you to match all 3 a characters in the following example:
\$\$) Matches '$$'
.*? Matches 0 or more characters non-greedily
(?=.*?\$\$) String must be followed by 0 or more arbitrary characters followed by '$$'
The code:
import re
s = "$$ccbbabbcc$$xxax$$bcaxay$$"
print(re.findall(r'\$\$.*?(a)(?=.*?\$\$)', s))
Prints:
['a', 'a', 'a']
The following should work:
re.findall("\${2}.+\${2}", stuff)
Breakdown:
Looks for two '$'
"\${2}
Then looks for one or more of any character
.+
Then looks for two '$' again
I believe this regex would work to match the a within the $$:
text = '$$ccbbabbcc$$ccbbabbcc'
re.findall('\${2}.*(a).*\${2}', text)
# prints
['a']
Alternatively:
A simple approach (requiring two checks instead of one regex) would be to first find all parts enclosed in your quoting text, then check if your search string is present withing.
example
text = '$$ccbbabbcc$$ccbbabbcc'
search_string = 'a'
parts = re.findall('\${2}.+\${2}', text)
[p for p in parts if search_string in p]
# prints
['$$ccbbabbcc$$']

Regex for uppercase and underscores between percentage signs

Regex has never been my strong point. In python I'm attempting to build an expression which matches substrings such as this:
%MATCH%
%MATCH_1%
$THIS_IS_A_MATCH%
It would be extracted by a %MATCH% like this or %LIKE_THIS%
I ended up with this (logically, but does not seem to work): %[A-Z0-9_]*$%
So where am I going wrong on this?
You can use a simple regex like this:
[%$]\w+[%$] <-- Notice I put $ because of your sample
On the other hand, if you only want uppercase you can use:
[%$][A-Z_\d]+[%$]
If you only want to match content within %, you could also use:
%.+?%
Python code
import re
p = re.compile(ur'[%$]\w+[%$]')
test_str = u"%MATCH%\n\n%MATCH_1%\n\n$THIS_IS_A_MATCH%"
re.findall(p, test_str)
Btw, the problem with your regex is below:
%[A-Z0-9_]*$%
^--- Remove this dolar sign

Using the .split() function based on conditions?

How would you be able to use the .split() function based on conditions?
Lets say I have the raw data:
Apples,Oranges,Strawberries Green beans,Tomatoes,Broccoli
My intended result is:
['Apples','Oranges','Strawberries','Green beans','Tomatoes','Brocolli']
Would it be able to have it split at commas and if there is a space and a capital letter following it?
The literal interpretation of what you asked for, using re.split:
import re
pat = re.compile(r'\s(?=[A-Z])|,')
pat.split(my_str)
This is more simply done, in your case:
pat = re.compile(r'.(?=[A-Z])')
Basically, split on any character that is followed by a capital letter.
Using regex will make the code simpler than a complicated split statement.
import re
...
re.findall(", [A-Z]",data)
Note you asked for a split for a command, space, capital, but in your example there are no spaces after commas.

python regex and replace

I am trying to learn python and regex at the same time and I am having some trouble in finding how to match till end of string and make a replacement on the fly.
So, I have a string like so:
ss="this_is_my_awesome_string/mysuperid=687y98jhAlsji"
What I'd want is to first find 687y98jhAlsji (I do not know this content before hand) and then replace it to myreplacedstuff like so:
ss="this_is_my_awesome_string/mysuperid=myreplacedstuff"
Ideally, I'd want to do a regex and replace by first finding the contents after mysuperid= (till the end of string) and then perform a .replace or .sub if this makes sense.
I would appreciate any guidance on this.
You can try this:
re.sub(r'[^=]+$', 'myreplacedstuff', ss)
The idea is to use a character class that exclude the delimiter (here =) and to anchor the pattern with $
explanation:
[^=] is a character class and means all characters that are not =
[^=]+ one or more characters from this class
$ end of the string
Since the regex engine works from the left to the right, only characters that are not an = at the end of the string are matched.
You can use regular expressions:
>>> import re
>>> mymatch = re.search(r'mysuperid=(.*)', ss)
>>> ss.replace(mymatch.group(1), 'replacing_stuff')
'this_is_my_awesome_string/mysuperid=replacing_stuff'
You should probably use #Casimir's answer though. It looks cleaner, and I'm not that good at regex :p.

Python split by regular expression

In Python, I am extracting emails from a string like so:
split = re.split(" ", string)
emails = []
pattern = re.compile("^[a-zA-Z0-9_\.-]+#[a-zA-Z0-9-]+.[a-zA-Z0-9-\.]+$");
for bit in split:
result = pattern.match(bit)
if(result != None):
emails.append(bit)
And this works, as long as there is a space in between the emails. But this might not always be the case. For example:
Hello, foo#foo.com
would return:
foo#foo.com
but, take the following string:
I know my best friend mailto:foo#foo.com!
This would return null. So the question is: how can I make it so that a regex is the delimiter to split? I would want to get
foo#foo.com
in all cases, regardless of punctuation next to it. Is this possible in Python?
By "splitting by regex" I mean that if the program encounters the pattern in a string, it will extract that part and put it into a list.
I'd say you're looking for re.findall:
>>> email_reg = re.compile(r'[a-zA-Z0-9_.-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+')
>>> email_reg.findall('I know my best friend mailto:foo#foo.com!')
['foo#foo.com']
Notice that findall can handle more than one email address:
>>> email_reg.findall('Text text foo#foo.com, text text, baz#baz.com!')
['foo#foo.com', 'baz#baz.com']
Use re.search or re.findall.
You also need to escape your expression properly (. needs to be escaped outside of character classes, not inside) and remove/replace the anchors ^ and $ (for example with \b), eg:
r"\b[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+\b"
The problem I see in your regex is your use of ^ which matches the start of a string and $ which matches the end of your string. If you remove it and then run it with your sample test case it will work
>>> re.findall("[A-Za-z0-9\._-]+#[A-Za-z0-9-]+.[A-Za-z0-9-\.]+","I know my best friend mailto:foo#foo.com!")
['foo#foo.com']
>>> re.findall("[A-Za-z0-9\._-]+#[A-Za-z0-9-]+.[A-Za-z0-9-\.]+","Hello, foo#foo.com")
['foo#foo.com']
>>>

Categories

Resources