Python match and replace, what I do wrong? - python

I have reg exp for match some data (is it here) and now I try to replace all matched data with single : characetr
test_str = u"THERE IS MY DATA"
p = re.compile(ur'[a-z]+([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)[a-z]+', re.M|re.I|re.SE)
print re.sub(p, r':/1',test_str)
I try it on few other way but it's not replace any or replace not only matched but whole pattern

1)It's backslash issue.
Use : print re.sub(p, r':\1',test_str) not print re.sub(p, r':/1',test_str) .
2)You are replacing all the pattern with :\1, that means replace all the text with : followed by the first group in the regex.
To replace just the first group inside the text you should add two groups , before the first and after.
I hope this will fix the issue:
test_str = u"THERE IS MY DATA"
p = re.compile(ur'([a-z]+)([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)([a-z]+)', re.M|re.I|re.SE)
print re.sub(p, r'\1:\2\3',test_str)

Related

Retrieve regex full match

I'm new in regex expressions. I've read the documentation but I still have some questions.
I Have the following string:
[('15000042', 19)]
And I need to get the key, the comma and the value as a string.
like this:
15000042,19
I need this to enter these value as a comma separated value in a database.
I've tried the next regular expression:
([\w,]+)
but this only split the string into 3 substrings. Is there a way to get the full match?
https://regex101.com/r/vtYKOG/1
I'm using python
You match what you don't want to keep and use 3 groups instead of 1 and assemble your value using these 3 groups:
\[\('(\d+)'(,) (\d+)\)\]
Regex demo
For example:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"\[\('(\d+)'(,) (\d+)\)\]", r"\1\2\3", test_str)
if result:
print (result)
Result
15000042,19
Another option is to use only your character class [^\w,]+ and negate it so match not what is listed.
Then replace those characters with an empty string:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"[^\w,]+", "", test_str)
if result:
print (result)
Regex demo

Python Find entire word in string using regex and user input

I'm trying to find the entire word exactly using regex but have the word i'm searching for be a variable value coming from user input. I've tried this:
regex = r"\b(?=\w)" + re.escape(user_input) + r"\b"
if re.match(regex, string_to_search[i], re.IGNORECASE):
<some code>...
but it matches every occurrence of the string. It matches "var"->"var" which is correct but also matches "var"->"var"iable and I only want it to match "var"->"var" or "string"->"string"
Input: "sword"
String_to_search = "There once was a swordsmith that made a sword"
Desired output: Match "sword" to "sword" and not "swordsmith"
You seem you want to use a pattern that matches an entire string. Note that \b word boundary is needed when you wan to find partial matches. When you need a full string match, you need anchors. Since re.match anchors the match at the start of string, all you need is $ (end of string position) at the end of the pattern:
regex = '{}$'.format(re.escape(user_input))
and then use
re.match(regex, search_string, re.IGNORCASE)
You can try re.finditer like that:
>>> import re
>>> user_input = "var"
>>> text = "var variable var variable"
>>> regex = r"(?=\b%s\b)" % re.escape(user_input)
>>> [m.start() for m in re.finditer(regex, text)]
[0, 13]
It'll find all matches iteratively.

Python how to replace content in the capture group of regex?

-abc1234567-abc.jpg
I wish to remove -abc before .jpg, and get -abc1234567.jpg. I tried re.sub(r'\d(-abc).jpg$', '', string), but it will also replace contents outside of the capture group, and give me -abc123456. Is it possible to only replace the content in the capture group i.e. '-abc'?
One solution is to use positive lookahead as follows.
import re
p = re.compile(ur'(\-abc)(?=\.jpg)')
test_str = u"-abc1234567-abc.jpg"
subst = u""
result = re.sub(p, subst, test_str)
OR
You can use two capture groups as follows.
import re
p = re.compile(ur'(\-abc)(\.jpg)')
test_str = u"-abc1234567-abc.jpg"
subst = r"\2"
result = re.sub(p, subst, test_str)
If you only want to remove -abc in only jpg files, you could use:
re.sub(r"-abc\.jpg$", ".jpg", string)
To use your code as close as possible: you should place '()' around the part you want to keep, not the part you want to remove. Then use \g<NUMBER> to select that part of the string. So:
re.sub(r'(.*)-abc(\.jpg)$', '\g<1>\g<2>', string)

How can I "divide" words with regular expressions?

I have a sentence in which every token has a / in it. I want to just print what I have before the slash.
What I have now is basic:
text = less/RBR.....
return re.findall(r'\b(\S+)\b', text)
This obviously just prints the text, how do I cut off the words before the /?
Assuming you want all characters before the slash out of every word that contains a slash. This would mean e.g. for the input string match/this but nothing here but another/one you would want the results match and another.
With regex:
import re
result = re.findall(r"\b(\w*?)/\w*?\b", my_string)
print(result)
Without regex:
result = [word.split("/")[0] for word in my_string.split()]
print(result)
Simple and straight-forward:
rx = r'^[^/]+'
# anchor it to the beginning
# the class says: match everything not a forward slash as many times as possible
In Python this would be:
import re
text = "less/RBR....."
print re.match(r'[^/]+', text)
As this is an object, you'd probably like to print it out, like so:
print re.match(r'[^/]+', text).group(0)
# less
This should also work
\b([^\s/]+)(?=/)\b
Python Code
p = re.compile(r'\b([^\s/]+)(?=/)\b')
test_str = "less/RBR/...."
print(re.findall(p, test_str))
Ideone Demo

python regex sub without order

I have following string "3 0ABC, mNone\n" and I want to remove m None and \n. The catch is that 'm', \n and None can be anywhere in the string in any order. I would appreciate any help.
I can do re.sub('[\nm,]','',string) or re.sub('None','',string) but don't know how to combine specially when the order doesn't matter.
If you want to remove m, None and \n you can use them as pattern together in a group. So you can use this regex:
(m|\\n|None)
Working demo
If you use the following code:
import re
p = re.compile(ur'(m|\\n|None)')
test_str = u"3 0ABC, mNone\n"
subst = u""
result = re.sub(p, subst, test_str)
print result
// Will show:
'3 0ABC, '

Categories

Resources