I have reg exp for match some data (is it here) and now I try to replace all matched data with single : characetr
test_str = u"THERE IS MY DATA"
p = re.compile(ur'[a-z]+([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)[a-z]+', re.M|re.I|re.SE)
print re.sub(p, r':/1',test_str)
I try it on few other way but it's not replace any or replace not only matched but whole pattern
1)It's backslash issue.
Use : print re.sub(p, r':\1',test_str) not print re.sub(p, r':/1',test_str) .
2)You are replacing all the pattern with :\1, that means replace all the text with : followed by the first group in the regex.
To replace just the first group inside the text you should add two groups , before the first and after.
I hope this will fix the issue:
test_str = u"THERE IS MY DATA"
p = re.compile(ur'([a-z]+)([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)([a-z]+)', re.M|re.I|re.SE)
print re.sub(p, r'\1:\2\3',test_str)
Related
I'm new in regex expressions. I've read the documentation but I still have some questions.
I Have the following string:
[('15000042', 19)]
And I need to get the key, the comma and the value as a string.
like this:
15000042,19
I need this to enter these value as a comma separated value in a database.
I've tried the next regular expression:
([\w,]+)
but this only split the string into 3 substrings. Is there a way to get the full match?
https://regex101.com/r/vtYKOG/1
I'm using python
You match what you don't want to keep and use 3 groups instead of 1 and assemble your value using these 3 groups:
\[\('(\d+)'(,) (\d+)\)\]
Regex demo
For example:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"\[\('(\d+)'(,) (\d+)\)\]", r"\1\2\3", test_str)
if result:
print (result)
Result
15000042,19
Another option is to use only your character class [^\w,]+ and negate it so match not what is listed.
Then replace those characters with an empty string:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"[^\w,]+", "", test_str)
if result:
print (result)
Regex demo
I'm trying to find the entire word exactly using regex but have the word i'm searching for be a variable value coming from user input. I've tried this:
regex = r"\b(?=\w)" + re.escape(user_input) + r"\b"
if re.match(regex, string_to_search[i], re.IGNORECASE):
<some code>...
but it matches every occurrence of the string. It matches "var"->"var" which is correct but also matches "var"->"var"iable and I only want it to match "var"->"var" or "string"->"string"
Input: "sword"
String_to_search = "There once was a swordsmith that made a sword"
Desired output: Match "sword" to "sword" and not "swordsmith"
You seem you want to use a pattern that matches an entire string. Note that \b word boundary is needed when you wan to find partial matches. When you need a full string match, you need anchors. Since re.match anchors the match at the start of string, all you need is $ (end of string position) at the end of the pattern:
regex = '{}$'.format(re.escape(user_input))
and then use
re.match(regex, search_string, re.IGNORCASE)
You can try re.finditer like that:
>>> import re
>>> user_input = "var"
>>> text = "var variable var variable"
>>> regex = r"(?=\b%s\b)" % re.escape(user_input)
>>> [m.start() for m in re.finditer(regex, text)]
[0, 13]
It'll find all matches iteratively.
-abc1234567-abc.jpg
I wish to remove -abc before .jpg, and get -abc1234567.jpg. I tried re.sub(r'\d(-abc).jpg$', '', string), but it will also replace contents outside of the capture group, and give me -abc123456. Is it possible to only replace the content in the capture group i.e. '-abc'?
One solution is to use positive lookahead as follows.
import re
p = re.compile(ur'(\-abc)(?=\.jpg)')
test_str = u"-abc1234567-abc.jpg"
subst = u""
result = re.sub(p, subst, test_str)
OR
You can use two capture groups as follows.
import re
p = re.compile(ur'(\-abc)(\.jpg)')
test_str = u"-abc1234567-abc.jpg"
subst = r"\2"
result = re.sub(p, subst, test_str)
If you only want to remove -abc in only jpg files, you could use:
re.sub(r"-abc\.jpg$", ".jpg", string)
To use your code as close as possible: you should place '()' around the part you want to keep, not the part you want to remove. Then use \g<NUMBER> to select that part of the string. So:
re.sub(r'(.*)-abc(\.jpg)$', '\g<1>\g<2>', string)
I have a sentence in which every token has a / in it. I want to just print what I have before the slash.
What I have now is basic:
text = less/RBR.....
return re.findall(r'\b(\S+)\b', text)
This obviously just prints the text, how do I cut off the words before the /?
Assuming you want all characters before the slash out of every word that contains a slash. This would mean e.g. for the input string match/this but nothing here but another/one you would want the results match and another.
With regex:
import re
result = re.findall(r"\b(\w*?)/\w*?\b", my_string)
print(result)
Without regex:
result = [word.split("/")[0] for word in my_string.split()]
print(result)
Simple and straight-forward:
rx = r'^[^/]+'
# anchor it to the beginning
# the class says: match everything not a forward slash as many times as possible
In Python this would be:
import re
text = "less/RBR....."
print re.match(r'[^/]+', text)
As this is an object, you'd probably like to print it out, like so:
print re.match(r'[^/]+', text).group(0)
# less
This should also work
\b([^\s/]+)(?=/)\b
Python Code
p = re.compile(r'\b([^\s/]+)(?=/)\b')
test_str = "less/RBR/...."
print(re.findall(p, test_str))
Ideone Demo
I have following string "3 0ABC, mNone\n" and I want to remove m None and \n. The catch is that 'm', \n and None can be anywhere in the string in any order. I would appreciate any help.
I can do re.sub('[\nm,]','',string) or re.sub('None','',string) but don't know how to combine specially when the order doesn't matter.
If you want to remove m, None and \n you can use them as pattern together in a group. So you can use this regex:
(m|\\n|None)
Working demo
If you use the following code:
import re
p = re.compile(ur'(m|\\n|None)')
test_str = u"3 0ABC, mNone\n"
subst = u""
result = re.sub(p, subst, test_str)
print result
// Will show:
'3 0ABC, '