I'm new in regex expressions. I've read the documentation but I still have some questions.
I Have the following string:
[('15000042', 19)]
And I need to get the key, the comma and the value as a string.
like this:
15000042,19
I need this to enter these value as a comma separated value in a database.
I've tried the next regular expression:
([\w,]+)
but this only split the string into 3 substrings. Is there a way to get the full match?
https://regex101.com/r/vtYKOG/1
I'm using python
You match what you don't want to keep and use 3 groups instead of 1 and assemble your value using these 3 groups:
\[\('(\d+)'(,) (\d+)\)\]
Regex demo
For example:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"\[\('(\d+)'(,) (\d+)\)\]", r"\1\2\3", test_str)
if result:
print (result)
Result
15000042,19
Another option is to use only your character class [^\w,]+ and negate it so match not what is listed.
Then replace those characters with an empty string:
import re
test_str = "[('15000042', 19)]"
result = re.sub(r"[^\w,]+", "", test_str)
if result:
print (result)
Regex demo
Related
The problem is simple, I'm given a random string and a random pattern and I'm told to get all the posible combinations of that pattern that occur in the string and mark then with [target] and [endtarget] at the beggining and end.
For example:
given the following text: "XuyZB8we4"
and the following pattern: "XYZAB"
The expected output would be: "[target]X[endtarget]uy[target]ZB[endtarget]8we4".
I already got the part that identifies all the words, but I can't find a way of placing the [target] and [endtarget] strings after and before the pattern (called in the code match).
import re
def tagger(text, search):
place_s = "[target]"
place_f = "[endtarget]"
pattern = re.compile(rf"[{search}]+")
matches = pattern.finditer(text)
for match in matches:
print(match)
return test_string
test_string = "alsikjuyZB8we4 aBBe8XAZ piarBq8 Bq84Z "
pattern = "XYZAB"
print(tagger(test_string, pattern))
I also tried the for with the sub method, but I couldn't get it to work.
for match in matches:
re.sub(match.group(0), place_s + match.group(0) + place_f, text)
return text
re.sub allows you to pass backreferences to matched groups within your pattern. so you do need to enclose your pattern in parentheses, or create a named group, and then it will replace all matches in the entire string at once with your desired replacements:
In [10]: re.sub(r'([XYZAB]+)', r'[target]\1[endtarget]', test_string)
Out[10]: 'alsikjuy[target]ZB[endtarget]8we4 a[target]BB[endtarget]e8[target]XAZ[endtarget] piar[target]B[endtarget]q8 [target]B[endtarget]q84[target]Z[endtarget] '
With this approach, re.finditer is not not needed at all.
I have a regex pattern with optional characters however at the output I want to remove those optional characters. Example:
string = 'a2017a12a'
pattern = re.compile("((20[0-9]{2})(.?)(0[1-9]|1[0-2]))")
result = pattern.search(string)
print(result)
I can have a match like this but what I want as an output is:
desired output = '201712'
Thank you.
You've already captured the intended data in groups and now you can use re.sub to replace the whole match with just contents of group1 and group2.
Try your modified Python code,
import re
string = 'a2017a12a'
pattern = re.compile(".*(20[0-9]{2}).?(0[1-9]|1[0-2]).*")
result = re.sub(pattern, r'\1\2', string)
print(result)
Notice, how I've added .* around the pattern, so any of the extra characters around your data is matched and gets removed. Also, removed extra parenthesis that were not needed. This will also work with strings where you may have other digits surrounding that text like this hello123 a2017a12a some other 99 numbers
Output,
201712
Regex Demo
You can just use re.sub with the pattern \D (=not a number):
>>> import re
>>> string = 'a2017a12a'
>>> re.sub(r'\D', '', string)
'201712'
Try this one:
import re
string = 'a2017a12a'
pattern = re.findall("(\d+)", string) # this regex will capture only digit
print("".join(p for p in pattern)) # combine all digits
Output:
201712
If you want to remove all character from string then you can do this
import re
string = 'a2017a12a'
re.sub('[A-Za-z]+','',string)
Output:
'201712'
You can use re module method to get required output, like:
import re
#method 1
string = 'a2017a12a'
print (re.sub(r'\D', '', string))
#method 2
pattern = re.findall("(\d+)", string)
print("".join(p for p in pattern))
You can also refer below doc for further knowledge.
https://docs.python.org/3/library/re.html
I need a python regular expression to extract all the occurrences of a string from the line .
So for example,
line = 'TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])'
I want to extract all the string which contains the rack ID. I am crappy with reg ex, so when I looked at the python docs but could not find the correct use of re.findAll or some similar regex expression.
Can someone help me with the regular expression?
Here is the output i need : [brikbrik0,brikbrikadfdas, brikbrik1adf]
You can capture alphanumerics coming after the rack::
>>> re.findall(r"rack:(\w+)", line)
['brikbrik0', 'brikbrikadfdas', 'brikbrik1adf']
Add a word boundary to rack:
\brack:(\w+)
See a demo on regex101.com.
In Python (demo on ideone.com):
import re
string = """TokenRange(start_token:5835456583056758754, end_token:5867789857766669245, rack:brikbrik0),EndpointDetails(host:192.168.210.183, datacenter:DC1, rack:brikbrikadfdas), EndpointDetails(host:192.168.210.182, datacenter:DC1, rack:brikbrik1adf)])"""
rx = re.compile(r'\brack:(\w+)')
matches = [match.group(1) for match in rx.finditer(string)]
print(matches)
I need to extract people's names from a really long string.
Their names are in this format: LAST, FIRST.
Some of these people have hyphenated names. Some don't.
My attempt with a smaller string:
Input:
import re
text = 'Smith-Jones, Robert&Epson, Robert'
pattern = r'[A-Za-z]+(-[A-Za-z]+)?,\sRobert'
print re.findall(pattern, text)
Expected output:
['Smith-Jones, Robert', 'Epson, Robert']
Actual output:
['-Jones', '']
What am I doing wrong?
Use
import re
text = 'Smith-Jones, Robert&Epson, Robert'
pattern = r'[A-Za-z]+(?:-[A-Za-z]+)?,\sRobert'
print re.findall(pattern, text)
# => ['Smith-Jones, Robert', 'Epson, Robert']
Just make the capturing group non-capturing. The thing is that findall returns capture group values if they are specified in the regex pattern. So, the best way to solve this in this pattern is just replace (...)? with (?:...)?.
See IDEONE demo
I have to validate next string format:
text-text-id-text
Separator is character '-'. Third column must always be id. I wrote next regex (in python) which validates string:
import re
s = 'col1-col2-col3-id' # any additional text at the end
# is allowed e.g. -col4-col5
print re.match('^(.*-){3}id(-.*)?$', s) # ok
print re.match('^(.*-){1}id(-.*)?$', s) # still ok, is should not be
I tried adding non-greedy mode, but result is still the same:
^(.*?-){1}id(-.*)?$
What am I missing in my regex? I could just validate string like this:
>>> import re
>>> print re.split('-', 'col1-col2-col3-id')
['col1', 'col2', 'col3', 'id']
And then check if the third element matches id, but I am interested in why does the first regex works as mentioned above.
Your first regex is incorrect because it asserts that id is present after the first three items.
Your second regex matches the string incorrectly because .* matches hyphens as well.
You should use this regex:
/^(?:[^-]+-){2}id/
Here is a regex demo!
And if you feel a need to anchor a regex to the end, use /^(?:[^-]*-){2}id.*$/!
As mentioned by Tim Pietzcker, consider asserting id at the end of the item:
/^(?:[^-]+-){2}id(?![^-])/
Here is an UPDATED regex demo!