This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have big text file and I have to find all words starts with '$' and ends with ';' like $word;.
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall("$..;", text)
print(x)
I want my output like ['$h;', '$h_end;'] How can I do that?
I have to find all words starts with '$' and ends with ';' like $word;.
I would do:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
result = re.findall('\$[^;]+;',text)
print(result)
Output:
['$h;', '$h_end;']
Note that $ needs to be escaped (\$) as it is one of special characters. Then I match 1 or more occurences of anything but ; and finally ;.
You may use
\$\w+;
See the regex demo. Details:
\$ - a $ char
\w+ - 1+ letters, digits, _ (=word) chars
; - a semi-colon.
Python demo:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall(r"\$\w+;", text)
print(x) # => ['$h;', '$h_end;']
Related
This question already has answers here:
Dollar sign in regular expression and new line character
(2 answers)
Closed 5 months ago.
There is wee confusion between $ and \Z in regex. I understand the underlying concept,
\Z matches the end of the string regardless of the multiline mode where as
$ matches end of the string or just before "\n" in multiline mode.
import re
items = ['lovely', '1\dentist', '2 lonely', 'eden', 'fly\n', 'dent']
# res = [e for e in items if re.search(r'\Aden|ly\Z', e)]
t = re.compile(r"^den|ly$")
res = [e for e in items if t.search(e)]
print(res)
res = ['lovely', '2 lonely', 'fly\n', 'dent']
Why am I matching "fly\n", It ends with "\n" so isn't it suppose to ignore it where as r"^den|ly\Z" get me the desired result.
Note the python documentation for the $ special character:
Matches the end of the string or just before the newline at the end of the string [emphasis added] […]
In "fly\n", the newline is at the end of the string, so '$' can match just before it. If instead the string were "fly\n\n", then the regex would fail.
This question already has answers here:
Capture contents inside curly braces
(2 answers)
Closed 3 years ago.
How to get the all the words which are enclosed in between {} in a string?
For example:
my_string = "select * from abc where file_id = {some_id} and ghg='0000' and number={some_num} and date={some_dt}"
output should be like:
[some_id,some_num,some_dt]
import re
my_string = "select * from abc where file_id = {some_id} and ghg='0000' and number={some_num} and date={some_dt}"
result = re.findall(r'{(.+?)}', my_string)
print(result)
Since it's words you are after,
import re
ans = re.findall("{([A-z]+)}", my_string)
The pattern [A-z] includes all upper-case and lower-case characters. [A-z]+ to capture at-least one or more characters, surrounded by () to capture the matches.
Output:
['some_id', 'some_num', 'some_dt']
This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?
import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.
This question already has answers here:
Non-consuming regular expression split in Python
(2 answers)
Closed 8 years ago.
I would like to split a string like the following
text="one,two;three.four:"
into the list
textOut=["one", ",two", ";three", ".four", ":"]
I have tried with
import re
textOut = re.split(r'(?=[.:,;])', text)
But this does not split anything.
I would use re.findall here instead of re.split:
>>> from re import findall
>>> text = "one,two;three.four:"
>>> findall("(?:^|\W)\w*", text)
['one', ',two', ';three', '.four', ':']
>>>
Below is a breakdown of the Regex pattern used above:
(?: # The start of a non-capturing group
^|\W # The start of the string or a non-word character (symbol)
) # The end of the non-capturing group
\w* # Zero or more word characters (characters that are not symbols)
For more information, see here.
I don't know what else can occur in your string, but will this do the trick?
>>> s='one,two;three.four:'
>>> [x for x in re.findall(r'[.,;:]?\w*', s) if x]
['one', ',two', ';three', '.four', ':']
This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 7 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']
>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
The regex pattern "[^"]*" matches a simple quoted string.
"(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
[^\s,"] matches a non-delimiter.
Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.