Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I want to extract all the text printed after "AAAAAAAAAAAAAAAAAA"
Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp
The following does not work:
import re
m = re.findall(r'AAAAAAAAAAAAAAAAAA(.*)', result)
print m[0]
Also, can I specify a variable in a regular expression instead of a hard coded string: "AAAAAAAAAAAAAAAAAA"?
Reason being, the text: "AAAAAAAAAAAAAAAAAA" is a variable and changes. So, I would like to look for a specific variable value in the pattern and then extract all the text after it.
Use re.S or re.DOTALL (they are synonyms) to have findall match across lines. Or, in your case, search is probably more appropriate since you only want one match. Also, to have it work for a non-hard-coded string, simply use string formatting or string concatenation. To avoid having unescaped regex characters in the string, run it through re.escape.
import re
result = """Give me some text!
AAAAAAAAAAAAAAAAAA
S
p
p
p
Epppp"""
s = 'AAAAAAAAAAAAAAAAAA'
# With formatting
m = re.search(r'{}(.*)'.format(re.escape(s)), result, re.S)
# With concatenation
m = re.search(re.escape(s) + r'(.*)', result, re.S)
print m.group(1)
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Expression example:
"abcddomain_rgz.png"
"djhajhdomain_rgb1.png"
Want to replace domain*.png in above expression with "domain.json".
Answers:
"abcddomain.json"
"djhajhdomain.json"
This is typical case of regex as mentioned in the comment section. Since you do not know the exact length of the string to be replaced right after domain until .png, you need to use a regular expression to perform that replacement.
Python provides you with the re module, which you can use its sub function to perform the replace:
import re
string = "djhajhdomain_rgb1.png"
result = re.sub("domain(.*).png", "domain.json", string)
print(result)
This will return:
djhajhdomain.json
use python regex instead (re package):
re.sub(r'domain.*\.png$', r"domain.json", 'djhajhdomain_rgb1.png')
Your best bet here would be regex.
x = "djhajhdomain_rgb1.png"
y = "djhajhdomain.json"
import re
pattern = re.compile(r'\w+domain')
ext = '.json'
match = re.match(pattern, x).group(0)
result = match+ext
assert result == y
import regex
compile a pattern to search in string. (Note here that the pattern will only accept alphanumerals and/or underscore before the literal string "domain")
set a pre-defined string extension
use the pattern compiled to match the string
concatenate the result
confirm that your result matches your desired output
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
text = {
1
title(1)
context(1)
2
title(2)
context(2)
...
n
title(n)
context(n)
}
If you can read only the numeric string to get the last value [n] in a text file, or if you can get the maximum value [n] in the whole column, or in any other way I would appreciate any explanation. The context can be multiple lines and can contain large numbers, so please exclude the context line from the calculation.
Because I am a beginner, I would really appreciate it if you describe it by function rather than by words.
If you know that the only lines with single numbers are all that you are interested in, you can use the regular expression re library.
Assuming that text contains your full text as string
import re
all_numbers = re.findall(r'(?m)^\d+$', text)
last_number = int(all_numbers[-1])
highest_number = max(int(n) for n in all_numbers)
A quick explanation of the regular expression r'(?m)^\d+$':
(?m) sets the re.M[ULTILINE] flag, so that lines in text are treated separately
^ normally matches the beginning of the whole string, but with the re.M flag, it matches the beginning of a line
\d+ matches one ore more decimal numbers, equivalent to [0-9]+
$ normally matches the end of the whole string, but with the re.M flag, it matches the end of a line
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I would like to extract some strings from between quotes using regular expression. The text is shown below:
CCKeyUpDomReady('test.asmx/asdasd', 'QMlPJZTOH09XOPCcbB2jcg==', '0OO6h+G2Tzhr5XWj1Upg0A==', '0OO6h+G2Tzhr5XWj1Upg0A==', '/qqwweq2.asmx/qqq')
Expected result must be:
test.asmx/asdasd
/qqwweq2.asmx/qqq
How can I do it? Here is the platform for testing:
https://regexr.com/3n142
The criteria: string which is between quotes must contains "asmx" word. The text is much more than showed above. You can think like that you are searching asmx urls in a website source code.
See regex in use here
'((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*)'
' Match this literally
((?:[^'\\]|\\.)*asmx(?:[^'\\]|\\.)*) Capture the following into capture group 1
(?:[^'\\]|\\.)* This is a beautiful trick gathered from PhiLho's answer to Regex for quoted string with escaping quotes. It matches escaped ' or any other character.
asmx The OP's search string/criterion
(?:[^'\\]|\\.)* This again
' Match this literally
The result is in capture group:
test.asmx/asdasd
/qqwweq2.asmx/qqq
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I have the following link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg
How to take just this one part of the link:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
and remove everything else? I also want to keep the extension.
I want to remove this part:
._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_
and keep this part:
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
How can I do this in python?
You could use:
re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
This makes some assumptions but works on your input. The search starts at the ._ sequence, takes anything after that that is a letter, digit, dash, underscore, dot or comma, then matches the extension. I picked an explicit small group of possible extensions; you could also just use (\.w+)$ at the end instead to widen the acceptable extensions to word characters.
Demo:
>>> import re
>>> inputurl = 'http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg'
>>> re.sub(r'\._[\w.,-]*(\.(?:jpg|png|gif))$', r'\1', inputurl)
'http://ecx.images-amazon.com/images/I51JXXb2vpDL.jpg'
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
l = url.split(".")
print(".".join(l[:-2:])+".{}".format(l[-1]))
prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
The following should work:
import re
url = "http://ecx.images-amazon.com/images/I/51JXXb2vpDL._SY344_PJlook-inside-v2,TopRight,1,0_SH20_BO1,204,203,200_.jpg"
print re.sub(r"(https?://.+?)\._.+(\.\w+)", r'\1\2', url)
The above code prints
http://ecx.images-amazon.com/images/I/51JXXb2vpDL.jpg
An important detail: More links are necessary to find the correct pattern. I'm currently assuming you want everything until the first ._
url = re.sub("(/[^./]+)\.[^/]*?(\.[^.]+)$", "\\1\\2", url)