Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 27 days ago.
Improve this question
It might be confusing and I don't know if it even possible as I have some knowledge about regex using python but I couldn't solve the issue I have.
The thing is I have logs that I want to replace the equal sign in their URL to their url encoding (%3D) using Regex syntax.
For example I have logs of this:
request=www.google.com/fgdsg=gfsdg=gfdsd
request_access=https://regex101.com/eawf/?=dasf
All the equal sign after the first one that uses to assign value for variable I want to match them then replace them with %3D like this:
request=www.google.com/fgdsg%3Dgfsdg%3Dsgfdsd
request_access=https://regex101.com/eawf/?%3Ddasf
This is what I want to be.
The written text it only example I didn't wrote real logs.
You can make use of a negative lookbehind like so: (?<!request)= this will match all the = that do not come after request. A little
import re
sample = "request=www.google.com/fgdsg=gfsdg=gfdsd"
replaced = re.sub('(?<!request)=', '%3D', sample)
print(replaced)
First remove the part before (including) the first =, leaving only the request URL. Then you can simply substitute all occurrences of = with %3D in this string.
import re
log = "request=www.google.com/fgdsg=gfsdg=gfdsd"
request_url = re.sub(r'^\w+=', '', log, 1)
urlencoded_request_url = re.sub(r'=', '%3D', request_url)
print(urlencoded_request_url) # www.google.com/fgdsg%3Dgfsdg%3Dgfdsd
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I have a string that might have any of the following format (example) :
1111__1111
1111__1111_11
111_11A_11
I have added the following check :
import re
print(bool(re.match("\d__\d","1111_1111"))
print(bool(re.match("\d__\d_\d","1111_1111_11"))
print(bool(re.match("\d_\d[A-Za-z]_\d","111_11A_11"))
I don't think the regex is correct because when I introduce a character in the first regex for example it returns me True Always.
can you please point me to a solution?
Thank you
It returns True because the pattern is trying to find matches based on each one of the characters inside the pattern string.
The following regular expression finds exact matches for the three scenarios:
print(bool(re.match("(^\d{4}__\d{4}$)","1111__1111")))
print(bool(re.match("(^\d{4}\_\d{4}\_\d{2}$)","1111_1111_11")))
print(bool(re.match("(^\d{3}_\d{2}[A-Z]_\d{2}$)","111_11A_11")))
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
text = {
1
title(1)
context(1)
2
title(2)
context(2)
...
n
title(n)
context(n)
}
If you can read only the numeric string to get the last value [n] in a text file, or if you can get the maximum value [n] in the whole column, or in any other way I would appreciate any explanation. The context can be multiple lines and can contain large numbers, so please exclude the context line from the calculation.
Because I am a beginner, I would really appreciate it if you describe it by function rather than by words.
If you know that the only lines with single numbers are all that you are interested in, you can use the regular expression re library.
Assuming that text contains your full text as string
import re
all_numbers = re.findall(r'(?m)^\d+$', text)
last_number = int(all_numbers[-1])
highest_number = max(int(n) for n in all_numbers)
A quick explanation of the regular expression r'(?m)^\d+$':
(?m) sets the re.M[ULTILINE] flag, so that lines in text are treated separately
^ normally matches the beginning of the whole string, but with the re.M flag, it matches the beginning of a line
\d+ matches one ore more decimal numbers, equivalent to [0-9]+
$ normally matches the end of the whole string, but with the re.M flag, it matches the end of a line
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a list of strings with filenames. The filenames follow a specific naming format:
string1_YYYYMMDD_HHMMSS_string2
Here YYYYMMDD and HHMMSS are actual date and time values.
I want to delete all characters that appear after 'string1' for each of the entries. I've been trying this with regex but to no vain. Could anyone help me with this?
You don't need a regex, just split on the first underscore:
s = 'string1_YYYYMMDD_HHMMSS_string2'
return s.split('_')[0]
[edit]:
If you can only rely on the last parts ('_YYYYMMDD_HHMMSS_string2') then try indexing like this:
s = 's_t_r_i_n_g_1_YYYYMMDD_HHMMSS_string2'
return '_'.join(s.split('_')[:-3])
Using regex:
import re
s = 'string1_YYYYMMDD_HHMMSS_string2'
newstr = re.sub('_.*', '', s)
print(newstr)
Notes:
_.* matches with a _ and all of its following characters.
re.sub(p, r, s) searches s for p and replaces all matches with r.
Update #1
string1 may contain additional underscores. I'd like to retain all of string1 and only get rid of the trailing pattern.
In this case you can use the following regex:
_\d{8}_\d{6}_.*
Demo: https://regex101.com/r/jS2gL5/1
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to write a regex matching pattern code to either return true if there is one '+' between two words and nothing else. I have written the code to check if there is only one '+' in the string but how will I check it is between two words?
The code is below:
import re
inputStr= "ali+ahmedafaw+"
inputStr2= "hello+world+again"
plus=re.findall(r'[+]', inputStr)
print (plus)
l_plus=len(plus)
print "The length is ",l_plus
if l_plus<=1:
print "True"
else:
print "False"
Actually it depends on what you mean by word. If you mean a word with more than one character, you can simply use [a-zA-Z]+ around the + character. Or other patterns which will match different characters like \w to match word characters.
re.search(r'[a-zA-Z]+\+[a-zA-Z]+', input_str)
But if you just want it doesn't appears at the leading and trailing of your text you can use negative look-around:
re.search(r'(?<!^)\+(?!$)', input_str)