replace wildcard numbers in pattern with additional text + same numbers - python

I need to find all parts of a large text string in this particular pattern:
"\t\t" + number (between 1-999) + "\t\t"
and then replace each occurrence with:
TEXT+"\t\t"+same number+"\t\t"
So, the end result is:
'TEXT\t\t24\t\tblah blah blahTEXT\t\t56\t\t'... and so on...
The various numbers are between 1-999 so it needs some kind of wildcard.
Please can somebody show me how to do it? Thanks!

You'll want to use Python's re library, and in particular the re.sub function:
import re # re is Python's regex library
SAMPLE_TEXT = "\t\t45\t\tbsadfd\t\t839\t\tds532\t\t0\t\t" # Test text to run the regex on
# Run the regex using re.sub (for substitute)
# re.sub takes three arguments: the regex expression,
# a function to return the substituted text,
# and the text you're running the regex on.
# The regex looks for substrings of the form:
# Two tabs ("\t\t"), followed by one to three digits 0-9 ("[0-9]{1,3}"),
# followed by two more tabs.
# The lambda function takes in a match object x,
# and returns the full text of that object (x.group(0))
# with "TEXT" prepended.
output = re.sub("\t\t[0-9]{1,3}\t\t",
lambda x: "TEXT" + x.group(0),
SAMPLE_TEXT)
print output # Print the resulting string.

Related

how to find value in (txt) python string

i'm new to python world and i'm trying to extract value from text. I try to find the keyword by re.search('keyword') , but I want to get the value after keyword
text = word1:1434, word2:4446, word3:7171
i just want to get the value of word1
i try
keyword = 'word1'
before_keyword, keyword, after_keyword = text.partition(keyword)
print(after_keyword)
output
:1434, word2:4446, word3:7171
i just want to get the value of word1 (1434)
Here is how you can search the text using regular expressions:
import re
keyword_regex = r'word1:(\d+)'
text = "word1:1434, word2:4446, word3:7171"
keyword_value = re.search(keyword_regex, text)
print(keyword_value.group(1))
The RegEx word1:(\d+) searches for the string word1: followed by one or more digits. It stops matching when the next character is not a digit. The parentheses around (\d+) make this part a capturing group which is what enables you to access it later using keyword_value.group(1).
More about regular expressions here and Python's re module here.
Assuming Text input is a string not dict; then
text = "word1:1434, word2:4446, word3:7171"
keyword = 'word1'
print(text.split(keyword+":")[1].split(",")[0])
Hope this helps...

How to get all the string after and before two specific words?

I want to replace all the string after "my;encoded;image:" (which is the base64 data of the image) and i want to stop before the word "END" , but the following code is replacing also the two strings "my;encoded;image:" and "END". Any suggestions?
import re
re.sub("my;encoded;image:.*END","random_words",image,flags=re.DOTALL)
NB : a simple way could be to use replacement but i want to use regex in my case Thanks
You can use a non-greedy regex to split the string into three groups. Then replace the second group with your string:
import re
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', r"\1my string\3", image)
print(x)
You can use f-strings with Python 3.6 and higher:
replacement = "hello"
x = re.sub(r'(.*my;encoded;image:)(.*?)(END.*)', fr'\1{replacement}\3', image)

Regular expression to retrieve string parts within parentheses separated by commas

I have a String from which I want to take the values within the parenthesis. Then, get the values that are separated from a comma.
Example: x(142,1,23ERWA31)
I would like to get:
142
1
23ERWA31
Is it possible to get everything with one regex?
I have found a method to do so, but it is ugly.
This is how I did it in python:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
secondResult = re.search("(?<=\()(.*?)(?=\))", firstResult.group(0))
finalResult = [x.strip() for x in secondResult.group(0).split(',')]
for i in finalResult:
print(i)
142
1
23ERWA31
This works for your example string:
import re
string = "x(142,1,23ERWA31)"
l = re.findall (r'([^(,)]+)(?!.*\()', string)
print (l)
Result: a plain list
['142', '1', '23ERWA31']
The expression matches a sequence of characters not in (,,,) and – to prevent the first x being picked up – may not be followed by a ( anywhere further in the string. This makes it also work if your preamble x consists of more than a single character.
findall rather than search makes sure all items are found, and as a bonus it returns a plain list of the results.
You can make this a lot simpler. You are running your first Regex but then not taking the result. You want .group(1) (inside the brackets), not .group(0) (the whole match). Once you have that you can just split it on ,:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
for e in firstResult.group(1).split(','):
print(e)
A little wonky looking, and also assuming there's always going to be a grouping of 3 values in the parenthesis - but try this regex
\((.*?),(.*?),(.*?)\)
To extract all the group matches to a single object - your code would then look like
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?),(.*?),(.*?)\)", string).groups()
You can then call the firstResult object like a list
>> print(firstResult[2])
23ERWA31

How to parse values appear after the same string in python?

I have a input text like this (actual text file contains tons of garbage characters surrounding these 2 string too.)
(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)
I am trying to parse the text to store something like this:
value1="xxx" and value2="yyy".
I wrote python code as follows:
value1_start = content.find('value')
value1_end = content.find(';', value1_start)
value2_start = content.find('value')
value2_end = content.find(';', value2_start)
print "%s" %(content[value1_start:value1_end])
print "%s" %(content[value2_start:value2_end])
But it always returns:
value=xxx
value=xxx
Could anyone tell me how can I parse the text so that the output is:
value=xxx
value=yyy
Use a regex approach:
re.findall(r'\bvalue=[^;]*', s)
Or - if value can be any 1+ word (letter/digit/underscore) chars:
re.findall(r'\b\w+=[^;]*', s)
See the regex demo
Details:
\b - word boundary
value= - a literal char sequence value=
[^;]* - zero or more chars other than ;.
See the Python demo:
import re
rx = re.compile(r"\bvalue=[^;]*")
s = "$%$%&^(&value=xxx;$%^$%^$&^%^*value=yyy;%$#^%"
res = rx.findall(s)
print(res)
Use regex to filter the data you want from the "junk characters":
>>> import re
>>> _input = '#4#5%value=xxx38u952035983049;3^&^*(^%$3value=yyy#%$#^&*^%;$#%$#^'
>>> matches = re.findall(r'[a-zA-Z0-9]+=[a-zA-Z0-9]+', _input)
>>> matches
['value=xxx', 'value=yyy']
>>> for match in matches:
print(match)
value=xxx
value=yyy
>>>
Summary or the regular expression:
[a-zA-Z0-9]+: One or more alphanumeric characters
=: literal equal sign
[a-zA-Z0-9]+: One or more alphanumeric characters
For this input:
content = '(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)'
use a simple regex and manually strip off the first and last two characters:
import re
values = [x[2:-2] for x in re.findall(r'\*\*value=.*?\*\*', content)]
for value in values:
print(value)
Output:
value=xxx
value=yyy
Here the assumption is that there are always two leading and two trailing * as in **value=xxx**.
You already have good answers based on the re module. That would certainly be the simplest way.
If for any reason (perfs?) you prefere to use str methods, it is indeed possible. But you must search the second string past the end of the first one :
value2_start = content.find('value', value1_end)
value2_end = content.find(';', value2_start)

How to replace only elements found using a python re.findall rather than entire string?

How would I replace groups found using the python regex findall method without having to change the rest of the string too.
For example:
import re
repl1='k1'
repl2='k2'
pattern=re.compile('CN=Root,Model=.*,Vector=Reactions\[(.*)\],ParameterGroup=Parameters,Parameter=(.*),Reference=Value')
I want use the re.sub to replace ONLY the elements within the (.*) with repl1 and repl1 rather than having to change the rest of the string too.
-------edit -----
The output I want should look like this:
output = 'CN=Root,Model=.*,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value')
But note I have left the '.*' in after model because this will change every time. I.e. this can be anything.
----------edit 2----------
The input is a simple one line which is almost exactly the same at pattern. For example:
input= 'CN=Root,Model=Model1,Vector=Reactions\[k10],ParameterGroup=Parameters,Parameter=k12,Reference=Value')
re.sub's argument repl can be a one-argument function, and in that case it is called with the match object as an argument. So, if you ensure that all parts of the pattern are in a group you should have all the information you need to replace the old string with the new one.
import re
repl1='k1'
repl2='k2'
pattern=re.compile('(CN=Root,Model=.*,Vector=Reactions\[)(.*)(\],ParameterGroup=Parameters,Parameter=)(.*)(,Reference=Value)')
target = 'CN=Root,Model=something,Vector=Reactions[somethingelse],ParameterGroup=Parameters,Parameter=1234,Reference=Value'
Now define a function that produces the matched string with groups 1 and 3 replaced with your desired values:
def repl(m):
g = list(m.groups())
g[1] = repl1
g[3] = repl2
return "".join(g)
Passing this function as the first argument to re.sub than achieves the desired transformation:
pattern.sub(repl, target)
gives the result
'CN=Root,Model=something,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value'

Categories

Resources