How to exactly match a word and then replace in Python? - python

Have a list which contains strings etc. as shown below
strpool = ['fruit,apple:3', '', '[1,abcd, ['fruit,apple'], ['1,kdlld', apple,taste]]']
Wanted to search exactly for the word 'apple' and replace with 'apple:3'
I tried the below code,
print str(strpool).replace("apple","apple:3")
print (re.sub(r"\bapple\b","apple:3",str(strpool)))
But its replacing even the apple:3 as well into apple:3:3 , not just the string apple.
Thought apple:3 would be considered as a string and that doesn't get changed.
Update:
How can I exactly match a string name without any other components attached to it and replace all of them inside a list ?
Tried re.sub but for that need to convert the list to string, instead is there any other way ?

Use a negative lookahead to match a word unless it's followed by :.
print(re.sub(r'\bapple\b(?!:)', 'apple:3', str(strpool))

Related

RegEx for a delimited string

I have a string like this '432342:username:full_name:1'. I need to write regular expression to check if string matches it.
I tried to .split(':') and then by accesing dict[i] checking if value in regular expression. But I need to match whole string.
only numbers:english letters and numbers:english, russian letters:1,2,3
Also tried like this but I don't understand how to add ':' separator to separate the string. Like in example above
pattern = r'[/b:]|[\d]|[a-zA-Z]|[а-яА-Я]|[1,2,3]'
As per your instructions, try this:
s = '432342:username:full_name:1'
re.findall(r'[0-9]+:[a-zA-Z]+:[а-яА-Я_]+:[123]',s)
#['432342:username:full_name:1']

Matching regex pattern where there is \n\r between starting and ending pattern

The red underscore is the desired string I want to match
I would like to match all strings (including \n) between the the two string provided in the example
However, in the first example, where there is a newline, I can't get anything to match
In the second example, the regex expression works. It matches the string highlighted in Green because it resides on a single line
Not sure if there is a notation I need to include for \n\r to be part of the pattern to match
Use this
output = re.search('This(.*?)\n\n(.*?)match', text)
>>> output.group(1)
'is a multiline expression'
>>> output.group(2)
'I would like to '
Try this one aswell:
output = re.search(r"This ([\S.]+) match", text).group(1).replace(r'\n','')
That will find the entire thing as one group then remove the new lines.

Searching for multiple strings

I have the following Python code:
string = '[subscript=hello] this is some text [subscript=hi again]'
superscripts = re.findall(r'\[subscript=(.+?)\]', string)
print superscripts
and it returns ['hello', 'hi again'] which is the text which I want, however, I would rather instead of returning this, I would prefer it to replace the string so that it returns <sub>hello</sub> this is some text <sub>hi again<sub>. I know I should use re.sub() but I'm unsure how to use it to correctly replace the string to my liking.
How would I do this? Thanks.
Edit: Screenshots
Use a backreference \1 which refers to the matched group in the first argument in your pattern:
your_new_string = re.sub(r'\[subscript=(.+?)\]', r'<sub>\1</sub>', your_old_string)

Dynamically Removing string with regex python

I am currently having trouble removing the end of strings using regex. I have tried using .partition with unsuccessful results. I am now trying to use regex unsuccessfully. All the strings follow the format of some random words **X*.* Some more words. Where * is a digit and X is a literal X. For Example 21X2.5. Everything after this dynamic string should be removed. I am trying to use re.sub('\d\d\X\d.\d', string). Can someone point me in the right direction with regex and how to split the string?
The expected output should read:
some random words 21X2.5
Thanks!
Use following regex:
re.search("(.*?\d\dX\d\.\d)", "some random words 21X2.5 Some more words").groups()[0]
Output:
'some random words 21X2.5'
Your regex is not correct. The biggest problem is that you need to escape the period. Otherwise, the regex treats the period as a match to any character. To match just that pattern, you can use something like:
re.findall('[\d]{2}X\d\.\d', 'asb12X4.4abc')
[\d]{2} matches a sequence of two integers, X matches the literal X, \d matches a single integer, \. matches the literal ., and \d matches the final integer.
This will match and return only 12X4.4.
It sounds like you instead want to remove everything after the matched expression. To get your desired output, you can do something like:
re.split('(.*?[\d]{2}X\d\.\d)', 'some random words 21X2.5 Some more words')[1]
which will return some random words 21X2.5. This expression pulls everything before and including the matched regex and returns it, discarding the end.
Let me know if this works.
To remove everything after the pattern, i.e do exactly as you say...:
s = re.sub(r'(\d\dX\d\.\d).*', r'\1', s)
Of course, if you mean something else than what you said, something different will be needed! E.g if you want to also remove the pattern itself, not just (as you said) what's after it:
s = re.sub(r'\d\dX\d\.\d.*', r'', s)
and so forth, depending on what, exactly, are your specs!-)

Regular Expression Quick Query

Quick regular expressions question.
I want an expression that will find the first digit in a line and also a word at the end of that line. (this will exclude any digits in there)
IE if the string is, "12345hello" then I want the regular expression to find "1hello"
Or even if it's "12345hel45667lo" to find the same thing.
I have the first digit down but my expression I thought would work is:
print re.findall(r'^\d\D+',string)
This just gives me empty brackets, or the first digit if I take out the \D. What gives?
Edit: If I put in a | for or then I get what I want sort of. Returns the words in the string along with the first digit but in separate groupings. I want it all in one.
print re.findall(r'^\d|\D+',string)
print re.sub(r'(?<!^)\d', '', "12345hel45667lo9a") -> '1helloa'
The only thing I can think of is to run a for loop that scans across the string for letters and combines them.

Categories

Resources