Problem with Regex "The string is outside the <li></li>" Python

Problem with Regex "The string is outside the <li></li>" Python - python

I need to get string which are not attachments.
(Is not between <li></li>)
Example:
i need this string!
<div>i need this string too!</div>
:)<li>I don't need this string!</li>:)
Result:
i need this string!
<div>i need this string too!</div>
:):)
I tried this:
^(?!<li>.*$).*(?<!<\/li>)
But the problem is that if there is text next to my string
<li>text</li>
For example a smiley:
:)<li>I don't need this string!</li>:)
then the pattern does not work and I don’t understand how to fix it.
Can you correct me please?

Use re.sub() to remove the string you don't want, instead of trying to match everything else.
result = re.sub('<li>.*?</li>', '', text);

Related

Get the string within brackets and remove useless string in Python

I have a string like this '0x69313430303239377678(i1400297vx)' I only want the value i1400297vx and nothing else.
Is there a simple way for example using strip method or I'm forced to use Regex,I'm not good at...
Someone could kindly help me?

This works, using split and strip:
'0x69313430303239377678(i1400297vx)'.split('(')[1].strip(')')
but a regex would be more readable!

Replace string with quotes, brackets, braces, and slashes in python

I have a string where I am trying to replace ["{\" with [{" and all \" with ".
I am struggling to find the right syntax in order to do this, does anyone have a solid understanding of how to do this?
I am working with JSON, and I am inserting a string into the JSON properties. This caused it to put a single quotes around my inserted data from my variable, and I need those single quotes gone. I tried to do json.dumps() on the data and do a string replace, but it does not work.
Any help is appreciated. Thank you.

You can use the replace method.
See documentation and examples here

I would recommend maybe posting more of your code below so we can suggest a better answer. Just based on the information you have provided, I would say that what you are looking for are escape characters. I may be able to help more once you provide us with more info!

Use the target/replacement strings as arguments to replace().
The general format is mystring = mystring.replace("old_text", "new_text")
Since your target strings have backslashes, you also probably want to use raw strings to prevent them from being interpreted as special characters.
mystring = "something"
mystring = mystring.replace(r'["{\"', '[{"')
mystring = mystring.replace(r'\"', '"')

if its two characters you want to replace then you have to first check for first character and then the second(which should be present just after the first one and so on) and shift(shorten the whole array by 3 elements in first case whenever the condition is satisfied and in the second case delete \ from the array.
You can also find particular substring by using inbuilt function and replace it by using replace() function to insert the string you want in its place

Python regex doesn't work on string

I have an HTML file that I process using lxml and BeautifulSoup (convert from HTML to text). Somehow, the ill-formed HTML below makes it into the text and I'd like to remove it. I tried matching something like "<.+>" in the text string, but it doesn't work. The string I want to remove is this:
string = """ .trb_m_b:befoe{ctent:'Hide comments'}.trb_c_so{padding-top:10px;min-height:500px}||<div class="trb_c_so" data-role=c_container><div class="s_comments" data-sitename="ffff" data-content-id="jksjkj7878787" data-type=promo-comment data-publisher="ronctt"></div></div>"""
The exact code I tried on it is:
pattern = re.compile(r'<.+>')
if (pattern.search(string)):
print ("Found")
However, that regex doesn't match the string, although it should.
Why would that be?
Thanks.
EDIT. It looks like the problem is not with the regular expressions, but with something very bizarre. I have this string in a list, it's the last item. When I loop through it the first time, for some reason, the program never hits it. The second time, however, it does. I don't understand the reason for it.
EDIT2. It turns out the problem was that I was trying to remove elements in a loop (if they matched the regex), which is not permitted. I rewrote the code to use a list comprehension, and now it works fine.

I believe what you want is this:
import re
data = re.findall("\<(.*?)\>", string)

Your HTML is not a complete HTML tag, if you really want to match the string that you give,you can use this：
re.findall("\.trb_m_b.*?></div></div>", string)

Replace text between parentheses in python

My string will contain () in it. What I need to do is to change the text between the brackets.
Example string: "B.TECH(CS,IT)".
In my string I need to change the content present inside the brackets to something like this.. B.TECH(ECE,EEE)
What I tried to resolve this problem is as follows..
reg = r'(()([\s\S]*?)())'
a = 'B.TECH(CS,IT)'
re.sub(reg,"(ECE,EEE)",a)
But I got output like this..
'(ECE,EEE)B(ECE,EEE).(ECE,EEE)T(ECE,EEE)E(ECE,EEE)C(ECE,EEE)H(ECE,EEE)((ECE,EEE)C(ECE,EEE)S(ECE,EEE),(ECE,EEE)I(ECE,EEE)T(ECE,EEE))(ECE,EEE)'
Valid output should be like this..
B.TECH(CS,IT)
Where I am missing and how to correctly replace the text.

The problem is that you're using parentheses, which have another meaning in RegEx. They're used as grouping characters, to catch output.
You need to escape the () where you want them as literal tokens. You can escape characters using the backslash character: \(.
Here is an example:
reg = r'\([\s\S]*\)'
a = 'B.TECH(CS,IT)'
re.sub(reg, '(ECE,EEE)', a)
# == 'B.TECH(ECE,EEE)'

The reason your regex does not work is because you are trying to match parentheses, which are considered meta characters in regex. () actually captures a null string, and will attempt to replace it. That's why you get the output that you see.
To fix this, you'll need to escape those parens – something along the lines of
\(...\)
For your particular use case, might I suggest a simpler pattern?
In [268]: re.sub(r'\(.*?\)', '(ECE,EEE)', 'B.TECH(CS,IT)')
Out[268]: 'B.TECH(ECE,EEE)'

regex search&replace a variable string including a regex statement

I want to use re.sub to replace a part of a string I know exactly what looks like. relevant part of code:
print "Regex statement: ", foundStatements[iterator]
print "string to replace with : \n", latexPreparedString
print "string to search&replace in: \n", fileAsString
processedString = re.sub(foundStatements[iterator], latexPreparedString, fileAsString)
print "processed string: \n", processedString
In my testing case, foundStatements[iterator] is "%#import script_example.py ( *out =(.|\n)*?return out)" But even though processedString contains foundStatements[iterator], processedString looks exactly like fileAsString, so it hasn't accomplished the re.sub task. What am I doing wrong?
EDIT: Ok, it definitely has something to do with the string I'm searching to replace containing regex code. Is there a way to make it just interpret it foundStatements[iterator] as a raw string to search for? The only solution I can think of is to create a function that replaces any regex symbols in a string with \regexsymbol (e.g. * -> \*), but it'd make sense for there to be a way to solve this with inbuilt functions. It'd also be a bit overkill since I'd have to make sure it works with every single regex symbol, of which there are quite a few :/
EDIT2: Well, just changing it to re.sub(re.escape(foundStatements[iterator]), latexPreparedString, fileAsString) seems to work. except when the regex statement doesn't hit anything in the original file. To explain, latexPreparedString is generated by using the regex-part of the foundStatements[iterator]. While it's logical that it shouldn't be able to set latexPreparedString to anything when the regex statement doesn't hit anything, I set latexPreparedString = "" by default, so in that case it should re.sub replace it with a blank string if it doesn't hit anything. Here's how to code looks at the moment: pastebin.com/wUedK3LN

First, for replacing an exact match in a string, you should use [string.replace()][1]:
processedString = fileAsString(foundStatements[iterator], latexPreparedString)
However, this will still fail in your case, because foundStatements[iterator] has a newline character in it. To escape it, you need to use the r prefix when declaring foundStatements[iterator].
If you still want to use re.sub, you have to both prefix the string with r and use re.escape(foundStatements[iterator]) instead of foundStatements[iterator]. You can read more about re.escape here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with Regex "The string is outside the <li></li>" Python - python

Use re.sub() to remove the string you don't want, instead of trying to match everything else. result = re.sub('<li>.*?</li>', '', text);

Related

Get the string within brackets and remove useless string in Python

Replace string with quotes, brackets, braces, and slashes in python

Python regex doesn't work on string

Replace text between parentheses in python

regex search&replace a variable string including a regex statement

Categories

Resources