Regex for newline character search in given string in Python - python

I'm want search newline character in string using regex in python.I don't want to include \r or \n in Message.
I have tried regex which is able to detect \r\n correctly. But when i'm removing \r\n from Line variable. still it prints the error.
Line="got less no of bytes than requested\r\n"
if(re.search('\\r|\\n',Line)):
print("Do not use \\r\\n in MSG");
It Should detect \r\n in Line variable which is as a text not the invisible \n.
It should not print when the Line is Like below:
Line="got less no of bytes than requested"

You are looking for the re.sub function.
Try to do this:
Import re
Line="got less no of bytes than requested\r\n"
replaced = re.sub('\n','',Line)
replaced = re.sub('\r','',Line)
print replaced

Instead of checking for newlines, it would probably be better to just remove them. No need to use regex for it, just use strip, it will remove all whitespace and newlines from the ends of the string:
line = 'got less no of bytes than requested\r\n'
line = line.strip()
# line = 'got less no of bytes than requested'
If you want to do it with regex you can use:
import re
line = 'got less no of bytes than requested\r\n'
line = re.sub(r'\n|\r', '', line)
# line = 'got less no of bytes than requested'
If you insist on checking for the newlines, you can do it like this:
if '\n' in line or '\r' in line:
print(r'Do not use \r\n in MSG');
Or the same with regex:
import re
if re.search(r'\n|\r', line):
print(r'Do not use \r\n in MSG');
Also: it's advisable to have your Python variables named with snake_case.

First of all consider to use strip as many guys here mentioned.
Second, if you want to match newline at ANY position in string use search not match
What is the difference between re.search and re.match?Here is more about search vs match
newline_regexp = re.compile("\n|\r")
newline_regexp.search(Line) # will give u search object or None if not found

If you just want to check for line breaks in the message, you can use the string function find(). Note the use of raw text as indicated by the r in front of strings. This removes the need to escape the backslash.
line = r"got less no of bytes than requested\r\n"
print(line)
if line.find(r'\r\n') > 0:
print("Do not use line breaks in MSG");

As others have noted, you are probably looking for line.strip(). But, in case you still want to practice regex, you would use the following code:
Line="got less no of bytes than requested\r\n"
# \r\n located anywhere in the string
prog = re.compile(r'\r\n')
# \r or \n located anywhere in the string
prog = re.compile(r'(\r|\n)')
if prog.search(Line):
print('Do not use \\r\\n in MSG');

Related

How to get rid of trailing \ while reading a file in python3

I am reading a file in python and getting the lines from it.
However, after printing out the values I get, I realize that after each line there is a trailing \ at the end.
I have looked at Python strip with \n and tried everything in it but nothing has removed the trailing .
For example
0048\
0051\
0052\
0054\
0056\
0057\
0058\
0059\
How can I get rid of these slashes?
Here is the code I have so far
for line in f:
line = line.replace('\\n', "")
line = line.replace('\\n', "")
print(line)
I've even tried using regex
strings = re.findall(r"\S+", f.read())
But nothing has worked so far.
You're probably confused about what is in the lines, and as a result you're confusing me too. '\n' is a single newline character, as shown using repr() (which is your friend when you want to know what a value is exactly). A line typically ends with that (the exception being the end of file which might not). That does not contain a backslash; that backslash is part of a string literal escape sequence. Your replace argument of '\\n' contains two characters, a backslash followed by the letter n. This wouldn't match a '\n'; the easiest way to remove the newline specifically is to use str.rstrip('\n'). The line reading itself will guarantee that there's only up to one newline, and it is at the end of the string. Frequently we use strip() with no argument instead as we don't want whitespace either.
If your string really does contain backslash, you can process that as well, whether using replace, strip, re or some other string processing. Just keep in mind that it might be used for escape sequences not only at string literal level but at regular expression level too. For instance, re.sub(r'\\$', '', str) will remove a backslash from the end of a string; the backslash itself is doubled to not mean a special sequence in the regular expression, and the string literal is raw to not need another doubling of the backslashes.

Why doesn't .rstrip('\n') work?

Let's say doc.txt contains
a
b
c
d
and that my code is
f = open('doc.txt')
doc = f.read()
doc = doc.rstrip('\n')
print doc
why do I get the same values?
str.rstrip() removes the trailing newline, not all the newlines in the middle. You have one long string, after all.
Use str.splitlines() to split your document into lines without newlines; you can rejoin it if you want to:
doclines = doc.splitlines()
doc_rejoined = ''.join(doclines)
but now doc_rejoined will have all lines running together without a delimiter.
Because you read the whole document into one string that looks like:
'a\nb\nc\nd\n'
When you do a rstrip('\n') on that string, only the rightmost \n will be removed, leaving all the other untouched, so the string would look like:
'a\nb\nc\nd'
The solution would be to split the file into lines and then right strip every line. Or just replace all the newline characters with nothing: s.replace('\n', ''), which gives you 'abcd'.
rstrip strips trailing spaces from the whole string. If you were expecting it to work on individual lines, you'd need to split the string into lines first using something like doc.split('\n').
Try this instead:
with open('doc.txt') as f:
for line in f:
print line,
Explanation:
The recommended way to open a file is using with, which takes care of closing the file at the end
You can iterate over each line in the file using for line in f
There's no need to call rstrip() now, because we're reading and printing one line at a time
Consider using replace and replacing each instance of '\n' with ''. This would get rid of all the new line characters in the input text.

Python: match a long string with special characters and white spaces and then prepend two characters to the beginning

I'm not sure how to approach this, i'm trying to match this long string in a text file that has lots of whitespace and special characters and append the characters to the front ie. "//"
i need to match this line:
$menu_items['gojo_project'] => array('http://www.gojo.net/community/plugin-inventory/ops-gojo/gojo', 'gojo',3),
and turn it into this:
//$menu_items['gojo_project'] => array('http://www.gojo.net/community/plugin-inventory/ops-gojo/gojo', 'gojo',3),
notice i just prepended two '/' character.
I tried using re.escape to format the string, but its just really long and still throw sytax error. Am i going about this the right way using 're' ? or is there a better pythonic way to match a string like this one in a text file and prepend to it?
Edit: Forgot to mention that i need to edit the file in-line. In short, its a long php script that i'm trying to find that line and comment it out (ie. //). So, I cant really use some of the proposed solutions(i think) since they have it writing the modification to a separate file.
Try fileinput it will let you read over a file and rewrite lines in place:
import fileinput
for line in fileinput.input("myfile.txt", inplace = 1):
if line == "$menu_items['gojo_project'] => array('http://www.gojo.net/community/plugin-inventory/ops-gojo/gojo', 'gojo',3),":
line = '//' + line
print line,
If you're trying to match exactly that string, it would be easier just to use the string equality operator, rather than regular expressions.
longString = "$menu_items['gojo_project'] => array('http://www.gojo.net/community/plugin-inventory/ops-gojo/gojo', 'gojo',3),"
input = open("myTextFile.txt", "r")
output = open("myOutput.txt", "w")
for line in input:
if line.rstrip() == longString: #rstrip removes the trailing newline/carriage return
line = "//" + line
output.write(line)
input.close()
output.close()

dealing with \n characters at end of multiline string in python

I have been using python with regex to clean up a text file. I have been using the following method and it has generally been working:
mystring = compiledRegex.sub("replacement",mystring)
The string in question is an entire text file that includes many embedded newlines. Some of the compiled regex's cover multiple lines using the re.DOTALL option. If the last character in the compiled regex is a \n the above command will substitute all matches of the regex except the match that ends with the final newline at the end of the string. In fact, I have had several other no doubt related problems dealing with newlines and multiple newlines when they appear at the very end of the string. Can anyone give me a pointer as to what is going on here? Thanks in advance.
If i correctly undestood you and all that you need is to get a text without newline at the end of the each line and then iterate over this text in order to find a required word than you can try to use the following:
data = (line for line in text.split('\n') if line.strip())# gives you all non empty lines without '\n'at the end
Now you can either search/replace any text you need using list slicing or regex functionality.
Or you can use replace in order to replace all '\n' to whenever you want:
text.replace('\n', '')
My bet is that your file does not end with a newline...
>>> content = open('foo').read()
>>> print content
TOTAL:.?C2
abcTOTAL:AC2
defTOTAL:C2
>>> content
'TOTAL:.?C2\nabcTOTAL:AC2\ndefTOTAL:C2'
...so the last line does not match the regex:
>>> regex = re.compile('TOTAL:.*?C2\n', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefTOTAL:C2'
If that is the case, the solution is simple: just match either a newline or the end of the file (with $):
>>> regex = re.compile('TOTAL:.*?C2(\n|$)', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefXXX'
I can't get a good handle on what is going on from your explanation but you may be able to fix it by replacing all multiple newlines with a single newline as you read in the file. Another option might be to just trim() the regex removing the \n at the end unless you need it for something.
Is the question mark to prevent the regex matching more than one iine at a time? If so then you probably want to be using the MULTILINE flag instead of DOTALL flag. The ^ sign will now match just after a new line or the beginning of a string and the $ sign will now match just before a newline character or the end of a string.
eg.
regex = re.compile('^TOTAL:.*$', re.MULTILINE)
content = regex.sub('', content)
However, this still leaves with the problem of empty lines. But why not just run one additional regex at the end that removes blank lines.
regex = re.compile('\n{2,}')
content = regex.sub('\n', content)

regex not matching

I am write a small python script to gather some data from a database, the only problem is when I export data as XML from mysql it includes a \b character in the XML file. I wrote code to remove it, but then realized I didn't need to do that processing everytime, so I put it in a method and am calling it I find a \b in the XML file, only now the regex isnt matching, even though I know the \b is there.
here is what I am doing:
Main program:
'''Program should start here'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\b")
if(p.match(line)):
print p.match(line)
processing = True
break #only one match needed
if(processing):
print "preprocess"
preprocess(xml_file)
Preprocessing method:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = []
for line in xml_file:
lines.append(re.sub("\b", "", line))
#go to the beginning of the file
xml_file.seek(0);
#overwrite with correct data
for line in lines:
xml_file.write(line);
xml_file.truncate()
Any help would be great,
Thanks
\b is a flag for the regular expression engine:
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.
So you will need to escape it to find it with a regex.
Escape it with backslash in regex. Since backslash in Python needs to be escaped as well (unless you use raw strings which you don't want to), you need a total of 3 backslashes:
p = re.compile("\\\b")
This will produce a pattern matching the \b character.
Correct me if i wrong but there is no need to use regEx in order to replace '\b', you can simply use replace method for this purpose:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", ""), xml_file)
#go to the beginning of the file
xml_file.seek(0)
#overwrite with correct data
for line in lines:
xml_file.write(line)
# OR: xml_file.writelines(lines)
xml_file.truncate()
Note that there is no need in python to use ';' at the end of string

Categories

Resources