How to detect lines with only a white space in a text? - python

Given that "empty line" is a white space:
I am trying to read a text file line by line. I want to ignore whitespace lines. Or in a more correct way, I want to detect empty lines.
An empty line can contain spaces, newline characters, etc. And it is still considered as an empty line. If you open it up in notepad, in an empty line you should not see anything.
Is there a quick way of doing this in Python? I am new to python by the way.

for line in someopenfile:
if line.isspace():
empty_line()

Using strip() on any string returns a string with all leading and trailing whitespace stripped. So calling that on a line with only whitespace gives you an empty string. You can then just filter on strings with non-zero length.
>>> lines=[' ','abc','']
>>> print filter(lambda x:len(x.strip()),lines)
['abc']
>>>

Related

output gets printed but it has a extra empty line

I'm trying this simple code to print a txt file with a if condition.
Code works fine, but when the output gets printed but it has a extra empty line. how to fix that?
with open('test.txt') as file:
for line in file:
if 'Efficient AP Image Upgrade ..... Enabled' in line:
break
print(line)
The line in line contains a newline character at the end. To avoid the print function to add another newline (the default behaviour), you should call print('line', end='') to specify that you don't want the extra newline.
You are probably using print on strings that already have a final newline -- in fact now that the question has been tidied up we can see that this is the case because you are using an iterator over file, and this will produce a sequence of lines that end with newline characters (except possibly the last line if it does not have a newline in the input file).
Note that after the data items, the print function will write an additional newline by default (more specifically, it will write the value specified by the end parameter, which defaults to a newline).
Possible approaches:
Use sys.stdout.write (does not append newline):
sys.stdout.write(text)
Use print but set it to write empty string instead of newline at the end:
print(text, end='')
Remove any newlines before printing (in principle this may include newlines in the middle of the string but because your strings come from an iterator over file object, there shouldn't be any):
print(text.replace('\n', ''))
Remove any leading or trailing whitespace (including newlines) before printing - note that this may include other spaces:
print(text.strip())
print() by default, creates a new line when you execute that. For your notice, try using
print(line,end='')
or
print(line,end=' ')
To remove the trailing new line you can strip new lines from the right side of the line:
print(text.rstrip('\n'))

Regex for newline character search in given string in Python

I'm want search newline character in string using regex in python.I don't want to include \r or \n in Message.
I have tried regex which is able to detect \r\n correctly. But when i'm removing \r\n from Line variable. still it prints the error.
Line="got less no of bytes than requested\r\n"
if(re.search('\\r|\\n',Line)):
print("Do not use \\r\\n in MSG");
It Should detect \r\n in Line variable which is as a text not the invisible \n.
It should not print when the Line is Like below:
Line="got less no of bytes than requested"
You are looking for the re.sub function.
Try to do this:
Import re
Line="got less no of bytes than requested\r\n"
replaced = re.sub('\n','',Line)
replaced = re.sub('\r','',Line)
print replaced
Instead of checking for newlines, it would probably be better to just remove them. No need to use regex for it, just use strip, it will remove all whitespace and newlines from the ends of the string:
line = 'got less no of bytes than requested\r\n'
line = line.strip()
# line = 'got less no of bytes than requested'
If you want to do it with regex you can use:
import re
line = 'got less no of bytes than requested\r\n'
line = re.sub(r'\n|\r', '', line)
# line = 'got less no of bytes than requested'
If you insist on checking for the newlines, you can do it like this:
if '\n' in line or '\r' in line:
print(r'Do not use \r\n in MSG');
Or the same with regex:
import re
if re.search(r'\n|\r', line):
print(r'Do not use \r\n in MSG');
Also: it's advisable to have your Python variables named with snake_case.
First of all consider to use strip as many guys here mentioned.
Second, if you want to match newline at ANY position in string use search not match
What is the difference between re.search and re.match?Here is more about search vs match
newline_regexp = re.compile("\n|\r")
newline_regexp.search(Line) # will give u search object or None if not found
If you just want to check for line breaks in the message, you can use the string function find(). Note the use of raw text as indicated by the r in front of strings. This removes the need to escape the backslash.
line = r"got less no of bytes than requested\r\n"
print(line)
if line.find(r'\r\n') > 0:
print("Do not use line breaks in MSG");
As others have noted, you are probably looking for line.strip(). But, in case you still want to practice regex, you would use the following code:
Line="got less no of bytes than requested\r\n"
# \r\n located anywhere in the string
prog = re.compile(r'\r\n')
# \r or \n located anywhere in the string
prog = re.compile(r'(\r|\n)')
if prog.search(Line):
print('Do not use \\r\\n in MSG');

Why doesn't .rstrip('\n') work?

Let's say doc.txt contains
a
b
c
d
and that my code is
f = open('doc.txt')
doc = f.read()
doc = doc.rstrip('\n')
print doc
why do I get the same values?
str.rstrip() removes the trailing newline, not all the newlines in the middle. You have one long string, after all.
Use str.splitlines() to split your document into lines without newlines; you can rejoin it if you want to:
doclines = doc.splitlines()
doc_rejoined = ''.join(doclines)
but now doc_rejoined will have all lines running together without a delimiter.
Because you read the whole document into one string that looks like:
'a\nb\nc\nd\n'
When you do a rstrip('\n') on that string, only the rightmost \n will be removed, leaving all the other untouched, so the string would look like:
'a\nb\nc\nd'
The solution would be to split the file into lines and then right strip every line. Or just replace all the newline characters with nothing: s.replace('\n', ''), which gives you 'abcd'.
rstrip strips trailing spaces from the whole string. If you were expecting it to work on individual lines, you'd need to split the string into lines first using something like doc.split('\n').
Try this instead:
with open('doc.txt') as f:
for line in f:
print line,
Explanation:
The recommended way to open a file is using with, which takes care of closing the file at the end
You can iterate over each line in the file using for line in f
There's no need to call rstrip() now, because we're reading and printing one line at a time
Consider using replace and replacing each instance of '\n' with ''. This would get rid of all the new line characters in the input text.

dealing with \n characters at end of multiline string in python

I have been using python with regex to clean up a text file. I have been using the following method and it has generally been working:
mystring = compiledRegex.sub("replacement",mystring)
The string in question is an entire text file that includes many embedded newlines. Some of the compiled regex's cover multiple lines using the re.DOTALL option. If the last character in the compiled regex is a \n the above command will substitute all matches of the regex except the match that ends with the final newline at the end of the string. In fact, I have had several other no doubt related problems dealing with newlines and multiple newlines when they appear at the very end of the string. Can anyone give me a pointer as to what is going on here? Thanks in advance.
If i correctly undestood you and all that you need is to get a text without newline at the end of the each line and then iterate over this text in order to find a required word than you can try to use the following:
data = (line for line in text.split('\n') if line.strip())# gives you all non empty lines without '\n'at the end
Now you can either search/replace any text you need using list slicing or regex functionality.
Or you can use replace in order to replace all '\n' to whenever you want:
text.replace('\n', '')
My bet is that your file does not end with a newline...
>>> content = open('foo').read()
>>> print content
TOTAL:.?C2
abcTOTAL:AC2
defTOTAL:C2
>>> content
'TOTAL:.?C2\nabcTOTAL:AC2\ndefTOTAL:C2'
...so the last line does not match the regex:
>>> regex = re.compile('TOTAL:.*?C2\n', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefTOTAL:C2'
If that is the case, the solution is simple: just match either a newline or the end of the file (with $):
>>> regex = re.compile('TOTAL:.*?C2(\n|$)', re.DOTALL)
>>> regex.sub("XXX", content)
'XXXabcXXXdefXXX'
I can't get a good handle on what is going on from your explanation but you may be able to fix it by replacing all multiple newlines with a single newline as you read in the file. Another option might be to just trim() the regex removing the \n at the end unless you need it for something.
Is the question mark to prevent the regex matching more than one iine at a time? If so then you probably want to be using the MULTILINE flag instead of DOTALL flag. The ^ sign will now match just after a new line or the beginning of a string and the $ sign will now match just before a newline character or the end of a string.
eg.
regex = re.compile('^TOTAL:.*$', re.MULTILINE)
content = regex.sub('', content)
However, this still leaves with the problem of empty lines. But why not just run one additional regex at the end that removes blank lines.
regex = re.compile('\n{2,}')
content = regex.sub('\n', content)

What's a quick one-liner to remove empty lines from a python string?

I have some code in a python string that contains extraneous empty lines. I would like to remove all empty lines from the string. What's the most pythonic way to do this?
Note: I'm not looking for a general code re-formatter, just a quick one or two-liner.
Thanks!
How about:
text = os.linesep.join([s for s in text.splitlines() if s])
where text is the string with the possible extraneous lines?
"\n".join([s for s in code.split("\n") if s])
Edit2:
text = "".join([s for s in code.splitlines(True) if s.strip("\r\n")])
I think that's my final version. It should work well even with code mixing line endings. I don't think that line with spaces should be considered empty, but if so then simple s.strip() will do instead.
LESSON ON REMOVING NEWLINES and EMPTY LINES WITH SPACES
"t" is the variable with the text. You will see an "s" variable, its a temporary variable that only exists during the evaluation of the main set of parenthesis (forgot the name of these lil python things)
First lets set the "t" variable so it has new lines:
>>> t='hi there here is\na big line\n\nof empty\nline\neven some with spaces\n \nlike that\n\n \nokay now what?\n'
Note there is another way to set the varible using triple quotes
somevar="""
asdfas
asdf
asdf
asdf
asdf
""""
Here is how it looks when we view it without "print":
>>> t
'hi there here is\na big line\n\nof empty\nline\neven some with spaces\n \nlike that\n\n \nokay now what?\n'
To see with actual newlines, print it.
>>> print t
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
COMMAND REMOVE ALL BLANK LINES (INCLUDING SPACES):
So somelines newlines are just newlines, and some have spaces so they look like new lines
If you want to get rid of all blank looking lines (if they have just newlines, or spaces as well)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
OR:
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n").strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
NOTE: that strip in t.strip().splitline(True) can be removes so its just t.splitlines(True), but then your output can end with an extra newline (so that removes the final newline). The strip() in the last part s.strip("\r\n").strip() and s.strip() is what actually removes the spaces in newlines and newlines.
COMMAND REMOVE ALL BLANK LINES (BUT NOT ONES WITH SPACES):
Technically lines with spaces should NOT be considered empty, but it all depends on the use case and what your trying to achieve.
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n")])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
** NOTE ABOUT THAT MIDDLE strip **
That middle strip there, thats attached to the "t" variable, just removes the last newline (just as the previous note has stated). Here is how it would look like without that strip being there (notice that last newline)
With 1st example (removing newlines and newlines with spaces)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n").strip()])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
.without strip new line here (stackoverflow cant have me format it in).
With 2nd example (removing newlines only)
>>> print "".join([s for s in t.strip().splitlines(True) if s.strip("\r\n")])
hi there here is
a big line
of empty
line
even some with spaces
like that
okay now what?
.without strip new line here (stackoverflow cant have me format it in).
The END!
filter(None, code.splitlines())
filter(str.strip, code.splitlines())
are equivalent to
[s for s in code.splitlines() if s]
[s for s in code.splitlines() if s.strip()]
and might be useful for readability
By using re.sub function
re.sub(r'^$\n', '', s, flags=re.MULTILINE)
Here is a one line solution:
print("".join([s for s in mystr.splitlines(True) if s.strip()]))
This code removes empty lines (with or without whitespaces).
import re
re.sub(r'\n\s*\n', '\n', text, flags=re.MULTILINE)
IMHO shortest and most Pythonic would be:
str(textWithEmptyLines).replace('\n\n','')
This one will remove lines of spaces too.
re.replace(u'(?imu)^\s*\n', u'', code)
using regex
re.sub(r'^$\n', '', somestring, flags=re.MULTILINE)
And now for something completely different:
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string, re
>>> tidy = lambda s: string.join(filter(string.strip, re.split(r'[\r\n]+', s)), '\n')
>>> tidy('\r\n \n\ra\n\n b \r\rc\n\n')
'a\012 b \012c'
Episode 2:
This one doesn't work on 1.5 :-(
BUT not only does it handle universal newlines and blank lines, it also removes trailing whitespace (good idea when tidying up code lines IMHO) AND does a repair job if the last meaningful line is not terminated.
import re
tidy = lambda c: re.sub(
r'(^\s*[\r\n]+|^\s*\Z)|(\s*\Z|\s*[\r\n]+)',
lambda m: '\n' if m.lastindex == 2 else '',
c)
expanding on ymv's answer, you can use filter with join to get desired string,
"".join(filter(str.strip, sample_string.splitlines(True)))
I wanted to remove a bunch of empty lines and what worked for me was:
if len(line) > 2:
myfile.write(output)
I went with 2 since that covered the \r\n.
I did want a few empty rows just to make my formatting look better so in those cases I had to use:
print(" \n"

Categories

Resources