Python - how to read file with NUL delimited lines?

Python - how to read file with NUL delimited lines? - python

I usually use the following Python code to read lines from a file :
f = open('./my.csv', 'r')
for line in f:
print line
But how about if the file is line delimited by "\0" (not "\n") ? Is there a Python module that could handle this ?
Thanks for any advice.

If your file is small enough that you can read it all into memory you can use split:
for line in f.read().split('\0'):
print line
Otherwise you might want to try this recipe from the discussion about this feature request:
def fileLineIter(inputFile,
inputNewline="\n",
outputNewline=None,
readSize=8192):
"""Like the normal file iter but you can set what string indicates newline.
The newline string can be arbitrarily long; it need not be restricted to a
single character. You can also set the read size and control whether or not
the newline string is left on the end of the iterated lines. Setting
newline to '\0' is particularly good for use with an input file created with
something like "os.popen('find -print0')".
"""
if outputNewline is None: outputNewline = inputNewline
partialLine = ''
while True:
charsJustRead = inputFile.read(readSize)
if not charsJustRead: break
partialLine += charsJustRead
lines = partialLine.split(inputNewline)
partialLine = lines.pop()
for line in lines: yield line + outputNewline
if partialLine: yield partialLine
I also noticed your file has a "csv" extension. There is a CSV module built into Python (import csv). There is an attribute called Dialect.lineterminator however it is currently not implemented in the reader:
Dialect.lineterminator
The string used to terminate lines produced by the writer. It defaults to '\r\n'.
Note The reader is hard-coded to recognise either '\r' or '\n' as end-of-line, and ignores lineterminator. This behavior may change in the future.

I have modified Mark Byers's suggestion so that we could READLINE file with NUL delimited lines in Python. This approach reads a potentially large file line by line and should be more memory efficient. Here is the Python code (with comments) :
import sys
# Variables for "fileReadLine()"
inputFile = sys.stdin # The input file. Use "stdin" as an example for receiving data from pipe.
lines = [] # Extracted complete lines (delimited with "inputNewline").
partialLine = '' # Extracted last non-complete partial line.
inputNewline="\0" # Newline character(s) in input file.
outputNewline="\n" # Newline character(s) in output lines.
readSize=8192 # Size of read buffer.
# End - Variables for "fileReadLine()"
# This function reads NUL delimited lines sequentially and is memory efficient.
def fileReadLine():
"""Like the normal file readline but you can set what string indicates newline.
The newline string can be arbitrarily long; it need not be restricted to a
single character. You can also set the read size and control whether or not
the newline string is left on the end of the read lines. Setting
newline to '\0' is particularly good for use with an input file created with
something like "os.popen('find -print0')".
"""
# Declare that we want to use these related global variables.
global inputFile, partialLine, lines, inputNewline, outputNewline, readSize
if lines:
# If there is already extracted complete lines, pop 1st llne from lines and return that line + outputNewline.
line = lines.pop(0)
return line + outputNewline
# If there is NO already extracted complete lines, try to read more from input file.
while True: # Here "lines" must be an empty list.
charsJustRead = inputFile.read(readSize) # The read buffer size, "readSize", could be changed as you like.
if not charsJustRead:
# Have reached EOF.
if partialLine:
# If partialLine is not empty here, treat it as a complete line and copy and return it.
popedPartialLine = partialLine
partialLine = "" # partialLine is now copied for return, reset it to an empty string to indicate that there is no more partialLine to return in later "fileReadLine" attempt.
return popedPartialLine # This should be the last line of input file.
else:
# If reached EOF and partialLine is empty, then all the lines in input file must have been read. Return None to indicate this.
return None
partialLine += charsJustRead # If read buffer is not empty, add it to partialLine.
lines = partialLine.split(inputNewline) # Split partialLine to get some complete lines.
partialLine = lines.pop() # The last item of lines may not be a complete line, move it to partialLine.
if not lines:
# Empty "lines" means that we must NOT have finished read any complete line. So continue.
continue
else:
# We must have finished read at least 1 complete llne. So pop 1st llne from lines and return that line + outputNewline (exit while loop).
line = lines.pop(0)
return line + outputNewline
# As an example, read NUL delimited lines from "stdin" and print them out (using "\n" to delimit output lines).
while True:
line = fileReadLine()
if line is None: break
sys.stdout.write(line) # "write" does not include "\n".
sys.stdout.flush()
Hope it helps.

Related

Remove newline at the end of a file in python

I'm modifying a file with python that may already contain newlines like the following :
#comment
something
#new comment
something else
My code appends some lines to this file, I'm also writing the code that will remove what I added (ideally also working if other modifications occurred in the file).
Currently, I end up with a file that grows each time I apply the code (append/remove) with newlines characters at the end of the file.
I'm looking for a clean way to remove those newlines without too much programmatic complexity. Newlines "inside" the file should remain, newlines at the end of the file should be removed.

use str.rstrip() method:
my_file = open("text.txt", "r+")
content = my_file.read()
content = content.rstrip('\n')
my_file.seek(0)
my_file.write(content)
my_file.truncate()
my_file.close()

I needed a way to remove newline at eof without having to read the whole file into memory. The code below works for me. I find this efficient in terms of memory when dealing with large files.
with open('test.txt', 'r+') as f: #opens file in read/write text mode
f.seek(0, 2) #navigates to the position at end of file
f.seek(f.tell() - 1) #navigates to the position of the penultimate char at end of file
last_char = f.read()
if last_char == '\n':
f.truncate(f.tell() - 1)

Read a File from redirected stdin with python

I am trying to read the content of a text file that was redirected stdin via the command line, and send it by the Internet when the receiver has to assemble it back to it's original form.
For instance:
$ python test.py < file.txt
I have tried to read the file and to assemble it back with the following code inspired by link:
for line in sys.stdin:
stripped = line.strip()
if not stripped: break
result = result + stripped
print "File is beeing copied"
file = open("testResult.txt", "w")
file.write(result)
file.close()
print "File copying is complete!"
But this solution works as long as I DON'T have an empty row( two '\n' one after another) in my file,if i do have, my loop breaks and the File reading ends.How can I read from stdin till i reach <> of the file that was redirected?

Why are you even looking at the data:
result = sys.stdin.read()

Instead of breaking, you just want to continue to the next line. The iterator will stop automatically when it reaches the end of the file.
import sys
result = ""
for line in sys.stdin:
stripped = line.strip()
if not stripped:
continue
result += stripped

line.strip() is removing the trailing newline from the read line.
If you want that newline then you shouldn't need to do that I don't think (does your output file have the input newlines)?
That if stripped bit is looking for a blank line and was, in the original, the termination characteristic of the loop.
That isn't your termination marker though. You don't want to stop there. So don't.
The loop will finish on its own when sys.stdin reaches the end of the input (EOF).
Drop line.strip() drop if not stripped: break replace result = result + stripped with result = result + line and then write that to the file to get a simple (though likely expensive) cp script.
There are likely more efficient ways to read all the lines from standard input if you want to do something with them though (depending on your goal).

python file seek skips lines

I have a file with content:
0x11111111
0x22222222
0x33333333
0x44444444
And I'm reading it line by line using:
f = open('test1', 'r')
print "Register CIP_REGS_CONTROL value:"
for i in range(4):
content = f.read(11)
f.seek(11, 1)
print content
Note that there're 11 bytes each line due to the '\n' char at the end. But the output is:
0x11111111
0x33333333
There's an empty line after the 1st and 3rd line, I don't know why it's like that. If I delete the '\n' in each line, and change the size of reading and seeking to 10, I got:
0x11111111
0x33333333
2 lines are also missing. Anybody can help? Thanks in advance.

Remove your seek call. Each call is skipping the next 11 bytes. That is read also moves the current position.

Two things:
You don't need to seek after the read. Your position in the file will already be at the next character after the call the read.
When you call print, it will add append a newline (\n) to your output.

The simplest (and safest - it ensures your file gets closed properly) way would be to use a with construct, and readline()
print "Register CIP_REGS_CONTROL value:"
with open('test1', 'r') as f:
for i in range(4):
print f.readline().strip()
strip() when called with no arguments removes all whitespace (which includes \n) from the beginning and end of a string.

Jython code for deleting spaces from Text file

I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh

You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')

blank lines in file after sorting content of a text file in python

I have this small script that sorts the content of a text file
# The built-in function `open` opens a file and returns a file object.
# Read mode opens a file for reading only.
try:
f = open("tracks.txt", "r")
try:
# Read the entire contents of a file at once.
# string = f.read()
# OR read one line at a time.
#line = f.readline()
# OR read all the lines into a list.
lines = f.readlines()
lines.sort()
f.close()
f = open('tracks.txt', 'w')
f.writelines(lines) # Write a sequence of strings to a file
finally:
f.close()
except IOError:
pass
the only problem is that the text is displayed at the bottom of the text file everytime it's sortened...
I assume it also sorts the blank lines...anybody knows why?
and maybe can you suggest some tips on how to avoid this happening?
thanks in advance

An "empty" line read from a text file is represented in Python by a string containing only a newline ("\n"). You may also want to avoid lines whose "data" consists only of spaces, tabs, etc ("whitespace"). The str.strip() method lets you detect both cases (a newline is whitespace).
f = open("tracks.txt", "r")
# omit empty lines and lines containing only whitespace
lines = [line for line in f if line.strip()]
f.close()
lines.sort()
# now write the output file

This is a perfect opportunity to do some test-based development (see below). Some observations:
In the example below, I omit the aspect of reading from and writing to a file. That's not essential to this question, in my opinion.
I assume you want to strip trailing newlines and omit blank lines. If not, you'll need to adjust. (But you'll have the framework for asserting/confirming the expected behavior.)
I agree with chryss above that you generally don't need to reflexively wrap things in try blocks in Python. That's an anti-pattern that comes from Java (which forces it), I believe.
Anyway, here's the test:
import unittest
def sort_lines(text):
"""Return text sorted by line, remove empty lines and strip trailing whitespace."""
lines = text.split('\n')
non_empty = [line.rstrip() for line in lines if line.strip()]
non_empty.sort()
return '\n'.join(non_empty)
class SortTest(unittest.TestCase):
def test(self):
data_to_sort = """z some stuff
c some other stuff
d more stuff after blank lines
b another line
a the last line"""
actual = sort_lines(data_to_sort)
expected = """a the last line
b another line
c some other stuff
d more stuff after blank lines
z some stuff"""
self.assertEquals(actual, expected, "no match!")
unittest.main()

The reason it sorts the blank lines is that they are there. A blank line is an empty string followed by \n (or \r\n or \r, depending on the OS). Perfectly sortable.
I should like to note that "try:" nested into a "try:... except" block is a bit ugly, and I'd close the file after reading, for style's sake.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - how to read file with NUL delimited lines? - python

I usually use the following Python code to read lines from a file : f = open('./my.csv', 'r') for line in f: print line But how about if the file is line delimited by "\0" (not "\n") ? Is there a Python module that could handle this ? Thanks for any advice.

Related

Remove newline at the end of a file in python

Read a File from redirected stdin with python

python file seek skips lines

Jython code for deleting spaces from Text file

blank lines in file after sorting content of a text file in python

Categories

Resources