Writing to and reading plain text files with writelines() and readlines() - python

I understand what these functions do, but what would be the practical use of reading/writing some as a list, and what is the use of writing a list to a file, if you could just use write() and then later use readlines() to view it as a list?

The practical upside of writing data to disk is that you can drop it from your system memory and come back to it later.

.writelines() accepts any iterable that produces strings. It could be implemented as:
def writelines(self, lines):
for line in lines:
self.write(line)
If you call file.writelines(["abc", "def"]) then the file will contain: abcdef. And file.readlines() would return ["abcdef"] if you read it back. As you see the roundtrip produces different result.
If you call file.write(["abc", "def"]) then you get TypeError: must be string or buffer, not list. Again it is different.

Related

Reading files in Python with for loop

To read a file in Python, the file must be first opened, and then a read() function is needed. Why is that when we use a for loop to read lines of a file, no read() function is necessary?
filename = 'pi_digits.txt'
with open(filename,) as file_object:
for line in file_object:
print(line)
I'm used to the code below, showing the read requirement.
for line in file_object.read():
This is because the file_object class has an "iter" method built in that states how the file will interact with an iterative statement, like a for loop.
In other words, when you say for line in file_object the file object is referencing its __iter__ method, and returning a list where each index contains a line of the file.
Python file objects define special behavior when you iterate over them, in this case with the for loop. Every time you hit the top of the loop it implicitly calls readline(). That's all there is to it.
Note that the code you are "used to" will actually iterate character by character, not line by line! That's because you will be iterating over a string (the result of the read()), and when Python iterates over strings, it goes character by character.
The open command in your with statement handles the reading implicitly. It creates a generator that yields the file a record at a time (the read is hidden within the generator). For a text file, each line is one record.
Note that the read command in your second example reads the entire file into a string; this consumes more memory than the line-at-a-time example.

How to read first N lines of a text file and write it to another text file?

Here is the code I modified from previous code.
But, I got this error:
TypeError: must be str not list in f1.write(head)
This is the part of code that is producing this error:
from itertools import islice
with open("input.txt") as myfile:
head = list(islice(myfile, 3))
f1.write(head)
f1.close()
Well, you have it right, using islice(filename, n) will get you the first n lines of file filename. The problem here is when you try and write these lines to another file.
The error is pretty intuitive (I've added the full error one receives in this case):
TypeError: write() argument must be str, not list
This is because f.write() accepts strings as parameters, not list types.
So, instead of dumping the list as is, write the contents of it in your other file using a for loop:
with open("input.txt", "r") as myfile:
head = list(islice(myfile, 3))
# always remember, use files in a with statement
with open("output.txt", "w") as f2:
for item in head:
f2.write(item)
Granted that the contents of the list are all of type str this works like a charm; if not, you just need to wrap each item in the for loop in an str() call to make sure it is converted to a string.
If you want an approach that doesn't require a loop, you could always consider using f.writelines() instead of f.write() (and, take a look at Jon's comment for another tip with writelines).

Reading a .txt file in python

I have use the following code to read a .txt file:
f = os.open(os.path.join(self.dirname, self.filename), os.O_RDONLY)
And when I want to output the content I use this:
os.read(f, 10);
Which means that this method reads 10 bytes from the beginning of the file on. While I need to read the content as much as it is, using some values such as -1 and so. What should I do?
You have two options:
Call os.read() repeatedly.
Open the file using the open() built-in (as opposed to os.open()), and just call f.read() with no arguments.
The second approach carries certain risk, in that you might run into memory issues if the file is very large.

How to get the size of a list when write to an file by bytes in python?

I have a file - which I read it into memory as a list, then split the list based on some rule, say there is list1, list2, .., listn. Now I want to get the size of each list, and this size is the file size when this list write to a file. the following is a code I have, the file name is 'temp', which size is: 744 bytes.
from os import stat
from sys import getsizeof
print(stat('temp').st_size) # we get exactly 744 here.
# Now read file into a list and use getsizeof() function:
with open('temp', 'r') as f:
chunks = f.readlines()
print(getsizeof(chunks)) # here i get 240, which is quite different than 744.
since I can't use getsizeof() to directly get the file size (on disk), so once i get the split list, i have to write this list to a tmp file:
open('tmp','w').write("".join(list1))
print(stat('tmp','w').st_size) # Here is the value I want.
os.remove('tmp')
this solution is very slow and require a lot of write/read to disk. Is there any better way to do? thanks a lot!
Instead of writing a series of bytes to a file and then looking at the file length1, you could just check the length of the string that you would have written to the file:
print(len("".join(list1)))
Here, I'm assuming that your list contains byte strings. If it doesn't, you can always encode a byte string from your unicode string:
print(len("".join(list1).encode(your_codec)))
which I think you would need for write to work properly anyway in your original solution.
1Your original code could also give flaky (wrong!) results since you never close the file. It's not guaranteed that all the contents of the string will be written to the file when you use os.stat on it due to buffering.

How to I read several lines in a file faster using python?

As of now I use the following python code:
file = open(filePath, "r")
lines=file.readlines()
file.close()
Say my file has several lines (10,000 or more), then my program becomes slow if I do this for more than one file. Is there a way to speed this up in Python? Reading various links I understand that readlines stores the lines of file in memory thats why the code gets slow.
I have tried the following code as well and the time gain I got is 17%.
lines=[line for line in open(filePath,"r")]
Is there any other module in python2.4 (which I might have missed).
Thanks,
Sandhya
for line in file:
This gives you an iterator that reads the file object one line at a time and then discards the previous line from memory.
A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit. In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer. New in version 2.3.
Short answer: don't assign the lines to a variable, just perform whatever operations you need inside the loop.

Categories

Resources