What is the difference between opening and reading a file in python? - python

In python's OS module there is a method to open a file and a method to read a file.
The docs for the open method say:
Open the file file and set various flags according to flags and
possibly its mode according to mode. The default mode is 0777 (octal),
and the current umask value is first masked out. Return the file
descriptor for the newly opened file.
The docs for the read method say;
Read at most n bytes from file descriptor fd. Return a string
containing the bytes read. If the end of the file referred to by fd
has been reached, an empty string is returned.
I understand what it means to read n bytes from a file. But how does this differ from open?

"Opening" a file doesn't actually bring any of the data from the file into your program. It just prepares the file for reading (or writing), so when your program is ready to read the contents of the file it can do so right away.

Opening a file allows you to read or write to it (depending on the flag you pass as the second argument), whereas reading it actually pulls the data from a file that is typcially saved into a variable for processing or printed as output.
You do not always read from a file once it is opened. Opening also allows you to write to a file, either by overwriting all the contents or appending to the contents.
To read from a file:
>>> myfile = open('foo.txt', 'r')
>>> myfile.read()
First you open the file with read permission (r)
Then you read() from the file
To write to a file:
>>> myfile = open('foo.txt', 'r')
>>> myfile.write('I am writing to foo.txt')
The only thing that is being done in line 1 of each of these examples is opening the file. It is not until we actually read() from the file that anything is changed

open gets you a fd (file descriptor), you can read from that fd later.
One may also open a file for other purpose, say write to a file.

It seems to me you can read lines from the file handle without invoking the read method but I guess read() truly puts the data in the variable location. In my course we seem to be printing lines, counting lines, and adding numbers from lines without using read().
The rstrip() method needs to be used, however, because printing the line from the file handle using a for in statement also prints the invisible line break symbol at the end of the line, as does the print statement.
From Python for Everybody by Charles Severance, this is the starter code.
"""
7.2
Write a program that prompts for a file name,
then opens that file and reads through the file,
looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point
values from each of the lines and compute the
average of those values and produce an output as
shown below. Do not use the sum() function or a
variable named sum in your solution.
You can download the sample data at
http://www.py4e.com/code3/mbox-short.txt when you
are testing below enter mbox-short.txt as the file name.
"""
# Use the file name mbox-short.txt as the file name
fname = input("Enter file name: ")
fh = open(fname)
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") :
continue
print(line)
print("Done")

Related

File handling techniques in Python

How do I read a file by opening that particular file instead of printing it on the console? I've used the following code but it prints the contents of the file on the console.
fw=open("x.txt",'r+')
#fw.write("Hello\n")
#fw.write("Python is crazy af")
n=fw.read()
print(n)
fw.close()
The builtin open function makes the contents of a file available, meaning you can manipulate it with your code. If you don't want to print a line of it, you can do .readlines(). If you don't want to print it you can do anything else you want with it like store it in a variable.
One last note about file context:
with open("filename.txt", "r") as file:
for line in file:
# Do something with line here
This pattern is guaranteed to close, instead of calling open and close separately.
But if you wanted to open a text editor...
https://stackoverflow.com/a/6178200/10553976
How do I read a file by opening that particular file
The first 2 (non comment) lines of your answer do this:
fw=open("x.txt",'r+')
n=fw.read()
You have now read the contents of x.txt into the variable n
instead of printing it on the console?
Don't print it then. Remove the line
print(n)
and the contents of the file won't be printed.

Python:Printing file content

I am a newbie to programming and trying to print contents of a file using the following statements but while trying to print the file contents, the output I get is empty space:-
with open('myfile.txt','a+') as myfile:
myfile.write("hello once again 2")
data=myfile.read()
print(data)
The reason for that is a wrong parameter to the open function. Try to replace a+ with r+, and read with readlines
with open('myfile.txt', 'r+') as myfile:
myfile.write("hello once again 2")
data = myfile.readlines() #please notice readlines
print(data)
Here is a reason for that.
When you open a file with 'a+' flag it is opened for reading and writing but the stream is position in the end the file. That why you read 'empty', because there is nothing.
I would advice you to work with file in two steps. First write to it, and then read it.
What write and read do - they write the content into the file but it is not going to be there immediately unless you close the file or call the flush function explicitly. The flush is going to be called in the end of the 'context manager' which is created by with open('myfile.txt', 'r+') as myfile. You can imagine 'context manager' as a wrapper which makes sure that 'flush' is called after you've done writing your code under with statement.
When you write your content your filepointer is at the end of the file.
To read it from the begining you need to reset your pointer.
do myfile.seek(0) before myfile.read()
for more details see: https://docs.python.org/2/tutorial/inputoutput.html
f.tell() returns an integer giving the file object’s current position
in the file, measured in bytes from the beginning of the file. To
change the file object’s position, use f.seek(offset, from_what). The
position is computed from adding offset to a reference point; the
reference point is selected by the from_what argument. A from_what
value of 0 measures from the beginning of the file, 1 uses the current
file position, and 2 uses the end of the file as the reference point.
from_what can be omitted and defaults to 0, using the beginning of the
file as the reference point.
Since the behavior of a+ can vary among operating systems, it is probably best not to use it is you want your code to be portable.
Unless your files are huge (is in a significant fraction of available RAM) I would do the following.
Read your whole file into a list of lines.
with open('myfile.txt') as myfile:
mylines = myfile.readlines()
You can now manipulate mylines as you like. Append, insert, change or delete lines as you wish.
At the end, write it all back.
with open('myfile.txt', 'w') as myfile:
myfile.writelines(mylines)
To the best of my knowledge, this should behave the same on all Python platforms.

Why does my text file keep overwriting the data on it?

I'm trying to pull some data from Facebook pages for a product and dump it all into a text file, but I find that the file keeps overwriting itself with the data. I'm not sure if it's a pagination issue or if I have to make several files.
Here's my code:
#Modules
import requests
import facebook
import json
def some_action(post):
print posts['data']
print post['created_time']
#Token
access_token = 'INSERT ACCESS TOKEN'
user = 'walkers'
#Posts
graph = facebook.GraphAPI(access_token)
profile = graph.get_object(user)
posts = graph.get_connections(profile['id'], 'posts')
#Write
while True:
posts = requests.get(posts['paging']['next']).json()
#print posts
with open('test121.txt', 'w') as outfile:
json.dump(posts, outfile)
Any idea as to why this is happening?
w overwrites, open with a to append or open the file once outside the loop:
append:
while True:
posts = requests.get(posts['paging']['next']).json()
#print posts
with open('test121.txt', 'a') as outfile:
json.dump(posts, outfile)
Open once outside the loop:
with open('test121.txt', 'w') as outfile:
while True:
posts = requests.get(posts['paging']['next']).json()
#print posts
json.dump(posts, outfile)
It makes more sense to use the second option, if you are going to be running the code multiple times then you can open with a outside the loop also, if the file does not exist it will be created, if it does data will be appended
This is because you are using the file operator with w mode, you are overwriting the content. You can use the a append mode:
It can be done like this
Modification:
with open('test121.txt', 'w') as outfile:
while True:
posts = requests.get(posts['paging']['next']).json()
json.dump(posts, outfile)
w overwrites on the existing file
i.e)
File1.txt:
123
code:
with open("File1.txt","w") as oup1:
oup1.write("2")
File1.txt after python run:
2
Its value is overwritten
a appends to the existing file
i.e)
File1.txt:
123
code:
with open("File1.txt","a") as oup1:
oup1.write("2")
File1.txt after python run:
1232
The written content is appended to the end.
Opening and Closing Files
to use actual data files reading and writing to the standard input and output.
Python provides basic functions and methods necessary to manipulate files by default. You can do your most of the file manipulation using a file object.
The open Function
Before you can read or write a file, you have to open it using Python's built-in open() function. This function creates a file object, which would be utilized to call other support methods associated with it.
Syntax
file object = open(file_name [, access_mode][, buffering])
Here are parameter details:
file_name: The file_name argument is a string value that contains the name of the file that you want to access.
access_mode: The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. A complete list of possible values is given below in the table. This is optional parameter and the default file access mode is read (r).
buffering: If the buffering value is set to 0, no buffering takes place. If the buffering value is 1, line buffering is performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action is performed with the indicated buffer size. If negative, the buffer size is the system default(default behavior).
Here is a list of the different modes of opening a file −
Modes and Description
r= Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the default mode.
rb= Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file. This is the default mode.
r+= Opens a file for both reading and writing. The file pointer placed at the beginning of the file.
rb+= Opens a file for both reading and writing in binary format. The file pointer placed at the beginning of the file.
w= Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
wb= Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
w+= Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
wb+= Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
a= Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
ab= Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
a+= Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
ab+= Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
Reading and Writing Files
The file object provides a set of access methods to make our lives easie with the use of read() and write() methods to read and write files.
The write() Method
The write() method writes any string to an open file. It is important to note that Python strings can have binary data and not just text.
The write() method does not add a newline character ('\n') to the end of the string −
Syntax
fileObject.write(string);
Here, passed parameter is the content to be written into the opened file.
Example
# Open a file
fo = open("file.txt", "wb")
fo.write( "Python is a great language");
# Closeopend file
fo.close()
The above method would create foo.txt file and would write given content in that file and finally it would close that file. If you would open this file, it would have following content.
Python is a great language.
The read() Method
The read() method reads a string from an open file. It is important to note that Python strings can have binary data. apart from text data.
Syntax
fileObject.read([count]);
Here, passed parameter is the number of bytes to be read from the opened file. This method starts reading from the beginning of the file and if count is missing, then it tries to read as much as possible, maybe until the end of file.
Example
Let's take a file foo.txt, which we created above.
# Open a file
fo = open("foo.txt", "r+")
str = fo.read(10);
print "Read String is : ", str
# Close opend file
fo.close()
This produces the following result −
Read String is : Python is

Beginner Python: Reading and writing to the same file

Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()

How to confirm that a file object is empty? [Python]

in a py module, I write:
outFile = open(fileName, mode='w')
if A:
outFile.write(...)
if B:
outFile.write(...)
and in these lines, I didn't use flush or close method.
Then after these lines, I want to check whether this "outFile" object is empty or not. How can I do with it?
There are a few problems with your code.
You can't .write to a file that you opened with 'r'. You need to open(fileName, 'w').
If A or B then you've certainly written to the file, so it's not empty!
Barring those. you can get the length of a file with
os.stat(outFile.fileno())
EDIT: I'll explain what flush does. Python is often used to do quite large amounts of file reads and writes, which can be slow. It is thus tweaked to make them as fast as possible. One way that is does so is to "buffer" such writes and then do them all in one big block: when you write a small string, Python will remember it but won't actually write it to the file until it thinks it should.
This means that if you want to tell whether you have written data to the file by inspecting the file, you have to tell Python to write all the data it's remembering first, or else you might not see it. flush is the command to write all the buffered data.
Of course, if you ask Python whether it's written anything to the file, say by inspecting the position in the file (.tell()), then it will know about the buffering.
If you've already written to the file, you can use .tell() to check if the current file position is nonzero:
>>> handle = open('/tmp/file.txt', 'w')
>>> handle.write('foo')
>>> handle.tell()
3
This won't work if you .seek() back to the beginning of the file.
You can use os.stat to get file info:
import os
fileSize = os.stat(fileName).st_size
with open("filename.txt", "r+") as f:
if f.read():
# file isn't empty
f.write("something")
# uncomment this line if you want to delete everything else in the file
# f.truncate()
else:
# file is empty
f.write("somethingelse")
"r+" mode always you to read & write.
"with" will automatically close file

Categories

Resources