I open several text file (STL) and run several operation on them using two functions previously defined. Precisely, the function "point_stl" extract the coordinates of the points contained in an STL file while the function "point_cloud" extracts the points from the STL files without repetitions.
with open(folder+"bone_set1.stl", "r") as f1, open(folder+"bone_set2.stl", "r") as f2:
var1 = point_stl(f1,f2)
var2 = point_cloud(f1,f2)
Why does it seem I can't use twice the variables f1 and f2? If I use them in the first function I don't get any results in the second and vice-versa.
Probably you are reading the files to the en inside the point_stl call.
The remedy is to to seek the files back to position 0 before caling point_cloud:
with open(folder+"bone_set1.stl", "r") as f1, open(folder+"bone_set2.stl", "r") as f2:
var1 = point_stl(f1,f2)
f1.seek(0)
f2.seek(0)
var2 = point_cloud(f1,f2)
To understand better: the most common use case of textfiles in Python is to read line after line, operating on the data on each line - that is probably what yoru code does inside those functions. The matter is that a file, once open, holds an internal "pointer" to the position up to it were read, and from which place it will resume reading on the next call. Your first function is likely reading the files to the end, and the pointers are at the end of the file. On calling the second function, tehre is nothing left to be read.
Now, the operating system file access has this "seek" function which allows one to place the file pointer at an arbitrary position - for text files, it mostly makes sense to position it either on the begining, at the end, or at a previously stored position (in another variable). By calling it with "0", and suppressing the second ("whence") parameter , both files are re-winded to the beginning.
instead of passing in a filepointer just use f.read() and pass in file contents
with open(folder+"bone_set1.stl", "r") as f1, open(folder+"bone_set2.stl", "r") as f2:
contentsf1 = f1.read()
contentsf2 = f2.read()
var1 = point_stl(contentsf1,contentsf2)
var2 = point_cloud(contentsf1,contentsf2)
Since you pass directly the file handlers, I assume that each function reads the files content.
Unfortunatly, after the first function has read the files, the read cursors are at the end of the files and the second function has nothing to read between the cursors and the end of the files.
My advice would be to:
first read the files outside the functions (before calling them)
then pass the files content to the function instead of the file handlers.
An other solution would be to set back the read cursors at the files start after the first function call.
Related
I have a list of JSON objects stored as a text file, one JSON object per line (total size is 30 GB), and what I'm trying to do is extract elements from those objects and store them in a new list. Here is my code to do that
print("Extracting fingerprints...")
start = time.time()
for jsonObj in open('ctl_records_sample.jsonlines'):
temp_dict = {}
temp_dict = json.loads(jsonObj)
finger = temp_dict['data']['leaf_cert']['fingerprint']
with open("fingerprints.txt", "w") as f:
f.write(finger+"\n")
finger = ""
end = time.time()
print("Fingerprint extraction finished in" + str(end-start) +"s")
Basically, I'm trying to go line-by-line of the original file and write that line's "fingerprint" to the new text file. However, after letting the code run for several seconds, I open up fingerprints.txt and see that only one fingerprint has been written to the file. Any idea what could be happening?
Your code here is the issue:
with open("fingerprints.txt", "w") as f:
f.write(finger+"\n")
The "w" part will truncate file each time it's opened.
You either want to open the file and keep it open throughout your loop, or check that the file exists and if it does open it with "a" to append.
You're opening the file in each loop iteration, in write mode as per your w parameter passed to the open function. Therefore it's being overwritten from the beginning.
You can solve it for example with two different approaches:
You can move your with statement before the for loop and everything will work, since it will be writing sequentially over the same file (using the same descriptor and pointer into the file).
Open the file in append mode each time, what will append your new written content to the end of the file. To do so, replace your w with an a.
When calling open() with the "w" mode, all the file contents will be deleted. From the Python documentation for the open() function:
'w': open for writing, truncating the file first
I think you are looking to use the "a" mode, which appends new contents to the end of the file:
'a': open for writing, appending to the end of the file if it exists
with open("fingerprints.txt", "a", newline="\n") as f:
f.write(finger)
(You can also drop the +"\n" to the f.write() call by passing the newline="\n" argument to open().)
Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()
I am a newbie to programming and trying to print contents of a file using the following statements but while trying to print the file contents, the output I get is empty space:-
with open('myfile.txt','a+') as myfile:
myfile.write("hello once again 2")
data=myfile.read()
print(data)
The reason for that is a wrong parameter to the open function. Try to replace a+ with r+, and read with readlines
with open('myfile.txt', 'r+') as myfile:
myfile.write("hello once again 2")
data = myfile.readlines() #please notice readlines
print(data)
Here is a reason for that.
When you open a file with 'a+' flag it is opened for reading and writing but the stream is position in the end the file. That why you read 'empty', because there is nothing.
I would advice you to work with file in two steps. First write to it, and then read it.
What write and read do - they write the content into the file but it is not going to be there immediately unless you close the file or call the flush function explicitly. The flush is going to be called in the end of the 'context manager' which is created by with open('myfile.txt', 'r+') as myfile. You can imagine 'context manager' as a wrapper which makes sure that 'flush' is called after you've done writing your code under with statement.
When you write your content your filepointer is at the end of the file.
To read it from the begining you need to reset your pointer.
do myfile.seek(0) before myfile.read()
for more details see: https://docs.python.org/2/tutorial/inputoutput.html
f.tell() returns an integer giving the file object’s current position
in the file, measured in bytes from the beginning of the file. To
change the file object’s position, use f.seek(offset, from_what). The
position is computed from adding offset to a reference point; the
reference point is selected by the from_what argument. A from_what
value of 0 measures from the beginning of the file, 1 uses the current
file position, and 2 uses the end of the file as the reference point.
from_what can be omitted and defaults to 0, using the beginning of the
file as the reference point.
Since the behavior of a+ can vary among operating systems, it is probably best not to use it is you want your code to be portable.
Unless your files are huge (is in a significant fraction of available RAM) I would do the following.
Read your whole file into a list of lines.
with open('myfile.txt') as myfile:
mylines = myfile.readlines()
You can now manipulate mylines as you like. Append, insert, change or delete lines as you wish.
At the end, write it all back.
with open('myfile.txt', 'w') as myfile:
myfile.writelines(mylines)
To the best of my knowledge, this should behave the same on all Python platforms.
I have use the following code to read a .txt file:
f = os.open(os.path.join(self.dirname, self.filename), os.O_RDONLY)
And when I want to output the content I use this:
os.read(f, 10);
Which means that this method reads 10 bytes from the beginning of the file on. While I need to read the content as much as it is, using some values such as -1 and so. What should I do?
You have two options:
Call os.read() repeatedly.
Open the file using the open() built-in (as opposed to os.open()), and just call f.read() with no arguments.
The second approach carries certain risk, in that you might run into memory issues if the file is very large.
Started Python a week ago and I have some questions to ask about reading and writing to the same files. I've gone through some tutorials online but I am still confused about it. I can understand simple read and write files.
openFile = open("filepath", "r")
readFile = openFile.read()
print readFile
openFile = open("filepath", "a")
appendFile = openFile.write("\nTest 123")
openFile.close()
But, if I try the following I get a bunch of unknown text in the text file I am writing to. Can anyone explain why I am getting such errors and why I cannot use the same openFile object the way shown below.
# I get an error when I use the codes below:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
readFile = openFile.read()
print readFile
openFile.close()
I will try to clarify my problems. In the example above, openFile is the object used to open file. I have no problems if I want write to it the first time. If I want to use the same openFile to read files or append something to it. It doesn't happen or an error is given. I have to declare the same/different open file object before I can perform another read/write action to the same file.
#I have no problems if I do this:
openFile = open("filepath", "r+")
writeFile = openFile.write("Test abc")
openFile2 = open("filepath", "r+")
readFile = openFile2.read()
print readFile
openFile.close()
I will be grateful if anyone can tell me what I did wrong here or is it just a Pythong thing. I am using Python 2.7. Thanks!
Updated Response:
This seems like a bug specific to Windows - http://bugs.python.org/issue1521491.
Quoting from the workaround explained at http://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html
the effect of mixing reads with writes on a file open for update is
entirely undefined unless a file-positioning operation occurs between
them (for example, a seek()). I can't guess what
you expect to happen, but seems most likely that what you
intend could be obtained reliably by inserting
fp.seek(fp.tell())
between read() and your write().
My original response demonstrates how reading/writing on the same file opened for appending works. It is apparently not true if you are using Windows.
Original Response:
In 'r+' mode, using write method will write the string object to the file based on where the pointer is. In your case, it will append the string "Test abc" to the start of the file. See an example below:
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\n'
>>> f.write("foooooooooooooo")
>>> f.close()
>>> f=open("a","r+")
>>> f.read()
'Test abc\nfasdfafasdfa\nsdfgsd\nfoooooooooooooo'
The string "foooooooooooooo" got appended at the end of the file since the pointer was already at the end of the file.
Are you on a system that differentiates between binary and text files? You might want to use 'rb+' as a mode in that case.
Append 'b' to the mode to open the file in binary mode, on systems
that differentiate between binary and text files; on systems that
don’t have this distinction, adding the 'b' has no effect.
http://docs.python.org/2/library/functions.html#open
Every open file has an implicit pointer which indicates where data will be read and written. Normally this defaults to the start of the file, but if you use a mode of a (append) then it defaults to the end of the file. It's also worth noting that the w mode will truncate your file (i.e. delete all the contents) even if you add + to the mode.
Whenever you read or write N characters, the read/write pointer will move forward that amount within the file. I find it helps to think of this like an old cassette tape, if you remember those. So, if you executed the following code:
fd = open("testfile.txt", "w+")
fd.write("This is a test file.\n")
fd.close()
fd = open("testfile.txt", "r+")
print fd.read(4)
fd.write(" IS")
fd.close()
... It should end up printing This and then leaving the file content as This IS a test file.. This is because the initial read(4) returns the first 4 characters of the file, because the pointer is at the start of the file. It leaves the pointer at the space character just after This, so the following write(" IS") overwrites the next three characters with a space (the same as is already there) followed by IS, replacing the existing is.
You can use the seek() method of the file to jump to a specific point. After the example above, if you executed the following:
fd = open("testfile.txt", "r+")
fd.seek(10)
fd.write("TEST")
fd.close()
... Then you'll find that the file now contains This IS a TEST file..
All this applies on Unix systems, and you can test those examples to make sure. However, I've had problems mixing read() and write() on Windows systems. For example, when I execute that first example on my Windows machine then it correctly prints This, but when I check the file afterwards the write() has been completely ignored. However, the second example (using seek()) seems to work fine on Windows.
In summary, if you want to read/write from the middle of a file in Windows I'd suggest always using an explicit seek() instead of relying on the position of the read/write pointer. If you're doing only reads or only writes then it's pretty safe.
One final point - if you're specifying paths on Windows as literal strings, remember to escape your backslashes:
fd = open("C:\\Users\\johndoe\\Desktop\\testfile.txt", "r+")
Or you can use raw strings by putting an r at the start:
fd = open(r"C:\Users\johndoe\Desktop\testfile.txt", "r+")
Or the most portable option is to use os.path.join():
fd = open(os.path.join("C:\\", "Users", "johndoe", "Desktop", "testfile.txt"), "r+")
You can find more information about file IO in the official Python docs.
Reading and Writing happens where the current file pointer is and it advances with each read/write.
In your particular case, writing to the openFile, causes the file-pointer to point to the end of file. Trying to read from the end would result EOF.
You need to reset the file pointer, to point to the beginning of the file before through seek(0) before reading from it
You can read, modify and save to the same file in python but you have actually to replace the whole content in file, and to call before updating file content:
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
I needed a function to go through all subdirectories of folder and edit content of the files based on some criteria, if it helps:
new_file_content = ""
for directories, subdirectories, files in os.walk(folder_path):
for file_name in files:
file_path = os.path.join(directories, file_name)
# open file for reading and writing
with io.open(file_path, "r+", encoding="utf-8") as edit_file:
for current_line in edit_file:
if condition in current_line:
# update current line
current_line = current_line.replace('john', 'jack')
new_file_content += current_line
# set the pointer to the beginning of the file in order to rewrite the content
edit_file.seek(0)
# delete actual file content
edit_file.truncate()
# rewrite updated file content
edit_file.write(new_file_content)
# empties new content in order to set for next iteration
new_file_content = ""
edit_file.close()