Write to a file into an exact position - python

My primary goal is to write to one file (e.g. file.txt) in many parallel flows, each flow should start from defined offset of a file.
Example:
script 1 - writes 10 chars from position 0
script 2 - writes 10 chars from position 10
script 3 - writes 10 chars from position 20
I didn't even get to parallelism cause I got stuck on writing to different offsets of a file.
I have created a simple script to check my idea:
file = open("sample_file.txt", "w")
file.seek(100)
file.write("new line")
file.close()
Ok, so the file was created, offset was moved to 100 and sentence 'new line' was added. Success.
But then I wanted to open the same file and add something with offsett 10:
file = open("sample_file.txt", "w")
file.seek(100)
file.write("new line")
file.close()
file = open("sample_file.txt", "a")
file.seek(10)
file.write("second line")
file.close()
And the sentence 'second line' is added but at the end of the file.
I'm sure it is possible to add chars somewhere in the middle of a file.
Can anyone help with this simple one?
Or maybe someone has an idea how to do it in parallel?
Pawel

As this post suggests, opening a file in 'a' mode will:
Open for writing. The file is created if it does not exist. The
stream is positioned at the end of the file. Subsequent writes
to the file will always end up at the then current end of file,
irrespective of any intervening fseek(3) or similar.
On the other hand, the mode 'r+' will let you:
Open for reading and writing. The stream is positioned at the
beginning of the file.
And though not mentioned explicitly, this will let you seek the file and write at different positions
Anyway if you are going to do this in parallel, you will have to control the resources. You don't want 2 processes writing to the file at the same time. Regarding that issue, see this SO question.

Related

How to start reading a file from a particular line in the case of a huge text file as I cannot iterate from line one

This is an issue of trying to reach to the line to start from and proceed from there in the shortest time possible.
I have a huge text file that I'm reading and performing operations line after line. I am currently keeping track of the line number that i have parsed so that in case of any system crash I know how much I'm done with.
How do I restart reading a file from the point if I don't want to start over from the beginning again.
count = 0
all_parsed = os.listdir("urltextdir/")
with open(filename,"r") as readfile :
for eachurl in readfile:
if str(count)+".txt" not in all_parsed:
urltext = getURLText(eachurl)
with open("urltextdir/"+str(count)+".txt","w") as writefile:
writefile.write(urltext)
result = processUrlText(urltext)
saveinDB(result)
This is what I'm currently doing, but when it crashes at a million lines, I'm having to through all these lines in the file to reach the point I want to start from, my Other alternative is to use readlines and load the entire file in memory.
Is there an alternative that I can consider.
Unfortunately line number isn't really a basic position for file objects, and the special seeking/telling functions are ruined by next, which is called in your loop. You can't jump to a line, but you can to a byte position. So one way would be:
line = readfile.readline()
while line:
line = readfile.readline(): #Must use `readline`!
lastell = readfile.tell()
print(lastell) #This is the location of the imaginary cursor in the file after reading the line
print(line) #Do with line what you would normally do
print(line) #Last line skipped by loop
Now you can easily jump back with
readfile.seek(lastell) #You need to keep the last lastell)
You would need to keep saving lastell to a file or printing it so on restart you know which byte you're starting at.
Unfortunately you can't use the written file for this, as any modification to the character amount will ruin a count based on this.
Here is one full implementation. Create a file called tell and put 0 inside of it, and then you can run:
with open('tell','r+') as tfd:
with open('abcdefg') as fd:
fd.seek(int(tfd.readline())) #Get last position
line = fd.readline() #Init loop
while line:
print(line.strip(),fd.tell()) #Action on line
tfd.seek(0) #Clear and
tfd.write(str(fd.tell())) #write new position only if successful
line = fd.readline() #Advance loop
print(line) #Last line will be skipped by loop
You can check if such a file exists and create it in the program of course.
As #Edwin pointed out in the comments, you may want to fd.flush() and os.fsync(fd.fileno) (import os if that isn't clear) to make sure after every write you file contents are actually on disk - this would apply to both write operations you are doing, the tell the quicker of the two of course. This may slow things down considerably for you, so if you are satisfied with the synchronicity as is, do not use that, or only flush the tfd. You can also specify the buffer when calling open size so Python automatically flushes faster, as detailed in https://stackoverflow.com/a/3168436/6881240.
If I got it right,
You could make a simple log file to store the count in.
but still would would recommand to use many files or store every line or paragraph in a database le sql or mongoDB
I guess it depends on what system your script is running on, and what resources (such as memory) you have available.
But with the popular saying "memory is cheap", you can simply read the file into memory.
As a test, I created a file with 2 million lines, each line 1024 characters long with the following code:
ms = 'a' * 1024
with open('c:\\test\\2G.txt', 'w') as out:
for _ in range(0, 2000000):
out.write(ms+'\n')
This resulted in a 2 GB file on disk.
I then read the file into a list in memory, like so:
my_file_as_list = [a for a in open('c:\\test\\2G.txt', 'r').readlines()]
I checked the python process, and it used a little over 2 GB in memory (on a 32 GB system)
Access to the data was very fast, and can be done by list slicing methods.
You need to keep track of the index of the list, when your system crashes, you can start from that index again.
But more important... if your system is "crashing" then you need to find out why it is crashing... surely a couple of million lines of data is not a reason to crash anymore these days...

writing output for python not functioning

I am attempting to output a new txt file but it come up blank. I am doing this
my_file = open("something.txt","w")
#and then
my_file.write("hello")
Right after this line it just says 5 and then no text comes up in the file
What am I doing wrong?
You must close the file before the write is flushed. If I open an interpreter and then enter:
my_file = open('something.txt', 'w')
my_file.write('hello')
and then open the file in a text program, there is no text.
If I then issue:
my_file.close()
Voila! Text!
If you just want to flush once and keep writing, you can do that too:
my_file.flush()
my_file.write('\nhello again') # file still says 'hello'
my_file.flush() # now it says 'hello again' on the next line
By the way, if you happen to read the beautiful, wonderful documentation for file.write, which is only 2 lines long, you would have your answer (emphasis mine):
Write a string to the file. There is no return value. Due to buffering, the string may not actually show up in the file until the flush() or close() method is called.
If you don't want to care about closing file, use with:
with open("something.txt","w") as f:
f.write('hello')
Then python will take care of closing the file for you automatically.
As Two-Bit Alchemist pointed out, the file has to be closed. The python file writer uses a buffer (BufferedIOBase I think), meaning it collects a certain number of bytes before writing them to disk in bulk. This is done to save overhead when a lot of write operations are performed on a single file.
Also: When working with files, try using a with-environment to make sure your file is closed after you are done writing/reading:
with open("somefile.txt", "w") as myfile:
myfile.write("42")
# when you reach this point, i.e. leave the with-environment,
# the file is closed automatically.
The python file writer uses a buffer (BufferedIOBase I think), meaning
it collects a certain number of bytes before writing them to disk in
bulk. This is done to save overhead when a lot of write operations are
performed on a single file. Ref #m00am
Your code is also okk. Just add a statement for close file, then work correctly.
my_file = open("fin.txt","w")
#and then
my_file.write("hello")
my_file.close()

delete or erase a portion of an opened file with python

is anyone could help me in finding a function that deletes just a portion from an opened file starting from its beginning. In other words, the program will open a file and read for example the first 100 bytes. Is there a built-in function on python or a way that helps me deleting just those first 100 bytes before closing the file (the file will be shifted to the right by 100 bytes). (FYI: truncate() does not help since it deletes the contents of a file starting from the current cursor position, I would like exactly the inverse-delete the content from beginning till the current cursor position and leave the rest.). Thank you
Is this something you want to do efficiently for large files, or just something you want to do in general?
It's pretty easy to do by reading in the file, and then writing it out:
import os
dat = open(filename, 'rb').read()
open(filename+'_temp', 'wb').write( dat[100:] )
os.rename(filename+'_temp',filename)
Note that this operates "safely" by first creating the new file, then moving it into place. If there is a failure anywhere, the old file will not be clobbered.

How to read from file opened in "a+" mode?

By definition, "a+" mode opens the file for both appending and reading. Appending works, but what is the method for reading? I did some searches, but couldn't find it clarified anywhere.
f=open("myfile.txt","a+")
print (f.read())
Tried this, it prints blank.
Use f.seek() to set the file offset to the beginning of the file.
Note: Before Python 2.7, there was a bug that would cause some operating systems to not have the file position always point to the end of the file. This could cause some users to have your original code work. For example, on CentOS 6 your code would have worked as you wanted, but not as it should.
f = open("myfile.txt","a+")
f.seek(0)
print f.read()
when you open the file using f=open(myfile.txt,"a+"), the file can be both read and written to.
By default the file handle points to the start of the file,
this can be determined by f.tell() which will be 0L.
In [76]: f=open("myfile.txt","a+")
In [77]: f.tell()
Out[77]: 0L
In [78]: f.read()
Out[78]: '1,2\n3,4\n'
However, f.write will take care of moving the pointer to the last line before writing.
There are still quirks in newer version of Python dependant on OS and they are due to differences in implementation of the fopen() function in stdio.
Linux's man fopen:
a+ - Open for reading and appending (writing at end of file). The file is created if it does not exist. The initial file position for reading is at the beginning of the file, but output is always appended to the end of the file.
OS X:
``a+'' - Open for reading and writing. The file is created if it does not exist. The stream is positioned at the end of the file. Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.
MSDN doesn't really state where the pointer is initially set, just that it moves to the end on writes.
When a file is opened with the "a" or "a+" access type, all write operations occur at the end of the file. The file pointer can be repositioned using fseek or rewind, but is always moved back to the end of the file before any write operation is carried out. Thus, existing data cannot be overwritten.
Replicating the differences on various systems with both Python 2.7.x and 3k are pretty straightforward with .open .tell
When dealing with anything through the OS, it's safer to take precautions like using an explicit .seek(0).
MODES
r+ read and write Starts at the beginning of the file
r read only Starts at the beginning of the file
a+ Read/Append. Preserves file content by writing to the end of the file
Good Luck!
Isabel Ruiz

Simple way to add text at the beginning of a script (file) in Python

I am a Python beginner and my next project is a program in which you enter the details of your program and then select the file (I'm using Tkinter), and then the program will format the details and write them to the start of the file.
I know that you'd have to 'rewrite' it and that a tmp file is probably in hand. I just want to know simple ways that one could achieve adding text to the beginning of a file.
Thanks.
To add text to the beginning of a file, you can (1) open the file for reading, (2) read the file, (3) open the file for writing and overwrite it with (your text + the original file text).
formatted_text_to_add = 'Sample text'
with open('userfile', 'rb') as filename:
filetext = filename.read()
newfiletext = formatted_text_to_add + '/n' + filetext
with open('userfile', 'wb') as filename:
filename.write(newfiletext)
This requires two I/O operations and I'm tempted to look for a way to do it in one pass. However, prior answers to similar questions suggest that trying to write to the beginning or middle of a file in Python gets complicated quite quickly unless you bite the bullet and overwrite the original file with the new text.
If I understand what you're asking, I believe you're looking for what's called a project skeleton. This link handles it pretty well.
This probably won't solve your exact problem, as you will need to know in advance the exact number of bytes you'll be adding to the beginning of the file.
# Put some text in the file
f = open("tmp.txt", "w")
print >>f, "123456789"
f.close()
# Open the file in read/write mode
f = open("tmp.txt", "r+")
f.seek(0) # reposition the file pointer to the beginning of the file
f.write('abc') # use write to avoid writing new lines
f.close()
When you reposition the file pointer using seek, you can overwrite the bytes that are already stored at that position. You can't, however, "insert" text, pushing existing bytes ahead to make room for new data. When I said you would need to know the exact number of bytes,
I meant you would have to "leave room" for the text at the beginning of the file. Something like:
f = open("tmp.txt", "w")
f.write("\0\0\0456789")
f.close()
# Some time later...
f = open("tmp.txt", "r+")
f.seek(0)
f.write('123')
f.close()
For text files, this can work if you leave a "blank" line of, say, 50 spaces at the beginning of the file. Later, you can go back and overwrite up to 50 bytes (the newline being byte 51)
without overwriting following lines. Of course, you can leave multiple lines at the beginning. The point is that you can't grow or shrink your reserved block of lines to be overwritten. There's nothing special about the newline in a file, other than that it is treated specially by file methods like read and readline for splitting blocks of data into separate strings.
To add one of more lines of text to the beginning of a file, without overwriting what's already present, you'll have to use the "read the old file, write to a new file" solution outlined in other answers.

Categories

Resources