I would like to track the changes of a file being appended by another program.
My plan of approach is this. I read the file's contents first. At a later time, the file contents would be appended from another application. I want to read the appended data only rather than re-reading everything from the top. In order to do this, I'm going to check the file's modification time. I would then seek() to the previous size of the file, and start reading from there.
Is this a proper approach? Or there is a known idiom for this?
Well, you have to make quite some assumptions about both the other program writing to file as well as the file system, but in generally it should work. Personally I would rather write the current seek position or line number (if reading simple text files) to another file and check it from there. This will also allow you to revert back in the file if some part is rewritten and the file size stays the same (or even gets smaller).
If you have some very important/unique data, besides making backups you should maybe think about appending the new data to new file and later rejoining the files (if needed) when you have checked that the data is fine in your other program. This way you could just read any new file as a whole after certain time. (Also remember that in a larger picture, system time and creation/modification times are not 100% trustworthy).
Related
This is mostly a sanity check, but I'm writing a python program to run in the background on boot and control two motors and two heaters. It determines what to do by checking a settings file every second (using Asyncio). A second program can be run by the user to modify the pickled settings file.
If this were to run for a long period of time (12+ hours), is this the best way to do it? I'm well versed in general coding principles, but not specifically Python.
It's okay for that program that reads the file, but if multiple programs can edit the same file, You might encounter some issues and they might corrupt the file...
Say both program_1 and program_2 can edit the same file. The problem is that you wouldn't be editing the file directly like its a global variable. You will be reading it into some variables, making changes in the variables, and then overwriting the file with the new settings.
Now consider the following scenario:
program_1 Reads the file.
program_2 Reads the file.
program_1 makes some changes to some data.
program_2 makes some changes to other data.
program_1 rewrites the file with the new content.
program_2 rewrites the file with the new content.
in the above scenario, the changes made by program_1 was accidentally removed by program_2 because they both attempted to make changes at the same time.
Simple solution
Make sure each program locks the file before starting to read and edit. And wait for it to be unlocked if it was already locked by the other program.
It depends on how the second program is writing the file, but this is pretty risky. The first program could try to read the file while the second program is in the middle of writing, and get a truncated file. Or the second program could modify the file while the first program is in the middle of reading, and the first program will get half old data and half new data.
If you want to update a file atomically on Unix, the second program should write to a temporary file, and then rename the temporary file to the original file. Then, the first program will always see a complete stable file.
If your config file is small, you can probably get away with writing the file directly, at least most of the time, but then you'll hit a weird non-reproducible bug every so often.
See this question for more information on atomically updating files.
two parts:
Using python, I want to write everything from stdout to a log file. I found a good solution for this part, but,
I'm also looking for a way to delete the file when it gets too big and start a new file. Usually the info going to the display is not useful so I don't care about it, but when an error occurs, I'd like to have captured it to a file to see what lead up to it.
I could write a function that checks the file size and starts a new file when the current file reaches a particular size, but I'm wondering if something exists to do this.
I would like my test program to run for weeks, constantly outputting to the display and a text file, but creating a new text file when the current one gets too big, so basically a circular buffer that wraps around.
I have achieved writing all the things I needed to the text file, but essentially the program needs to keep going back to the text file and saving only the changes. At the moment it overwrites the entire file, deleting all the previous information.
There is typical confusion about how are text files organized.
Text files are not organized by lines, but by bytes
When one looks to a text file, it looks like lines.
It is natural to expect, that on disk it goes the same way, but this is not true.
Text file are written to disk byte by byte, often one character being represented by one byte (but
in some cases more bytes). A line of text happens to be just a sequence of bytes, being terminated
by some sort of new lines ("\n", "\n\r" or whatever is used for new line).
If we want to change 2nd line out of 3, we would have to fit the change just in the bytes, used for
2nd line, not to mess up with line 3. If we would write too many bytes for line 2, we would
overwrite bytes of line 3. If we would write too few bytes, there would be stil present some (alredy
obsolete) bytes from remainder of line 2.
Strategies to modify content of text file
Republisher - Read it, modify in memory, write all content completely back
This might first sound like vasting a lot of effort, but it is by far the most often used approach
and is in 99% most effective one.
The beauty is, it is simple.
The fact is, for most files sizes it is fast enouhg.
Journal - append changes to the end
Rather rare approach is to write first version of the file to the disk and later on append to the
end notes about what has changed.
Reading such a file means, it has to rerun all the history of changes from journal to find out final
content of the file.
Surgeon - change only affected lines
In case you keep lines of fixed length (measured in bytes!! not in characters), you might point to
modified line and rewrite just that line.
This is quite difficult to do easily and is used rather with binary files. This is definitely not
the task for beginers.
Conclusions
Go for "Republisher" pattern.
Use whatever format fits your needs (INI, CSV, JSON, XML, YAML...).
Personally I prefer saving data to JSON format - json package is part of Python stdlib and it
supports lists as well dictionaries, what allows saving tabular as well as tree like structures.
Are the changes you are making going to be over several different runs of a program? If not, I suggest making all of your changes to the data while it is still in memory and then writing it out just before program termination.
You can open it as follows:
FileOpen = open("test.txt","a")
I'm writing a program that extracts and adds files to the xbox 360's STFS files. The STFS structure is a mini file system, it has hashtables, a file table, etc.
Extracting the files is simple enough. I have the starting block of the file and the amount of blocks in the file, so I just need to find the block offsets, read the block lengths, and send that out as the file. What happens, though, when I need to replace or remove a file? I've read that on Windows and computers in general, files aren't actually deleted, they're just removed from the file table and are overwritten when something else needs the space. When I'm writing a file, then, how do I find an unused sequence of blocks large enough to hold it? The blocks are 0x1000 bytes in length and fill remaining space with empty bytes, so everything evens out nicely, but I can't think of an efficient way to find an unused range of blocks that will fit the file I want to add.
My current plan is to rewrite everything on removing or adding a file so that I don't have large amounts of unused space that I'm unable to figure out how to overwrite. Is there a good introduction to file systems like NTFS or FAT32 that I could read that won't take days to understand and will contain the necessary information to write a basic file manager?
reference to structure: http://free60.org/STFS
edit: on Second though, I would create a list of ranges for each file in the table. That is, the start offset and end offset based on size. When looking for an open range to insert a file, I would start at 0 and check if the end start or end of each range is inside the range needed by the file to be inserted. If either the start or end is inside the range, I would move on to the end of the other file's end offset. This is better than my initial idea, but still seems inefficient. I would have to make multiple comparisons for every file in the file table.
Question: How do you write data to an already existing file at the beginning of the file with out writing over what's already there and with out reading the entire file into memory? (e.g. prepend)
Info:
I'm working on a project right now where the program frequently dumps data into a file. this file will very quickly balloon up to 3-4gb. I'm running this simulation on a computer with only 768mb of ram. pulling all that data to the ram over and over will be a great pain and a huge waste of time. The simulation already takes long enough to run as it is.
The file is structured such that the number of dumps it makes is listed at the beginning with just a simple value, like 6. each time the program makes a new dump I want that to be incremented, so now it's 7. the problem lies with the 10th, 100th, 1000th, and so dump. the program will enter the 10 just fine, but remove the first letter of the next line:
"9\n580,2995,2083,028\n..."
"10\n80,2995,2083,028\n..."
obviously, the difference between 580 and 80 in this case is significant. I can't lose these values. so i need a way to add a little space in there so that I can add in this new data without losing my data or having to pull the entire file up and then rewrite it.
Basically what I'm looking for is a kind of prepend function. something to add data to the beginning of a file instead of the end.
Programmed in Python
~n
See the answers to this question:
How do I modify a text file in Python?
Summary: you can't do it without reading the file in (this is due to how the operating system works, rather than a Python limitation)
It's not addressing your original question, but here are some possible workarounds:
Use SQLite (it's bundled with your Python)
Use a fancier database, either RDBMS or NoSQL
Just track the number of dumps in a different text file
The first couple of options are a little more work up front, but provide more flexibility. The last option is the easiest solution to your current problem.
You could quite easily create an new file, output the data you wish to prepend to that file and then copy the content of the existing file and append it to the new one, then rename.
This would prevent having to read the whole file if that is the primary issue.