I would like to show the progress of processing a csv file.
I've searched and found this:
Tracking file load progress in Python
But this it will make my life a bit harder, because I'll need to process the bytes read.
Another approach is to count the lines but I wouldn't like to read the number of lines before start to process.
My idea is to get the file size(OS), and as I'm processing the file I get the bytes processed (should be the fastest approach).
Any other solution to show the progress?
I found file.tell() but I haven't used it. It should give the position in the file.
You could ball-park it, right? The csv is just a text file, and you can grab the file size from the os module. Then, from the first line you read in, you can calculate the size of each line, and estimate the total lines in the file.
Clicking through your link, though, it appears that this is exactly the same suggestion :)
Related
I am trying to gather data from my load cell that is hooked up to a raspberry pi 3B. I have the script reading data but am trying to figure out how to export that data into a .txt file.
I have seen some info on how to create .txt files with text but not much for integers. I need something more fit to my application.
The script takes x (not sure on the exact value) samples of data per second so the amount of data can vary depending on how long I run the script. I want a text file to be created once I stop the script and record the data points on separate lines like in the attached image. I found a way to do it with words/letters but integers wouldn't work.
Let me know if there is anything I can share to help find a solution easier. I appreciate all the input.
In Python, you can use "open" with the "w" mode to create a new file:
file = open("load_cell_data.txt", "w")
Pass the data in with file.write and then close the file with file.close.
Docs: https://docs.python.org/3/library/functions.html#open
I'm trying to iterate over the lines of a csv, for each line, I want to do a bunch of work, save that line in a destination csv and remove it from the original csv, saving both origin and destination csv files at every line (save state in case of a crash). Is there an elegant way of doing this that doesn't involve opening and closing the file at every point?
To write to a file immediately, open it without buffering:
with open("test.csv", "w", buffering=0) as my_file:
...
This makes sense for the output file; repeatedly deleting the first line of the input is another matter. The only way to do that is to write out the entire remainder of the file. Over and over (google "quadratic complexity"). Which will definitely have a performance impact, and rather increases rather than reduces the chance that something will go wrong.
I strongly recommend leaving the input file alone, and finding another way to keep track of how much has been processed. (E.g. write out somewhere else the number of lines that have been processed, and adapt your code to skip this many lines.)
PS. If you wanted to get cute you could process the input file from the end (last row first), and use truncate to delete each processed line without rewriting what comes before. But that's tricky to get right, and really it's not a good fit for your goal of simply tracking how far you have gotten with processing.
two parts:
Using python, I want to write everything from stdout to a log file. I found a good solution for this part, but,
I'm also looking for a way to delete the file when it gets too big and start a new file. Usually the info going to the display is not useful so I don't care about it, but when an error occurs, I'd like to have captured it to a file to see what lead up to it.
I could write a function that checks the file size and starts a new file when the current file reaches a particular size, but I'm wondering if something exists to do this.
I would like my test program to run for weeks, constantly outputting to the display and a text file, but creating a new text file when the current one gets too big, so basically a circular buffer that wraps around.
I would like to track the changes of a file being appended by another program.
My plan of approach is this. I read the file's contents first. At a later time, the file contents would be appended from another application. I want to read the appended data only rather than re-reading everything from the top. In order to do this, I'm going to check the file's modification time. I would then seek() to the previous size of the file, and start reading from there.
Is this a proper approach? Or there is a known idiom for this?
Well, you have to make quite some assumptions about both the other program writing to file as well as the file system, but in generally it should work. Personally I would rather write the current seek position or line number (if reading simple text files) to another file and check it from there. This will also allow you to revert back in the file if some part is rewritten and the file size stays the same (or even gets smaller).
If you have some very important/unique data, besides making backups you should maybe think about appending the new data to new file and later rejoining the files (if needed) when you have checked that the data is fine in your other program. This way you could just read any new file as a whole after certain time. (Also remember that in a larger picture, system time and creation/modification times are not 100% trustworthy).
I have achieved writing all the things I needed to the text file, but essentially the program needs to keep going back to the text file and saving only the changes. At the moment it overwrites the entire file, deleting all the previous information.
There is typical confusion about how are text files organized.
Text files are not organized by lines, but by bytes
When one looks to a text file, it looks like lines.
It is natural to expect, that on disk it goes the same way, but this is not true.
Text file are written to disk byte by byte, often one character being represented by one byte (but
in some cases more bytes). A line of text happens to be just a sequence of bytes, being terminated
by some sort of new lines ("\n", "\n\r" or whatever is used for new line).
If we want to change 2nd line out of 3, we would have to fit the change just in the bytes, used for
2nd line, not to mess up with line 3. If we would write too many bytes for line 2, we would
overwrite bytes of line 3. If we would write too few bytes, there would be stil present some (alredy
obsolete) bytes from remainder of line 2.
Strategies to modify content of text file
Republisher - Read it, modify in memory, write all content completely back
This might first sound like vasting a lot of effort, but it is by far the most often used approach
and is in 99% most effective one.
The beauty is, it is simple.
The fact is, for most files sizes it is fast enouhg.
Journal - append changes to the end
Rather rare approach is to write first version of the file to the disk and later on append to the
end notes about what has changed.
Reading such a file means, it has to rerun all the history of changes from journal to find out final
content of the file.
Surgeon - change only affected lines
In case you keep lines of fixed length (measured in bytes!! not in characters), you might point to
modified line and rewrite just that line.
This is quite difficult to do easily and is used rather with binary files. This is definitely not
the task for beginers.
Conclusions
Go for "Republisher" pattern.
Use whatever format fits your needs (INI, CSV, JSON, XML, YAML...).
Personally I prefer saving data to JSON format - json package is part of Python stdlib and it
supports lists as well dictionaries, what allows saving tabular as well as tree like structures.
Are the changes you are making going to be over several different runs of a program? If not, I suggest making all of your changes to the data while it is still in memory and then writing it out just before program termination.
You can open it as follows:
FileOpen = open("test.txt","a")