I have a script I'm running a bunch of times that generates and logs data in json files. These take days to run and I need to run several dozen test cases. I log progress in json files for post-processing. I'd like to check in occasionally to see how long it has left. This is all single thread, but I've dealt with multiprocessing enough to be scared of opening the file while it's being written for fear that viewing it will place a temporary lock on the file.
Is it safe to view the json in a linux terminal using nano log_file.json while my Python scripts are running and could attempt to write to the log at any time?
If it is not safe, are there any alternatives?
I'm worried if Python tries to record an entry that it could be lost or throw an error while I'm viewing progress. Viewing only, no saving obviously. I'd love to check in on progress to switch between test cases faster, but I really don't want to raise an error that loses days of progress if it's unable to write to the json.
Sorry if this is a duplicate, I tried searching but I'm not sure what to even search for this question.
You can use tail command on terminal to view the logs. Following is the full command:-
tail -F <path_to_file>
It will show some of the last lines of the file and continue to show if data is being written in the file.
Related
I'm looking for a solution to rescue data from RAM.
My program terminated with an error and the data should still be in the memory.
Can I access it to save it somehow?
I'm working with python an a Raspberry Pi 3. My program scrapes data from the web and stores it in a csv-file. All data is scraped, but before writing it the program crashed. Executing the program again is not an option.
I ran the programm by calling it from the console, an error appeared and the console is waiting for my next input:
pi#raspberrypi: python3 program.py
"Error-message"
pi#raspberrypi:
Inside program.py my data was stored in a list 'data_list'.
How can I retrieve this list back?
Editing:
Executing the program again is not an option, because it took ca. 12h to complete. The scraped data would be used to make an educated guess for the runtime of a second program. By the time the scraping would have finished this guess is irrelevant.
In theory you could start reading memory addresses until you finally see something that looks like a CSV string. But this data would most likely be fragmented.
You could not do that in python, you'd need C or C++ and that'd take time to write.
In practice, by the time I'm posting this answer there is a very high chance that the pages your program used have been overridden by something else. Also due to process isolation you might not even be able to read all the memory.
I have a python script to process a bunch of log files. It opens up each log and parse each line, and stores the information in data structures. After scanning through the file, it collects the statistics and generate an output file.
I'm testing it against various logs, and it went ok for 20+ different files. But recently, I found it consistently fails on a particular file with 4381k lines. It somehow cannot complete the scanning and parsing.
I'm in the process of narrow down the problem right now, I don't know exactly what was happening there. I'd really need some input in order to find the right direction. Thanks in advance!
I'm a week into learning Python and am trying to write a piece of code that allows me to run a text-based Perl script in LXTerminal automatically. I have a couple of questions regarding some specifics.
I need my code to start the Perl script with a user-inputted environment file, enter a few specific settings into the Perl script, and then read in many .txt files, one at a time, into the Perl script. It also needs to restart the process for every single .txt file and capture each individual output (it would help if every output could be written to a single .csv file).
To call the Perl script, I'm starting with the following:
alphamelts="/home/melts/Desktop/alphamelts"
pipe=subprocess.Popen(["perl", "/home/Desktop/melts/alphaMELTS", "run_alphamelts.command -f %s"]) % raw_input("Enter an environment file:"), stdout=PIPE
Assuming that's correct, I now need it to read in a .txt file, enter number-based commands, have my code wait for the Perl script to finish its calculations, and I need it to write the output to a .csv file. If it helps, the Perl script I'm running automatically generates a space delimited file containing the results of its calculations once the program exists, but it would be super helpful if only a few of its outputs were written onto a single seperate .csv file for each .txt file processed.
No idea where to go from here but I absolutely have to get this working. Sorry for the complexity.
Thank you!
you can do some really cool stuff in ipython. Check out this notebook for some specific examples. As far as waiting for a subprocess to finish, I think you need to put a pause in your script. Also, for data handling and export to csv and excel, I'd recommend pandas
Just something to get you started.
I have written a python script that is designed to run forever. I load the script into a folder that I made on my remote server which is running debian wheezy 7.0. The code runs , but it will only run for 3 to 4 hours then it just stops, I do not have any log information on it stopping.I come back and check the running process and its not there. Is this a problem in where I am running the python file from? The script simply has a while loop and writes to an external csv file. The file runs from /var/pythonscript. The folder is a custom folder that I made. There is not error that I receive and the only way I know how long the code runs is by the time stamp on the csv file. I run the .py file by ssh to the server and sudo python scriptname.I also would like to know the best place in the linux debian directory to run python files from and limitations concerning that. Any help would be much appreciated.
Basically you're stuffed.
Your problem is:
You have a script, which produces no error messages, no logging, and no other diagnostic information other than a single timestamp, on an output file.
Something has gone wrong.
In this case, you have no means of finding out what the issue was. I suggest any of the following:
either adding logging or diagnostic information to the script.
Contacting the developer of the script and getting them to find a way of determining the issue.
Delete the evidently worthless script if you can't do either option 1, or 2, above, and consider an alternative way of doing your task.
Now, if the script does have logging, or other diagnostic data, but you delete or throw them away, then that's your problem and you need to stop discarding this useful information.
EDIT (following comment).
At a basic level, you should print to either stdout, or to stderr, that alone will give you a huge amount of information. Just things like, "Discovered 314 records, we need to save 240 records", "Opened file name X.csv, Open file succeeded (or failed, as the case may be)", "Error: whatever", "Saved 2315 records to CSV". You should be able to determine if those numbers make sense. (There were 314 records, but it determined 240 of them should be saved, yet it saved 2315? What went wrong!? Time for more logging or investigation!)
Ideally, though, you should take a look at the logging module in python as that will let you log stack traces effectively, show line numbers, the function you're logging in, and the like. Using the logging module allows you to specify logging levels (eg, DEBUG, INFO, WARN, ERROR), and to filter them or redirect them to file or the console, as you may choose, without changing the logging statements themselves.
When you have a problem (crash, or whatever), you'll be able to identify roughly where the error occured, giving you information to either increase the logging in that area, or to be able to reason what must have happened (though you should probably then add enough logging so that the logging will tell you what happened clearly and unambiguously).
I need to run the python program in the backend. To the script I have given one input file and the code is processing that file and creating new output file. Now if I change the input file content I don't want to run the code again. It should run in the back end continously and generate the output file. Please if someone knows the answer for this let me know.
thank you
Basically, you have to set up a so-called FileWatcher, i.e. some mechanism which looks out for changes in a file.
There are several techniques for watching file/directory changes in python. Have a look at this question: Monitoring contents of files/directories?. Another link is here, this is about directory changes but file changes are handled in a similar way. You could also google for "watch file changes python" in order to get a lot of answers :)
Note: If you're programming in windows, you should probably implement your program as windows service, look here for how to do that.