Terminate Python Program, but Recover Data

Terminate Python Program, but Recover Data - python

I have an inefficient simulation running (it has been running for ~24 hours).
It can be split into 3 independent parts, so I would like to cancel the simulation, and start a more efficient one, but still recover the data that has already been calculated for the first part.
When an error happens in a program, for example, you can still access the data that the script was working with, and examine it to see where things went wrong.
Is there a way to kill the process manually without losing the data?

You could start a debugger such as winpdb, or any of several IDE debuggers, in a separate session, attach to the running process, (this halts it), set a break point in a section of the code that has access to your data, resume until you reach the break point and then save your data to a file, your new process could then load that data as a starting point.

Related

How can I trigger function in django while reading it's output without impacting that functions execution time?

I've been trying to find a solution to my issue for some time now, but I haven't come across anything that seems intuitive enough that it seems like the "right" solution.
I'm building an electron app that uses django as the backend. The backend is responsible for running some long processes that are time critical. For example, I have a loop that continuously takes data for about 5 seconds. When I run that function standalone, it takes a data point about every 10 ms, however, when I run it through django it takes a data point anywhere from 10 ms to 70 ms or even longer. This makes sense to me intuitively because django is sharing thread time to keep responding to the frontend. However, this is unacceptable for my application.
The delays seem to be related to returning data to the frontend. Basically, there's a static variable in the view class that's a container for the result data, then the measurement is triggered from the front end and the measurement populates that static variable with data. When the measurement is running, the front end queries django once a second for updated data so it can plot it for the user to track progress.
I first tried using threading.Thread to create a new thread to run the measurement. This thread gets triggered by the django view, but that doesn't fix the issue. Ok, maybe this makes sense too because the thread is still sharing processing time with the main thread?
The next step seems to be creating an entirely new subprocess. However, I'd still like to be passing data back to the front end while the script runs, so short of dropping a file, I don't know how to do that.
Is there an officially supported way to run a function through django who's execution time won't be impacted by the fact that it's being triggered from django?

Good practice for parallel tasks in python

I have one python script which is generating data and one which is training a neural network with tensorflow and keras on this data. Both need an instance of the neural network.
Since I haven't set the flag "allow growth" each process takes the full GPU memory. Therefore I simply give each process it's own GPU. (Maybe not a good solution for people with only one GPU... yet another unsolved problem)
The actual problem is as follow: Both instances need access to the networks weights file. I recently had a bunch of crashes because both processes tried to access the weights. A flag or something similar should stop each process from accessing it, whilst the other process is accessing. Hopefully this doesn't create a bottle neck.
I tried to come up with a solution like semaphores in C, but today I found this post in stack-exchange.
The idea with renaming seems quite simple and effective to me. Is this good practice in my case? I'll just create the weight file with my own function
self.model.save_weights(filepath='weights.h5$$$')
in the learning process, rename them after saving with
os.rename('weights.h5$$$', 'weights.h5')
and load them in my data generating process with function
self.model.load_weights(filepath='weights.h5')
?
Will this renaming overwrite the old file? And what happens if the other process is currently loading? I would appreciate other ideas how I could multithread / multiprocess my script. Just realized that generating data, learn, generating data,... in a sequential script is not really performant.
EDIT 1: Forgot to mention that the weights are stored in a .h5 file by keras' save function

The multiprocessing module has a RLock class that you can use to regulate access to a sharded resource. This also works for files if you remember to acquire the lock before reading and writing and release it afterwards. Using a lock implies that some of the time one of the processes cannot read or write the file. How much of a problem this is depends on how much both processes have to access the file.
Note that for this to work, one of the scripts has to start the other script as a Process after creating the lock.
If the weights are a Python data structure, you could put that under control of a multiprocessing.Manager. That will manage access to the objects under its control for you. Note that a Manager is not meant for use with files, just in-memory objects.
Additionally on UNIX-like operating systems Python has os.lockf to lock (part of) a file. Note that this is an advisory lock only. That is, if another process calls lockf, the return value indicates that the file is already locked. It does not actually prevent you from reading the file.
Note:
Files can be read and written. Only when two processes are reading the same file (read/read) does this work well. Every other combination (read/write, write/read, write/write) can and eventually will result in undefined behavior and data corruption.
Note2:
Another possible solution involves inter process communication.
Process 1 writes a new h5 file (with a random filename), closes it, and then sends a message (using a Pipe or Queue to Process 2 "I've written a new parameter file \path\to\file".
Process 2 then reads the file and deletes it. This can work both ways but requires that both processes check for and process messages every so often. This prevents file corruption because the writing process only notifies the reading process after it has finished the file.

Python: Threading/multiprocessing with matplotlib and user input

I am currently working on a code that will continuous plot data retrieved via serial communication while also allowing for user input, in the form of raw_input, to control the job, such as starting/stopping/clearing the plot and setting the save file names for the data. Currently, I'm trying to do this by having an extra thread that will just read user input and relay it to the program while it continuously plots and saves the data.
Unfortunately, I have run into some errors where commands that are entered during the plotting loop freeze the program for 2 minutes or so, which I believe has to do with matplotlib not being thread safe, where a command entered while the loop is not working with the plotting libraries will lead to a response in 1-2 seconds.
I have attempted switching from threading to the multiprocessing library to try to alleviate the problem to no avail, where the program will not show a plot, leading me to believe the plotting process never starts (the plotting command is the first command in it). I can post the codes or the relevant parts of either program if necessary.
I wanted to know if there was any way around these issues, or if I should start rethinking how I want to program this. Any suggestions on different ways of incorporating user input are welcome too.
Thanks

If matplotlib is not thread-safe, the right thing to do is to serialize all the inputs into matplotlib through a single event queue. Matplotlib can retrieve items off the queue until the queue is empty, and then process all the new parameters.
Your serial communication code and your raw_input should simply put data on this queue, and not attempt direct communication with matplotlib.
Your matplotlib thread will be doing one of three things: (1) waiting for new data; (2) retrieving new data and processing it (e.g. appending it to the arrays to be plotted or changing output file names) and staying in this state as long as the queue is not empty, or moving on to state (3) if it is; or (3) invoking matplotlib to do the plotting, then looping back to state (1).
If you are implementing multiple action commands from your raw_input, you can add some auxiliary state variables. For example, if 'stop' is read from the queue, then you would set a variable that would cause state (3) to skip the plotting and go straight to state (1), and if 'start' is read from the queue, you would reset this variable, and would resume plotting when data is received.
You might think you want to do something fancy like: "if I see data, wait to make sure more is not coming before I start to plot." This is usually a mistake. You would have to tune your wait time very carefully, and then would still find times when your plotting never happened because of the timing of the input data. If you have received data (you are in state 2), and the queue is empty, just plot! In the time taken to do that, if 4 more data points come in, then you'll plot 4 more next time...

How to pause a python script running in terminal

I have a web crawling python script running in terminal for several hours, which is continuously populating my database. It has several nested for loops. For some reasons I need to restart my computer and continue my script from exactly the place where I left. Is it possible to preserve the pointer state and resume the previously running script in terminal?
I am looking for a solution which will work without altering the python script. Modifying the code is a lower priority as that would mean to relaunch the program and reinvest time.
Update:
Thanks for the VM suggestion. I'll take that. For the sake of completion, what generic modifications should be made to script to make it pause and resumable?
Update2:
Porting on VM works fine. I have also modified script to make it failsafe against network failures. Code written below.

You might try suspending your computer or running in a virtual machine which you can subsequently suspend. But as your script is working with network connections chances are your script won't work from the point you left once you bring up the system. Suspending a computer and restoring it or saving a Virtual M/C and restoring it would mean you need to restablish the network connection. This is true for any elements which are external to your system and network is one of them. And there are high chances that if you are using a dynamic network, the next time you boot chances are you would get a new IP and the network state that you were working previously would be void.
If you are planning to modify the script, few things you need to keep it mind.
Add serializing and Deserializing capabilities. Python has the pickle and the faster cPickle method to do it.
Add Restart points. The best way to do this is to save the state at regular interval and when restarting your script, restart from last saved state after establishing all the transients elements like network.
This would not be an easy task so consider investing a considrable amount of time :-)
Note***
On a second thought. There is one alternative from changing your script. You can try using cloud Virtualization Solutions like Amazon EC2.

I ported my script to VM and launched it from there. However there were network connection glitches after resuming from hibernation. Here's how I solved it by tweaking python script:
import logging
import socket
import time
socket.setdefaulttimeout(30) #set timeout in secs
maxretry = 10 #set max retries
sleeptime_between_retry = 1 #waiting time between retries
erroroccured = 0
while True:
try:
domroot = parse(urllib2.urlopen(myurl)).getroot()
except Exception as e:
erroroccured += 1
if erroroccured>maxretry:
logger.info("Maximum retries reached. Quitting this leg.")
break
time.sleep(sleeptime_between_retry)
logging.info("Network error occurred. Retrying %d time..."%(erroroccured))
continue
finally:
#common code to execute after try or except block, if any
pass
break
This modification made my script temper proof to network failures.

As others have commented, unless you are running your script in a virtual machine that can be suspended, you would need to modify your script to track its state.

Since you're populating a database with your data, I suggest to use it as a way to track the progress of the script (get the latest URL parsed, have a list of pending URLs, etc.).
If the script is terminated abruptly, you don't have to worry about saving its state because the database transactions will come to the rescue and only the data that you've committed will be saved.
When the script is retarted, only the data for the URLs that you completely processed will be stored and you it can resume just picking up the next URL according to the database.

If this problem is important enough to warrant this kind of financial investment, you could run the script on a virtual machine. When you need to shut down, suspend the virtual machine, and then shut down the computer. When you want to start again, start the computer, and then wake up your virtual machine.

WinPDB is a python debugger that supports remote debugging. I never used it, and don't know if remote debugging a running process requires a modification to the script (which is very likely, otherwise it'd be a security issue); but if remote debugging without modifying the script is possible then you may be able to dump the current state of the script to a file and figure out later how to load it. I don't think it would work though.

Python: Script works, but seems to deadlock after some time

I have the following script, which is working for the most part Link to PasteBin The script's job is to start a number of threads which in turn each start a subprocess with Popen. The output from each subprocess is as follows:
1
2
3
.
.
.
n
Done
Bascially the subprocess is transferring 10M records from tables in one database to different tables in another db with a lot of data massaging/manipulation in between because of the different schemas. If the subprocess fails at any time in it's execution (bad records, duplicate primary keys, etc), or it completes successfully, it will output "Done\n". If there are no more records to select against for transfer then it will output "NO DATA\n"
My intent was to create my script "tableTransfer.py" which would spawn a number of these processes, read their output, and in turn output information such as number of updates completed, time remaining, time elapsed, and number of transfers per second.
I started running the process last night and checked in this morning to see it had deadlocked. There were not subprocceses running, there are still records to be updated, and the script had not exited. It was simply sitting there, no longer outputting the current information because no subprocces were running to update the total number complete which is what controls updates to the output. This is running on OS X.
I am looking for three things:
I would like to get rid of the possibility of this deadlock occurring so I don't need to check in on it as frequently. Is there some issue with locking?
Am I doing this in a bad way (gThreading variable to control looping of spawning additional thread... etc.) I would appreciate some suggestions for improving my overall methodology.
How should I handle ctrl-c exit? Right now I need to kill the process, but assume I should be able to use the signal module or other to catch the signal and kill the threads, is that right?
I am not sure whether I should be pasting my entire script here, since I usually just paste snippets. Let me know if I should paste it here as well.

You have a few places in your script where you return without releasing your locks. This could cause a problem - lines: 97 and 99 - this is where try: finally: blocks can help you a lot as you can then ensure that the release is called properly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.