Continue stopped run in MLflow

Continue stopped run in MLflow - python

We run our experiment on AWS spot instances. Sometimes the experiments are stopped, and we would prefer to continue logging to the same run. How can you set the run-id of the active run?
Something like this pseudocode (not working):
if new:
mlflow.start_run(experiment_id=1, run_name=x)
else:
mlflow.set_run(run_id)

You can pass the run_id directly to start_run:
mlflow.start_run(experiment_id=1,
run_name=x,
run_id=<run_id_of_interrupted_run> # pass None to start a new run
)
Of course, you have to store the run_id for this. You can get it with run.info.run_id

Related

How to continuously run a python script until an output is printed?

I am trying to get a python program to continuously run until a certain aws log is registered and printed. It is supposed to:
Run indefinitely even if no events happen
List whatever events occurs (in my case a log stating that the task is finished)
Stop running
the python command looks like this: python3 watch_logs.py <log-source> --start=15:00:00
The logs are working fine, and the python script can print them out between certain time frames as long as they already exist. The program works by taking a continuously running task which prints events to the log file, and the python script should filter out events I am looking for and print them.
But, when I run the script, it wont print the event even if I can see the log entry appear in the file. If i kill the process and run it again using the same timestamp, it will find the log entry and end the script like it should.
the code is fairly short:
logs = get_log_events(
log_group=log_group,
start_time=start_time,
end_time=end_time
)
while True:
for event in logs:
print(event['message'].rstrip())
sys.exit("Task complete")
Any insight why this is happening would help a lot. I am fairly new to python

The value in the variable logs is old when the file is updated. You need to update this variable. For example if you were to use logs = myfile.read() at start of your script, the value in the logs variable would be a snapshot of that file at that time.

Try storing event['message'].rstrip() in a variable and checking with an if statement if it corresponds to the log you want to find

If you don't want to read the file each time through the loop, you should have a look at pygtail (https://pypi.org/project/pygtail/).

I was overthinking the problem. I put the log variable inside the loop so it was being defined at each cycle
while True:
logs = get_log_events(
log_group=log_group,
start_time=start_time,
end_time=end_time
)
for event in logs:
print(event['message'].rstrip())
sys.exit("Task complete")

How to make a python script stopable from another script?

TL;DR: If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time? (Without KeyboardInterrupt or killing the task)
--
I've recently posted this question: How to make my code stopable? (Not killing/interrupting)
The answers did address my question, but from a termination/interruption point of view, and that's not really what I wanted. (Although, my question didn't made that clear)
So, I'm rephrasing it.
I created a generic script for example purposes. So I have this class, that gathers data from a generic API and write the data into a csv. The code is started by typing python main.py on a terminal window.
import time,csv
import GenericAPI
class GenericDataCollector:
def __init__(self):
self.generic_api = GenericAPI()
self.loop_control = True
def collect_data(self):
while self.loop_control: #Can this var be changed from outside of the class? (Maybe one solution)
data = self.generic_api.fetch_data() #Returns a JSON with some data
self.write_on_csv(data)
time.sleep(1)
def write_on_csv(self, data):
with open('file.csv','wt') as f:
writer = csv.writer(f)
writer.writerow(data)
def run():
obj = GenericDataCollector()
obj.collect_data()
if __name__ == "__main__":
run()
The script is supposed to run forever OR until I command it to stop. I know I can just KeyboardInterrupt (Ctrl+C) or abruptly kill the task. That isn't what I'm looking for. I want a "soft" way to tell the script it's time to stop, not only because interruption can be unpredictable, but it's also a harsh way to stop.
If that script was running on a docker container (for example) you wouldn't be able to Ctrl+C unless you happen to be in the terminal/bash inside the docker.
Or another situation: If that script was made for a customer, I don't think it's ok to tell the customer, just use Ctrl+C/kill the task to stop it. Definitely counterintuitive, especially if it's a non tech person.
I'm looking for way to code another script (assuming that's a possible solution) that would change to False the attribute obj.loop_control, finishing the loop once it's completed. Something that could be run by typing on a (different) terminal python stop_script.py.
It doesn't, necessarily, needs to be this way. Other solutions are also acceptable, as long it doesn't involve KeyboardInterrupt or Killing tasks. If I could use a method inside the class, that would be great, as long I can call it from another terminal/script.
Is there a way to do this?
If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time?

In general, there are two main ways of doing this (as far as I can see). The first one would be to make your script check some condition that can be modified from outside (like the existence or the content of some file/socket). Or as #Green Cloak Guy stated, using pipes which is one form of interprocess communication.
The second one would be to use the built in mechanism for interprocess communication called signals that exists in every OS where python runs. When the user presses Ctrl+C the terminal sends a specific signal to the process in the foreground. But you can send the same (or another) signal programmatically (i.e. from another script).
Reading the answers to your other question I would say that what is missing to address this one is a way to send the appropriate signal to your already running process. Essentially this can be done by using the os.kill() function. Note that although the function is called 'kill' it can send any signal (not only SIGKILL).
In order for this to work you need to have the process id of the running process. A commonly used way of knowing this is making your script save its process id when it launches into a file stored in a common location. To get the current process id you can use the os.getpid() function.
So summarizing I'd say that the steps to achieve what you want would be:
Modify your current script to store its process id (obtainable by using os.getpid()) into a file in a common location, for example /tmp/myscript.pid. Note that if you want your script to be protable you will need to address this in a way that works in non-unix like OSs like Windows.
Choose one signal (typically SIGINT or SIGSTOP or SIGTERM) and modify your script to register a custom handler using signal.signal() that addresses the graceful termination of your script.
Create another (note that it could be the same script with some command line paramater) script that reads the process id from the known file (aka /tmp/myscript.pid) and sends the chosen signal to that process using os.kill().
Note that an advantage of using signals to achieve this instead of an external way (files, pipes, etc.) is that the user can still press Ctrl+C (if you chose SIGINT) and that will produce the same behavior as the 'stop script' would.

What you're really looking for is any way to send a signal from one program to another, independent, program. One way to do this would be to use an inter-process pipe. Python has a module for this (which does, admittedly, seem to require a POSIX-compliant shell, but most major operating systems should provide that).
What you'll have to do is agree on a filepath beforehand between your running-program (let's say main.py) and your stopping-program (let's say stop.sh). Then you might make the main program run until someone inputs something to that pipe:
import pipes
...
t = pipes.Template()
# create a pipe in the first place
t.open("/tmp/pipefile", "w")
# create a lasting pipe to read from that
pipefile = t.open("/tmp/pipefile", "r")
...
And now, inside your program, change your loop condition to "as long as there's no input from this file - unless someone writes something to it, .read() will return an empty string:
while not pipefile.read():
# do stuff
To stop it, you put another file or script or something that will write to that file. This is easiest to do with a shell script:
#!/usr/bin/env sh
echo STOP >> /tmp/pipefile
which, if you're containerizing this, you could put in /usr/bin and name it stop, give it at least 0111 permissions, and tell your user "to stop the program, just do docker exec containername stop".
(using >> instead of > is important because we just want to append to the pipe, not to overwrite it).
Proof of concept on my python console:
>>> import pipes
>>> t = pipes.Template()
>>> t.open("/tmp/file1", "w")
<_io.TextIOWrapper name='/tmp/file1' mode='w' encoding='UTF-8'>
>>> pipefile = t.open("/tmp/file1", "r")
>>> i = 0
>>> while not pipefile.read():
... i += 1
...
At this point I go to a different terminal tab and do
$ echo "Stop" >> /tmp/file1
then I go back to my python tab, and the while loop is no longer executing, so I can check what happened to i while I was gone.
>>> print(i)
1704312

DEBUG only one function (or module), RUN rest of the program in Pycharm?

I will try to explain my question on simple example program (my problem is much more complex because my program is much more complex).
Lets suppose i have a program that has 2 lines, making 2 functions:
data = long_one() #takes 2 hours in DEBUG mode, 15min in RUN mode
short_one(data) #i want to DEBUG this one
Lets also say that it is very difficult to prepare the data variable and the only way to obtain it is by running the function long_one().
Is there a way to RUN long_one() and DEBUG short_one() in Pycharm?
In other words is there a way to perform either:
DEBUG with specification that long_one() should be processed in RUN mode
or RUN with specification that short_one() should be debugged?

As Asagen has proposed:
attached debugger to python console.
started my script in RUN mode
while script was running I did Tools/Attach to Process and have chosen my process.
The debugger has started from the moment i did this and stopped on first breakpoint it encountered.
There was one inconvenience - I had to know when to start debugging (in which moment to attach debugger to process). I propose a workaround:
Add to code an infininte loop in place you want to start debugging (see below):
data = long_one() # takes 2 hours in DEBUG mode, 15min in RUN mode
infinite_loop = True
print "OK man, it is the time to start debugging!"
while infinite_loop:
time.sleep(0.2) # add breakpoint here
short_one(data) #i want to DEBUG this one
Add a breakpoint inside the while loop
While running process, when you see in console the printed text "OK man, it is the time to start debugging!", attach debugger to process.
Next when it stops in the infinite loop, evaluate code fragment infinite_loop = False, so you leave the loop
It is it, you are now in DEBUG mode after running whole code before,
If you want to get back to RUN mode, just stop debugger. It is possible to switch between RUN and DEBUG as many times and in any places you want

import sys
def do_the_thing_for_debug():
print('doing the debug version')
def do_the_thing():
print('doing the release version')
if __name__ == '__main__':
"""
Detecting if you're in the PyCharm debugger or not
WARNING: This is a hacky & there is probably a better
way of doing this by looking at other items in the
global() namespace or environment for things that
begin with PyCharm and stuff.
Also, this could break in future versions of PyCharm!
If you put a breakpoint HERE and look at the callstack you will
see the entry point is in 'pydevd.py'
In debug mode, it copies off sys.argv: "sys.original_argv = sys.argv[:]"
We abuse this knowledge to test for the PyCharm debugger.
"""
if hasattr(sys, 'original_argv'):
do_the_thing = do_the_thing_for_debug
do_the_thing()
Output from PyCharm when I "run" it
doing the release version
Output from PyCharm when I "debug" it
doing the debug version

enable a script to stop itself when a command is issued from terminal

I have a script runReports.py that is executed every night. Suppose for some reason the script takes too long to execute, I want to be able to stop it from terminal by issuing a command like ./runReports.py stop.
I tried to implement this by having the script to create a temporary file when the stop command is issued.
The script checks for existence of this file before running each report.
If the file is there the script stops executing, else it continues.
But I am not able to find a way to make the issuer of the stop command aware that the script has stopped successfully. Something along the following lines:
$ ./runReports.py stop
Stopping runReports...
runReports.py stopped successfully.
How to achieve this?

For example if your script runs in loop, you can catch signal http://en.wikipedia.org/wiki/Unix_signal and terminate process:
import signal
class SimpleReport(BaseReport):
def __init__(self):
...
is_running = True
def _signal_handler(self, signum, frame):
is_running = False
def run(self):
signal.signal(signal.SIGUSR1, self._signal_handler) # set signal handler
...
while is_running:
print("Preparing report")
print("Exiting ...")
To terminate process just call kill -SIGUSR1 procId

You want to achieve inter process communication. You should first explore the different ways to do that : system V IPC (memory, very versatile, possibly baffling API), sockets (including unix domain sockets)(memory, more limited, clean API), file system (persistent on disk, almost architecture independent), and choose yours.
As you are asking about files, there are still two ways to communicate using files : either using file content (feature rich, harder to implement), or simply file presence. But the problem using files, is that is a program terminates because of an error, it may not be able to write its ended status on the disk.
IMHO, you should clearly define what are your requirements before choosing file system based communication (testing the end of a program is not really what it is best at) unless you also need architecture independence.
To directly answer your question, the only reliable way to know if a program has ended if you use file system communication is to browse the list of currently active processes, and the simplest way is IMHO to use ps -e in a subprocess.

Instead of having a temporary file, you could have a permanent file(config.txt) that has some tags in it and check if the tag 'running = True'.
To achieve this is quiet simple, if your code has a loop in it (I imagine it does), just make a function/method that branches a check condition on this file.
def continue_running():
with open("config.txt") as f:
for line in f:
tag, condition = line.split(" = ")
if tag == "running" and condition == "True":
return True
return False
In your script you will do this:
while True: # or your terminal condition
if continue_running():
# your regular code goes here
else:
break
So all you have to do to stop the loop in the script is change the 'running' to anything but "True".

Thread-Switching in PyDbg

I've tried posting this in the reverse-engineering stack-exchange, but I thought I'd cross-post it here for more visibility.
I'm having trouble switching from debugging one thread to another in pydbg. I don't have much experience with multithreading, so I'm hoping that I'm just missing something obvious.
Basically, I want to suspend all threads, then start single stepping in one thread. In my case, there are two threads.
First, I suspend all threads. Then, I set a breakpoint on the location where EIP will be when thread 2 is resumed. (This location is confirmed by using IDA). Then, I enable single-stepping as I would in any other context, and resume Thread 2.
However, pydbg doesn't seem to catch the breakpoint exception! Thread 2 seems to resume and even though it MUST hit that address, there is no indication that pydbg is catching the breakpoint exception. I included a "print "HIT BREAKPOINT" inside pydbg's internal breakpoint handler, and that never seems to be called after resuming Thread 2.
I'm not too sure about where to go next, so any suggestions are appreciated!
dbg.suspend_all_threads()
print dbg.enumerate_threads()[0]
oldcontext = dbg.get_thread_context(thread_id=dbg.enumerate_threads()[0])
if (dbg.disasm(oldcontext.Eip) == "ret"):
print disasm_at(dbg,oldcontext.Eip)
print "Thread EIP at a ret"
addrstr = int("0x"+(dbg.read(oldcontext.Esp + 4,4))[::-1].encode("hex"),16)
print hex(addrstr)
dbg.bp_set(0x7C90D21A,handler=Thread_Start_bp_Handler)
print dbg.read(0x7C90D21A,1).encode("hex")
dbg.bp_set(oldcontext.Eip + dbg.instruction.length,handler=Thread_Start_bp_Handler)
dbg.set_thread_context(oldcontext,thread_id=dbg.enumerate_threads()[0])
dbg.context = oldcontext
dbg.resume_thread(dbg.enumerate_threads()[0])
dbg.single_step(enable=True)
return DBG_CONTINUE
Sorry about the "magic numbers", but they are correct as far as I can tell.

One of your problems is that you try to single step through Thread2 and you only refer to Thread1 in your code:
dbg.enumerate_threads()[0] # <--- Return handle to the first thread.
In addition, the code the you posted is not reflective of the complete structure of your script, which makes it hard to judge wether you have other errors or not. You also try to set breakpoint within the sub-brach that disassembles your instructions, which does not make a lot of sense to me logically. Let me try to explain what I know, and lay it out in an organized manner. That way you might look back at your code, re-think it and correct it.
Let's start with basic framework of debugging an application with pydbg:
Create debugger instance
Attache to the process
Set breakpoints
Run it
Breakpoint gets hit - handle it.
This is how it could look like:
from pydbg import *
from pydbg.defines import *
# This is maximum number of instructions we will log
MAX_INSTRUCTIONS = 20
# Address of the breakpoint
func_address = "0x7C90D21A"
# Create debugger instance
dbg = pydbg()
# PID to attach to
pid = int(raw_input("Enter PID: "))
# Attach to the process with debugger instance created earlier.
# Attaching the debugger will pause the process.
dbg.attach(pid)
# Let's set the breakpoint and handler as thread_step_setter,
# which we will define a little later...
dbg.bp_set(func_address, handler=thread_step_setter)
# Let's set our "personalized" handler for Single Step Exception
# It will get triggered if execution of a thread goes into single step mode.
dbg.set_callback(EXCEPTION_SINGLE_STEP, single_step_handler)
# Setup is done. Let's run it...
dbg.run()
Now having the basic structure, let's define our personalized handlers for breakpoint and single stepping. The code snippet below defines our "custom" handlers. What will happen is when breakpoint hits we will iterate through threads and set them to single step mode. It will in turn trigger single step exception, which we will handle and disassemble MAX_INSTRUCTIONS amount of instructions:
def thread_step_setter(dbg):
dbg.suspend_all_threads()
for thread_id in dbg.enumerate_threads():
print "Single step for thread: 0x%08x" % thread_id
h_thread = dbg.open_thread(thread_id)
dbg.single_step(True, h_thread)
dbg.close_handle(h_thread)
# Resume execution, which will pass control to step handler
dbg.resume_all_threads()
return DBG_CONTINUE
def single_step_handler(dbg):
global total_instructions
if instructions == MAX_INSTRUCTION:
dbg.single_step(False)
return DBG_CONTINUE
else:
# Disassemble the instruction
current_instruction = dbg.disasm(dbg.context,Eip)
print "#%d\t0x%08x : %s" % (total_instructions, dbg.context.Eip, current_instruction)
total_instructions += 1
dbg.single_step(True)
return DBG_CONTINUE
Discloser: I do not guarantee that the code above will work if copied and pasted. I typed it out and haven't tested it. However, if basic understanding is acquired, the small syntactical error could be easily fixed. I apologize in advanced if I have any. I don't currently have means or time to test it.
I really hope it helps you out.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.