How to add more metrics to a finished MLflow run?

How to add more metrics to a finished MLflow run? - python

Once an MLflow run is finished, external scripts can access its parameters and metrics using python mlflow client and mlflow.get_run(run_id) method, but the Run object returned by get_run seems to be read-only.
Specifically, .log_param .log_metric, or .log_artifact cannot be used on the object returned by get_run, raising errors like these:
AttributeError: 'Run' object has no attribute 'log_param'
If we attempt to run any of the .log_* methods on mlflow, it would log them into to a new run with auto-generated run ID in the Default experiment.
Example:
final_model_mlflow_run = mlflow.get_run(final_model_mlflow_run_id)
with mlflow.ActiveRun(run=final_model_mlflow_run) as myrun:
# this read operation uses correct run
run_id = myrun.info.run_id
print(run_id)
# this write operation writes to a new run
# (with auto-generated random run ID)
# in the "Default" experiment (with exp. ID of 0)
mlflow.log_param("test3", "This is a test")
Note that the above problem exists regardless of the Run status (.info.status can be both "FINISHED" or "RUNNING", without making any difference).
I wonder if this read-only behavior is by design (given that immutable modeling runs improve experiments reproducibility)? I can appreciate that, but it also goes against code modularity if everything has to be done within a single monolith like the with mlflow.start_run() context...

As it was pointed out to me by Hans Bambel and as it is documented here mlflow.start_run (in contrast to mlflow.ActiveRun) accepts the run_id parameter of an existing run.
Here's an example tested to work in v1.13 through v1.19 - as you see one can even overwrite an existing metric to correct a mistake:
with mlflow.start_run(run_id=final_model_mlflow_run_id):
# print(mlflow.active_run().info)
mlflow.log_param("start_run_test", "This is a test")
mlflow.log_metric("start_run_test", 1.23)
mlflow.log_metric("start_run_test", 1.33)
mlflow.log_artifact("/home/jovyan/_tmp/formula-features-20201103.json", "start_run_test")

Related

keep my data test when running TestCase in django

i have a initial_data command in my project and when i try to create my test database and run this command
"py manage.py initial_data" it's not working because this command makes data in main database and not test_database, actually can't understand which one is test_database.
so i give up to use this command,and i have other method to resolve my problem .
and i decided to create manually my initial_data.
initial_data command actually makes my required data in my other tables.
and why i try to create data?
because i have greater than 500 tests and this tests required to other test datas.
so i have this tests:
class BankServiceTest(TestCase):
def setUp(self):
self.bank_model = models.Bank
self.country_model = models.Country
def test_create_bank_object(self):
self.bank_model.objects.create(name='bank')
re1 = self.bank_model.objects.get(name='bank')
self.assertEqual(re1.name, 'bank')
def test_create_country_object(self):
self.country_model.objects.create(name='country', code=100)
bank = self.bank_model.objects.get(name='bank')
re1 = self.country_model.objects.get(code=100)
self.assertEqual(re1.name, 'country')
i want after running "test create bank object " to have first function datas in second function
actually this method Performs the command "initial_data".
so what is the method for this problem.
i used unit_test.TestCase but it not working for me.
this method actually makes test data in main data base and i do not want this method.
or maybe i used bad of this method.
actually i need to keep my data in running test.

How do you get the run parameters and runId within Databricks notebook?

When running a Databricks notebook as a job, you can specify job or run parameters that can be used within the code of the notebook. However, it wasn't clear from documentation how you actually fetch them. I'd like to be able to get all the parameters as well as job id and run id.

Job/run parameters
When the notebook is run as a job, then any job parameters can be fetched as a dictionary using the dbutils package that Databricks automatically provides and imports. Here's the code:
run_parameters = dbutils.notebook.entry_point.getCurrentBindings()
If the job parameters were {"foo": "bar"}, then the result of the code above gives you the dict {'foo': 'bar'}. Note that Databricks only allows job parameter mappings of str to str, so keys and values will always be strings.
Note that if the notebook is run interactively (not as a job), then the dict will be empty. The getCurrentBinding() method also appears to work for getting any active widget values for the notebook (when run interactively).
Getting the jobId and runId
To get the jobId and runId you can get a context json from dbutils that contains that information. (Adapted from databricks forum):
import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()
context = json.loads(context_str)
run_id_obj = context.get('currentRunId', {})
run_id = run_id_obj.get('id', None) if run_id_obj else None
job_id = context.get('tags', {}).get('jobId', None)
So within the context object, the path of keys for runId is currentRunId > id and the path of keys to jobId is tags > jobId.

Nowadays you can easily get the parameters from a job through the widget API. This is pretty well described in the official documentation from Databricks. Below, I'll elaborate on the steps you have to take to get there, it is fairly easy.
Create or use an existing notebook that has to accept some parameters. We want to know the job_id and run_id, and let's also add two user-defined parameters environment and animal.
# Get parameters from job
job_id = dbutils.widgets.get("job_id")
run_id = dbutils.widgets.get("run_id")
environment = dbutils.widgets.get("environment")
animal = dbutils.widgets.get("animal")
print(job_id)
print(run_id)
print(environment)
print(animal)
Now let's go to Workflows > Jobs to create a parameterised job. Make sure you select the correct notebook and specify the parameters for the job at the bottom. According to the documentation, we need to use curly brackets for the parameter values of job_id and run_id. For the other parameters, we can pick a value ourselves.
Note: The reason why you are not allowed to get the job_id and run_id directly from the notebook, is because of security reasons (as you can see from the stack trace when you try to access the attributes of the context). Within a notebook you are in a different context, those parameters live at a "higher" context.
Run the job and observe that it outputs something like:
dev
squirrel
137355915119346
7492
Command took 0.09 seconds
You can even set default parameters in the notebook itself, that will be used if you run the notebook or if the notebook is triggered from a job without parameters. This makes testing easier, and allows you to default certain values.
# Adding widgets to a notebook
dbutils.widgets.text("environment", "tst")
dbutils.widgets.text("animal", "turtle")
# Removing widgets from a notebook
dbutils.widgets.remove("environment")
dbutils.widgets.remove("animal")
# Or removing all widgets from a notebook
dbutils.widgets.removeAll()
And last but not least, I tested this on different cluster types, so far I found no limitations. My current settings are:
spark.databricks.cluster.profile serverless
spark.databricks.passthrough.enabled true
spark.databricks.pyspark.enableProcessIsolation true
spark.databricks.repl.allowedLanguages python,sql

CANoe: How to select and start test cases from XML Test Module from Python using CANoe COM interface?

currently I am able to:
start CANoe application
load a CANoe configuration file
load a test setup file
def load_test_setup(self, canoe_test_setup_file: str = None) -> None:
logger.info(
f'Loading CANoe test setup file <{canoe_test_setup_file}>.')
if self.measurement.Running:
logger.info(
f'Simulation is currently running, so new test setup could \
not be loaded!')
return
self.test_setup.TestEnvironments.Add(canoe_test_setup_file)
test_environment = self.test_setup.TestEnvironments.Item(1)
logger.info(f'Loaded test environment is <{test_environment.Name}>.')
How can I access the XML Test Module loaded with the test setup (tse) file and select tests to be executed?

The last before line in your snippet is most probably causing the issue.
I have been trying to fix this issue for quite some time now and finally found the solution.
Somehow when you execute the line self.test_setup.TestEnvironments.Item(1)
win32com creates an object of type TestSetupItem which doesn't have the necessary properties or methods to access the test cases. Instead we want to access objects of collection types TestSetupFolders or TestModules. win32com creates object of TestSetupItem type even though I have a single XML Test Module (called AutomationTestSeq) in the Test Environment as you can see here.
There are three possible solutions that I found.
Manually clearing the generated cache before each run.
Using win32com.client.DispatchWithEvents or win32com.client.gencache.EnsureDispatch generates a bunch of python files that describe CANoe's object model.
If you had used either of those before, TestEnvironments.Item(1) will always return TestSetupItem instead of the more appropriate type objects.
To remove the cache you need to delete the C:\Users\{username}\AppData\Local\Temp\gen_py\{python version} folder.
Doing this every time is of course not very practical.
Force win32com to always use dynamic dispatch.
You can do this by using:
canoe = win32com.client.dynamic.Dispatch("CANoe.Application")
Any objects you create using canoe from now on, will be dynamically dispatched.
Forcing dynamic dispatch is easier than manually clearing the cache folder every time. This gave me good results always. But doing this will not let you have any insight into the objects. You won't be able to see the acceptable properties and methods for the objects.
Typecast TestSetupItem to TestSetupFolders or TestModules.
This has the risk that if you typecast incorrectly, you will get unexpected results. But has worked well for me so far.
In short: win32.CastTo(test_env, "ITestEnvironment2"). This will ensure that you are using the recommended object hierarchy as per CANoe technical reference.
Note that you will also have to typecast TestSequenceItem to TestCase to be able to access test case verdict and enable/disable test cases.
Below is a decent example script.
"""Execute XML Test Cases without a pass verdict"""
import sys
from time import sleep
import win32com.client as win32
CANoe = win32.DispatchWithEvents("CANoe.Application")
CANoe.Open("canoe.cfg")
test_env = CANoe.Configuration.TestSetup.TestEnvironments.Item('Test Environment')
# Cast required since test_env is originally of type <ITestEnvironment>
test_env = win32.CastTo(test_env, "ITestEnvironment2")
# Get the XML TestModule (type <TSTestModule>) in the test setup
test_module = test_env.TestModules.Item('AutomationTestSeq')
# {.Sequence} property returns a collection of <TestCases> or <TestGroup>
# or <TestSequenceItem> which is more generic
seq = test_module.Sequence
for i in range(1, seq.Count+1):
# Cast from <ITestSequenceItem> to <ITestCase> to access {.Verdict}
# and the {.Enabled} property
tc = win32.CastTo(seq.Item(i), "ITestCase")
if tc.Verdict != 1: # Verdict 1 is pass
tc.Enabled = True
print(f"Enabling Test Case {tc.Ident} with verdict {tc.Verdict}")
else:
tc.Enabled = False
print(f"Disabling Test Case {tc.Ident} since it has already passed")
CANoe.Measurement.Start()
sleep(5) # Sleep because measurement start is not instantaneous
test_module.Start()
sleep(1)

Just continue what you have done.
The TestEnvironment contains the TestModules. Each TestModule contains a TestSequence which in turn contains the TestCases.
Keep in mind that you cannot individual TestCases but only the TestModule. But you can enable and disable individual TestCases before execution by using the COM-API.
(typing this from the top of my head, might not work 100%)
test_module = test_environment.TestModules.Item(1) # of 2 or whatever
test_sequence = test_module.Sequence
for i in range(1, test_sequence.Count + 1):
test_case = test_sequence.Item(i)
if ...:
test_case.Enabled = False # or True
test_module.Start()
You have to keep in mind that a TestSequence can also contain other TestSequences (i.e. a TestGroup). This depends on how your TestModule is setup. If so, you have to take care of that in your loop and descend into these TestGroups while searching for your TestCase of interest.

Error when changing instance type in a python for loop

I have a Python 2 script which uses boto3 library.
Basically, I have a list of instance ids and I need to iterate over it changing the type of each instance from c4.xlarge to t2.micro.
In order to accomplish that task, I'm calling the modify_instance_attribute method.
I don't know why, but my script hangs with the following error message:
EBS-optimized instances are not supported for your requested configuration.
Here is my general scenario:
Say I have a piece of code like this one below:
def change_instance_type(instance_id):
client = boto3.client('ec2')
response = client.modify_instance_attribute(
InstanceId=instance_id,
InstanceType={
'Value': 't2.micro'
}
)
So, If I execute it like this:
change_instance_type('id-929102')
everything works with no problem at all.
However, strange enough, if I execute it in a for loop like the following
instances_list = ['id-929102']
for instance_id in instances_list:
change_instance_type(instance_id)
I get the error message above (i.e., EBS-optimized instances are not supported for your requested configuration) and my script dies.
Any idea why this happens?

When I look at EBS optimized instances I don't see that T2 micros are supported:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSOptimized.html
I think you would need to add EbsOptimized=false as well.

Why can't a close an open ADODB.Recordset?

My Python script uses an ADODB.Recordset object. I use an ADODB.Command object with a collection of ADODB.Parameter objects to update a record in the set. After that, I check the state of the recordset, and it was 1, which is adStateOpen. But when I call MyRecordset.Close(), I get an exception complaining that the operation is invalid in the set's current state. What state could an open recordset be in that would make it invalid to close it, and what can I do to fix it?
Code is scattered between a couple of files. I'll work on getting an illustration together.

Yes, that was the problem. Once I change the value of one of a recordset's ADODB.Field objects, I have to either update the recordset using ADODB.Recordset.Update() or call CancelUpdate().
The reason I'm going through all this rigarmarole of the ADODB.Command object is that ADODB.Recordset.Update() fails at random (or so it seems to me) times, complaining that "query-based update failed because row to update cannot be found". I've never been able to predict when that will happen or find a reliable way to keep it from happening. My only choice when that happens is to replace the ADODB.Recordset.Update() call with the construction of a complete update query and executing it using an ADODB.Connection or ADODB.Command object.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.