Like the question says, I want to know if/how you can set a Databricks widget input using a variable instead of a hard-coded value. I have two notebooks. One needs apply a filter to some values. The other needs to run some code, then optionally (as dictated by another widget) apply that same filter.
Here's some example code (modified for simplicity/privacy).
In Notebook2 we have:
start = dbutils.widgets.get("startDate")
filter_condition = None
if start:
filter_condition = f"GeneratedDate >= '{start}'"
foo = important_function(filter_condition)
%run ./Notebook1 $run_training="True" $num_trials=100 $filter_string=filter_condition
where I want filter_condition to be the above-defined variable and not a string.
In Notebook1, there's some code like:
if run_training=="True":
bar = optimize_model(datasets, grid, int(num_trials))
elif run_training=="False":
baz = apply_filter(filter_string)
else:
# Throw error
You're probably looking for the notebook.run function of the databricks utilities package, rather than the %run command:
dbutils.notebook.run(path='./Notebook1',
timeout_seconds=300,
arguments={'run_training':'True',
'num_trials':100,
'filter_string':filter_condition})
The notebook will be run as a "ephemeral" job. Note that the notebook will run in a separate notebook environment, so any variables etc created will not be brought back into the notebook you ran it from. Your input arguments come through as widget variables, which can be accessed using:
num_trails = dbutils.widgets.get('num_trails')
etc. I think you are already doing that though.
Related
When running a Databricks notebook as a job, you can specify job or run parameters that can be used within the code of the notebook. However, it wasn't clear from documentation how you actually fetch them. I'd like to be able to get all the parameters as well as job id and run id.
Job/run parameters
When the notebook is run as a job, then any job parameters can be fetched as a dictionary using the dbutils package that Databricks automatically provides and imports. Here's the code:
run_parameters = dbutils.notebook.entry_point.getCurrentBindings()
If the job parameters were {"foo": "bar"}, then the result of the code above gives you the dict {'foo': 'bar'}. Note that Databricks only allows job parameter mappings of str to str, so keys and values will always be strings.
Note that if the notebook is run interactively (not as a job), then the dict will be empty. The getCurrentBinding() method also appears to work for getting any active widget values for the notebook (when run interactively).
Getting the jobId and runId
To get the jobId and runId you can get a context json from dbutils that contains that information. (Adapted from databricks forum):
import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()
context = json.loads(context_str)
run_id_obj = context.get('currentRunId', {})
run_id = run_id_obj.get('id', None) if run_id_obj else None
job_id = context.get('tags', {}).get('jobId', None)
So within the context object, the path of keys for runId is currentRunId > id and the path of keys to jobId is tags > jobId.
Nowadays you can easily get the parameters from a job through the widget API. This is pretty well described in the official documentation from Databricks. Below, I'll elaborate on the steps you have to take to get there, it is fairly easy.
Create or use an existing notebook that has to accept some parameters. We want to know the job_id and run_id, and let's also add two user-defined parameters environment and animal.
# Get parameters from job
job_id = dbutils.widgets.get("job_id")
run_id = dbutils.widgets.get("run_id")
environment = dbutils.widgets.get("environment")
animal = dbutils.widgets.get("animal")
print(job_id)
print(run_id)
print(environment)
print(animal)
Now let's go to Workflows > Jobs to create a parameterised job. Make sure you select the correct notebook and specify the parameters for the job at the bottom. According to the documentation, we need to use curly brackets for the parameter values of job_id and run_id. For the other parameters, we can pick a value ourselves.
Note: The reason why you are not allowed to get the job_id and run_id directly from the notebook, is because of security reasons (as you can see from the stack trace when you try to access the attributes of the context). Within a notebook you are in a different context, those parameters live at a "higher" context.
Run the job and observe that it outputs something like:
dev
squirrel
137355915119346
7492
Command took 0.09 seconds
You can even set default parameters in the notebook itself, that will be used if you run the notebook or if the notebook is triggered from a job without parameters. This makes testing easier, and allows you to default certain values.
# Adding widgets to a notebook
dbutils.widgets.text("environment", "tst")
dbutils.widgets.text("animal", "turtle")
# Removing widgets from a notebook
dbutils.widgets.remove("environment")
dbutils.widgets.remove("animal")
# Or removing all widgets from a notebook
dbutils.widgets.removeAll()
And last but not least, I tested this on different cluster types, so far I found no limitations. My current settings are:
spark.databricks.cluster.profile serverless
spark.databricks.passthrough.enabled true
spark.databricks.pyspark.enableProcessIsolation true
spark.databricks.repl.allowedLanguages python,sql
I am working on this python notebook in Google Collab:
https://github.com/AllenDowney/ModSimPy/blob/master/notebooks/chap01.ipynb
I had to change the configuration line because the one stated in the original was erroring out:
# Configure Jupyter to display the assigned value after an assignment
# Line commented below because errors out
# %config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# Edit solution given below
%config InteractiveShell.ast_node_interactivity='last_expr'
However, I think the original statement was meant to show values of assignments (if I'm not mistaken), so that when I run the following cell in the notebook, I should see an output:
meter = UNITS.meter
second = UNITS.second
a = 9.8 * meter / second**2
If so how can I make the notebook on google collab show output of assignments?
The short answer is: you can not show output of assignments in Colab.
Your confusion comes from how Google Colab works. The original script is meant to run in IPython. But Colab is not a regular IPython. When you run IPython shell, your %config InteractiveShell.ast_node_interactivity options are (citing documentation)
‘all’, ‘last’, ‘last_expr’ , ‘last_expr_or_assign’ or ‘none’,
specifying which nodes should be run interactively (displaying output
from expressions). ‘last_expr’ will run the last node interactively
only if it is an expression (i.e. expressions in loops or other blocks
are not displayed) ‘last_expr_or_assign’ will run the last expression
or the last assignment. Other values for this parameter will raise a
ValueError.
all will display all the variables, but not the assignments, for example
x = 5
x
y = 7
y
Out[]:
5
7
The differences between the options become more significant when you want to display variables in the loop.
In Colab your options are restricted to ['all', 'last', 'last_expr', 'none']. If you select all, the result for the above cell will be
Out[]:
57
Summarizing all that, there is no way of showing the result of assignment in Colab. Your only option (AFAIK) is to add the variable you want to see to the cell where it is assigned (which is similar to regular print):
meter = UNITS.meter
second = UNITS.second
a = 9.8 * meter / second**2
a
Google Colab has not yet been upgraded to the latest IPython version- if you explicitly upgrade with
!pip install -U ipython
then last_expr_or_assign will work.
Well the easy way is to just wrap your values in print statements like:
print(meter)
print(second)
print(a)
However, if you want to do it the jupyter way, looks like the answer is
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
Found the above from this link: https://stackoverflow.com/a/36835741/7086982
As mentioned by others, it doesn't work as last_expr_or_assign was introduced in juptyer v6.1, and colab is using v5.x. Upgrading the jupyter version on colab might cause some instability (warning shown by colab):
WARNING: Upgrading ipython, ipykernel, tornado, prompt-toolkit or pyzmq can
cause your runtime to repeatedly crash or behave in unexpected ways and is not
recommended
One other solution is to use an extension such as ipydex, which provides magic comments (like ##:, ##:T, ##:S) which cause that either the return value or the right hand side of an assignment of a line is displayed.
!pip install ipydex
%load_ext ipydex.displaytools
e.g.
a = 4
c = 5 ##:
output
c := 5
---
I want to programmatically create several cells in a Jupyter notebook.
With this function I can create one cell
def create_new_cell(contents):
from IPython.core.getipython import get_ipython
shell = get_ipython()
shell.set_next_input(contents, replace=False)
But if I try to call it several times, for instance, from a for loop, like so
for x in ['a', 'b', 'c']:
create_new_cell(x)
It will only create one cell with the last item in the list. I've tried to find if there's a "flush" function or something similar but did not succeed.
Does anyone know how to properly write several cells programmatically?
I dug a bit more in the code of the shell.payload_manager and found out that in the current implementation of the set_next_input it does not pass the single argument to the shell.payload_manager.write_payload function. That prevents the notebook from creating several cells, since they all have the same source (the set_next_input function, in this case).
That being said, the following function works. It's basically the code from write_payload function setting the single parameter to False.
def create_new_cell(contents):
from IPython.core.getipython import get_ipython
shell = get_ipython()
payload = dict(
source='set_next_input',
text=contents,
replace=False,
)
shell.payload_manager.write_payload(payload, single=False)
Hope this helps someone out there ;)
from IPython.display import display, Javascript
def add_cell(text, type='code', direct='above'):
text = text.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
display(Javascript('''
var cell = IPython.notebook.insert_cell_{}("{}")
cell.set_text("{}")
'''.format(direct, type, text)));
for i in range(3):
add_cell(f'# heading{i}', 'markdown')
add_cell(f'code {i}')
codes above will add cells as follows:
My question is about widgets to pass parameters in databricks. I am using widgets in one notebook to set parameters. Then, I am running this initial notebook from other notebooks. I want the chosen parameters to be pulled in.
For example, in Notebook 1 I have:
dbutils.widgets.dropdown("start_year", "2011", [str(x) for x in range(2008, 2021)], "Earliest year")
start_year=dbutils.widgets.get("start_year")
print("The start_year is " + dbutils.widgets.get("start_year"))
The print statement correctly prints whatever year the user has selected.
In Notebook 2, which runs Notebook 1 using %run, it will only print the default year, in this case 2011, no matter what is selected. What am I doing wrong? Thanks!
From the widget docs (all the way at the bottom):
If you run a notebook that contains widgets, the specified notebook is run with the widget’s default values. You can also pass in values to widgets. For example:
%run /path/to/notebook $X="10" $Y="1"
Have you considered setting "notebook 1" to store the value, at least temporarily, in dbfs rather than in a variable? Then you can retrieve whatever was set in "notebook 1".
i.e. Assuming the user must visit and set the parameter in Notebook 1
Notebook 1
start_year=dbutils.widgets.get("start_year")
myTmpTxt = "The start_year is " + dbutils.widgets.get("start_year")
print(myTmpTxt)
with open('/dbfs/mnt/tmp/tmpStoreYear.txt', 'w') as savedYear:
savedYear.write(myTmpTxt)
Notebook 2
openQuery = open('/dbfs/mnt/tmp/tmpStoreYear.txt', 'rb')
openQuery.read()
Also, there is an option specify widget parameters using dbutils.notebook.run (docs)
However, it still runs as a separate job, and the values won't be imported to your "notebook 2"
Hope this helps!
I am working on a mini GUI project , I am currently struggling to figure out how to get selected value from the list and then return that value to the main function so that I can use that value in somewhere else . Can someone help me please !!!!
####
self.device_list_store = gtk.ListStore(str,str,str,str,str)
for device in self.get_dev_list():
self.device_list_store.append(list(device))
device_list_treeview = gtk.TreeView(self.device_list_store)
selected_row = device_list_treeview.get_selection()
selected_row.connect("changed",self.item_selected)
####
def item_selected(self,selection):
model,row = selection.get_selected()
if row is not None:
selected_device = model[row][0]
at the moment ,the item_selected function is not returning anything , I want to return selected_device back to the main function so I can use it in other functions as well .
EDIT: I've edited code above to remove formatting errors #jcoppens
As you can see in the documentation, the item_selected function is called with one parameter, tree_selection. But if you define the function inside a class, it requires the self parameter too, which is normally added automatically. In your (confusing) example, there is no class defined, so I suspect the problem is your program which is incomplete.
Also, I suspect you don't want device_list_treeview = gtk.T... in the for loop:
for device in self.get_dev_list():
self.device_list_store.append(list(device))
device_list_treeview = gtk.TreeView(self.device_list_store)
And I suspect you want selected_device = mod... indented below the if:
if row is not None:
selected_device = model[row][0]
Please convert your example in a complete program, and formatted correctly.
BTW: item_selected is not a good name for the signal handler. It is also called if the item is unselected (which is why the signal is called 'changed')!
And important: Even though you should first read the basic Python tutorials and Gtk tutorials, you should then consider using lazka's excellent reference for all the Python APIs. There's a link on the page to download it completely and have it at hand in your computer.