Get the current contents of the entire Jupyter Notebook

Get the current contents of the entire Jupyter Notebook - python

I have a Jupyter Notebook running. I want to be able to access the source of the current Jupyter Notebook from within Python. My end goal is to pass it into ast.parse so I can do some analysis on the user's code. Ideally, I'd be able to do something like this:
import ast
ast.parse(get_notebooks_code())
Obviously, if the source code was an IPYNB file, there'd be an intermediary step of extracting the code from the Python cells, but that's a relatively easy problem to solve.
So far, I've found code that will use the list_running_servers function of the IPython object in order to make a request and match up kernel IDs - this gives me the filename of the currently running notebook. This would work, except for the fact that the source code on disk may not match up with what the user has in the browser (until you save a new checkpoint).
I've seen some ideas involving extracting out data using JavaScript, but that requires either a separate cell with magic or calling the display.Javascript function - which fires asynchronously, and therefore doesn't allow me to pass the result to ast.parse.
Anyone have any clever ideas for how to dynamically get the current notebooks source code available as a string in Python for immediate processing? I'm perfectly fine if I need to make this be an extension or even a kernel wrapper, I just need to get the source code somehow.

Well, this isn't exactly what I wanted, but here's my current strategy. I need to run some Python code based on the user's code, but it doesn't actually have to be connected to the user's code directly. So I'm just going to run the following magic afterwards:
%%javascript
// Get source code from cells
var source_code = Jupyter.notebook.get_cells().map(function(cell) {
if (cell.cell_type == "code") {
var source = cell.code_mirror.getValue();
if (!source.startsWith("%%javascript")) {
return source;
}
}
}).join("\n");
// Embed the code as a Python string literal.
source_code = JSON.stringify(source_code);
var instructor_code = "student_code="+source_code;
instructor_code += "\nimport ast\nprint(ast.dump(ast.parse(student_code)))\nprint('Great')"
// Run the Python code along with additional code I wanted.
var kernel = IPython.notebook.kernel;
var t = kernel.execute(instructor_code, { 'iopub' : {'output' : function(x) {
if (x.msg_type == "error") {
console.error(x.content);
element.text(x.content.ename+": "+x.content.evalue+"\n"+x.content.traceback.join("\n"))
} else {
element.html(x.content.text.replace(/\n/g, "<br>"));
console.log(x);
}
}}});

What about combining https://stackoverflow.com/a/44589075/1825043 and https://stackoverflow.com/a/54350786/1825043 ? That gives something like
%%javascript
IPython.notebook.kernel.execute('nb_name = "' + IPython.notebook.notebook_name + '"')
and
import os
from nbformat import read, NO_CONVERT
nb_full_path = os.path.join(os.getcwd(), nb_name)
with open(nb_full_path) as fp:
notebook = read(fp, NO_CONVERT)
cells = notebook['cells']
code_cells = [c for c in cells if c['cell_type'] == 'code']
for no_cell, cell in enumerate(code_cells):
print(f"####### Cell {no_cell} #########")
print(cell['source'])
print("")
I get
####### Cell 0 #########
%%javascript
IPython.notebook.kernel.execute('nb_name = "' + IPython.notebook.notebook_name + '"')
####### Cell 1 #########
import os
from nbformat import read, NO_CONVERT
nb_full_path = os.path.join(os.getcwd(), nb_name)
with open(nb_full_path) as fp:
notebook = read(fp, NO_CONVERT)
cells = notebook['cells']
code_cells = [c for c in cells if c['cell_type'] == 'code']
for no_cell, cell in enumerate(code_cells):
print(f"####### Cell {no_cell} #########")
print(cell['source'])
print("")

Related

Using python and suds, data not read by server side because element is not defined as an array

I am a very inexperienced programmer with no formal education. Details will be extremely helpful in any responses.
I have made several basic python scripts to call SOAP APIs, but I am running into an issue with a specific API function that has an embedded array.
Here is a sample excerpt from a working XML format to show nested data:
<bomData xsi:type="urn:inputBOM" SOAP-ENC:arrayType="urn:bomItem[]">
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
<bomItem>
<item_partnum></item_partnum>
<item_partrev></item_partrev>
<item_serial></item_serial>
<item_lotnum></item_lotnum>
<item_sublotnum></item_sublotnum>
<item_qty></item_qty>
</bomItem>
</bomData>
I have tried 3 different things to get this to work to no avail.
I can generate the near exact XML from my script, but a key attribute missing is the 'SOAP-ENC:arrayType="urn:bomItem[]"' in the above XML example.
Option 1 was using MessagePlugin, but I get an error because my section is like the 3 element and it always injects into the first element. I have tried body[2], but this throws an error.
Option 2 I am trying to create the object(?). I read a lot of stack overflow, but I might be missing something for this.
Option 3 looked simple enough, but also failed. I tried setting the values in the JSON directly. I got these examples by an XML sample to JSON.
I have also done a several other minor things to try to get it working, but not worth mentioning. Although, if there is a way to somehow do the following, then I'm all ears:
bomItem[]: bomData = {"bomItem"[{...,...,...}]}
Here is a sample of my script:
# for python 3
# using pip install suds-py3
from suds.client import Client
from suds.plugin import MessagePlugin
# Config
#option 1: trying to set it as an array using plugin
class MyPlugin(MessagePlugin):
def marshalled(self, context):
body = context.envelope.getChild('Body')
bomItem = body[0]
bomItem.set('SOAP-ENC:arrayType', 'urn:bomItem[]')
URL = "http://localhost/application/soap?wsdl"
client = Client(URL, plugins=[MyPlugin()])
transact_info = {
"username":"",
"transaction":"",
"workorder":"",
"serial":"",
"trans_qty":"",
"seqnum":"",
"opcode":"",
"warehouseloc":"",
"warehousebin":"",
"machine_id":"",
"comment":"",
"defect_code":""
}
#WIP - trying to get bomData below working first
inputData = {
"dataItem":[
{
"fieldname": "",
"fielddata": ""
}
]
}
#option 2: trying to create the element here and define as an array
#inputbom = client.factory.create('ns3:inputBOM')
#inputbom._type = "SOAP-ENC:arrayType"
#inputbom.value = "urn:bomItem[]"
bomData = {
#Option 3: trying to set the time and array type in JSON
#"#xsi:type":"urn:inputBOM",
#"#SOAP-ENC:arrayType":"urn:bomItem[]",
"bomItem":[
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
},
{
"item_partnum":"",
"item_partrev":"",
"item_serial":"",
"item_lotnum":"",
"item_sublotnum":"",
"item_qty":""
}
]
}
try:
response = client.service.transactUnit(transact_info,inputData,bomData)
print("RESPONSE: ")
print(response)
#print(client)
#print(envelope)
except Exception as e:
#handle error here
print(e)
I appreciate any help and hope it is easy to solve.

I have found the answer I was looking for. At least a working solution.
In any case, option 1 worked out. I read up on it at the following link:
https://suds-py3.readthedocs.io/en/latest/
You can review at the '!MessagePlugin' section.
I found a solution to get message plugin working from the following post:
unmarshalling Error: For input string: ""
A user posted an example how to crawl through the XML structure and modify it.
Here is my modified example to get my script working:
#Using MessagePlugin to modify elements before sending to server
class MyPlugin(MessagePlugin):
# created method that could be reused to modify sections with similar
# structure/requirements
def addArrayType(self, dataType, arrayType, transactUnit):
# this is the code that is key to crawling through the XML - I get
# the child of each parent element until I am at the right level for
# modification
data = transactUnit.getChild(dataType)
if data:
data.set('SOAP-ENC:arrayType', arrayType)
def marshalled(self, context):
# Alter the envelope so that the xsd namespace is allowed
context.envelope.nsprefixes['xsd'] = 'http://www.w3.org/2001/XMLSchema'
body = context.envelope.getChild('Body')
transactUnit = body.getChild("transactUnit")
if transactUnit:
self.addArrayType('inputData', 'urn:dataItem[]', transactUnit)
self.addArrayType('bomData', 'urn:bomItem[]', transactUnit)

download an excel file from Bokeh server via "button"

I have a Bokeh directory format like below.
BTSapp.py is my equivalent of 'main.py'
In data folder, I have 1 input (excel) file and 1 output (excel) file. I wrote a script to transform the data from the input file and write it to the output file. I would like to create a bokeh button, which when the end users click they can download the output file.
Can someone please help me? I found this question also on stackoverflow: send file from server to client on bokeh but I couldn't make it work in my case. I kept getting syntax error for JScode_xhr.
Thank you in advance.

I tried myself and below is the correct code. It will also fix the issue of double alert and of the generation of 2 excel files after the js code is activated.
Note: this is an adjusted version from this post
JScode_fetch = """
var filename = 'my_BSS_result.xlsx';
fetch('/app/static/output.xlsx', {cache: "no-store"}).then(response => response.blob())
.then(blob => {
//addresses IE
if (navigator.msSaveBlob) {
navigator.msSaveBlob(blob, filename);
}
else {
var link = document.createElement("a");
link.href = URL.createObjectURL(blob);
link.download = filename;
link.target = "_blank";
link.style.visibility = 'hidden';
link.dispatchEvent(new MouseEvent('click'))
URL.revokeObjectURL(url);
}
return response.text();
});
"""

Remove Files from Directory after uploading in Databricks using dbutils

A very clever person from StackOverflow assisted me in copying files to a directory from Databricks here:
copyfiles
I am using the same principle to remove the files once it has been copied as shown in the link:
for i in range (0, len(files)):
file = files[i].name
if now in file:
dbutils.fs.rm(files[i].path,'/mnt/adls2/demo/target/' + file)
print ('copied ' + file)
else:
print ('not copied ' + file)
However, I'm getting the error:
TypeError: '/mnt/adls2/demo/target/' has the wrong type - class bool is expected.
Can someone let me know how to fix this. I thought it would be simple matter of removing the file after originally copying it using command dbutils.fs.rm

If you want to delete all files from the following path: '/mnt/adls2/demo/target/', there is a simple command:
dbutils.fs.rm('/mnt/adls2/demo/target/', True)
Anyway, if you want to use your code, take a look at dbutils doc:
rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory
The second argument of the function is expected to be boolean, but your code has string with path:
dbutils.fs.rm(files[i].path, '/mnt/adls2/demo/target/' + file)
So your new code can be following:
for i in range (0, len(files)):
file = files[i].name
if now in file:
dbutils.fs.rm(files[i].path + file, True)
print ('copied ' + file)
else:
print ('not copied ' + file)

In order to remove the files from dbfs you can write this in any notebook
%fs rm -r dbfs:/user/sample_data.parquet

If you have huge number of files the deleting them in this way might take a lot of time. you can utilize spark parallelism to delete the files in parallel. Answer that I am providing is in scala but can be changed to python.
you can check if the directory exists or not using this function below:
import java.io._
def CheckPathExists(path:String): Boolean =
{
try
{
dbutils.fs.ls(path)
return true
}
catch
{
case ioe:java.io.FileNotFoundException => return false
}
}
You can define a function that is used to delete the files. you are creating this function inside an object and extends that object from Serializable class as below :
object Helper extends Serializable
{
def delete(directory: String): Unit = {
dbutils.fs.ls(directory).map(_.path).toDF.foreach { filePath =>
println(s"deleting file: $filePath")
dbutils.fs.rm(filePath(0).toString, true)
}
}
}
Now you can first check to see if the path exists and if it returns true then you can call the delete function to delete the files within the folder on multiple tasks.
val directoryPath = "<location"
val directoryExists = CheckPathExists(directoryPath)
if(directoryExists)
{
Helper.delete(directoryPath)
}

Disable Plotly in Python from communicating with the network in any form

Is it possible to get Plotly (used from within Python) to be "strictly local"? In other words, is it possible to use it in a way that guarantees it won't contact the network for any reason?
This includes things like the program trying to contact the Plotly service (since that was the business model), and also things like ensuring clicking anywhere in the generated html won't either have a link to Plotly or anywhere else.
Of course, I'd like to be able to do this on a production machine connected to the network, so pulling out the network connection is not an option.

Even a simple import plotly already attempts to connect to the network as this example shows:
import logging
logging.basicConfig(level=logging.INFO)
import plotly
The output is:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): api.plot.ly
The connection is made when the get_graph_reference() function is called while the graph_reference module is initialized.
One way to avoid connecting to plot.ly servers is to set an invalid plotly_api_domain in ~/.plotly/.config. For me, this is not an option as the software is run on the client’s machine and I do not want to modify their configuration file. Additionally, it is also not yet possible to change the configuration directory through an environment variable.
One work-around is to monkey-patch requests.get before importing plotly:
import requests
import inspect
original_get = requests.get
def plotly_no_get(*args, **kwargs):
one_frame_up = inspect.stack()[1]
if one_frame_up[3] == 'get_graph_reference':
raise requests.exceptions.RequestException
return original_get(*args, **kwargs)
requests.get = plotly_no_get
import plotly
This is surely not a full solution but, if nothing else, this shows that plot.ly is currently not meant to be run completely offline.

I think I have come up with a solution for this. First, you need download the open source Plotly.js file. Then I have a function, written below, that will produce the javascript from the python plot and reference your local copy of plotly-latest.min.js. See below:
import sys
import os
from plotly import session, tools, utils
import uuid
import json
def get_plotlyjs():
path = os.path.join('offline', 'plotly.min.js')
plotlyjs = resource_string('plotly', path).decode('utf-8')
return plotlyjs
def js_convert(figure_or_data,outfilename, show_link=False, link_text='Export to plot.ly',
validate=True):
figure = tools.return_figure_from_figure_or_data(figure_or_data, validate)
width = figure.get('layout', {}).get('width', '100%')
height = figure.get('layout', {}).get('height', 525)
try:
float(width)
except (ValueError, TypeError):
pass
else:
width = str(width) + 'px'
try:
float(width)
except (ValueError, TypeError):
pass
else:
width = str(width) + 'px'
plotdivid = uuid.uuid4()
jdata = json.dumps(figure.get('data', []), cls=utils.PlotlyJSONEncoder)
jlayout = json.dumps(figure.get('layout', {}), cls=utils.PlotlyJSONEncoder)
config = {}
config['showLink'] = show_link
config['linkText'] = link_text
config["displaylogo"]=False
config["modeBarButtonsToRemove"]= ['sendDataToCloud']
jconfig = json.dumps(config)
plotly_platform_url = session.get_session_config().get('plotly_domain',
'https://plot.ly')
if (plotly_platform_url != 'https://plot.ly' and
link_text == 'Export to plot.ly'):
link_domain = plotly_platform_url\
.replace('https://', '')\
.replace('http://', '')
link_text = link_text.replace('plot.ly', link_domain)
script = '\n'.join([
'Plotly.plot("{id}", {data}, {layout}, {config}).then(function() {{',
' $(".{id}.loading").remove();',
'}})'
]).format(id=plotdivid,
data=jdata,
layout=jlayout,
config=jconfig)
html="""<div class="{id} loading" style="color: rgb(50,50,50);">
Drawing...</div>
<div id="{id}" style="height: {height}; width: {width};"
class="plotly-graph-div">
</div>
<script type="text/javascript">
{script}
</script>
""".format(id=plotdivid, script=script,
height=height, width=width)
#html = html.replace('\n', '')
with open(outfilename, 'wb') as out:
#out.write(r'<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>')
out.write(r'<script src="plotly-latest.min.js"></script>')
for line in html.split('\n'):
out.write(line)
out.close()
print ('JS Conversion Complete')
The key lines that takes away all the links are:
config['showLink'] = show_link #False
....
config["modeBarButtonsToRemove"]= ['sendDataToCloud']
You would call the fuction as such to get a static HTML file that references your local copy of plotly open-sourced library:
fig = {
"data": [{
"x": [1, 2, 3],
"y": [4, 2, 5]
}],
"layout": {
"title": "hello world"
}
}
js_convert(fig, 'test.html')

I haven't done any extensive testing, but it looks like Plot.ly offers an "offline" mode:
https://plot.ly/python/offline/
A simple example:
from plotly.offline import plot
from plotly.graph_objs import Scatter
plot([Scatter(x=[1, 2, 3], y=[3, 1, 6])])
You can install Plot.ly via pip and then run the above script to produce a static HTML file:
$ pip install plotly
$ python ./above_script.py
When I run this from Terminal my web browser opens to the following file URL:
file:///some/path/to/temp-plot.html
This renders an interact graph that is completely local to your file system.

passing a python object into a casperjs script iterating over the object and returning a result object to python

I'm new to programing in languages more suited to the web, but I have programmed in vba for excel.
What I would like to do is:
pass a list (in python) to a casper.js script.
Inside the casperjs script I would like to iterate over the python object (a list of search terms)
In the casper script I would like to query google for search terms
Once queried I would like to store the results of these queries in an array, which I concatenate together while iterating over the python object.
Then once I have searched for all the search-terms and found results I would like to return the RESULTS array to python, so I can further manipulate the data.
QUESTION --> I'm not sure how to write the python function to pass an object to casper.
QUESTION --> I'm also not sure how to write the casper function to pass an javascript object back to python.
Here is my python code.
import os
import subprocess
scriptType = 'casperScript.js'
APP_ROOT = os.path.dirname(os.path.realpath(__file__))
PHANTOM = '\casperjs\bin\casperjs'
SCRIPT = os.path.join(APP_ROOT, test.js)
params = [PHANTOM, SCRIPT]
subprocess.check_output(params)
js CODE
var casper = require('casper').create();
casper.start('http://google.com/', function() {
this.echo(this.getTitle());
});
casper.run();

Could you use JSON to send the data to the script and then decode it when you get it back?
Python:
json = json.dumps(stuff) //Turn object into string to pass to js
Load a json file into python:
with open(location + '/module_status.json') as data_file:
data = json.load(data_file);
Deserialize a json string to an object in python
Javascript:
arr = JSON.parse(json) //Turn a json string to a js array
json = JSON.stringify(arr) //Turn an array to a json string ready to send to python

You can use two temporary files, one for input and the other for output of the casperjs script. woverton's answer is ok, but lacks a little detail. It is better to explicitly dump your JSON into a file than trying to parse the console messages from casperjs as they can be interleaved with debug strings and such.
In python:
import tempfile
import json
import os
import subprocess
APP_ROOT = os.path.dirname(os.path.realpath(__file__))
PHANTOM = '\casperjs\bin\casperjs'
SCRIPT = os.path.join(APP_ROOT, test.js)
input = tempfile.NamedTemporaryFile(mode="w", delete=False)
output = tempfile.NamedTemporaryFile(mode="r", delete=False)
yourObj = {"someKey": "someData"}
yourJSON = json.dumps(yourObj)
input.file.write(yourJSON)
# you need to close the temporary input and output file because casperjs does operations on them
input.file.close()
input = None
output.file.close()
print "yourJSON", yourJSON
# pass only file names
params = [PHANTOM, SCRIPT, input.name, output.name]
subprocess.check_output(params)
# you need to open the temporary output file again
output = open(output.name, "r")
yourReturnedJSON = json.load(output)
print "returned", yourReturnedJSON
output.close()
output = None
At the end, the temporary files will be automatically deleted when the objects are garbage collected.
In casperjs:
var casper = require('casper').create();
var fs = require("fs");
var input = casper.cli.raw.get(0);
var output = casper.cli.raw.get(1);
input = JSON.parse(fs.read(input));
input.casper = "done"; // add another property
input = JSON.stringify(input);
fs.write(output, input, "w"); // input written to output
casper.exit();
The casperjs script isn't doing anything useful. It just writes the inputfile to the output file with an added property.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get the current contents of the entire Jupyter Notebook - python

Related

Using python and suds, data not read by server side because element is not defined as an array

download an excel file from Bokeh server via "button"

Remove Files from Directory after uploading in Databricks using dbutils

Disable Plotly in Python from communicating with the network in any form

passing a python object into a casperjs script iterating over the object and returning a result object to python

Categories

Resources