Ipython Notebook: Elegant way of extracting method out? - python

As my notebook gets longer, I want to extract some code out, so the notebook would be easier to read.
For example, this is a cell/function that I want to extract from the notebook
def R_square_of(MSE, kde_result):
# R square measure:
# https://en.wikipedia.org/wiki/Coefficient_of_determination
y_mean = np.mean(kde_result)
SS_tot = np.power(kde_result - y_mean,2)
SS_tot_avg = np.average(SS_tot)
SS_res_avg = MSE
R_square = 1 - SS_res_avg/SS_tot_avg
return R_square
How can I do it effectively?
My thoughts:
It's pretty easy to create a my_helper.py and put the code above there, then from my_helper import *
The problem is that, I may use other package in the method (in this case, np, i.e. numpy), then I need to re-import numpy in my_helper.py. Can it re-use the environment created in ipython notebook, hence no need for re-importing?
If I change the code in my_helper.py, I need to restart the kernel to load the change(NameError: global name 'numpy' is not defined), this makes it difficult to change code in that file.

Instead of importing your other file, you could instead run it with the %run magic command:
In [1]: %run -i my_helper.py
-i: run the file in IPython’s namespace instead of an empty one. This is useful if you are experimenting with code written in a text editor which depends on variables defined interactively.
I'd still take the opportunity to recommend writing the file as a proper python module and importing it. This way you actually develop a codebase usable outside of the notebook environment. You could write tests for it or publish it somewhere.

Related

How to change IPython %pdb and %debug debugger?

By default, ipython uses ipdb as debugger with %pdb or %debug magics.
However, I much prefer pdb++... Is there a way of changing the debugger called with these magics ? (I am aware I can simply use pdb.xpm() on exception with pdb++, but I'd like to make it work with ipython magic commands so that I don't have to wrap the code each time...)
So at least for limited circumstances and not in a way I'd necessarily recommend, the answer here is yes. I can't promise the below will work outside the confines of what I did, but it might give you enough insight to play around with it yourself. Caution is warranted because it involves changing undocumented attributes of the ipython shell class at runtime. TLDR: I hunted down how ipython calls the debugger when the %pdb magic is on or when you call the %debug magic, and I updated it to use the debugger I wanted. Skip the next two paragraphs if you just want the approach that worked for me and don't care about the hunt.
Long version: when you run ipython it starts an instance of TerminalInteractiveShell, which has a debugger_cls attribute telling you the debugger that ipython will launch. Unfortunately, at the level of TerminalInteractiveShell, debugger_cls is actually a property of the class, and has no setter that lets you modify it. Rather, it either gets set to Pdb (actually a more featureful ipython Pdb than the traditional pdb) or TerminalPdb (even more features).
If you dig deeper, however, you find that debugger_cls gets passed up to InteractiveShell to initialize how tracebacks are handled. There it seems to disappear into the initialization of InteractiveShell's InteractiveTB property, but actually just ends up as the debugger_cls attribute of that (InteractiveTB) class (by setting the inherited attribute from TBTools). Finally, this debugger_cls attribute only gets used to set the pdb attribute (more or less by doing TBToolsInstance.pdb = TBToolsInstance.debugger_cls()) in one of several places. In any case, it turns out that these attributes can be changed! And if you change them correctly they will percolate to the shell itself! Importantly, this relies on the fact that ipython makes use of the Traitlets package to create a Singleton object for the shell, and this allows you to gain access to that object from within the terminal itself. More on that below.
Below I show the code you can run in the ipython shell to achieve your desired result. As an example, I'm replacing the default debugger (TerminalPdb) with a modified version I created that deals more nicely with certain list comprehensions (LcTerminalPdb). The process (which you can run in the ipython shell) is as follows.
# import the TerminalInteractiveShell class
from IPython.terminal.interactiveshell import TerminalInteractiveShell
# grab the specific instance of the already initialized ipython
shl = TerminalInteractiveShell().instance()
# grab the InteractiveTB attribute (which is a class)
tbClass = shl.InteractiveTB
# load the debugger class you want to use; I assume it's accessible on your path
from LcTerminalPdb import LcTerminalPdb
# change tbClass's debugger_cls to the debugger you want (for consistency
# with the next line)
tbClass.debugger_cls = LcTerminalPdb
# more importantly, set the pdb attribute to an instance of the class
tbClass.pdb = tbClass.debugger_cls()
# The above line is necessary if you already have the terminal running
# (and have entered pdb at least once); otherwise, ipython will run it on
# its own
That's it! Note that because you call the instance() method of TerminalInteractiveShell, you are grabbing the object for the currently running shell, which is why the modifications will affect the shell itself and so all following debugs. For a bonus, you can add these lines of code to your ipython_config.py file, so the debugger you want (LcTerminalPdb here) is always loaded with ipython:
c.InteractiveShellApp.exec_lines = [
'%pdb on',
'from LcTerminalPdb import LcTerminalPdb',
'from IPython.terminal.interactiveshell import TerminalInteractiveShell',
'shl = TerminalInteractiveShell().instance().InteractiveTB',
'shl.debugger_cls = LcTerminalPdb',
]
Note that above I don't need to write the extra shl.pdb = shl.debugger_cls() line, as ipython will take care of it the first time a debug point is entered. But feel free to, to be sure.
NOTES:
I have only tested this using LcTerminalPdb, and only briefly, but it seems to work appropriately
I suspect as long as other pdb debuggers have the same API as pdb (i.e. if they can be used by the PYTHONBREAKPOINT environment variable) then it should work
It's really unclear to me whether changing such deep attributes will have unexpected effects, so not sure how much I recommend this approach

Is there any magic command to not run a specific cell in Python? [duplicate]

I usually have to rerun (most parts of) a notebook when reopen it, in order to get access to previously defined variables and go on working.
However, sometimes I'd like to skip some of the cells, which have no influence to subsequent cells (e.g., they might comprise a branch of analysis that is finished) and could take very long time to run. These cells can be scattered throughout the notebook, so that something like "Run All Below" won't help much.
Is there a way to achieve this?
Ideally, those cells could be tagged with some special flags, so that they could be "Run" manually, but would be skipped when "Run All".
EDIT
%%cache (ipycache extension) as suggested by #Jakob solves the problem to some extent.
Actually, I don't even need to load any variables (which can be large but unnecessary for following cells) when re-run, only the stored output matters as analyzing results.
As a work-around, put %%cache folder/unique_identifier to the beginning of the cell. The code will be executed only once and no variables will be loaded when re-run unless you delete the unique_identifier file.
Unfortunately, all the output results are lost when re-run with %%cache...
EDIT II (Oct 14, 2013)
The master version of ipython+ipycache now pickles (and re-displays) the codecell output as well.
For rich display outputs including Latex, HTML(pandas DataFrame output), remember to use IPython's display() method, e.g., display(Latex(r'$\alpha_1$'))
Though this isn't exactly what you seem to be looking for, if you wish to entirely omit the execution of a cell (where no cached results are loaded), you can add the following hack at the beginning of a cell (assuming you are using a Unix-based OS):
%%script false
or a variant (working as of early 2020 -- see here for explanation):
%%script false --no-raise-error
Currently, there is no such feature included in the IPython notebook.
Nevertheless, there are some possibilities to make your life easier, like:
use the %store or maybe better the %%cache magic (extension) to store the results of these intermittently cells, so they don't have to be recomputed (see https://github.com/rossant/ipycache)
add a if==0: before the cells you don't want to execute
convert these cells to raw cells (but you will loose the already stored output!)
(see discussion at https://github.com/ipython/ipython/issues/2125)
Here's a simple and universal solution without the need for workarounds:
Simply type this as the top line of the cell to skip the cell:
%%script echo skipping
It's tested on Windows and Mac with recent Jupyter, and I think it should work on other Unix-like platforms as well because they also have an echo command. Some of the other proposed solutions are more platform-specific.
Of course you can put what ever text you like instead of "skipping". When you execute the cell, it will merely print this text instead of executing the code in the cell.
If no cached results are expected to be loaded, I find the Freeze nbextension quite useful to this end.
Although unofficial, I strongly recommend to give these notebook extensions a try if you have never used them before.
To install the extension machinery,
$ pip install jupyter_contrib_nbextensions && jupyter contrib nbextension install
To enable the Freeze extension, launch jupyter notebook and open a new notebook, from the menu select Edit > nbextensions config, and then check Freeze.
This question is a bit older, but the most convenient answer seems to be missing. You can use the 'initialization cells' from the NBextensions. Once installed/activated, you can in any notebook mark cells as 'initialization cells' which can then be run with a specific button.
Install NBextensions:here
Activate 'initialization cells' when you launch the jupyter dashboard
In your notebook, in the 'view' menu choose 'cell toolbar' then 'initialization cell'
Next to each cell there is now a checkbox. Mark all the cells you want to run for initialization
When reopening a notebook, click the button that looks like a pocket calculator to run all the initialization cells.
The simplest way to skip python code in jupyter notebook cell from running, I temporarily convert those cells to markdown.
The %%script false solution stopped working some time in 2019.
Here are some other available workarounds. These are based on programs ignoring their arguments when you tell them not to expect any. Here are some easy examples:
Perl:
%%perl -e0
​
for i in range(10): print(i)
Here you're running: perl -e '0' cellcontents
A more memorable version:
%%perl -eat
​
for i in range(10): print(i)
Here you're running: perl -e 'at' cellcontents
Bash:
%%bash -c :
for i in range(10): print(i)
: is a no-op in bash, so you're running: bash -c : cellcontents
I haven't looked at the external magic implementation code, but I'm pretty sure "cellcontents" are passed as arguments and won't be interpreted by shell by mistake, say if you were to include ; in them and accidentally inject some bad code. But I can't guarantee you that.
I'm sure you can come up with other creative solutions, by looking at the supported programs here: https://ipython.readthedocs.io/en/stable/interactive/magics.html#cell-magics
For the cells you wish to skip when you press Run All you can use try/except blocks, where you try to display the already calculated information in the try block, and perform the calculation in the except block.
Take this cell for example:
my_results = 2 # Assume this is a bigger calculation
print(my_results)
print('displayed my results.')
To skip the calculation for this cell in subsequent runs, change the contents of this cell to the following:
try:
print(my_results)
print('displayed state value')
except:
my_results = 2 # Assume this is a bigger calculation
print(my_results)
print('displayed newly calculated value')
The first time you run this cell, it will attempt to print the value of the state variable my_results. This throws an exception so it jumps to the except block and does the actual calculation of my_results (which in this case just makes it equate to 2). At the end of the first run the output for this cell would be:
2
displayed newly calculated value
When you run the cell a second time (whether that be manually or via Run All), the try block will first execute but this time since the variable is available in the state, it does not throw an exception. Instead it displays the result and the except block is never run. At the end of the second run the output of this cell would be:
2
displayed state value
This doesn't meet your explicit criteria that the cell should be completely skipped, but effectively the calculation is skipped.
If the displaying of the information is more complex than using a single print or display, it would probably be cleaner if you make the displaying routine into a function like so:
def print_my_results(my_result):
print(my_result)
print('displayed my result.')
try:
print_my_results(my_results)
except:
my_results = 2 # Assume this is a bigger calculation
print_my_results(my_results)

Calling python functions without running from the editor

Please excuse what I know is an incredibly basic question that I have nevertheless been unable to resolve on my own.
I'm trying to switch over my data analysis from Matlab to Python, and I'm struggling with something very basic: in Matlab, I write a function in the editor, and to use that function I simply call it from the command line, or within other functions. The function that I compose in the matlab editor is given a name at the function definition line, and it's generally best for the function name to match the .m file name to avoid confusion.
I don't understand how functions differ in Python, because I have not been successful translating the same approach there.
For instance, if I write a function in the Python editor (I'm using Python 2.7 and Spyder), simply saving the .py file and calling it by its name from the Python terminal does not work. I get a "function not defined" error. However, if I execute the function within Spyder's editor (using the "run file" button), not only does the code execute properly, from that point on the function is also call-able directly from the terminal.
So...what am I doing wrong? I fully appreciate that using Python isn't going to be identical to Matlab in every way, but it seems that what I'm trying to do isn't unreasonable. I simply want to be able to write functions and call them from the python command line, without having to run each and every one through the editor first. I'm sure my mistake here must be very simple, yet doing quite a lot of reading online hasn't led me to an answer.
Thanks for any information!
If you want to use functions defined in a particular file in Python you need to "import" that file first. This is similar to running the code in that file. Matlab doesn't require you to do this because it searches for files with a matching name and automagically reads in the code for you.
For example,
myFunction.py is a file containing
def myAdd(a, b):
return a + b
In order to access this function from the Python command line or another file I would type
from myFunction import myAdd
And then during this session I can type
myAdd(1, 2)
There are a couple of ways of using import, see here.
You need to a check for __main__ to your python script
def myFunction():
pass
if __name__ == "__main__":
myFunction()
then you can run your script from terminal like this
python myscript.py
Also if your function is in another file you need to import it
from myFunctions import myFunction
myFunction()
Python doesn't have MATLAB's "one function per file" limitation. You can have as many functions as you want in a given file, and all of them can be accessed from the command line or from other functions.
Python also doesn't follow MATLAB's practice of always automatically making every function it can find usable all the time, which tends to lead to function name collisions (two functions with the same name).
Instead, Python uses the concept of a "module". A module is just a file (your .py file). That file can have zero or more functions, zero or more variables, and zero or more classes. When you want to use something from that file, you just import it.
So say you have a file 'mystuff.py':
X = 1
Y = 2
def myfunc1(a, b):
do_something
def myfunc2(c, d):
do_something
And you want to use it, you can just type import mystuff. You can then access any of the variables or functions in mystuff. To call myfunc2, you can just do mystuff.myfunc2(z, w).
What basically happens is that when you type import mystuff, it just executes the code in the file, and makes all the variables that result available from mystuff.<varname>, where <varname> is the name of the variable. Unlike in MATLAB, Python functions are treated like any other variable, so they can be accessed just like any other variable. The same is true with classes.
There are other ways to import, too, such as from mystuff import myfunc.
You run python programs by running them with
python program.py

Ignore the rest of the python file

My python scripts often contain "executable code" (functions, classes, &c) in the first part of the file and "test code" (interactive experiments) at the end.
I want python, py_compile, pylint &c to completely ignore the experimental stuff at the end.
I am looking for something like #if 0 for cpp.
How can this be done?
Here are some ideas and the reasons they are bad:
sys.exit(0): works for python but not py_compile and pylint
put all experimental code under def test():: I can no longer copy/paste the code into a python REPL because it has non-trivial indent
put all experimental code between lines with """: emacs no longer indents and fontifies the code properly
comment and uncomment the code all the time: I am too lazy (yes, this is a single key press, but I have to remember to do that!)
put the test code into a separate file: I want to keep the related stuff together
PS. My IDE is Emacs and my python interpreter is pyspark.
Use ipython rather than python for your REPL It has better code completion and introspection and when you paste indented code it can automatically "de-indent" the pasted code.
Thus you can put your experimental code in a test function and then paste in parts without worrying and having to de-indent your code.
If you are pasting large blocks that can be considered individual blocks then you will need to use the %paste or %cpaste magics.
eg.
for i in range(3):
i *= 2
# with the following the blank line this is a complete block
print(i)
With a normal paste:
In [1]: for i in range(3):
...: i *= 2
...:
In [2]: print(i)
4
Using %paste
In [3]: %paste
for i in range(10):
i *= 2
print(i)
## -- End pasted text --
0
2
4
In [4]:
PySpark and IPython
It is also possible to launch PySpark in IPython, the enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To use IPython, set the IPYTHON variable to 1 when running bin/pyspark:1
$ IPYTHON=1 ./bin/pyspark
Unfortunately, there is no widely (or any) standard describing what you are talking about, so getting a bunch of python specific things to work like this will be difficult.
However, you could wrap these commands in such a way that they only read until a signifier. For example (assuming you are on a unix system):
cat $file | sed '/exit(0)/q' |sed '/exit(0)/d'
The command will read until 'exit(0)' is found. You could pipe this into your checkers, or create a temp file that your checkers read. You could create wrapper executable files on your path that may work with your editors.
Windows may be able to use a similar technique.
I might advise a different approach. Separate files might be best. You might explore iPython notebooks as a possible solution, but I'm not sure exactly what your use case is.
Follow something like option 2.
I usually put experimental code in a main method.
def main ():
*experimental code goes here *
Then if you want to execute the experimental code just call the main.
main()
With python-mode.el mark arbitrary chunks as section - for example via py-sectionize-region.
Than call py-execute-section.
Updated after comment:
python-mode.el is delivered by melpa.
M-x list-packages RET
Look for python-mode - the built-in python.el provides 'python, while python-mode.el provides 'python-mode.
Developement just moved hereto: https://gitlab.com/python-mode-devs/python-mode
I think the standard ('Pythonic') way to deal with this is to do it like so:
class MyClass(object):
...
def my_function():
...
if __name__ == '__main__':
# testing code here
Edit after your comment
I don't think what you want is possible using a plain Python interpreter. You could have a look at the IEP Python editor (website, bitbucket): it supports something like Matlab's cell mode, where a cell can be defined with a double comment character (##):
## main code
class MyClass(object):
...
def my_function():
...
## testing code
do_some_testing_please()
All code from a ##-beginning line until either the next such line or end-of-file constitutes a single cell.
Whenever the cursor is within a particular cell and you strike some hotkey (default Ctrl+Enter), the code within that cell is executed in the currently running interpreter. An additional feature of IEP is that selected code can be executed with F9; a pretty standard feature but the nice thing here is that IEP will smartly deal with whitespace, so just selecting and pasting stuff from inside a method will automatically work.
I suggest you use a proper version control system to keep the "real" and the "experimental" parts separated.
For example, using Git, you could only include the real code without the experimental parts in your commits (using add -p), and then temporarily stash the experimental parts for running your various tools.
You could also keep the experimental parts in their own branch which you then rebase on top of the non-experimental parts when you need them.
Another possibility is to put tests as doctests into the docstrings of your code, which admittedly is only practical for simpler cases.
This way, they are only treated as executable code by the doctest module, but as comments otherwise.

Python - How to save functions

I´m starting in python. I have four functions and are working OK. What I want to do is to save them. I want to call them whenever I want in python.
Here's the code my four functions:
import numpy as ui
def simulate_prizedoor(nsim):
sim=ui.random.choice(3,nsim)
return sims
def simulate_guess(nsim):
guesses=ui.random.choice(3,nsim)
return guesses
def goat_door(prizedoors, guesses):
result = ui.random.randint(0, 3, prizedoors.size)
while True:
bad = (result == prizedoors) | (result == guesses)
if not bad.any():
return result
result[bad] = ui.random.randint(0, 3, bad.sum())
def switch_guesses(guesses, goatdoors):
result = ui.random.randint(0, 3, guesses.size)
while True:
bad = (result == guesses) | (result == goatdoors)
if not bad.any():
return result
result[bad] = ui.random.randint(0, 3, bad.sum())
What you want to do is to take your Python file, and use it as a module or a library.
There's no way to make those four functions automatically available, no matter what, 100% percent of the time, but you can do something very close.
For example, at the top of your file, you imported numpy. numpy is a module or library which has been set up so it's available any time you run python, as long as you import it.
You want to do the same thing -- save those 4 functions into a file, and import them whenever you want them.
For example, if you copy and paste those four functions into a file named foobar.py, then you can simply do from foobar import *. However, this will only work if you're running Python in the same folder where you saved your code.
If you want to make your module available system-wide, you have to save it somewhere on the PYTHONPATH. Usually, saving it to C:\Python27\Lib\site-packages will work (assuming you're running Windows).
If you decide to put them anywhere in your project folder don`t forget to create a blank init.py file so python can see them. A better answer can be provided here : http://docs.python.org/2/tutorial/modules.html
Save them in a file - this makes them a module.
If you put them in a file called mymod.py, in python you can load them as follows
from mymod import *
simulate_prizedoor(23)
Quick solution, without having to explicitly create a file - relies on IPython and its storemagic
IPython 4.0.1 -- An enhanced Interactive Python.
details.
In [1]: def func(a):
...: print a
...:
In [2]: func = _i #gets the previous input
In [3]: store func #store(magic) the input
#(auto-magic enabled or would need '%store')
Stored 'func' (unicode)
In [4]: exit
IPython 4.0.1 -- An enhanced Interactive Python.
In [1]: store -r func #retrieve stored string
In [2]: exec func #execute string as python code
In [3]: func(10)
10
Once you had stored all your functions just once, then you can restore them all with store -r, and then exec func once for each function, in each new session.
(Came across this question while looking for a solution for 'quick saving' functions (most convenient way) while in an interactive python session - adding my current best solution for future readers)

Categories

Resources