How to make pdb recognize that the source has changed between runs? - python

From what I can tell, pdb does not recognize when the source code has changed between "runs". That is, if I'm debugging, notice a bug, fix that bug, and rerun the program in pdb (i.e. without exiting pdb), pdb will not recompile the code. I'll still be debugging the old version of the code, even if pdb lists the new source code.
So, does pdb not update the compiled code as the source changes? If not, is there a way to make it do so? I'd like to be able to stay in a single pdb session in order to keep my breakpoints and such.
FWIW, gdb will notice when the program it's debugging changes underneath it, though only on a restart of that program. This is the behavior I'm trying to replicate in pdb.

The following mini-module may help. If you import it in your pdb session, then you can use:
pdb> pdbs.r()
at any time to force-reload all non-system modules except main. The code skips that because it throws an ImportError('Cannot re-init internal module main') exception.
# pdbs.py - PDB support
from __future__ import print_function
def r():
"""Reload all non-system modules, to reload stuff on pbd restart. """
import importlib
import sys
# This is likely to be OS-specific
SYS_PREFIX = '/usr/lib'
for k, v in list(sys.modules.items()):
if (
k == "__main__" or
k.startswith("pdb") or
not getattr(v, "__file__", None)
or v.__file__.startswith(SYS_PREFIX)
):
continue
print("reloading %s [%s]" % (k, v.__file__), file=sys.stderr)
importlib.reload(v)

Based on #pourhaus answer (from 2014), this recipe augments the pdb++ debugger with a reload command (expected to work on both Linux & Windows, on any Python installation).
TIP: the new reload command accepts an optional list of module-prefixes to reload (and to exclude), not to break already loaded globals when resuming debugging.
Just insert the following Python-3.6 code into your ~/.pdbrc.py file:
## Augment `pdb++` with a `reload` command
#
# See https://stackoverflow.com/questions/724924/how-to-make-pdb-recognize-that-the-source-has-changed-between-runs/64194585#64194585
from pdb import Pdb
def _pdb_reload(pdb, modules):
"""
Reload all non system/__main__ modules, without restarting debugger.
SYNTAX:
reload [<reload-module>, ...] [-x [<exclude-module>, ...]]
* a dot(`.`) matches current frame's module `__name__`;
* given modules are matched by prefix;
* any <exclude-modules> are applied over any <reload-modules>.
EXAMPLES:
(Pdb++) reload # reload everything (brittle!)
(Pdb++) reload myapp.utils # reload just `myapp.utils`
(Pdb++) reload myapp -x . # reload `myapp` BUT current module
"""
import importlib
import sys
## Derive sys-lib path prefix.
#
SYS_PREFIX = importlib.__file__
SYS_PREFIX = SYS_PREFIX[: SYS_PREFIX.index("importlib")]
## Parse args to decide prefixes to Include/Exclude.
#
has_excludes = False
to_include = set()
# Default prefixes to Exclude, or `pdb++` will break.
to_exclude = {"__main__", "pdb", "fancycompleter", "pygments", "pyrepl"}
for m in modules.split():
if m == "-x":
has_excludes = True
continue
if m == ".":
m = pdb._getval("__name__")
if has_excludes:
to_exclude.add(m)
else:
to_include.add(m)
to_reload = [
(k, v)
for k, v in sys.modules.items()
if (not to_include or any(k.startswith(i) for i in to_include))
and not any(k.startswith(i) for i in to_exclude)
and getattr(v, "__file__", None)
and not v.__file__.startswith(SYS_PREFIX)
]
print(
f"PDB-reloading {len(to_reload)} modules:",
*[f" +--{k:28s}:{getattr(v, '__file__', '')}" for k, v in to_reload],
sep="\n",
file=sys.stderr,
)
for k, v in to_reload:
try:
importlib.reload(v)
except Exception as ex:
print(
f"Failed to PDB-reload module: {k} ({v.__file__}) due to: {ex!r}",
file=sys.stderr,
)
Pdb.do_reload = _pdb_reload

What do you mean by "rerun the program in pdb?" If you've imported a module, Python won't reread it unless you explicitly ask to do so, i.e. with reload(module). However, reload is far from bulletproof (see xreload for another strategy).
There are plenty of pitfalls in Python code reloading. To more robustly solve your problem, you could wrap pdb with a class that records your breakpoint info to a file on disk, for example, and plays them back on command.
(Sorry, ignore the first version of this answer; it's early and I didn't read your question carefully enough.)

I decided to comment some lines in my input script, and after
(Pdb) run
I got pdb to recognize that change. The bad thing: it runs the script from the beginning. The good things below.
(Pdb) help run
run [args...]
Restart the debugged python program. If a string is supplied
it is split with "shlex", and the result is used as the new
sys.argv. History, breakpoints, actions and debugger options
are preserved. "restart" is an alias for "run".

May not work for more complex programs, but for a simple example using importlib.reload() using Python v3.5.3:
[user#machine ~] cat test.py
print('Test Message')
#
# start and run with debugger
#
[user#machine ~] python3 -m pdb test.py
> /home/user/test.py(1)<module>()
-> print('Test Message')
(Pdb) c
Test Message
The program finished and will be restarted
> /home/user/test.py(1)<module>()
-> print('Test Message')
#
# in another terminal, change test.py to say "Changed Test Message"
#
#
# back in PDB:
#
(Pdb) import importlib; import test; importlib.reload(test)
Changed Test Message
<module 'test' from '/home/user/test.py'>
(Pdb) c
Test Message
The program finished and will be restarted
> /home/user/test.py(1)<module>()
-> print('Changed Test Message')
(Pdb) c
Changed Test Message
The program finished and will be restarted
> /home/user/test.py(1)<module>()
-> print('Changed Test Message')

ipdb %autoreload extension
6.2.0 docs document http://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html#module-IPython.extensions.autoreload :
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from foo import some_function
In [4]: some_function()
Out[4]: 42
In [5]: # open foo.py in an editor and change some_function to return 43
In [6]: some_function()
Out[6]: 43

Related

Is it possible to let the entire R Studio program run from Python? [duplicate]

I searched for this question and found some answers on this, but none of them seem to work. This is the script that I'm using in python to run my R script.
import subprocess
retcode = subprocess.call("/usr/bin/Rscript --vanilla -e 'source(\"/pathto/MyrScript.r\")'", shell=True)
and I get this error:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: source ... withVisible -> eval -> eval -> read.csv -> read.table
Execution halted
and here is the content of my R script (pretty simple!)
data = read.csv('features.csv')
data1 = read.csv("BagofWords.csv")
merged = merge(data,data1)
write.table(merged, "merged.csv",quote=FALSE,sep=",",row.names=FALSE)
for (i in 1:length(merged$fileName))
{
fileConn<-file(paste("output/",toString(merged$fileName[i]),".txt",sep=""))
writeLines((toString(merged$BagofWord[i])),fileConn)
close(fileConn)
}
The r script is working fine, when I use source('MyrScript.r') in r commandline. Moreover, when I try to use the exact command which I pass to the subprocess.call function (i.e., /usr/bin/Rscript --vanilla -e 'source("/pathto/MyrScript.r")') in my commandline it works find, I don't really get what's the problem.
I would not trust too much the source within the Rscript call as you may not completely understand where are you running your different nested R sessions. The process may fail because of simple things such as your working directory not being the one you think.
Rscript lets you directly run an script (see man Rscript if you are using Linux).
Then you can do directly:
subprocess.call ("/usr/bin/Rscript --vanilla /pathto/MyrScript.r", shell=True)
or better parsing the Rscript command and its parameters as a list
subprocess.call (["/usr/bin/Rscript", "--vanilla", "/pathto/MyrScript.r"])
Also, to make things easier you could create an R executable file. For this you just need to add this in the first line of the script:
#! /usr/bin/Rscript
and give it execution rights. See here for detalis.
Then you can just do your python call as if it was any other shell command or script:
subprocess.call ("/pathto/MyrScript.r")
I think RPy2 is worth looking into, here is a cool presentation on R-bloggers.com to get you started:
http://www.r-bloggers.com/accessing-r-from-python-using-rpy2/
Essentially, it allows you to have access to R libraries with R objects that provides both a high level and low level interface.
Here are the docs on the most recent version: https://rpy2.github.io/doc/latest/html/
I like to point Python users to Anaconda, and if you use the package manager, conda, to install rpy2, it will also ensure you install R.
$ conda install rpy2
And here's a vignet based on the documents' introduction:
>>> from rpy2 import robjects
>>> pi = robjects.r['pi']
>>> pi
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7fde1c00a088 / R:0x562b8fbbe118>
[3.141593]
>>> from rpy2.robjects.packages import importr
>>> base = importr('base')
>>> utils = importr('utils')
>>> import rpy2.robjects.packages as rpackages
>>> utils = rpackages.importr('utils')
>>> packnames = ('ggplot2', 'hexbin')
>>> from rpy2.robjects.vectors import StrVector
>>> names_to_install = [x for x in packnames if not rpackages.isinstalled(x)]
>>> if len(names_to_install) > 0:
... utils.install_packages(StrVector(names_to_install))
And running an R snippet:
>>> robjects.r('''
... # create a function `f`
... f <- function(r, verbose=FALSE) {
... if (verbose) {
... cat("I am calling f().\n")
... }
... 2 * pi * r
... }
... # call the function `f` with argument value 3
... f(3)
... ''')
R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7fde1be0d8c8 / R:0x562b91196b18>
[18.849556]
And a small self-contained graphics demo:
from rpy2.robjects.packages import importr
graphics = importr('graphics')
grdevices = importr('grDevices')
base = importr('base')
stats = importr('stats')
import array
x = array.array('i', range(10))
y = stats.rnorm(10)
grdevices.X11()
graphics.par(mfrow = array.array('i', [2,2]))
graphics.plot(x, y, ylab = "foo/bar", col = "red")
kwargs = {'ylab':"foo/bar", 'type':"b", 'col':"blue", 'log':"x"}
graphics.plot(x, y, **kwargs)
m = base.matrix(stats.rnorm(100), ncol=5)
pca = stats.princomp(m)
graphics.plot(pca, main="Eigen values")
stats.biplot(pca, main="biplot")
The following code should work out:
import rpy2.robjects as robjects
robjects.r.source("/pathto/MyrScript.r", encoding="utf-8")
I would not suggest using a system call to there are many differences between python and R especially when passing around data.
There are many standard libraries to call R from Python to choose from see this answer
If you just want run a script then you can use system("shell command") of the sys lib available by import sys. If you have an usefull output you can print the result by " > outputfilename" at the end of your shell command.
For example:
import sys
system("ls -al > output.txt")
Try adding a line to the beginning of your R script that says:
setwd("path-to-working-directory")
Except, replace the path with the path to the folder containing the files features.csv and BagofWords.csv.
I think the problem you are having is because when you run this script from R your working directory is already the correct path, but when you run the script from python, it defaults to a working directory somewhere else (likely the top of the user directory).
By adding the extra line at the beginning of your R script, you are explicitly setting the working directory and the code to read in these files will work. Alternatively, you could replace the filenames in read.csv() with the full filepaths of these files.
#dmontaner suggested this possibility in his answer:
The process may fail because of simple things such as your working directory not being the one you think.

How to compile multiple subprocess python files into single .exe file using pyinstaller

I have a similar question to this one:Similar Question.
I have a GUI and where the user can input information and the other scripts use some of that information to run.I have 4 different scripts for each button. I run them as a subprocess so that the main gui doesn’t act up or say that it’s not responding. This is an example of what I have since the code is really long since I used PAGE to generate the gui.
###Main.py#####
import subprocess
def resource_path(relative_path):
#I got this from another post to include images but I'm also using it to include the scripts"
try:
# PyInstaller creates a temp folder and stores path in _MEIPASS
base_path = sys._MEIPASS
except Exception:
base_path = os.path.abspath(".")
return os.path.join(base_path, relative_path)
Class aclass:
def get_info(self):
global ModelNumber, Serial,SpecFile,dateprint,Oper,outputfolder
ModelNumber=self.Model.get()
Serial=self.SerialNumber.get()
outputfolder=self.TEntry2.get()
SpecFile= self.Spec_File.get()
return ModelNumber,Serial,SpecFile,outputfolder
def First(self):
aclass.get_info(self) #Where I use the resource path function
First_proc = subprocess.Popen([sys.executable, resource_path('first.py'),str(ModelNumber),str(Serial),str(path),str(outputfolder)])
First_proc.wait()
#####First.py#####
import numpy as np
import scipy
from main import aclass
ModelNumber = sys.argv[1]
Serial = sys.argv[2]
path = sys.argv[3]
path_save = sys.argv[4]
and this goes on for my second,third, and fourth scripts.
In my spec file, I added:
a.datas +=[('first.py','C\\path\\to\\script\\first.py','DATA')]
a.datas +=[('main.py','C\\path\\to\\script\\main.py','DATA')]
this compiles and it works, but when I try to convert it to an .exe, it crashes because it can't import first.py properly and its own libraries (numpy,scipy....etc). I've tried adding it to the a.datas, and runtime_hooks=['first.py'] in the spec file...and I can't get it to work. Any ideas? I'm not sure if it's giving me this error because it is a subprocess.
Assuming you can't restructure your app so this isn't necessary (e.g., by using multiprocessing instead of subprocess), there are three solutions:
Ensure that the .exe contains the scripts as an (executable) zipfile—or just use pkg_resources—and copy the script out to a temporary directory so you can run it from there.
Write a multi-entrypoint wrapper script that can be run as your main program, and also run as each script—because, while you can't run a script out of the packed exe, you can import a module out of it.
Using pkg_resources again, write a wrapper that runs the script by loading it as a string and running it with exec instead.
The second one is probably the cleanest, but it is a bit of work. And, while we could rely on setuptools entrypoints to some of the work, trying to explain how to do this is much harder than explaining how to do it manually,1 so I'm going to do the latter.
Let's say your code looked like this:
# main.py
import subprocess
import sys
spam, eggs = sys.argv[1], sys.argv[2]
subprocess.run([sys.executable, 'vikings.py', spam])
subprocess.run([sys.executable, 'waitress.py', spam, eggs])
# vikings.py
import sys
print(' '.join(['spam'] * int(sys.argv[1])))
# waitress.py
import sys
import time
spam, eggs = int(sys.argv[1]), int(sys.argv[2]))
if eggs > spam:
print("You can't have more eggs than spam!")
sys.exit(2)
print("Frying...")
time.sleep(2)
raise Exception("This sketch is getting too silly!")
So, you run it like this:
$ python3 main.py 3 4
spam spam spam
You can't have more eggs than spam!
We want to reorganize it so there's a script that looks at the command-line arguments to decide what to import. Here's the smallest change to do that:
# main.py
import subprocess
import sys
if sys.argv[1][:2] == '--':
script = sys.argv[1][2:]
if script == 'vikings':
import vikings
vikings.run(*sys.argv[2:])
elif script == 'waitress':
import waitress
waitress.run(*sys.argv[2:])
else:
raise Exception(f'Unknown script {script}')
else:
spam, eggs = sys.argv[1], sys.argv[2]
subprocess.run([sys.executable, __file__, '--vikings', spam])
subprocess.run([sys.executable, __file__, '--waitress', spam, eggs])
# vikings.py
def run(spam):
print(' '.join(['spam'] * int(spam)))
# waitress.py
import sys
import time
def run(spam, eggs):
spam, eggs = int(spam), int(eggs)
if eggs > spam:
print("You can't have more eggs than spam!")
sys.exit(2)
print("Frying...")
time.sleep(2)
raise Exception("This sketch is getting too silly!")
And now:
$ python3 main.py 3 4
spam spam spam
You can't have more eggs than spam!
A few changes you might want to consider in real life:
DRY: We have the same three lines of code copied and pasted for each script, and we have to type each script name three times. You can just use something like __import__(sys.argv[1][2:]).run(sys.argv[2:]) with appropriate error handling.
Use argparse instead of this hacky special casing for the first argument. If you're already sending non-trivial arguments to the scripts, you're probably already using argparse or an alternative anyway.
Add an if __name__ == '__main__': block to each script that just calls run(sys.argv[1:]), so that during development you can still run the scripts directly to test them.
I didn't do any of these because they'd obscure the idea for this trivial example.
1 The documentation is great as a refresher if you've already done it, but as a tutorial and explanatory rationale, not so much. And trying to write the tutorial that the brilliant PyPA guys haven't been able to come up with for years… that's probably beyond the scope of an SO answer.

How to do manual reload of file in iPython shell

I have a file called sub.py, and I want to be able to call functions in it from the iPython shell. The iPython autoreload functionality has not been working very well, though. Sometimes it detects changes, sometimes it doesn't.
Instead of debugging autoreload, I was wondering if there's a way to just manually reload, or unload and load, modules in iPython. Currently I terminate the shell, start it again, re-import my module, and go from there. It would be great to be able to do a manual reload without killing the iPython shell.
I find my homebrewed %reimport to be very useful in this context:
def makemagic(f):
name = f.__name__
if name.startswith('magic_'): name = name[6:]
def wrapped(throwaway, *pargs, **kwargs): return f(*pargs,**kwargs)
if hasattr(f, '__doc__'): wrapped.__doc__ = f.__doc__
get_ipython().define_magic(name, wrapped)
return f
#makemagic
def magic_reimport(dd):
"""
The syntax
%reimport foo, bar.*
is a shortcut for the following:
import foo; foo = reload(foo)
import bar; bar = reload(bar); from bar import *
"""
ipython = get_ipython().user_ns
for d in dd.replace(',', ' ').split(' '):
if len(d):
bare = d.endswith('.*')
if bare: d = d[:-2]
exec('import xx; xx = reload(xx)'.replace('xx', d), ipython)
if bare: exec('from xx import *'.replace('xx', d), ipython)
Once gotcha is that, when there are sub-modules of packages involved, you have to reimport the sub-module, and then the top-level package:
reimport foo.bar, foo

Python Cmd Tab Completion Problems

I've got an application I'm currently working on for our company. Its currently built around Python's Cmd module, and features tab-completion for a number of tasks.
For some reason however, the Tab completion only currently works on one machine in the building - running the scripts from other machines doesn't allow the tab completion.
Here's the offending code parts:
def populate_jobs_list():
global avail_jobs
avail_jobs = os.walk(rootDir()).next()[1]
print avail_jobs
...
def complete_job(self, text, line, start_index, end_index):
global avail_jobs
populate_jobs_list()
if text:
return [
jobs for jobs in avail_jobs
if jobs.startswith(text)
]
else:
return avail_jobs
def do_job(self, args):
pass
split_args = args.rsplit()
os.environ['JOB'] = args
job_dir = os.path.join( rootDir(), os.getenv('JOB'))
os.environ['JOB_PROPS'] = (job_dir + '\\job_format.opm')
if not os.path.isdir(job_dir):
print 'Job does not exist. Try again.'
return
else:
print('Jobbed into: ' + os.getenv('JOB'))
return
populate_jobs_list()
prompt = outPrompt()
prompt.prompt = '\> '
prompt.cmdloop('Loading...')
Am I missing something obvious here? Just to clarify, on machine A, the tab completion works as intended. When its run on any other machine in the building, it fails to complete.
Check if the environment variable PYTHONSTARTUP is set properly. It should point to a script which in turn needs to do sth like this:
try:
import readline
except ImportError:
sys.stdout.write("No readline module found, no tab completion available.\n")
else:
import rlcompleter
readline.parse_and_bind('tab: complete')
Maybe (some part of) this is only done properly on the one working machine?
Maybe the readline module is available only on the one working machine?

Python pdb (debugger) disp equivalent?

Is there a pdb equivalent to disp in gdb?
E.g. when I'm debugging C using gdb I can have variables printed on every 'step' through the code by typing:
disp var
When I'm debugging python using pdb I would like similar functionality, but disp does not seem to be there, the python pdb documentation does not seem to offer an alternative - but it seems like an odd omission?
The code bellow uses Python introspection features to add two new commands to the PDB module 0
just put the given function, and its call in a separate module, and import this module before starting debugging - you should have the 'disp' and 'undisp' commands do add and retract watches to variables.
It works by monkeypatching Python's pdb module, which is written in pure python.
# -*- coding: utf-8 -*-
def patch_pdb():
import pdb
def wrap(func):
def new_postcmd(self, *args, **kw):
result = func(self, *args, **kw)
if hasattr(self, "curframe") and self.curframe and hasattr(self, "watch_list"):
for arg in self.watch_list:
try:
print >> self.stdout, "%s: %s"% (arg, self._getval(arg)) + ", ",
except:
pass
self.stdout.write("\n")
return result #func(self, *args, **kw)
return new_postcmd
pdb.Pdb.postcmd = wrap(pdb.Pdb.postcmd)
def do_disp(self, arg):
if not hasattr(self, "watch_list"):
self.watch_list = []
self.watch_list.append(arg)
pdb.Pdb.do_disp = do_disp
def do_undisp(self, arg):
if hasattr(self, "watch_list"):
try:
self.watch_list.remove(arg)
except:
pass
pdb.Pdb.do_undisp = do_undisp
patch_pdb()
if __name__ == "__main__":
# for testing
import pdb; pdb.set_trace()
a = 0
for i in range(10):
print i
a += 2
Unfortunately I could only make it display the state of the variables as they where previously to the execution of the last command. (I tried a little bit, but monkeypatching the bdb module, which is the base for the Pdb did not seem to work as well). You can try and change the methods in either pdb.Pdb, bdb.Bdb or cmd.Cmd that are decorated by wrap to find one that is called after the debugged frame state has changed.
You can set up some aliases that will do this for you:
alias n next;; p var
alias s step;; p var
Printing a whole list of variable names is left as an exercise to the reader. Unfortunately doing it this way means that when you send the debugger an empty line, the "last command" it executes is p var rather than, for example, n. If you want to fix that, then you can use this somewhat hacky set of Pdb commands instead:
!global __stack; from inspect import stack as __stack
!global __Pdb; from pdb import Pdb as __Pdb
!global __pdb; __pdb = [__framerec[0].f_locals.get("pdb") or __framerec[0].f_locals.get("self") for __framerec in __stack() if (__framerec[0].f_locals.get("pdb") or __framerec[0].f_locals.get("self")).__class__ == __Pdb][-1]
alias s step;; p var;; !__pdb.lastcmd = "!__pdb.cmdqueue.append('s')"
alias n next;; p var;; !__pdb.lastcmd = "!__pdb.cmdqueue.append('n')"
During the pdb debugging you can type normal python code, beyond the one letter commands - so just using print var should work for you.

Categories

Resources