Deciphering large program flow in Python

Deciphering large program flow in Python - python

I'm in the process of learning how a large (356-file), convoluted Python program is set up. Besides manually reading through and parsing the code, are there any good methods for following program flow?
There are two methods which I think would be useful:
Something similar to Bash's "set -x"
Something that displays which file outputs each line of output
Are there any methods to do the above, or any other ways that you have found useful?

I don't know if this is actually a good idea, but since I actually wrote a hook to display the file and line before each line of output to stdout, I might as well give it to you…
import inspect, sys
class WrapStdout(object):
_stdout = sys.stdout
def write(self, buf):
frame = sys._getframe(1)
try:
f = inspect.getsourcefile(frame)
except TypeError:
f = 'unknown'
l = frame.f_lineno
self._stdout.write('{}:{}:{}'.format(f, l, buf))
def flush(self):
self._stdout.flush()
sys.stdout = WrapStdout()
Just save that as a module, and after you import it, every chunk of stdout will be prefixed with file and line number.
Of course this will get pretty ugly if:
Anyone tries to print partial lines (using stdout.write directly, or print magic comma in 2.x, or end='' in 3.x).
You mix Unicode and non-Unicode in 2.x.
Any of the source files have long pathnames.
etc.
But all the tricky deep-Python-magic bits are there; you can build on top of it pretty easily.

Could be very tedious, but using a debugger to trace the flow of execution, instruction by instruction could probably help you to some extent.
import pdb
pdb.set_trace()

You could look for a cross reference program. There is an old program called pyxr that does this. The aim of cross reference is to let you know how classes refer to each other. Some of the IDE's also do this sort of thing.

I'd recommend running the program inside an IDE like pydev or pycharm. Being able to stop the program and inspect its state can be very helpful.

Related

Apply textwrap.fill to every print on python script

I would like to have my print output length limited to X characters.
I've been looking for some info and I found the command textwrap.fill which makes just what I was looking for by using something like:
print(textwrap.fill("Hello world", X))
However, I was wondering if there is a way to apply this length limitation to every print without having to write it (I do have plenty of them) by creating or setting a class or something at the very beginning of the script.

Monkeypatching print is doable, but it's not a good idea. For one thing, what do you want to happen if someone does a print(spam, eggs)? Or print(spam, end='')? Or print(spam, file=outfile)?
A better solution is probably replacing sys.stdout with a wrapper.
The normal sys.stdout is a plain old text file object, a TextIOWrapper just like the ones you get from open, except that when you write to it, it goes to the console instead of to a file on disk.
And you're allowed to replace it with anything else that meets the TextIOBase protocol.
And writing a TextIOBase is really simple. All you really need to implement is write and/or read and readline (depending on whether you're wrapping output, input, or both), and all our wrapper needs to do in write is to buffer up lines, fill them, and pass them to the real file object underneath.
Like this:
import io
import sys
import textwrap
class Filler(io.TextIOBase):
def __init__(self, file, width=70):
self.file = file
self.textwrapper = textwrap.TextWrapper(width=width)
self.buf = ''
def write(self, buf):
self.buf += buf
lines = self.buf.split('\n')
self.buf = lines.pop()
for line in lines:
self.file.write(self.textwrapper.fill(line) + '\n')
def close(self):
if self.buf:
self.file.write(self.textwrapper.fill(buf))
self.buf = ''
self.file.close()
sys.stdout = Filler(sys.stdout, 32)
print('Spam spam spam spammity ' * 10)
print('Spam', 'Eggs')
sys.stdout.textwrapper.width = 72
print('Spam ' + 'spam ' * 50, 'and eggs', sep='... ')
print('Spam', end=' ')
print('Eggs', end=' ')
print('Cheese')
Technically, I think I may be cheating in a few ways here:
The docs say the ABC TextIOBase wants detach, read, and readline, even if they don't make sense here. But the ABC doesn't seem to enforce them as abstract methods, so I didn't bother.
I think it's legal (and it works) to leave encoding and errors set to None, since we're just passing through to another TextIOBase and expecting it to do the encoding, but I can't find anything that says it's legal. And if some code were to test sys.stdout.encoding to see if it's UTF-8 or something, that might be trouble.
Similarly for newlines. And, since I haven't tested on Windows, I can't be as sure that it works.
Also, forwarding other methods to self.file might be a good idea, like fileno() and isatty(). But I'd worry that any app that wants to access stdout as a TTY probably need to know about the Filler that we stuck in front of it, not just transparently go through it.
This is of course all Python 3-specific. In Python 2:
sys.stdout is a file, not a TextIOWrapper. The API you need to wrap is a bit different, and not nearly as well defined.
Unless you __future__ up the 3.x-style print function, print is a statement, so you can't monkeypatch it. (I mean, you could write an import hook that bytecode-hacks out every PRINT_* bytecode, or maybe even inject a .so that replaces PyObject_Print… but who cares anyway? It's Python 2.)

Simulate Python interactive mode

Im writing a private online Python interpreter for VK, which would closely simulate IDLE console. Only me and some people in whitelist would be able to use this feature, no unsafe code which can harm my server. But I have a little problem. For example, I send the string with code def foo():, and I dont want to get SyntaxError but continue defining function line by line without writing long strings with use of \n. exec() and eval() doesn't suit me in that case. What should I use to get desired effect? Sorry if duplicate, still dont get it from similar questions.

The Python standard library provides the code and codeop modules to help you with this. The code module just straight-up simulates the standard interactive interpreter:
import code
code.interact()
It also provides a few facilities for more detailed control and customization of how it works.
If you want to build things up from more basic components, the codeop module provides a command compiler that remembers __future__ statements and recognizes incomplete commands:
import codeop
compiler = codeop.CommandCompiler()
try:
codeobject = compiler(some_source_string)
# codeobject is an exec-utable code object if some_source_string was a
# complete command, or None if the command is incomplete.
except (SyntaxError, OverflowError, ValueError):
# If some_source_string is invalid, we end up here.
# OverflowError and ValueError can occur in some cases involving invalid literals.

It boils down to reading input, then
exec <code> in globals,locals
in an infinite loop.
See e.g. IPython.frontend.terminal.console.interactiveshell.TerminalInteractiveSh
ell.mainloop().
Continuation detection is done in inputsplitter.push_accepts_more() by trying ast.parse().
Actually, IPython already has an interactive web console called Jupyter Notebook, so your best bet should be to reuse it.

entering console inputs from within python file

In my python file, I have made a GUI widget that takes some inputs from user. I have imported a python module in my python file that takes some input using raw_input(). I have to use this module as it is, I have no right to change it. When I run my python file, it ask me for the inputs (due to raw_input() of imported module). I want to use GUI widget inputs in that place.
How can I pass the user input (that we take from widget) as raw_input() of imported module?

First, if importing it directly into your script isn't actually a requirement (and it's hard to imagine why it would be), you can just run the module (or a simple script wrapped around it) as a separate process, using subprocess or pexpect.
Let's make this concrete. Say you want to use this silly module foo.py:
def bar():
x = raw_input("Gimme a string")
y = raw_input("Gimme another")
return 'Got two strings: {}, {}'.format(x, y)
First write a trivial foo.wrapper.py:
import foo
print(foo.bar())
Now, instead of calling foo.do_thing() directly in your real script, run foo_wrapper as a child process.
I'm going to assume that you already have the input you want to send it in a string, because that makes the irrelevant parts of the answer simpler (in fact, it makes them possible—if you wanted to use some GUI code for that, there's really no way I could show you how unless you first tell us which GUI library you're using).
So:
foo_input = 'String 1\nString 2\n'
with subprocess.Popen([sys.executable, 'foo_wrapper.py'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE) as p:
foo_output, _ = p.communicate(foo_input)
Of course in real life you'll want to use an appropriate path for foo_wrapper.py instead of assuming that it's in the current working directory, but this should be enough to illustrate the idea.
Meanwhile, if "I have no right to change it" just means "I don't (and shouldn't) have checkin rights to the foo project's github site or the relevant subtree on our company's P4 server" or whatever, there's a really easy answer: Fork it, and change the fork.
Even if it's got a weak copyleft license like LGPL: fork it, change the fork, publish your fork under the same license as the original, then use your fork.
If you're depending on the foo package being installed on every target system, and can't depend on your replacement foo being installed instead, that's a bit more of a problem. But if the function or method that actually calls raw_input is just a small fraction of the actual code in foo, you can fix that by monkeypatching foo at runtime.
And that leads to the last-ditch possibility: You can always monkeypatch raw_input itself.
Again, I'm going to assume that you already have the input you need to give it to make things simpler.
So, first you write a replacement function:
foo_input = ['String 1\n', 'String 2\n']
def fake_raw_input(prompt):
global foo_input
return foo_input.pop()
Now, there are two ways you can patch this in. Usually, you want to do this:
import foo
foo.raw_input = fake_raw_input
This means any code in foo that calls raw_input will see the function you crammed into its module globals instead of the normal builtin. Unless it does something really funky (like looking up the builtin directly and copying it to a local variable or something), this is the answer.
If you need to handle one of those really funky edge cases, and you don't mind doing something questionable, you can do this:
import __builtin__
__builtin__.raw_input = fake_raw_input
You must do this before the first import foo anywhere in your problem. Also, it's not clear whether this is intentionally guaranteed to work, accidentally guaranteed to work (and should be fixed in the future), or not guaranteed to work. But it does work (at least for CPython 2.5-2.7, which is what you're probably using).

convert python print statements to logging

I'm working on a large codebase that uses print statements for logging rather than python logging. I'm wondering if there is a recommended for converting all these print statements to calls to logging.info? Many of these prints are spread over several lines and thus any solution needs to handle those cases and hopefully would maintain formatting.
I've looked into python rope but that doesn't seem to have the facility to convert calls to statement like print to a function call.

You could use 2to3 and only apply the fix for print statement -> print function.
2to3 --fix=print [yourfiles] # just displays the diff on stdout
2to3 --fix=print [yourfiles] --write # also saves the changes to disk
This should automatically handle all those strange cases, and then converting print functions to logging functions should be a straightforward find-and-replace with, e.g., sed.
If you don't have the shortcut for the 2to3 script for some reason, run lib2to3 as a module instead:
python -m lib2to3 --fix=print .

just add this few line before your code starts and it will log everything it prints. I think you are looking for something like this.
import logging
import sys
class writer :
def __init__(self, *writers) :
self.writers = writers
def write(self, text) :
logging.warning(text)
saved = sys.stdout
sys.stdout = writer(sys.stdout)
print "There you go."
print "There you go2."

Nesting 'WITH' statements in Python

It turns out that "with" is a funny word to search for on the internet.
Does anyone knows what the deal is with nesting with statements in python?
I've been tracking down a very slippery bug in a script I've been writing and I suspect that it's because I'm doing this:
with open(file1) as fsock1:
with open(file2, 'a') as fsock2:
fstring1 = fsock1.read()
fstring2 = fsock2.read()
Python throws up when I try to read() from fsock2. Upon inspection in the debugger, this is because it thinks the file is empty. This wouldn't be worrisome except for the fact that running the exact same code in the debugging interperter not in a with statement shows me that the file is, in fact, quite full of text...
I'm going to proceed on the assumption that for now nesting with statements is a no-no, but if anyone who knows more has a different opinion, I'd love to hear it.

I found the solution in python's doc. You may want to have a look at this (Python 3) or this (Python 2)
If you are running python 2.7+ you can use it like this:
with open(file1) as fsock1, open(file2, 'a') as fsock2:
fstring1 = fsock1.read()
fstring2 = fsock2.read()
This way you avoid unnecessary indentation.

AFAIK you can't read a file open with append mode 'a'.

Upon inspection in the debugger, this is because it thinks the file is empty.
I think that happens because it can't actually read anything. Even if it could, when you append to a file, the seek pointer is moved to the end of the file in preparation for writing to occur.
These with statements work just fine for me:
with open(file1) as f:
with open(file2, 'r') as g: # Read, not append.
fstring1 = f.read()
fstring2 = g.read()
Note that use of contextlib.nested, as another poster suggested, is potentially fraught with peril here. Let's say you do this:
with contextlib.nested(open(file1, "wt"), open(file2)) as (f_out, f_in):
...
The context managers here get created one at a time. That means that if the opening of file2 fails (say, because it doesn't exist), then you won't be able to properly finalize file1 and you'll have to leave it up to the garbage collector. That's potentially a Very Bad Thing.

There is no problem with nesting with statements -- rather, you're opening file2 for append, so you can't read from it.
If you do dislike nesting with statements, for whatever reason, you can often avoid that with the contextlib.nested function. However, it won't make broken code (e.g., code that opens a file for append and then tries to read it instead) work, nor will lexically nesting with statements break code that's otherwise good.

As of python 3.10 you can do it like this
with (
Something() as example1,
SomethingElse() as example2,
YetSomethingMore() as example3,
):
...
this can be helpful in pytests when you want to do nested patches in some autouse fixture like so
from unittest.mock import patch
import pytest
#pytest.fixture(scope="session", autouse=True)
def setup():
with (
patch("something.Slow", MagicMock()) as slow_mock,
patch("something.Expensive") as expensive_mock,
patch("other.ThirdParty", as third_party_mock,
):
yield

As for searching for "with", prefixing a word with '+' will prevent google from ignoring it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deciphering large program flow in Python - python

Could be very tedious, but using a debugger to trace the flow of execution, instruction by instruction could probably help you to some extent. import pdb pdb.set_trace()

You could look for a cross reference program. There is an old program called pyxr that does this. The aim of cross reference is to let you know how classes refer to each other. Some of the IDE's also do this sort of thing.

I'd recommend running the program inside an IDE like pydev or pycharm. Being able to stop the program and inspect its state can be very helpful.

Related

Apply textwrap.fill to every print on python script

Simulate Python interactive mode

entering console inputs from within python file

convert python print statements to logging

Nesting 'WITH' statements in Python

Categories

Resources