Catching calls to __str__ on memoryview() object

Catching calls to __str__ on memoryview() object - python

I'm starting to port some code from Python2.x to Python3.x, but before I make the jump I'm trying to modernise it to recent 2.7. I'm making good progress with the various tools (e.g. futurize), but one area they leave alone is the use of buffer(). In Python3.x buffer() has been removed and replaced with memoryview() which in general looks to be cleaner, but it's not a 1-to-1 swap.
One way in which they differ is:
In [1]: a = "abcdef"
In [2]: b = buffer(a)
In [3]: m = memoryview(a)
In [4]: print b, m
abcdef <memory at 0x101b600e8>
That is, str(<buffer object>) returns a byte-string containing the contents of the object, whereas memoryviews return their repr(). I think the new behaviour is better, but it's causing issues.
In particular I've got some code which is throwing an exception because it's receiving a byte-string containing <memory at 0x1016c95a8>. That suggests that there's a piece of code somewhere else that is relying on this behaviour to work, but I'm having real trouble finding it.
Does anybody have a good debugging trick for this type of problem?

One possible trick is to write a subclass of the memoryview and temporarily change all your memoryview instances to, lets say, memoryview_debug versions:
class memoryview_debug(memoryview):
def __init__(self, string):
memoryview.__init__(self, string)
def __str__(self):
# ... place a breakpoint, log the call, print stack trace, etc.
return memoryview.__str__(self)
EDIT:
As noted by OP it is apparently impossible to subclass from memoryview. Fortunately thanks to dynamic typing that's not a big problem in Python, it will be just more inconvenient. You can change inheritance to composition:
class memoryview_debug:
def __init__(self, string):
self.innerMemoryView = memoryview(string)
def tobytes(self):
return self.innerMemoryView.tobytes()
def tolist(self):
return self.innerMemoryView.tolist()
# some other methods if used by your code
# and if overridden in memoryview implementation (e.g. __len__?)
def __str__(self):
# ... place a breakpoint, log the call, print stack trace, etc.
return self.innerMemoryview.__str__()

Related

How to stack multiple calls? [duplicate]

I'm trying to create a function that chains results from multiple arguments.
def hi(string):
print(string)<p>
return hi
Calling hi("Hello")("World") works and becomes Hello \n World as expected.
the problem is when I want to append the result as a single string, but
return string + hi produces an error since hi is a function.
I've tried using __str__ and __repr__ to change how hi behaves when it has not input. But this only creates a different problem elsewhere.
hi("Hello")("World") = "Hello"("World") -> Naturally produces an error.
I understand why the program cannot solve it, but I cannot find a solution to it.

You're running into difficulty here because the result of each call to the function must itself be callable (so you can chain another function call), while at the same time also being a legitimate string (in case you don't chain another function call and just use the return value as-is).
Fortunately Python has you covered: any type can be made to be callable like a function by defining a __call__ method on it. Built-in types like str don't have such a method, but you can define a subclass of str that does.
class hi(str):
def __call__(self, string):
return hi(self + '\n' + string)
This isn't very pretty and is sorta fragile (i.e. you will end up with regular str objects when you do almost any operation with your special string, unless you override all methods of str to return hi instances instead) and so isn't considered very Pythonic.
In this particular case it wouldn't much matter if you end up with regular str instances when you start using the result, because at that point you're done chaining function calls, or should be in any sane world. However, this is often an issue in the general case where you're adding functionality to a built-in type via subclassing.
To a first approximation, the question in your title can be answered similarly:
class add(int): # could also subclass float
def __call__(self, value):
return add(self + value)
To really do add() right, though, you want to be able to return a callable subclass of the result type, whatever type it may be; it could be something besides int or float. Rather than trying to catalog these types and manually write the necessary subclasses, we can dynamically create them based on the result type. Here's a quick-and-dirty version:
class AddMixIn(object):
def __call__(self, value):
return add(self + value)
def add(value, _classes={}):
t = type(value)
if t not in _classes:
_classes[t] = type("add_" + t.__name__, (t, AddMixIn), {})
return _classes[t](value)
Happily, this implementation works fine for strings, since they can be concatenated using +.
Once you've started down this path, you'll probably want to do this for other operations too. It's a drag copying and pasting basically the same code for every operation, so let's write a function that writes the functions for you! Just specify a function that actually does the work, i.e., takes two values and does something to them, and it gives you back a function that does all the class munging for you. You can specify the operation with a lambda (anonymous function) or a predefined function, such as one from the operator module. Since it's a function that takes a function and returns a function (well, a callable object), it can also be used as a decorator!
def chainable(operation):
class CallMixIn(object):
def __call__(self, value):
return do(operation(self, value))
def do(value, _classes={}):
t = type(value)
if t not in _classes:
_classes[t] = type(t.__name__, (t, CallMixIn), {})
return _classes[t](value)
return do
add = chainable(lambda a, b: a + b)
# or...
import operator
add = chainable(operator.add)
# or as a decorator...
#chainable
def add(a, b): return a + b
In the end it's still not very pretty and is still sorta fragile and still wouldn't be considered very Pythonic.
If you're willing to use an additional (empty) call to signal the end of the chain, things get a lot simpler, because you just need to return functions until you're called with no argument:
def add(x):
return lambda y=None: x if y is None else add(x+y)
You call it like this:
add(3)(4)(5)() # 12

You are getting into some deep, Haskell-style, type-theoretical issues by having hi return a reference to itself. Instead, just accept multiple arguments and concatenate them in the function.
def hi(*args):
return "\n".join(args)
Some example usages:
print(hi("Hello", "World"))
print("Hello\n" + hi("World"))

Is it possible to "hack" Python's print function?

Note: This question is for informational purposes only. I am interested to see how deep into Python's internals it is possible to go with this.
Not very long ago, a discussion began inside a certain question regarding whether the strings passed to print statements could be modified after/during the call to print has been made. For example, consider the function:
def print_something():
print('This cat was scared.')
Now, when print is run, then the output to the terminal should display:
This dog was scared.
Notice the word "cat" has been replaced by the word "dog". Something somewhere somehow was able to modify those internal buffers to change what was printed. Assume this is done without the original code author's explicit permission (hence, hacking/hijacking).
This comment from the wise #abarnert, in particular, got me thinking:
There are a couple of ways to do that, but they're all very ugly, and
should never be done. The least ugly way is to probably replace the
code object inside the function with one with a different co_consts
list. Next is probably reaching into the C API to access the str's
internal buffer. [...]
So, it looks like this is actually possible.
Here's my naive way of approaching this problem:
>>> import inspect
>>> exec(inspect.getsource(print_something).replace('cat', 'dog'))
>>> print_something()
This dog was scared.
Of course, exec is bad, but that doesn't really answer the question, because it does not actually modify anything during when/after print is called.
How would it be done as #abarnert has explained it?

First, there's actually a much less hacky way. All we want to do is change what print prints, right?
_print = print
def print(*args, **kw):
args = (arg.replace('cat', 'dog') if isinstance(arg, str) else arg
for arg in args)
_print(*args, **kw)
Or, similarly, you can monkeypatch sys.stdout instead of print.
Also, nothing wrong with the exec … getsource … idea. Well, of course there's plenty wrong with it, but less than what follows here…
But if you do want to modify the function object's code constants, we can do that.
If you really want to play around with code objects for real, you should use a library like bytecode (when it's finished) or byteplay (until then, or for older Python versions) instead of doing it manually. Even for something this trivial, the CodeType initializer is a pain; if you actually need to do stuff like fixing up lnotab, only a lunatic would do that manually.
Also, it goes without saying that not all Python implementations use CPython-style code objects. This code will work in CPython 3.7, and probably all versions back to at least 2.2 with a few minor changes (and not the code-hacking stuff, but things like generator expressions), but it won't work with any version of IronPython.
import types
def print_function():
print ("This cat was scared.")
def main():
# A function object is a wrapper around a code object, with
# a bit of extra stuff like default values and closure cells.
# See inspect module docs for more details.
co = print_function.__code__
# A code object is a wrapper around a string of bytecode, with a
# whole bunch of extra stuff, including a list of constants used
# by that bytecode. Again see inspect module docs. Anyway, inside
# the bytecode for string (which you can read by typing
# dis.dis(string) in your REPL), there's going to be an
# instruction like LOAD_CONST 1 to load the string literal onto
# the stack to pass to the print function, and that works by just
# reading co.co_consts[1]. So, that's what we want to change.
consts = tuple(c.replace("cat", "dog") if isinstance(c, str) else c
for c in co.co_consts)
# Unfortunately, code objects are immutable, so we have to create
# a new one, copying over everything except for co_consts, which
# we'll replace. And the initializer has a zillion parameters.
# Try help(types.CodeType) at the REPL to see the whole list.
co = types.CodeType(
co.co_argcount, co.co_kwonlyargcount, co.co_nlocals,
co.co_stacksize, co.co_flags, co.co_code,
consts, co.co_names, co.co_varnames, co.co_filename,
co.co_name, co.co_firstlineno, co.co_lnotab,
co.co_freevars, co.co_cellvars)
print_function.__code__ = co
print_function()
main()
What could go wrong with hacking up code objects? Mostly just segfaults, RuntimeErrors that eat up the whole stack, more normal RuntimeErrors that can be handled, or garbage values that will probably just raise a TypeError or AttributeError when you try to use them. For examples, try creating a code object with just a RETURN_VALUE with nothing on the stack (bytecode b'S\0' for 3.6+, b'S' before), or with an empty tuple for co_consts when there's a LOAD_CONST 0 in the bytecode, or with varnames decremented by 1 so the highest LOAD_FAST actually loads a freevar/cellvar cell. For some real fun, if you get the lnotab wrong enough, your code will only segfault when run in the debugger.
Using bytecode or byteplay won't protect you from all of those problems, but they do have some basic sanity checks, and nice helpers that let you do things like insert a chunk of code and let it worry about updating all offsets and labels so you can't get it wrong, and so on. (Plus, they keep you from having to type in that ridiculous 6-line constructor, and having to debug the silly typos that come from doing so.)
Now on to #2.
I mentioned that code objects are immutable. And of course the consts are a tuple, so we can't change that directly. And the thing in the const tuple is a string, which we also can't change directly. That's why I had to build a new string to build a new tuple to build a new code object.
But what if you could change a string directly?
Well, deep enough under the covers, everything is just a pointer to some C data, right? If you're using CPython, there's a C API to access the objects, and you can use ctypes to access that API from within Python itself, which is such a terrible idea that they put a pythonapi right there in the stdlib's ctypes module. :) The most important trick you need to know is that id(x) is the actual pointer to x in memory (as an int).
Unfortunately, the C API for strings won't let us safely get at the internal storage of an already-frozen string. So screw safely, let's just read the header files and find that storage ourselves.
If you're using CPython 3.4 - 3.7 (it's different for older versions, and who knows for the future), a string literal from a module that's made of pure ASCII is going to be stored using the compact ASCII format, which means the struct ends early and the buffer of ASCII bytes follows immediately in memory. This will break (as in probably segfault) if you put a non-ASCII character in the string, or certain kinds of non-literal strings, but you can read up on the other 4 ways to access the buffer for different kinds of strings.
To make things slightly easier, I'm using the superhackyinternals project off my GitHub. (It's intentionally not pip-installable because you really shouldn't be using this except to experiment with your local build of the interpreter and the like.)
import ctypes
import internals # https://github.com/abarnert/superhackyinternals/blob/master/internals.py
def print_function():
print ("This cat was scared.")
def main():
for c in print_function.__code__.co_consts:
if isinstance(c, str):
idx = c.find('cat')
if idx != -1:
# Too much to explain here; just guess and learn to
# love the segfaults...
p = internals.PyUnicodeObject.from_address(id(c))
assert p.compact and p.ascii
addr = id(c) + internals.PyUnicodeObject.utf8_length.offset
buf = (ctypes.c_int8 * 3).from_address(addr + idx)
buf[:3] = b'dog'
print_function()
main()
If you want to play with this stuff, int is a whole lot simpler under the covers than str. And it's a lot easier to guess what you can break by changing the value of 2 to 1, right? Actually, forget imagining, let's just do it (using the types from superhackyinternals again):
>>> n = 2
>>> pn = PyLongObject.from_address(id(n))
>>> pn.ob_digit[0]
2
>>> pn.ob_digit[0] = 1
>>> 2
1
>>> n * 3
3
>>> i = 10
>>> while i < 40:
... i *= 2
... print(i)
10
10
10
… pretend that code box has an infinite-length scrollbar.
I tried the same thing in IPython, and the first time I tried to evaluate 2 at the prompt, it went into some kind of uninterruptable infinite loop. Presumably it's using the number 2 for something in its REPL loop, while the stock interpreter isn't?

Monkey-patch print
print is a builtin function so it will use the print function defined in the builtins module (or __builtin__ in Python 2). So whenever you want to modify or change the behavior of a builtin function you can simply reassign the name in that module.
This process is called monkey-patching.
# Store the real print function in another variable otherwise
# it will be inaccessible after being modified.
_print = print
# Actual implementation of the new print
def custom_print(*args, **options):
_print('custom print called')
_print(*args, **options)
# Change the print function globally
import builtins
builtins.print = custom_print
After that every print call will go through custom_print, even if the print is in an external module.
However you don't really want to print additional text, you want to change the text that is printed. One way to go about that is to replace it in the string that would be printed:
_print = print
def custom_print(*args, **options):
# Get the desired seperator or the default whitspace
sep = options.pop('sep', ' ')
# Create the final string
printed_string = sep.join(args)
# Modify the final string
printed_string = printed_string.replace('cat', 'dog')
# Call the default print function
_print(printed_string, **options)
import builtins
builtins.print = custom_print
And indeed if you run:
>>> def print_something():
... print('This cat was scared.')
>>> print_something()
This dog was scared.
Or if you write that to a file:
test_file.py
def print_something():
print('This cat was scared.')
print_something()
and import it:
>>> import test_file
This dog was scared.
>>> test_file.print_something()
This dog was scared.
So it really works as intended.
However, in case you only temporarily want to monkey-patch print you could wrap this in a context-manager:
import builtins
class ChangePrint(object):
def __init__(self):
self.old_print = print
def __enter__(self):
def custom_print(*args, **options):
# Get the desired seperator or the default whitspace
sep = options.pop('sep', ' ')
# Create the final string
printed_string = sep.join(args)
# Modify the final string
printed_string = printed_string.replace('cat', 'dog')
# Call the default print function
self.old_print(printed_string, **options)
builtins.print = custom_print
def __exit__(self, *args, **kwargs):
builtins.print = self.old_print
So when you run that it depends on the context what is printed:
>>> with ChangePrint() as x:
... test_file.print_something()
...
This dog was scared.
>>> test_file.print_something()
This cat was scared.
So that's how you could "hack" print by monkey-patching.
Modify the target instead of the print
If you look at the signature of print you'll notice a file argument which is sys.stdout by default. Note that this is a dynamic default argument (it really looks up sys.stdout every time you call print) and not like normal default arguments in Python. So if you change sys.stdout print will actually print to the different target even more convenient that Python also provides a redirect_stdout function (from Python 3.4 on, but it's easy to create an equivalent function for earlier Python versions).
The downside is that it won't work for print statements that don't print to sys.stdout and that creating your own stdout isn't really straightforward.
import io
import sys
class CustomStdout(object):
def __init__(self, *args, **kwargs):
self.current_stdout = sys.stdout
def write(self, string):
self.current_stdout.write(string.replace('cat', 'dog'))
However this also works:
>>> import contextlib
>>> with contextlib.redirect_stdout(CustomStdout()):
... test_file.print_something()
...
This dog was scared.
>>> test_file.print_something()
This cat was scared.
Summary
Some of these points have already be mentioned by #abarnet but I wanted to explore these options in more detail. Especially how to modify it across modules (using builtins/__builtin__) and how to make that change only temporary (using contextmanagers).

A simple way to capture all output from a print function and then process it, is to change the output stream to something else, e.g. a file.
I'll use a PHP naming conventions (ob_start, ob_get_contents,...)
from functools import partial
output_buffer = None
print_orig = print
def ob_start(fname="print.txt"):
global print
global output_buffer
print = partial(print_orig, file=output_buffer)
output_buffer = open(fname, 'w')
def ob_end():
global output_buffer
close(output_buffer)
print = print_orig
def ob_get_contents(fname="print.txt"):
return open(fname, 'r').read()
Usage:
print ("Hi John")
ob_start()
print ("Hi John")
ob_end()
print (ob_get_contents().replace("Hi", "Bye"))
Would print
Hi John
Bye John

Let's combine this with frame introspection!
import sys
_print = print
def print(*args, **kw):
frame = sys._getframe(1)
_print(frame.f_code.co_name)
_print(*args, **kw)
def greetly(name, greeting = "Hi")
print(f"{greeting}, {name}!")
class Greeter:
def __init__(self, greeting = "Hi"):
self.greeting = greeting
def greet(self, name):
print(f"{self.greeting}, {name}!")
You'll find this trick prefaces every greeting with the calling function or method. This might be very useful for logging or debugging; especially as it lets you "hijack" print statements in third party code.

Why does linearization in Python give this bizarre result?

Consider this simplified situation:
class Decoder:
def __str__(self):
return self.__bytes__().decode('ascii')
class Comment(Decoder, bytes):
def __bytes__(self):
return b'#' + self
Usage:
Comment(b'foo')
Prints:
b'foo'
Instead of expected:
#foo
Regardless of the order in Comment.mro() (i.e. I can swap Decoder and bytes in the supeclass list), Decoder.__str__() is never called.
What gives?

Comment(b'foo') calls Comment.__new__, which, not being defined, resolves to either Decoder.__new__ or bytes.__new__, depending on the order in which you list them in the definition of Comment.
The MRO for Comment is Comment, bytes, Decoder, object. However, the functions actually being called are:
Comment.__new__, to create a new object. Since that function isn't defined, we next call bytes.__new__, which is defined. It effectively just calls object.__new__(Comment, b'foo'), giving you your final object.
To display the return value of Comment, the interpreter tries to call Comment.__repr__, not Comment.__str__. Again, the function isn't defined, so it falls back to bytes.__repr__, giving the observed result.

If you use print function you get expected result, but if you look at result from console you see the result of __repr__ method. If you need it works in this way you can call self.__str__() from __repr__
>>msg = Comment(b'foo')
>>msg
b'foo'
>>print(msg) # or str(msg)
'#foo'
There you can read how it works docs

What kind of exception to raise for unknown enum value?

Assume the following class:
class PersistenceType(enum.Enum):
keyring = 1
file = 2
def __str__(self):
type2String = {PersistenceType.keyring: "keyring", PersistenceType.file: "file"}
return type2String[self]
#staticmethod
def from_string(type):
if (type == "keyring" ):
return PersistenceType.keyring
if (type == "file"):
return PersistenceType.file
raise ???
Being a python noob, I am simply wondering: what specific kind of exception should be raised here?

The short answer is ValueError:
Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.
The longer answer is that almost none of that class should exist. Consider:
class PersistenceType(enum.Enum):
keyring = 1
file = 2
This gives you everything your customized enum does:
To get the same result as your customised __str__ method, just use the name property:
>>> PersistenceType.keyring.name
'keyring'
To get a member of the enum using its name, treat the enum as a dict:
>>> PersistenceType['keyring']
<PersistenceType.keyring: 1>
Using the built-in abilities of Enum.enum gives you several advantages:
You're writing much less code.
You aren't repeating the names of the enum members all over the place, so you aren't going to miss anything if you modify it at some point.
Users of your enum, and readers of code that uses it, don't need to remember or look up any customized methods.
If you're coming to Python from Java, it's always worth bearing in mind that:
Python Is Not Java (or, stop writing so much code)
Guido1 has a time machine (or, stop writing so much code)
1 … or in this case Ethan Furman, the author of the enum module.

Python: Need the result of functools.partial(function) to be known as something else

My software supports python to automate tasks (Maya). When I undo or redo in this software it prints the last command, unfortunately for Python this is the memory address of the function rather than something actually useful. So the user sees the output Undo: <functools.partial object at 0x000002235DEDDF48> instead of something actually useful like Undo: Set Key on something at frame x
There appears to be no option to make Maya print a useful result from within it's own functionality, so now I want to ask if there's some obscure way cheese it with python to have that instance call itself something useful in a way the software will print while hopefully not interfering with the functionality. I'll try anything at this point!
def testFunc():
pass
test = partial(testFunc)
test results in <functools.partial object at 0x000002235DEA95E8>
If anyone can think of a more accurate title please edit / suggest.

Thanks to kindall giving me a lead in the comments I was able to find an answer. Subclassing partial and defining __repr__() is the key.
By grabbing the *args on __init__() and storing it as self.result we can use it on __repr__() to return the last argument given to *args as the result given by Maya when using Undo/Redo.
class rpartial(partial):
def __init__(self, *args):
self.result = args[-1]
def __repr__(self):
return self.result
rpartial(function, arg1, arg2, undoredo)
The string given to rpartial on the last line for undoredo is what will be printed by Maya when using Undo/Redo.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Catching calls to str on memoryview() object - python

Related

How to stack multiple calls? [duplicate]

Is it possible to "hack" Python's print function?

Why does linearization in Python give this bizarre result?

What kind of exception to raise for unknown enum value?

Python: Need the result of functools.partial(function) to be known as something else

Categories

Resources