more-ing or less-ing output in the python interpreter - python

What is the best alternative to more-ing or less-ing multi-line output while running python in the interpreter mode?
Suppose, there exists an object variable foo which had many properties. A dir(foo) would dump onto the screen. We cannot inspect or page this output since you are presented with the interpreter prompt immediately.
Currently the only way to inspect such data is to store into a variable and view slices or it. For e.g.
>>> keys = dir(foo)
>>> len(keys)
120
>>> keys[10:20] #viewing the sub slice of keys
...
Hoping that there is an alternative to this. I know that help() does present with a more-like interface, but only for documentation of the object under consideration.

help's more-like interface is provided by the pydoc module, in particular its undocumented method pager. If you convert your data to a string (perhaps by using the pprint module for additional readability), you can send it to pager to get the interactive visualization you're looking for.
>>> import pydoc
>>> import pprint
>>> def more_vars(obj):
... pydoc.pager(pprint.pformat(vars(obj)))
...
>>> import math
>>> more_vars(math)
{'__doc__': 'This module provides access to the mathematical functions\n'
'defined by the C standard.',
'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__name__': 'math',
'__package__': '',
[not pictured: about 30 more lines of methods/attributes]
'frexp': <built-in function frexp>,
'fsum': <built-in function fsum>,
'gamma': <built-in function gamma>,
-- More --

Related

Why does a newly created variable in Python have a ref-count of four?

I've been working on a presentation for colleagues to explain the basic behavior of and reasoning behind the GIL, and found something I couldn't explain while putting together a quick explanation of reference counting. It appears that newly declared variables have four references, instead of the one I would expect. For example, the following code:
the_var = 'Hello World!'
print('Var created: {} references'.format(sys.getrefcount(the_var)))
Results in the this output:
Var created: 4 references
I validated that the output was the same if I used an integer > 100 (< 100 are pre-created and have a larger ref-count) or a float and if I declared the variable within a function scope or in a loop. The outcome was the same. The behavior also seems to be the same in 2.7.11 and 3.5.1.
I attempted to debug sys.getrefcount to see whether it was creating additional references, but was unable to step into the function (I'm assuming it is a direct thunk down to the C layer).
I know I'm gonna get at least one question on this when I present, and I'm actually pretty puzzled by the output anyway. Can anyone explain this behavior to me?
There are several scenarios that will yield a different reference count. The most straightforward is the REPL console:
>>> import sys
>>> the_var = 'Hello World!'
>>> print(sys.getrefcount(the_var))
2
Understanding this result is pretty straight-forward - there is one reference in the local stack and another temporary/local to the sys.getrefcount() function (even the documentation warns about it - The count returned is generally one higher than you might expect). But when you run it as a standalone script:
import sys
the_var = 'Hello World!'
print(sys.getrefcount(the_var))
# 4
as you've noticed, you get a 4. So what gives? Well, lets investigate... There is a very helpful interface to the garbage collector - the gc module - so if we run it in the REPL console:
>>> import gc
>>> the_var = 'Hello World!'
>>> gc.get_referrers(the_var)
[{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, 'the_var': 'Hello
World!', 'gc': <module 'gc' (built-in)>, '__name__': '__main__', '__doc__': None}]
No wonders there, - that's essentially just the current namespace (locals()) as the variable doesn't exist anywhere else. But what happens when we run that as a standalone script:
import gc
import pprint
the_var = 'Hello World!'
pprint.pprint(gc.get_referrers(the_var))
this prints out (YMMV, based on your Python version):
[['gc',
'pprint',
'the_var',
'Hello World!',
'pprint',
'pprint',
'gc',
'get_referrers',
'the_var'],
(-1, None, 'Hello World!'),
{'__builtins__': <module '__builtin__' (built-in)>,
'__doc__': None,
'__file__': 'test.py',
'__name__': '__main__',
'__package__': None,
'gc': <module 'gc' (built-in)>,
'pprint': <module 'pprint' from 'D:\Dev\Python\Py27-64\lib\pprint.pyc'>,
'the_var': 'Hello World!'}]
Sure enough, we have two more references in the list just as sys.getrefcount() told us, but what the hell are those? Well, when Python interpreter is parsing your script it first needs to compile it to bytecode - and while it does, it stores all the strings in a list which, since it mentions your variable as well, is declared as a reference to it.
The second more cryptic entry ((-1, None, 'Hello World!')) comes from the peep-hole optimizer and is there just optimize access (string reference in this case).
Both of those are purely temporary and optional - REPL console is doing context separation so you don't see these references, if you were to 'outsource' your compiling from your current context:
import gc
import pprint
exec(compile("the_var = 'Hello World!'", "<string>", "exec"))
pprint.pprint(gc.get_referrers(the_var))
you'd get:
[{'__builtins__': <module '__builtin__' (built-in)>,
'__doc__': None,
'__file__': 'test.py',
'__name__': '__main__',
'__package__': None,
'gc': <module 'gc' (built-in)>,
'pprint': <module 'pprint' from 'D:\Dev\Python\Py27-64\lib\pprint.pyc'>,
'the_var': 'Hello World!'}]
and if you were to go back to the original attempt at getting the reference count via sys.getreferencecount():
import sys
exec(compile("the_var = 'Hello World!'", "<string>", "exec"))
print(sys.getrefcount(the_var))
# 2
just like in the REPL console, and just as expected. The extra reference due to the peep-hole optimizing, since it happens in-place, can be immediately discarded by forcing garbage collection (gc.collect()) before counting your references.
However, the string list that is created during compilation cannot be released until the whole file has been parsed and compiled, which is why if you were to import your script in an another script and then count the references to the_var from it you'd get 3 instead of 4 just when you thought it cannot confuse you any more ;)

Access module 'sys' without using import machinery

Sandboxing Python code is notoriously difficult due to the power of the reflection facilities built into the language. At a minimum one has to take away the import mechanism and most of the built-in functions and global variables, and even then there are holes ({}.__class__.__base__.__subclasses__(), for instance).
In both Python 2 and 3, the 'sys' module is built into the interpreter and preloaded before user code begins to execute (even in -S mode). If you can get a handle to the sys module, then you have access to the global list of loaded modules (sys.modules) which enables you to do all sorts of naughty things.
So, the question: Starting from an empty module, without using the import machinery at all (no import statement, no __import__, no imp library, etc), and also without using anything normally found in __builtins__ unless you can get a handle to it some other way, is it possible to acquire a reference to either sys or sys.modules? (Each points to the other.) Am interested in both 2.x and 3.x answers.
__builtins__ can usually be recovered, giving you a path back to __import__ and thus to any module.
For Python 3 this comment from eryksun works, for example:
>>> f = [t for t in ().__class__.__base__.__subclasses__()
... if t.__name__ == 'Sized'][0].__len__
>>> f.__globals__['__builtins__']['__import__']('sys')
<module 'sys' (built-in)>
In Python 2, you just look for a different object:
>>> f = [t for t in ().__class__.__base__.__subclasses__()
... if t.__name__ == 'catch_warnings'][0].__exit__.__func__
>>> f.__globals__['__builtins__']['__import__']('sys')
<module 'sys' (built-in)>
Either method looks for subclasses of a built-in type you can create with literal syntax (here a tuple), then referencing a function object on that subclass. Function objects have a __globals__ dictionary reference, which will give you the __builtins__ object back.
Note that you can't just say no __import__ because it is part of __builtins__ anyway.
However, many of those __globals__ objects are bound to have sys present already. Searching for a sys module on Python 3, for example, gives me access to one in a flash:
>>> next(getattr(c, f).__globals__['sys']
... for c in ().__class__.__base__.__subclasses__()
... for f in dir(c)
... if isinstance(getattr(c, f, None), type(lambda: None)) and
... 'sys' in getattr(c, f).__globals__)
<module 'sys' (built-in)>
The Python 2 version only need to unwrap the unbound methods you find on classes to get the same results:
>>> next(getattr(c, f).__func__.__globals__['sys']
... for c in ().__class__.__base__.__subclasses__()
... for f in dir(c)
... if isinstance(getattr(c, f, None), type((lambda: 0).__get__(0))) and
... 'sys' in getattr(c, f).__func__.__globals__)
<module 'sys' (built-in)>

Compiling a tkinter game

I understand the general process of how to compile programs using py2exe, portable python, and other ways and always some of the issues that can cause problems such as matplotlib etc. However, I'm curious as to how a compiler would work if a game is using pickle. Would the game still be able to save and load states if it is compiled or would it no longer be able to have this option?
Also, if anyone doesn't mind I'm a bit confused as to how compiling a program actually works, as in the process that the compiler goes through to be able to make your program an executable a general explanation of this process would be awesome.
Basically, python interprets the lines of code with the language parser... and then compiles the parsed lines to byte code. This byte code is "compiled python".
Let's build a bit of code:
# file: foo.py
class Bar(object):
x = 1
def __init__(self, y):
self.y = y
Now we import it.
>>> import foo
>>> foo
<module 'foo' from 'foo.py'>
>>> reload(foo)
<module 'foo' from 'foo.pyc'>
What you'll notice is that the first time we import foo, it says it was imported from foo.py. That's because python had to byte compile the code into a module object. Doing so, however, leaves a .pyc file in your directory... that's a compiled python file. Python prefers to use compiled code, as a time-saver as opposed to compiling the code again... so when you reload the module, python picks the compiled code to import. Basically, when you are "installing" python modules, you are just moving compiled code into somewhere python can import it (on your PYTHONPATH).
>>> import numpy
>>> numpy
<module 'numpy' from '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/__init__.pyc'>
The site-packages directory is the default place that compiled 3rd party code gets installed. Indeed a module is just a python object representation of a file. Meaning, a module instance is a compiled file. Once you "compile" the file in to a module, it's no longer going to care what's in the file... python only needs the compiled byte code after that.
>>> import types
>>> types.ModuleType.mro()
[<type 'module'>, <type 'object'>]
>>> foo.__class__.mro()
[<type 'module'>, <type 'object'>]
>>> i = object()
>>> object
<type 'object'>
>>> i
<object object at 0x1056f60b0>
Here we see (using types) that foo is an instance of a ModuleType... so basically a compiled file. mro shows modules inherit from a python object, which is the primary object in python. (Yes, it's object-oriented).
Here i is an instance of an object, just as foo is an instance of a ModuleType. Python works with instances of compiled objects, not the underlying code... just like (almost?) every other language. So, when you work with a class that you built in the foo module, you are working with the byte compiled instance of the class. You can dynamically modify the class instance, by adding methods on-the-fly... and it doesn't change the underlying file foo.py... but it does alter the byte-compiled instance of the module foo that's held in memory.
>>> zap = foo.Bar(2)
>>> zap.x, zap.y
(1, 2)
>>> foo.Bar
<class 'foo.Bar'>
>>> foo.Bar.mro()
[<class 'foo.Bar'>, <type 'object'>]
>>>
>>> def wow(self):
... return self.x + self.y
...
>>> wow(zap)
3
>>> foo.Bar.wow = wow
>>> foo.Bar.wow(zap)
3
>>> zap.wow()
3
Again, the file foo.py would be unchanged... however, I added wow to the class Bar, so it's usable as if wow were in the code in the first place. So working with "compiled" python is not static at all... it just means that you are working with code that has been byte compiled to save some time when you are importing it the first time. Note that since the module foo is an instance, you can also edit it in memory (not just objects that already live in it's contents).
>>> foo.square = lambda x:x**2
>>>
>>> from foo import square
>>> square(3)
9
Here I added squared to foo -- not to foo.py, but to the byte compiled copy of foo that lives in memory.
So can you pickle and unpickle objects in compiled code? Absolutely. You are probably doing that already if you've used pickle.
P.S. If you are talking about building C++ extensions to python, and compiling the code to shared libraries... it's still fundamentally no different.
If you are looking for some nitty-gritty details on byte compiling, check out my question and answer here: How is a python function's name reference found inside it's declaration?.

Accessing the print function from globals()

Apologies in advance for conflating functions and methods, I don't have time at the moment to sort out the terminology but I'm aware of the distinction (generally).
I'm trying to control what functions are run by my script via command-line arguments. After a lot of reading here and elsewhere, I'm moving in the direction of the following example.
# After connecting to a database with MySQLdb and defining a cursor...
cursor.execute(some_query_stored_earlier)
for row in cursor:
for method_name in processing_methods: # ('method1','method2', ...)
globals()[method_name](row)
(Clarification: processing_methods is a tuple of user-defined strings via command-line argument(s) with nargs='*'.)
However, I'm running into problems with print (no surprise there). I would like print to be:
among the methods that MIGHT be specified from the command line;
the default method when NO methods are specified from the command line;
not performed if ONLY OTHER methods are specified from the command line.
Let me acknowledge that I can make things easier on myself by eliminating the first and third criteria and simply doing:
for row in cursor:
print row
for method_name in processing_methods:
globals[method_name](row)
But I really don't want to ALWAYS print every row in what will sometimes be a several-million-rows result. I did a future import, hoping that would solve my problem - no such luck. So I did a little exploring:
>>> from __future__ import print_function
>>> print
<built-in function print>
>>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, 'print_function': _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536), '__package__': None}
>>> a = "Hello, world!"
>>> print(a)
Hello, world!
>>> globals()['print'](a)
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
globals()['print'](a)
KeyError: 'print' # Okay, no problem, obviously it's...
>>> globals()['print_function'](a)
Traceback (most recent call last):
File "<pyshell#34>", line 1, in <module>
globals()['print_function'](a)
AttributeError: _Feature instance has no __call__ method # ...huh.
So then I did a little more reading, and this Q&A prompted some more exploring:
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__']
>>> __builtins__
<module '__builtin__' (built-in)>
>>> 'print' in dir(__builtins__)
True # now we're getting somewhere!
>>> __builtins__.print something
SyntaxError: invalid syntax # fair enough.
>>> __builtins__.print('something')
SyntaxError: invalid syntax # wait, what?
>>> dir(__builtins__.print)
SyntaxError: invalid syntax # -_-
Something is going on here that I just don't understand, and this other Q&A hasn't made it any clearer. I think the easy solution to my particular issue is going to be a mildly awkward wrapper like:
def printrows(row):
print row # assuming no future import, of course
But it's driving me crazy: Why can't I access print via the globals dictionary? Am I doing it wrong, or is it just something you can't do with built-in functions?
Did you forget to repeat from __future__ import print_function when you opened a new shell for your second try (where you got all those syntax errors)? It works for me: https://ideone.com/JOBAAk
If you do an otherwise seemingly useless assignment, it works the way I think you expected. I'm not an expert in the internals at work here, so I can't explain WHY this works, but it does.
>>> from __future__ import print_function
>>> row="Hello world"
>>> print = print
>>> globals()['print'](row)
Hello world

Synthetic functions in python

In python I can create a class without class statement:
MyClass = type('X', (object,), dict(a=1))
Is there a way to create a function without 'def'?
Thats as far as i got...
d={} # func from string
exec'''\
def synthetics(s):
return s*s+1
''' in d
>>> d.keys()
['__builtins__', 'synthetics']
>>> d['synthetics']
<function synthetics at 0x00D09E70>
>>> foo = d['synthetics']
>>> foo(1)
2
Technically, yes, this is possible. The type of a function is, like all other types, a constructor for instances of that type:
FunctionType = type(lambda: 0)
help(FunctionType)
As you can see from the help, you need at minimum code and globals. The former is a compiled bytecode object; the latter is a dictionary.
To make the code object, you can use the code type's constructor:
CodeType = type((lambda: 0).func_code)
help(CodeType)
The help says this is "not for the faint of heart" and that's true. You need to pass bytecode and a bunch of other stuff to this constructor. So the easiest way to get a code object is from another function, or using the compile() function. But it is technically possible to generate code objects completely synthetically if you understand Python bytecode well enough. (I have done this, on a very limited basis, to construct signature-preserving wrapper functions for use in decorators.)
PS -- FunctionType and CodeType are also available via the types module.
There might be a more direct way than the following, but here's a full-blown function without def. First, use a trivial lambda expression to get a function object:
>>> func = lambda: None
Then, compile some source code to get a code object and use that to replace the lambda's code:
>>> func.__code__ = compile("print('Hello, world!')", "<no file>", "exec")
>>> func()
Hello, world!

Categories

Resources