I am trying to compare the bytecode of two things with difflib, but dis.dis() always prints it to the console. Any way to get the output in a string?
If you're using Python 3.4 or later, you can get that string by using the method Bytecode.dis():
>>> s = dis.Bytecode(lambda x: x + 1).dis()
>>> print(s)
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 RETURN_VALUE
You also might want to take a look at dis.get_instructions(), which returns an iterator of named tuples, each corresponding to a bytecode instruction.
Uses StringIO to redirect stdout to a string-like object (python 2.7 solution)
import sys
import StringIO
import dis
def a():
print "Hello World"
stdout = sys.stdout # Hold onto the stdout handle
f = StringIO.StringIO()
sys.stdout = f # Assign new stdout
dis.dis(a) # Run dis.dis()
sys.stdout = stdout # Reattach stdout
print f.getvalue() # print contents
Related
In C++ you can read one value at a time like this:
//from console
cin >> x;
//from file:
ifstream fin("file name");
fin >> x;
I would like to emulate this behaviour in Python. It seems, however, that the ordinary ways to get input in Python read either whole lines, the whole file, or a set number of bits.
I would like a function, let's call it one_read(), that reads from a file until it encounters either a white-space or a newline character, then stops. Also, on subsequent calls to one_read() the input should begin where it left off.
Examples of how it should work:
# file input.in is:
# 5 4
# 1 2 3 4 5
n = int(one_read())
k = int(one_read())
a = []
for i in range(n):
a.append(int(one_read()))
# n = 5 , k = 4 , a = [1,2,3,4,5]
How can I do this?
I think the following should get you close. I admit I haven't tested the code carefully. It sounds like itertools.takewhile should be your friend, and a generator like yield_characters below will be useful.
from itertools import takewhile
import regex as re
# this function yields characters from a file one a at a time.
def yield_characters(file):
with open(file, 'r') as f:
while f:
line = f.readline()
for char in line:
yield char
# double check this. My python regex is weak.
def not_whitespace(char):
return bool(re.match(r"\S", char))
# this should use takewhile to get iterators while something is
def read_one(file):
chars = yield_character(file)
while chars:
yield list(takewhile(not_whitespace, chars)).join()
The read_one above is a generator, so you will need to do something like call list on it.
Normally you would just read a line at a time, then split this and work with each part. However if you can't do this for resource reasons, you can implement your own reader which will read one character at a time, and then yield a word each time it reaches a delimiter (or in this example also a newline or the end of the file).
This implemention uses a context manager to handle the file opening/reading, though this might be overkill:
from functools import partial
class Words():
def __init__(self, fname, delim):
self.delims = ['\n', delim]
self.fname = fname
self.fh = None
def __enter__(self):
self.fh = open(self.fname)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.fh.close()
def one_read(self):
chars = []
for char in iter(partial(self.fh.read, 1), ''):
if char in self.delims:
# delimiter signifies end of word
word = ''.join(chars)
chars = []
yield word
else:
chars.append(char)
# Assuming x.txt contains 12 34 567 8910
with Words('/tmp/x.txt', ' ') as w:
print(next(w.one_read()))
# 12
print(next(w.one_read()))
# 34
print(list(w.one_read()))
# [567, 8910]
More or less anything that operates on files in Python can operate on the standard input and standard output. The sys standard library module defines stdin and stdout which give you access to those streams as file-like objects.
Reading a line at a time is considered idiomatic in Python because the other way is quite error-prone (just one C++ example question on Stack Overflow). But if you insist: you will have to build it yourself.
As you've found, .read(n) will read at most n text characters (technically, Unicode code points) from a stream opened in text mode. You can't tell where the end of the word is until you read the whitespace, but you can .seek back one spot - though not on the standard input, which isn't seekable.
You should also be aware that the built-in input will ignore any existing data on the standard input before prompting the user:
>>> sys.stdin.read(1) # blocks
foo
'f'
>>> # the `foo` is our input, the `'f'` is the result
>>> sys.stdin.read(1) # data is available; doesn't block
'o'
>>> input()
bar
'bar'
>>> # the second `o` from the first input was lost
Try creating a class to remember where the operation left off.
The __init__ function takes the filename, you could modify this to take a list or other iterable.
read_one checks if there is anything left to read, and if there is, removes and returns the item at index 0 in the list; that being everything until the first whitespace.
class Reader:
def __init__(self, filename):
self.file_contents = open(filename).read().split()
def read_one(self):
if self.file_contents != []:
return self.file_contents.pop(0)
Initalise the function as follows and adapt to your liking:
reader = Reader(filepath)
reader.read_one()
This question already has an answer here:
How can I make a for-loop loop through lines instead of characters in a variable?
(1 answer)
Closed 6 years ago.
I'm trying to get output from os.system using the following code:
p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
ls = p.communicate()[0]
when I print the output I get:
> print (ls)
file1.txt
file2.txt
The output somehow displays as two separate strings, However, when I try to print out the strings of filenames using a for loop i get
a list of characters instead:
>> for i in range(len(ls)):
> print i, ls[i]
Output:
0 f
1 i
2 l
3 e
4 1
5 .
6 t
7 x
8 t
9 f
10 i
11 l
12 e
13 2
14 .
15 t
16 x
17 t
I need help ensuring the os.system() output returns as strings and
not a set of characters.
p.communicate returns a string. It may look like a list of filenames, but it is just a string. You can convert it to a list of filenames by splitting on the newline character:
s = p.communicate()[0]
for line in s.split("\n"):
print "line:", line
Are you aware that there are built-in functions to get a list of files in a directory?
for i in range(len(...)): is usually a code smell in Python. If you want to iterate over the numbered elements of a collection to canonical method is for i, element in enumerate(...):.
The code you quote clearly isn't the code you ran, since when you print ls you see two lines separated by a newline, but when you iterate over the characters of the string the newline doesn't appear.
The bottom line is that you are getting a string back from communicate()[0], but you are then iterating over it, giving you the individual characters. I suspect what you would like to do is use the .splt() or .splitlines() method on ls to get the individual file names, but you are trying to run before you can walk. Forst of all, get a clear handle on what the communicate method is returning to you.
Apparently, in Python 3.6, p.communicate returns bytes object:
In [16]: type(ls)
Out[16]: bytes
Following seems to work better:
In [22]: p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
In [23]: ls = p.communicate()[0].split()
In [25]: for i in range(len(ls)):
...: print(i, ls[i])
...:
0 b'file1.txt'
1 b'file2.txt'
But I would rather use os.listdir() instead of subprocess:
import os
for line in os.listdir():
print line
Very simple code:
import StringIO
import numpy as np
c = StringIO.StringIO()
c.write("1 0")
a = np.loadtxt(c)
print a
I get an empty array + warning that c is an empty file.
I fixed this by adding:
d=StringIO.StringIO(c.getvalue())
a = np.loadtxt(d)
I think such a thing shouldn't happen, what is happening here?
It's because the 'position' of the file object is at the end of the file after the write. So when numpy reads it, it reads from the end of the file to the end, which is nothing.
Seek to the beginning of the file and then it works:
>>> from StringIO import StringIO
>>> s = StringIO()
>>> s.write("1 2")
>>> s.read()
''
>>> s.seek(0)
>>> s.read()
'1 2'
StringIO is a file-like object. As such it has behaviors consistent with a file. There is a notion of a file pointer - the current position within the file. When you write data to a StringIO object the file pointer is adjusted to the end of the data. When you try to read it, the file pointer is already at the end of the buffer, so no data is returned.
To read it back you can do one of two things:
Use StringIO.getvalue() as you already discovered. This returns the
data from the beginning of the buffer, leaving the file pointer unchanged.
Use StringIO.seek(0) to reposition the file pointer to the start of
the buffer and then calling StringIO.read() to read the data.
Demo
>>> from StringIO import StringIO
>>> s = StringIO()
>>> s.write('hi there')
>>> s.read()
''
>>> s.tell() # shows the current position of the file pointer
8
>>> s.getvalue()
'hi there'
>>> s.tell()
8
>>> s.read()
''
>>> s.seek(0)
>>> s.tell()
0
>>> s.read()
'hi there'
>>> s.tell()
8
>>> s.read()
''
There is one exception to this. If you provide a value at the time that you create the StringIO the buffer will be initialised with the value, but the file pointer will positioned at the start of the buffer:
>>> s = StringIO('hi there')
>>> s.tell()
0
>>> s.read()
'hi there'
>>> s.read()
''
>>> s.tell()
8
And that is why it works when you use
d=StringIO.StringIO(c.getvalue())
because you are initialising the StringIO object at creation time, and the file pointer is positioned at the beginning of the buffer.
There is something I do not understand about the lineno offset that's being calculated by the ast module. Usually when I get the lineno of some ast object, it gives me the first line the object is encountered.
For example in the below case, the foo's lin
st='def foo():\n print "hello"'
import ast
print ast.parse(st).body[0].lineno
print ast.parse(st).body[0].body[0].lineno
would return 1 for function foo and return 2 for the hello world printout
However, if I parse a multi-line docstring (ast.Expr) the lineno provided is the last line.
st='def foo():\n """\n Test\n """'
import ast
print ast.parse(st).body[0].lineno
print ast.parse(st).body[0].body[0].lineno
The result would still be line 1 for the function but it would be line 4 for the docstring. I would have expected it to be on line 2 since that is when the docstring begins.
I guess what I am asking is if there is a way to always get the first lineno of all ast objects including ast.Expr .
AST's source locations leave much to be desired, but a lot of that is made available by the ASTTokens library, which annotates AST nodes with more useful location info. In your example:
import asttokens
st='def foo():\n """\n Test\n """'
atok = asttokens.ASTTokens(st, parse=True)
print atok.tree.body[0].first_token.start[0]
print atok.tree.body[0].body[0].first_token.start[0]
Prints 1 and 2, as desired. Perhaps more interestingly,
print atok.get_text_range(atok.tree.body[0])
print atok.get_text_range(atok.tree.body[0].body[0])
Prints the ranges of source text corresponding to the nodes: (0,35) and (15,35) in this case.
This question already has answers here:
Can I redirect the stdout into some sort of string buffer?
(9 answers)
capturing dis.dis results
(2 answers)
Closed 8 years ago.
Im using a library of functions that some of them print data I need:
def func():
print "data"
How can I call this function and get the printed data into a string?
If you can't change those functions, you will need to redirect sys.stdout:
>>> import sys
>>> stdout = sys.stdout
>>> import StringIO
>>> s = StringIO.StringIO()
>>> sys.stdout = s
>>> print "hello"
>>> sys.stdout = stdout
>>> s.seek(0)
>>> s.read()
'hello\n'