Using Python codecs causes readline problems with sys.stdin?

Using Python codecs causes readline problems with sys.stdin? - python

I am writing a Python wrapper script (childscript.py) for a command line executable (childprogram). Another executable, (parentprogram) spawns childscript.py and pipes output into childscript.py. childscript.py spawns childprogram with:
retval = subprocess.Popen(RUNLINE, shell=False, stdout=None, stderr=None, stdin=subprocess.PIPE)
If childscript.py does a series of reads from sys.stdin straight up using readline:
line = sys.stdin.readline()
I am able to get all the output from parentprogram and feed it into childprogram.
However, if I try to use the codecs module by doing:
sys.stdin = codecs.open(sys.stdin.fileno(), encoding='iso-8859-1', mode='rb', buffering=0)
or do a:
sys.stdin = codecs.getreader('iso-8859-1')(sys.stdin.detach())
and attempt to do the read, the read does not get all the output from parentprogram. If I force additional output from parentprogram, the missing bits come out along with part of the additional output that I pushed in. It looks like childscript.py is not reading everything that it is being provided to it when I use the codecs module.
Am I doing something totally wrong? Without the codecs, childscript.py triggers an exception when presented with iso-8859-1 encoded stuff from parentprogram.
EDIT:
I discovered that Python v3.x "open" can take the encoding option as well. I changed the line to use "open" instead of "codecs.open":
sys.stdin = open(sys.stdin.fileno(), encoding='iso-8859-1', mode='r')
and it works as expected, without any of the problems that open.codecs produces. I've switched my script to use "open" instead.
If anybody can explain why the codecs module behaves differently, I'd appreciate it.

Flush the output channel in the parent.
Pipes are always buffered. The usual buffer size is 4KB. Unlike when the output is connected to the console, the standard runtime will not flush the output for you after each line.

Try this:
import sys
import os
fd = sys.stdin.fileno()
text = ''
while 1:
try:
raw_data = os.read(fd, 1024)
text += unicode(raw_data, 'iso-8859-1')
# now do something with text
except (EOFError, KeyboardInterrupt):
break
This way you avoid using readline() which will issue an error for non-ascii characters, but still can use non-blocking read.
The only problem is that you have to separate the input into lines yourself.

Related

How to convert a Python 3 string into a Python 2 (non-Unicode) string? [duplicate]

In python 2.x I could do this:
import sys, array
a = array.array('B', range(100))
a.tofile(sys.stdout)
Now however, I get a TypeError: can't write bytes to text stream. Is there some secret encoding that I should use?

A better way:
import sys
sys.stdout.buffer.write(b"some binary data")

An idiomatic way of doing so, which is only available for Python 3, is:
with os.fdopen(sys.stdout.fileno(), "wb", closefd=False) as stdout:
stdout.write(b"my bytes object")
stdout.flush()
The good part is that it uses the normal file object interface, which everybody is used to in Python.
Notice that I'm setting closefd=False to avoid closing sys.stdout when exiting the with block. Otherwise, your program wouldn't be able to print to stdout anymore. However, for other kind of file descriptors, you may want to skip that part.

import os
os.write(1, a.tostring())
or, os.write(sys.stdout.fileno(), …) if that's more readable than 1 for you.

In case you would like to specify an encoding in python3 you can still use the bytes command like below:
import os
os.write(1,bytes('Your string to Stdout','UTF-8'))
where 1 is the corresponding usual number for stdout --> sys.stdout.fileno()
Otherwise if you don't care of the encoding just use:
import sys
sys.stdout.write("Your string to Stdout\n")
If you want to use the os.write without the encoding, then try to use the below:
import os
os.write(1,b"Your string to Stdout\n")

How to read a print out statement from another program with python?

I have an algorithm that is written in C++ that outputs a cout debug statement to the terminal window and I would like to figure out how to read that printout with python without it being piped/written to a file or to return a value.
Python organizes how each of the individual C++ algorithms are called while the data is kept on the heap and not onto disk. Below is an example of the a situation that is of similar output,
+-------------- terminal window-----------------+
(c++)runNewAlgo: Debug printouts on
(c++)runNewAlgo: closing pipes and exiting
(c++)runNewAlgo: There are 5 objects of interest found
( PYTHON LINE READS THE PRINT OUT STATEMENT)
(python)main.py: Starting the next processing node, calling algorithm
(c++)newProcessNode: Node does work
+---------------------------------------------------+
Say the line of interest is "there are 5 objects of interest" and the code will be inserted before the python call. I've tried to use sys.stdout and subprocess.Popen() but I'm struggling here.

Your easiest path would probably be to invoke your C++ program from inside your Python script.
More details here: How to call an external program in python and retrieve the output and return code?

You can use stdout from the returned process and read it line-by-line. The key is to pass stdout=subprocess.PIPE so that the output is sent to a pipe instead of being printed to your terminal (via sys.stdout).
Since you're printing human-readable text from your C++ program, you can also pass encoding='utf-8' as well to automatically decode each line using utf-8 encoding; otherwise, raw bytes will be returned.
import subprocess
proc = subprocess.Popen(['/path/to/your/c++/program'],
stdout=subprocess.PIPE, encoding='utf-8')
for line in proc.stdout:
do_something_with(line)
print(line, end='') # if you also want to see each line printed

Detect STDIN file and prevent user input in Python

I want to write a command-line Python program that can be called in a Windows cmd.exe prompt using the STDIN syntax and to print help text to STDOUT if an input file is not provided.
The STDIN syntax is different from argument syntax, and is necessary to be a drop-in replacement solution:
my_program.py < input.txt
Here's what I have so far:
import sys
# Define stdout with \n newline character instead of the \r\n default
stdout = open(sys.__stdout__.fileno(),
mode=sys.__stdout__.mode,
buffering=1,
encoding=sys.__stdout__.encoding,
errors=sys.__stdout__.errors,
newline='\n',
closefd=False)
def main(args):
lines = ''.join([line for line in sys.stdin.readlines()])
lines = lines.replace( '\r\n', '\n' ).replace( '\t', ' ' )
stdout.write(lines)
if __name__=='__main__':
main(sys.argv)
I cannot figure out how to detect if a file was provided to STDIN and prevent prompting for user input if it wasn't. sys.argv doesn't contain STDIN. I could wrap it in a thread with a timer and wait for some file access upper limit time and decide that a file probably wasn't provided, but I wanted to see if there's a better way. I searched in SO for this question, but was unable to find an answer that avoids a timer.

test.py:
import sys
if sys.__stdin__.isatty():
print("stdin from console")
else:
print("stdin not from console")
execution:
> test.py
stdin from console
> test.py <input.txt
stdin not from console

The operator you are using will read a file and provide the contents of that file on stdin for your process. This means there is no way for your script to tell whether it is being fed the contents of a file, or whether there is a really fast typist at the keyboard entering the exact same series of keystrokes that matches the file contents.
By the time your script accesses the data, it's just a stream of characters, the fact that it was a file is only known to the command line interface you used to write the redirection.

Parsing the output of a subprocess while executing and clearing the memory (Python 2.7)

I need to parse the output produced by an external program (third party, I have no control over it) which produces large amounts of data. Since the size of the output greatly exceeds the available memory, I would like to parse the output while the process is running
and remove from the memory the data that have already been processed.
So far I do something like this:
import subprocess
p_pre = subprocess.Popen("preprocessor",stdout = subprocess.PIPE)
# preprocessor is an external bash script that produces the input for the third-party software
p_3party = subprocess.Popen("thirdparty",stdin = p_pre.stdout, stdout = subprocess.PIPE)
(data_to_parse,can_be_thrown) = p_3party.communicate()
parsed_data = myparser(data_to_parse)
When "thirdparty" output is small enough, this approach works. But as stated in the Python documentation:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
I think a better approach (that could actually make me save some time),
would be to start processing data_to_parse while it is being produces,
and when the parsing has been done correctly "clear" data_to_parse removing
the data that have already been parsed.
I have also tried to use a for cycle like:
parsed_data=[]
for i in p_3party.stdout:
parsed_data.append(myparser(i))
but it gets stuck and can't understand why.
So I would like to know what it is the best approach to accomplish this? What are the issues to be aware of?

You can use the subprocess.Popen() to create a steam from which you read lines.
import subprocess
stream = subprocess.Popen(stdout=subprocess.PIPE).stdout
for line in stream:
#parse lines as you recieve them.
print line
You could pass the lines to your myparser() method, or append them to a list until you are ready to use them.. whatever.
In your case, using two sub-processes, it would work something like this:
import subprocess
def method(stream, retries=3):
while retries > 0:
line = stream.readline()
if line:
yield line
else:
retries -= 1
pre_stream = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout
stream = subprocess.Popen(cmd, stdin=pre_stream, stdout=subprocess.PIPE).stdout
for parsed in method(stream):
# do what you want with the parsed data.
parsed_data.append(parsed)

Iterating over a file as in for i in p_3party.stdout: uses a read-ahead buffer. The readline() method may be more reliable with a pipe -- AFAIK it reads character by character.
while True:
line = p_3party.stdout.readline()
if not line:
break
parsed_data.append(myparser(line))

can you print a file from python?

Is there some way of sending output to the printer instead of the screen in Python? Or is there a service routine that can be called from within python to print a file? Maybe there is a module I can import that allows me to do this?

Most platforms—including Windows—have special file objects that represent the printer, and let you print text by just writing that text to the file.
On Windows, the special file objects have names like LPT1:, LPT2:, COM1:, etc. You will need to know which one your printer is connected to (or ask the user in some way).
It's possible that your printer is not connected to any such special file, in which case you'll need to fire up the Control Panel and configure it properly. (For remote printers, this may even require setting up a "virtual port".)
At any rate, writing to LPT1: or COM1: is exactly the same as writing to any other file. For example:
with open('LPT1:', 'w') as lpt:
lpt.write(mytext)
Or:
lpt = open('LPT1:', 'w')
print >>lpt, mytext
print >>lpt, moretext
close(lpt)
And so on.
If you've already got the text to print in a file, you can print it like this:
with open(path, 'r') as f, open('LPT1:', 'w') as lpt:
while True:
buf = f.read()
if not buf: break
lpt.write(buf)
Or, more simply (untested, because I don't have a Windows box here), this should work:
import shutil
with open(path, 'r') as f, open('LPT1:', 'w') as lpt:
shutil.copyfileobj(f, lpt)
It's possible that just shutil.copyfile(path, 'LPT1:'), but the documentation says "Special files such as character or block devices and pipes cannot be copied with this function", so I think it's safer to use copyfileobj.

Python doesn't (unless you're using graphical libraries) ever send stuff to "The screen". It writes to stdout and stderr, which are, as far as Python is concerned, just things that look like files.
It's simple enough to have python direct those streams to anything else that looks like a file; for instance, see Redirect stdout to a file in Python?
On unix systems, there are file-like devices that happen to be printers (/dev/lp*); on windows, LPT1 serves a similar purpose.
Regardless of the OS, you'll have to make sure that LPT1 or /dev/lp* are actually hooked up to a printer somehow.

If you are on linux, the following works if you have your printer setup and set as your default.
from subprocess import Popen
from cStringIO import StringIO
# place the output in a file like object
sio = StringIO(output_string)
# call the system's lpr command
p = Popen(["lpr"], stdin=sio, shell=True)
output = p.communicate()[0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Python codecs causes readline problems with sys.stdin? - python

Flush the output channel in the parent. Pipes are always buffered. The usual buffer size is 4KB. Unlike when the output is connected to the console, the standard runtime will not flush the output for you after each line.

Related

How to convert a Python 3 string into a Python 2 (non-Unicode) string? [duplicate]

How to read a print out statement from another program with python?

Detect STDIN file and prevent user input in Python

Parsing the output of a subprocess while executing and clearing the memory (Python 2.7)

can you print a file from python?

Categories

Resources