What's behind sys.stdin.readlines()

What's behind sys.stdin.readlines() - python

Question 1:
I have a piece of code like this (Python2.7):
for line in sys.stdin.readlines():
print line
When I run this code, input a string in the terminal and press Enter key, nothing happens. 'print line' doesn't work.
So I imagine there is buffer for sys.stdin.readlines(), but I wonder how does it work? Can I flush it so every time a line is given, 'print line' can be executed imeediatly?
Question2: What's the difference between these two lines:
for line in sys.stdin:
for line in sys.stdin.readline():
I found their behavior is a little different. If I use ctrl+D to terminate the input, I have to press ctrl+D twice in the first case before it's really terminated. While in the second case, only one ctrl+D is enough.

CTRL-D sends the EOF (end of file) control character to stdin in an interactive shell. Usually, you feed a file to the stdin of a process via redirection (e.g. myprogram < myfile), but if you are interactively typing characters into stdin of a process, you need to tell it when to stop reading the "file" you are actively creating.
sys.stdin.readlines waits for stdin to complete (via an EOF control character), then conveniently splits the entire stdin contents (flushed) before the EOF into a list of tokens delimited by newline characters. When you hit ENTER, you send a \n character, which is rendered for you as a new line, but does NOT tell stdin to stop reading.
Regarding the other two lines, I think this might help:
Think of the sys.stdin object as a file. When you EOF, you save that file and then you are not allowed to edit it anymore because it leaves your hands and belongs to stdin. You can perform functions on that file, like readlines, which is a convenient way to say "I want a list, and each element is a line in that file". Or, you can just read one line from it with readline, in which case the for loop would be only iterating over the characters in that line.
What's going on behind the scenes?
Internally, the reference to sys.stdin blocks execution until EOF is received in sys.stdin. Then it becomes a file-like object stored in memory with a read pointer pointing to the beginning.
When you call just readline, the pointer reads until it hits a \n character, returns to you what it just traversed over, and stays put, waiting for you to move it again. Calling readline again will cause the pointer to move until the next \n, if it exists, else EOF.
readlines is really telling the pointer to traverse all the way (\n is functionally meaningless) from its current position (not necessarily beginning of file) until it sees EOF.
Try it out!
Trying it out is the best way to learn.
To see this behavior in action, try making a file with 10 lines, then redirect it to the stdin of a python script that prints sys.stdin.readline 3 times, then print sys.stdin.readlines. You'll see 3 lines printed out then a list containing 7 elements :)

Related

End of File Input Python

I am trying to solve Kattis problem. The full problem is found in the link: https://open.kattis.com/problems/addingwords
The part of the problem that I'm confused with is : "Input is a sequence of up to 2000 commands, one per line, ending at end of file."
What would be the code for this input? I tried doing this:
import sys
for line in sys.stdin.readlines():
#print('something')
After this, I continued the program as normal within the indentation from above. My question is, how would I test whether the program is working in cmd? I want to test a few cases, but when I input something, the command prompt keeps waiting for other results instead of printing anything out. And when I press control C the program ends abruptly. How are we supposed to check whether the program is working while taking in user input till end of file?

The problem here is that readlines() is eager not lazy. That means it will read the entire file into memory (until EOF) and then split it into lines and return a list of those lines. So when working with interactive stdin, sys.stdin.readline() will wait until the end of stdin (Ctrl-D on linux/macOS, Ctrl+Z on windows).
But there is no need for readlines() (and in fact, you almost should never use it). Iterating over a file object by default does so by lines:
for line in sys.stdin:
print('got line!')
The docs even admit that you should do this. If you do need all lines in a list, then just do list(sys.stdin).

Python printing character-by-character in OSX Terminal

I'm trying to have Python copy the contents of a .txt file into the bash terminal on OS X (10.10), but the line does not print until every single character of the line has been printed to the line. Is there any way to have Python print each line character-by-character, instead of line-by-line? My code is designed to wait between characters, but each line simply takes a long time to print:
while True:
character = text_file.read(1)
if not character: break
else:
sys.stdout.write(character)
time.sleep(0.050)
When I run this code in IDLE, the characters print one at a time. In Terminal, lines take several seconds to print, and each line prints all at once. Is there any way to reproduce the behavior I'm seeing in IDLE in Terminal?

Add sys.stdout.flush() after sys.stdout.write(character)
The reason should be that the output of stdout is buffered.

if you want remove new line at the end of the line.
you can simply
print character,
will remove the new line(\n).

Call Perl script from Python constantly returning values

I found a question on this site which showed me how to call a Perl script from Python. I'm currently using the following lines of code to achieve this:
pipe = subprocess.Popen(["perl", "./Perl_Script.pl", param], stdout=subprocess.PIPE)
result = pipe.stdout.read()
This works perfectly, but the only issue is that the Perl script takes a few minutes to run. At the end of the Perl script, I use a simple print statement to print my values I need to return back to Python, which gets set to the result variable in Python.
Is there a way I can include more print statements in my Perl script every few seconds that can get returned to Python continuously (instead of waiting a few minutes and returning a long list at the end)?
Ultimately, what I'm doing is using the Perl script to obtain data points that I then send back to Python to plot an eye diagram. Instead of waiting for minutes to plot the eye diagram when the Perl script is finished running, I'd like to return segments of the data to Python continuously, allowing my plot to update every few seconds.

The default UNIX stdio buffer is at least 8k. If you're writing less than 8k, you'll end up waiting until the program ends before the buffer is flushed.
Tell the Perl program to stop buffering output, and probably tell python not to buffer input through the pipe.
$| = 1;
to un-buffer STDOUT in your Perl program.

You need two pieces: To read a line at a time in Python space and to emit a line at a time from Perl. The first can be accomplished with a loop like
while True:
result = pipe.stdout.readline()
if not result:
break
# do something with result
The readline blocks until a line of text (or EOF) is received from the attached process, then gives you the data it read. So long as each chunk of data is on its own line, that should work.
If you run this code without modifying the Perl script, however, you will not get any output for quite a while, possibly until the Perl script is finished executing. This is because Perl block-buffers output to a pipe by default. You can tell it to flush the buffer more often by changing a global variable in the scope in which you are printing:
use English qw(-no_match_vars);
local $OUTPUT_AUTOFLUSH = 1;
print ...;
See http://perl.plover.com/FAQs/Buffering.html and http://perldoc.perl.org/perlvar.html .

pipe.stdout.read() tries to read the whole stream, so it will block until perl is finished.
Try this:
line=' '
while line:
line = pipe.stdout.readline()
print line,

Reading both newline types from an already opened file in python

I'm using subprocess.popen to run a command and grab the stdout.
It so happens that the program (mplayer) sort of uses both eol types \n and \r. The \rs come from terminal control characters. So the output I end up with are regular lines interspersed with really long lines where the \rs were ignored.
I know if I had opened a file myself, I could set the newline type. However, I'm getting the stdout from popen so I have no control over that.
I had a look at the python 2.7 source and I image I can somehow use TextIOWrapper to respect both eol types. However I'm not too sure what I need to pass to it. I know I need to pass the constructor some sort of buffer, but I don't know how to get the buffer from an already opened file.
All in all, how to I readline() in python that breaks at both \n and \r given an already open file/stream?

Popen.subprocess (and Popen.check_output if the convenience function is enough for you), have a universal_newlines parameter which by default is False, but when set to True will give you the behaviour you need of converting all newline variants to \n.

Is it possible for `tail` to emit incomplete lines?

I am using tail -F log | python parse.py to monitor and parse a growing log file, but some parsing errors occur that may be caused by reading incomplete lines from the log file.
Is it possible that tail emits incomplete lines?
In the parser, I am reading rows with code like the following:
import csv
import sys
reader = csv.reader(sys.stdin)
for row in reader
# process

It is possible that tail can emit 'unparsable lines' - but only if invalid lines are written to the file. Kind of a circular bit of argument, but here's an example of how it could happen:
You tail -f on /var/log/syslog
syslog-ng dies in the middle of a block-spanning write (sectors are 512 bytes, your filesystem block size is most likely larger, altho probably not much larger than 4096.. so, syslog has 9k of data buffered up to write out, it gets through 4k byte page and before it can write the next 4k+1k syslog dies. On ext2 at least, that'll end up as a partial write even after fsck. ext3? .. heh. I've been doing embedded for so long I can't remember... But, I'd HOPE not.. But who's to say that the data you're writing is always going to be correct? You might get a non-fatal string formatting error that doesn't include a newline that you're expecting.
You'll then have a partial line that isn't terminated with a newline (or even a \0), and the next time syslog starts up and starts appending it'll just be appending on the end of of the file with no notion of 'valid' records. So the first new record will be garbage, but the next one will be ok.
This is easy to exercise..
In one window
tail -f SOMEFILE
In another window
echo FOO >>SOMEFILE
echo BAR >>SOMEFILE
printf NO_NEWLINE >>SOMEFILE
echo I_WILL_HAVE_THE_LAST_LINE_PREFIXED_TO_ME_CAUSING_NERD_RAGE >>SOMEFILE
Since Linux's tail uses inotify by default, whatever is reading will get the last line without a newline and wait until the next newline comes along appending NO_NEWLINE to the beginning of what it considers 'the latest line'.
If you want to do this 'the pythonic' way, if you're using Linux - use inotify, if you're using OSX or BSD use 'knotty' and negate the use of 'tail' as an input pipe and just watch the file yourself.
Tail might do weird things if you use the 'resync on truncate' too - i.e. if the file gets zeroed and restarted in the middle of a read, you might get some weird amount of data on the read since 'tail' will close the previously opened file handle in exchange for the new one.

Since you changed up the question on me.. new answer! :p
reader = csv.reader(sys.stdin)
for row in reader:
try:
validate_row_data_somehow(row)
do_things_with_valid_row(row)
except:
print "failed to process row", `row`

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.