Last line of stdin not being read when EOF is detected

Last line of stdin not being read when EOF is detected - python

Probably an easy question, but I am basically trying to replicate the behavior (very sparsely and simplistically) of the cat command but in python. I want the user to be able to enter all the text they wish, then when they enter an EOF (by pressing ctrl+d or cmd+d) the program should print out everything they enter.
import sys
for line in sys.stdin:
print line
If I enter the input lines and follow the last line with a return character, and then press cmd+d:
never gonna give you up
never gonna let you down
never gonna run around
and desert you
then the output is
never gonna give you up
never gonna let you down
never gonna run around
and desert you
However, if I press cmd+d while I am still on the last line "and desert you" then the output is:
never gonna give you up
never gonna let you down
never gonna run around
How can I modify the program such that if the user presses the EOF on the last line, then it should still be included as part of the output?

The behavior of Control-D is actually a function of the TTY driver of the operating system, not of Python. The TTY driver normally sends data to programs a full line at a time. You'll note that 'cat' and most other programs behave the same way.
To do what you're asking is non-trivial. It usually requires putting the TTY into raw mode, reading and processing data (you would get the Control-D character itself), and then taking it out of raw mode before exiting (you don't want to exit in raw mode!)
Procedures for doing that are O/S dependent, but you might be able to make use of tty.setraw() and tty.setcbreak().

Okay that was wierd but it turns out there is an explanation. ctrl-d is "end of transmission" (EOT), not EOF. What it means is environment specific, but to a *nix terminal, it means to flush the input buffer to the program immediately.
If there are characters in the buffer, python's stdin gets them but since there is no \n, the stdin iterator buffers them and doesn't emit anything yet.
If there are no characters in the buffer, python's stdin gets an empty buffer and python follows the rule that an empty buffer means EOF and it should stop iteration.
If you type some characters and then 2 ctrl-d's in a row, you'll get the buffered characters and iteration will end. And, to keep things as complicated as possible, if you start another for line in sys.stdin it will happily continue taking input.
EDIT
This script will give you any pending characters immediately when you hit ctrl-d with data in the input buffer but it won't know to exit unless you hit a second ctrl-d
import sys
while True:
c = sys.stdin.read(1)
if not c:
break
sys.stdout.write(c)
sys.stdout.flush()

I think this is a python 2.7 problem. I tried your code on Python 3.5 and it works just fine.
Hope this helps!

Related

Why is Pycharm failing to convert string to integer?

num = 0
for i in range(5):
ask = int(input())
if abs(ask) > 0:
num = ask
print(num)
When I run the code, it lets me input the first string. But once I enter the second string, the program crashes and says "pythonProject\main.py", line 3, in
ask = int(input())
ValueError: invalid literal for int() with base 10: ''"
What's going on?
My input: I just type 1, press enter, type 2, press enter and then it crashes. I am sure it is not an error where I click enter too quickly, or I accidentally type in an empty string because I've ran the code multiple times.
What have I tried so far?
Creating a new project and pasting the code -> didn't work
Asking my friend to copy the code onto his PyCharm and run it -> worked fine on his computer
'Edit configurations', uncheck 'emulate code in output console' -> didn't work, it was already unchecked
Checked that I was running the correct file in the project -> didn't work, I was running the right file
EDIT:
FIXED, just needed to check 'Emulate code in output console' rather than uncheck it. Not sure why this works though, or how I can keep it checked for all future projects - rather than having to manually check it every time.

FIXED, just needed to check 'Emulate code in output console' rather than uncheck it. Not sure why this works though, or how I can keep it checked for all future projects - rather than having to manually check it every time.

The problem is with the input, as you are either pressing enter without any input (empty string) or you are entering a float value. This thread might help you in this case. The code is working fine when I input an integer and gives the same error when entered empty string or a float value.

To get it work you need to check "Emulate code in output console".
I've answered in the comments section and I'm glad it worked out, here is an explanation:
You need to know a concept of "terminal emulator" to understand why and how this works. When a program is ran (at least on UNIX-like operating systems), it has three I/O streams: stdin, stdout and stderr. The stdin is used to input data, and two others are for output.
Input or output stream is just a buffer used to communicate with the program back and forth. Once something is written to the buffer, it can be read from there. If the buffer is empty, an attempt to read from there will cause stall until the buffer has something in it. More about stdio: https://en.wikipedia.org/wiki/Standard_streams
When the program is ran through the terminal emulator, I/O streams are connected to this emulator, so whatever you type in the terminal window is written to your stdin by default. Whatever your program writes to the stdout and stderr is displayed on the screen. (However, this behavior may be changed using pipes, so you can pass data from some file to the stdin and also you can redirect the output to the file)
Here is the history behind terminal emulators, to understand, why is it implemented this way: https://en.wikipedia.org/wiki/Terminal_emulator#Computer_terminals
For example, you have a simple program:
s = input('Enter string: ')
print(f'stdout: {s}')
If you run it from the terminal and type "TEST":
$ python3 test.py
TEST
stdout: TEST
But you also can, for example, pass data directly to stdin, and redirect output to the file:
$ echo "ABCDEF" | python3 test.py > OUTPUT.txt
there will be no text in the terminal, but OUPUT.txt will appear. It will contain:
stdout: ABCDEF
Now, about PyCharm:
By default, when it runs your script, it does not automatically emulate terminal in the output window. It simply does not send anything to the stdin and it won't react to pressed keys. When your program gets to the line with input(), it starts to read the stdin stream until it gets \n character from the stream (indicating that user has pressed Return key). As nothing gets sent to the stream, it will wait infinitely.
Useful tip: for testing, instead of just typing something into the terminal every time, you can also check "Redirect input from:" and choose an input file.

How to process lines of standard input interactively in Python?

I'd like to use Python as an external process and communicate with it by standard input/standard output. Specifically, I want to
for line in sys.stdin:
result = compute_something(line)
print result
sys.stdout.flush()
The output flush is to get the result back right away, without buffering. I want to do the same with the input--- to process each line right away, without buffering. However, the above code does not respond to each line individually; it waits until a large amount of data is accumulated in the standard input and then processes everything at once.
This is true even if the calling program flushes its standard output with every line. It's also true even if I'm running the above directly on a console. The buffer is in Python, not the calling program.
Moreover, I found that control-D on the console makes Python flush its standard input buffer. (And then I can continue to send more input afterward!) However, that's not useful to me because the calling program can't send the equivalent of control-D at the end of each line.
One more thing: for line in sys.stdin.xreadlines() appears to be equivalent to for line in sys.stdin: they both buffer.
So my question is, how can I write a Python script that does not buffer its input, so that it processes each line of input right away?

(I solved the problem before posting the question, but I think I should share it anyway--- others might encounter this problem and I'm still interested in any comments on why this is happening and what we should all know about how to control Python's or Linux's implicit buffering.)
Here's one way to avoid input buffering:
while True:
line = sys.stdin.readline()
result = compute_something(line)
print result
sys.stdout.flush()
Apparently, .readline() avoids the input buffer while direct iteration and .xreadlines() do not.

Need character-by-character keyboard input that interacts well with paste and ANSI escape sequences

My program (a "TRAC Processor") uses character-by-character input. I am implementing readline-like input features for strings which are terminated with characters other than enter (usually ') and may themselves be multi-line. So I output terminal escape sequences between input characters, including escape sequences which query the terminal emulator (cursor position and screen size). To do cross-platform single-character input, I used http://code.activestate.com/recipes/134892/, which was very helpful.
This worked fine with paste... until I needed to get a terminal response after the first character of the paste. It seemed like the pasted text was getting mingled with the response to the escape sequence. I thought I would fix it by flushing the input buffer before initiating the escape-sequence query: wait 10ms, and if there is no input proceed; if there is input, buffer it and wait again until no input. Based on this post, I tried to poll stdin using select(). Great idea, but it didn't work, and it produced very strange behavior. I posted that strange behavior in the original version of this question, thinking I was misunderstanding select and there was a way to fix it. There doesn't seem to be, but I have found another way to flush (and save) the input stream. I decided to keep this question, and post that method as answer.
The problem with select() is explained here. After the first character of the paste, the other characters are already buffered, and select only returns new input when there is new input beyond what is already buffered. I couldn't bring myself to delete the MWE I'd produced of this behavior, so you can see it below.
Unfortunately, the answers proposed in that post either don't work for me or need a lot more explanation. #slowdog suggests using unbuffered input (os.read(stdin.fileno(), 1) instead of stdin.read(1)). That solves the select problem, but it breaks paste: it seems that all the characters of the paste after the first one are buffered no matter what, so you never see them. It also didn't seem to work well with the escape-sequence responses, which seem to also get buffered. It's also annoying because you need to flush the output buffer, but that's not so terrible. #Omnafarious, in a comment, said "Though, another way to handle the Python buffering issue to to simply do a no-parameter read, which should read everything currently available." That is ultimately what I did, as posted below, but "simply" turns out not to be so simple. There is another solution here, but I figured there must be a way to do this without threading.
Incidentally, there is a relatively simple work-around, because it turns out that the paste is not randomly interspersed with response to the escape sequence. The entire remainder of the paste gets read before the escape sequence response, so that when you are looking for the escape-sequence response (which itself starts with an escape), you can just buffer all the characters you read before the escape, and process them later. This only fails if you might be typing ESC characters in at the terminal. In any case, by this time I was pretty much hooked on solving this problem, and I thought others might find the answer valuable.
Anyway, FWIW here is my MWE for the select problem, which just echoes the text rather than buffering it:
def flush():
import sys, tty, termios
from select import select
tty.setraw(sys.stdin.fileno())
while True:
rlist, wlist, xlist = select([sys.stdin], [], [], 1)
if rlist == []: return
sys.stdout.write(sys.stdin.read(1))
Paste this into the Python prompt (2.7.9) and put another blank line at the end. If you invoke flush() and type some text more quickly than one letter per second, it types it back to you. For example, I typed "hello" and then paused, and got this result:
>>> flush()
hello>>>
In the OSX Terminal app (at least), if you copy the word text to the clipboard, invoke the function and hit paste within one second, here's what you get:
>>> flush()
t>>>
Odd! Only the first letter. Try it again, typing nothing:
>>> flush()
>>>
It paused for a second and does nothing, like no input waiting, right? Try it again, and hit ?:
>>> flush()
ext?>>>
You get the rest of the paste, saved up, before the ?!! Also, strangely, there is a 1-second pause before it types the ? which I don't understand. If you try again at this point, it behaves like normal.
OK, let's try it again, first pasting text, then pasting WTF, then typing !:
>>> flush()
t>>> flush()
extW>>> flush()
TF!>>>
So again the paste only gives the first letter, and holds the others in the input buffer, and pauses for a second before W and !. Yet another strange thing: the buffered characters are not entered at the Python >>> prompt.
One lingering question: why do you get the additional 1-second pause before the next letter is echoed? Select does not always wait for the whole time period...

The "no-parameter read" is often cited as a way to read all available bytes, which sounds perfect for this application. Unfortunately, when you look at the documentation, read() is the same as readall(), which blocks until EOF. So you need to set stdin to non-blocking mode.
Once you do this, you start getting:
IOError: [Errno 35] Resource temporarily unavailable
When you google this, the vast majority of responses say that the solution to this problem is to get rid of non-blocking mode... so that's not helpful here. However, this post explains that this is simply what the non-blocking read() does when there are no characters to return.
Here is my flush() function, much of which copied from that post:
def flush():
import sys, tty, termios, fcntl, os
fd = sys.stdin.fileno()
old_attr = termios.tcgetattr(fd)
old_fl = fcntl.fcntl(fd, fcntl.F_GETFL)
try:
tty.setraw(fd)
fcntl.fcntl(fd, fcntl.F_SETFL, old_fl | os.O_NONBLOCK)
inp = sys.stdin.read()
except IOError, ex1: #if no chars available generates exception
try: #need to catch correct exception
errno = ex1.args[0] #if args not sequence get TypeError
if errno == 35:
return '' #No characters available
else:
raise #re-raise exception ex1
except TypeError, ex2: #catch args[0] mismatch above
raise ex1 #ignore TypeError, re-raise exception ex1
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old_attr)
fcntl.fcntl(fd, fcntl.F_SETFL, old_fl)
return inp
Hope it's helpful to someone!

Why doesn't print output show up immediately in the terminal when there is no newline at the end?

I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.

The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.

Why do I have to press Ctrl+D twice to close stdin?

I have the following Python script that reads numbers and outputs an error if the input is not a number.
import fileinput
import sys
for line in (txt.strip() for txt in fileinput.input()):
if not line.isdigit():
sys.stderr.write("ERROR: not a number: %s\n" % line)
If I get the input from stdin, I have to press Ctrl + D twice to end the program. Why?
I only have to press Ctrl + D once when I run the Python interpreter by itself.
bash $ python test.py
1
2
foo
4
5
<Ctrl+D>
ERROR: not a number: foo
<Ctrl+D>
bash $

In Python 3, this was due to a bug in Python's standard I/O library. The bug was fixed in Python 3.3.
In a Unix terminal, typing Ctrl+D doesn't actually close the process's stdin. But typing either Enter or Ctrl+D does cause the OS read system call to return right away. So:
>>> sys.stdin.read(100)
xyzzy (I press Enter here)
(I press Ctrl+D once)
'xyzzy\n'
>>>
sys.stdin.read(100) is delegated to sys.stdin.buffer.read, which calls the system read() in a loop until either it accumulates the full requested amount of data; or the system read() returns 0 bytes; or an error occurs. (docs) (source)
Pressing Enter after the first line caused the system read() to return 6 bytes. sys.stdin.buffer.read called read() again to try to get more input. Then I pressed Ctrl+D, causing read() to return 0 bytes. At this point, sys.stdin.buffer.read gave up and returned just the 6 bytes it had collected earlier.
Note that the process still has my terminal on stdin, and I can still type stuff.
>>> sys.stdin.read() (note I can still type stuff to python)
xyzzy (I press Enter)
(Press Ctrl+D again)
'xyzzy\n'
OK. This is the part that was busted when this question was originally asked. It works now. But prior to Python 3.3, there was a bug.
The bug was a little complicated --- basically the problem was that two separate layers were doing the same work. BufferedReader.read() was written to call self.raw.read() repeatedly until it returned 0 bytes. However, the raw method, FileIO.read(), performed a loop-until-zero-bytes of its own. So the first time you press Ctrl+D in a Python with this bug, it would cause FileIO.read() to return 6 bytes to BufferedReader.read(), which would then immediately call self.raw.read() again. The second Ctrl+D would cause that to return 0 bytes, and then BufferedReader.read() would finally exit.
This explanation is unfortunately much longer than my previous one, but it has the virtue of being correct. Bugs are like that...

Most likely this has to do with Python the following Python issues:
5505: sys.stdin.read() doesn't return after first EOF on Windows, and
1633941: for line in sys.stdin: doesn't notice EOF the first time.

I wrote an explanation about this in my answer to this question.
How to capture Control+D signal?
In short, Control-D at the terminal simply causes the terminal to flush the input. This makes the read system call return. The first time it returns with a non-zero value (if you typed something). The second time, it returns with 0, which is code for "end of file".

The first time it considers it to be input, the second time it's for keeps!
This only occurs when the input is from a tty. It is likely because of the terminal settings where characters are buffered until a newline (carriage return) is entered.

Using the "for line in file:" form of reading lines from a file, Python uses a hidden read-ahead buffer (see http://docs.python.org/2.7/library/stdtypes.html#file-objects at the file.next function). First of all, this explains why a program that writes output when each input line is read displays no output until you press CTRL-D. Secondly, in order to give the user some control over the buffering, pressing CTRL-D flushes the input buffer to the application code. Pressing CTRL-D when the input buffer is empty is treated as EOF.
Tying this together answers the original question. After entering some input, the first ctrl-D (on a line by itself) flushes the input to the application code. Now that the buffer is empty, the second ctrl-D acts as End-of-File (EOF).
file.readline() does not exhibit this behavior.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.