Why do I have to press Ctrl+D twice to close stdin?

Why do I have to press Ctrl+D twice to close stdin? - python

I have the following Python script that reads numbers and outputs an error if the input is not a number.
import fileinput
import sys
for line in (txt.strip() for txt in fileinput.input()):
if not line.isdigit():
sys.stderr.write("ERROR: not a number: %s\n" % line)
If I get the input from stdin, I have to press Ctrl + D twice to end the program. Why?
I only have to press Ctrl + D once when I run the Python interpreter by itself.
bash $ python test.py
1
2
foo
4
5
<Ctrl+D>
ERROR: not a number: foo
<Ctrl+D>
bash $

In Python 3, this was due to a bug in Python's standard I/O library. The bug was fixed in Python 3.3.
In a Unix terminal, typing Ctrl+D doesn't actually close the process's stdin. But typing either Enter or Ctrl+D does cause the OS read system call to return right away. So:
>>> sys.stdin.read(100)
xyzzy (I press Enter here)
(I press Ctrl+D once)
'xyzzy\n'
>>>
sys.stdin.read(100) is delegated to sys.stdin.buffer.read, which calls the system read() in a loop until either it accumulates the full requested amount of data; or the system read() returns 0 bytes; or an error occurs. (docs) (source)
Pressing Enter after the first line caused the system read() to return 6 bytes. sys.stdin.buffer.read called read() again to try to get more input. Then I pressed Ctrl+D, causing read() to return 0 bytes. At this point, sys.stdin.buffer.read gave up and returned just the 6 bytes it had collected earlier.
Note that the process still has my terminal on stdin, and I can still type stuff.
>>> sys.stdin.read() (note I can still type stuff to python)
xyzzy (I press Enter)
(Press Ctrl+D again)
'xyzzy\n'
OK. This is the part that was busted when this question was originally asked. It works now. But prior to Python 3.3, there was a bug.
The bug was a little complicated --- basically the problem was that two separate layers were doing the same work. BufferedReader.read() was written to call self.raw.read() repeatedly until it returned 0 bytes. However, the raw method, FileIO.read(), performed a loop-until-zero-bytes of its own. So the first time you press Ctrl+D in a Python with this bug, it would cause FileIO.read() to return 6 bytes to BufferedReader.read(), which would then immediately call self.raw.read() again. The second Ctrl+D would cause that to return 0 bytes, and then BufferedReader.read() would finally exit.
This explanation is unfortunately much longer than my previous one, but it has the virtue of being correct. Bugs are like that...

Most likely this has to do with Python the following Python issues:
5505: sys.stdin.read() doesn't return after first EOF on Windows, and
1633941: for line in sys.stdin: doesn't notice EOF the first time.

I wrote an explanation about this in my answer to this question.
How to capture Control+D signal?
In short, Control-D at the terminal simply causes the terminal to flush the input. This makes the read system call return. The first time it returns with a non-zero value (if you typed something). The second time, it returns with 0, which is code for "end of file".

The first time it considers it to be input, the second time it's for keeps!
This only occurs when the input is from a tty. It is likely because of the terminal settings where characters are buffered until a newline (carriage return) is entered.

Using the "for line in file:" form of reading lines from a file, Python uses a hidden read-ahead buffer (see http://docs.python.org/2.7/library/stdtypes.html#file-objects at the file.next function). First of all, this explains why a program that writes output when each input line is read displays no output until you press CTRL-D. Secondly, in order to give the user some control over the buffering, pressing CTRL-D flushes the input buffer to the application code. Pressing CTRL-D when the input buffer is empty is treated as EOF.
Tying this together answers the original question. After entering some input, the first ctrl-D (on a line by itself) flushes the input to the application code. Now that the buffer is empty, the second ctrl-D acts as End-of-File (EOF).
file.readline() does not exhibit this behavior.

Related

Python and C pipeline: Python does not write from buffer when I flush

This is part of a university assignment but the problem I have does not relate to the assignment itself.
I have a C file that is waiting for two inputs from terminal, there are two gets(info1) and later a gets(info2). I have a buffered writed that I use to do the following inside python:
sys.stdout.buffer.write("Hello 1".encode("ascii"))
sys.stdout.flush()
sys.stdout.buffer.write("Hello 2".encode("ascii"))
sys.stdout.flush()
I then have a pipeline that sends these outputs from the python code to the input of the C program using the terminal. When I run the command I can see on the prints that both "Hello 1" and "Hello 2" is sent to gets(info1). Why is this happening? Since I flush the buffer should it not send the first input to the terminal, getting catched by the C first gets(info1) and "Hello 2" getting catched in the second gets(info2)? I even introduced a sleep function after the first flush but it sleeps then sends both the outputs to the first gets(info1). The pipeline obviously works since the C program is able to get the output from terminal produced by the python program. But why am I only gettings inputs to the first function even though I flush the buffer after the first string is written?
When I do
sys.stdout.buffer.write("Hello 1".encode("ascii"))
sys.stdout.buffer.write("\n".encode("ascii"))
sys.stdout.flush()
sys.stdout.buffer.write("Hello 2".encode("ascii"))
sys.stdout.flush()
It sends it properly. However, I need the output to be very specific

I got it working. See stdout buffering. I needed to feed stdout a newline instead of flushing, all good! :)

Why is Pycharm failing to convert string to integer?

num = 0
for i in range(5):
ask = int(input())
if abs(ask) > 0:
num = ask
print(num)
When I run the code, it lets me input the first string. But once I enter the second string, the program crashes and says "pythonProject\main.py", line 3, in
ask = int(input())
ValueError: invalid literal for int() with base 10: ''"
What's going on?
My input: I just type 1, press enter, type 2, press enter and then it crashes. I am sure it is not an error where I click enter too quickly, or I accidentally type in an empty string because I've ran the code multiple times.
What have I tried so far?
Creating a new project and pasting the code -> didn't work
Asking my friend to copy the code onto his PyCharm and run it -> worked fine on his computer
'Edit configurations', uncheck 'emulate code in output console' -> didn't work, it was already unchecked
Checked that I was running the correct file in the project -> didn't work, I was running the right file
EDIT:
FIXED, just needed to check 'Emulate code in output console' rather than uncheck it. Not sure why this works though, or how I can keep it checked for all future projects - rather than having to manually check it every time.

FIXED, just needed to check 'Emulate code in output console' rather than uncheck it. Not sure why this works though, or how I can keep it checked for all future projects - rather than having to manually check it every time.

The problem is with the input, as you are either pressing enter without any input (empty string) or you are entering a float value. This thread might help you in this case. The code is working fine when I input an integer and gives the same error when entered empty string or a float value.

To get it work you need to check "Emulate code in output console".
I've answered in the comments section and I'm glad it worked out, here is an explanation:
You need to know a concept of "terminal emulator" to understand why and how this works. When a program is ran (at least on UNIX-like operating systems), it has three I/O streams: stdin, stdout and stderr. The stdin is used to input data, and two others are for output.
Input or output stream is just a buffer used to communicate with the program back and forth. Once something is written to the buffer, it can be read from there. If the buffer is empty, an attempt to read from there will cause stall until the buffer has something in it. More about stdio: https://en.wikipedia.org/wiki/Standard_streams
When the program is ran through the terminal emulator, I/O streams are connected to this emulator, so whatever you type in the terminal window is written to your stdin by default. Whatever your program writes to the stdout and stderr is displayed on the screen. (However, this behavior may be changed using pipes, so you can pass data from some file to the stdin and also you can redirect the output to the file)
Here is the history behind terminal emulators, to understand, why is it implemented this way: https://en.wikipedia.org/wiki/Terminal_emulator#Computer_terminals
For example, you have a simple program:
s = input('Enter string: ')
print(f'stdout: {s}')
If you run it from the terminal and type "TEST":
$ python3 test.py
TEST
stdout: TEST
But you also can, for example, pass data directly to stdin, and redirect output to the file:
$ echo "ABCDEF" | python3 test.py > OUTPUT.txt
there will be no text in the terminal, but OUPUT.txt will appear. It will contain:
stdout: ABCDEF
Now, about PyCharm:
By default, when it runs your script, it does not automatically emulate terminal in the output window. It simply does not send anything to the stdin and it won't react to pressed keys. When your program gets to the line with input(), it starts to read the stdin stream until it gets \n character from the stream (indicating that user has pressed Return key). As nothing gets sent to the stream, it will wait infinitely.
Useful tip: for testing, instead of just typing something into the terminal every time, you can also check "Redirect input from:" and choose an input file.

Last line of stdin not being read when EOF is detected

Probably an easy question, but I am basically trying to replicate the behavior (very sparsely and simplistically) of the cat command but in python. I want the user to be able to enter all the text they wish, then when they enter an EOF (by pressing ctrl+d or cmd+d) the program should print out everything they enter.
import sys
for line in sys.stdin:
print line
If I enter the input lines and follow the last line with a return character, and then press cmd+d:
never gonna give you up
never gonna let you down
never gonna run around
and desert you
then the output is
never gonna give you up
never gonna let you down
never gonna run around
and desert you
However, if I press cmd+d while I am still on the last line "and desert you" then the output is:
never gonna give you up
never gonna let you down
never gonna run around
How can I modify the program such that if the user presses the EOF on the last line, then it should still be included as part of the output?

The behavior of Control-D is actually a function of the TTY driver of the operating system, not of Python. The TTY driver normally sends data to programs a full line at a time. You'll note that 'cat' and most other programs behave the same way.
To do what you're asking is non-trivial. It usually requires putting the TTY into raw mode, reading and processing data (you would get the Control-D character itself), and then taking it out of raw mode before exiting (you don't want to exit in raw mode!)
Procedures for doing that are O/S dependent, but you might be able to make use of tty.setraw() and tty.setcbreak().

Okay that was wierd but it turns out there is an explanation. ctrl-d is "end of transmission" (EOT), not EOF. What it means is environment specific, but to a *nix terminal, it means to flush the input buffer to the program immediately.
If there are characters in the buffer, python's stdin gets them but since there is no \n, the stdin iterator buffers them and doesn't emit anything yet.
If there are no characters in the buffer, python's stdin gets an empty buffer and python follows the rule that an empty buffer means EOF and it should stop iteration.
If you type some characters and then 2 ctrl-d's in a row, you'll get the buffered characters and iteration will end. And, to keep things as complicated as possible, if you start another for line in sys.stdin it will happily continue taking input.
EDIT
This script will give you any pending characters immediately when you hit ctrl-d with data in the input buffer but it won't know to exit unless you hit a second ctrl-d
import sys
while True:
c = sys.stdin.read(1)
if not c:
break
sys.stdout.write(c)
sys.stdout.flush()

I think this is a python 2.7 problem. I tried your code on Python 3.5 and it works just fine.
Hope this helps!

How to process lines of standard input interactively in Python?

I'd like to use Python as an external process and communicate with it by standard input/standard output. Specifically, I want to
for line in sys.stdin:
result = compute_something(line)
print result
sys.stdout.flush()
The output flush is to get the result back right away, without buffering. I want to do the same with the input--- to process each line right away, without buffering. However, the above code does not respond to each line individually; it waits until a large amount of data is accumulated in the standard input and then processes everything at once.
This is true even if the calling program flushes its standard output with every line. It's also true even if I'm running the above directly on a console. The buffer is in Python, not the calling program.
Moreover, I found that control-D on the console makes Python flush its standard input buffer. (And then I can continue to send more input afterward!) However, that's not useful to me because the calling program can't send the equivalent of control-D at the end of each line.
One more thing: for line in sys.stdin.xreadlines() appears to be equivalent to for line in sys.stdin: they both buffer.
So my question is, how can I write a Python script that does not buffer its input, so that it processes each line of input right away?

(I solved the problem before posting the question, but I think I should share it anyway--- others might encounter this problem and I'm still interested in any comments on why this is happening and what we should all know about how to control Python's or Linux's implicit buffering.)
Here's one way to avoid input buffering:
while True:
line = sys.stdin.readline()
result = compute_something(line)
print result
sys.stdout.flush()
Apparently, .readline() avoids the input buffer while direct iteration and .xreadlines() do not.

stdin should not wait for "CTRL+D"

I got a simple python script which should read from stdin.
So if I redirect a stdout of a program to the stdin to my python script.
But the stuff that's logged by my program to the python script will only "reach" the python script when the program which is logging the stuff gets killed.
But actually I want to handle each line which is logged by my program as soon as it is available and not when my program which should actually run 24/7 quits.
So how can I make this happen? How can I make the stdin not wait for CTRL+D or EOF until they handle data?
Example
# accept_stdin.py
import sys
import datetime
for line in sys.stdin:
print datetime.datetime.now().second, line
# print_data.py
import time
print "1 foo"
time.sleep(3)
print "2 bar"
# bash
python print_data.py | python accept_stdin.py

Like all file objects, the sys.stdin iterator reads input in chunks; even if a line of input is ready, the iterator will try to read up to the chunk size or EOF before outputting anything. You can work around this by using the readline method, which doesn't have this behavior:
while True:
line = sys.stdin.readline()
if not line:
# End of input
break
do_whatever_with(line)
You can combine this with the 2-argument form of iter to use a for loop:
for line in iter(sys.stdin.readline, ''):
do_whatever_with(line)
I recommend leaving a comment in your code explaining why you're not using the regular iterator.

It is also an issue with your producer program, i.e. the one you pipe stdout to your python script.
Indeed, as this program only prints and never flushes, the data it prints is kept in the internal program buffers for stdout and not flushed to the system.
Add sys.stdout.flush() call right after you print statement in print_data.py.
You see the data when you quit the program as it automatically flushes on exit.
See this question for explanation,

As said by #user2357112 you need to use:
for line in iter(sys.stdin.readline, ''):
After that you need to start python with the -u flag to flush stdin and stdout immediately.
python -u print_data.py | python -u accept_stdin.py
You can also specify the flag in the shebang.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.