How to set file buffering parameters?

How to set file buffering parameters? - python

Running a long and time consuming number crunching process in the shell with a Python script. In the script, to indicate progress, I have inserted occassional print commands like
#!/usr/bin/env python3
#encoding:utf-8
print('Stage 1 completed')
Triggering the script in the shell by
user#hostname:~/WorkingDirectory$chmod 744 myscript.py && nohup ./myscript.py&
It redirects the output to nohup.out, but I cannot see the output until the entire script is done, probably because of stdout buffering. So in this scenario, how do I somehow adjust the buffering parameters to check the progress periodically? Basically, I want zero buffering, so that as soon a print command is issued in the python script, it will appear on nohup.out. Is that possible?
I know it is a rookie question and in addition to the exact solution, any easy to follow reference to the relevant material (which will help me master the buffering aspects of shell without getting into deeper Kernel or hardware level) will be greatly appreciated too.
If it is important, I am using #54~16.04.1-Ubuntu on x86_64

Python is optimised for reading in and printing out lots of data.
So standard input and output of the Python interpreter are buffered by default.
We can override this behavior some ways:
use interpretator python with option -u.
From man python:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work around this, you will want to use "sys.stdin.readline()" inside a
"while 1:" loop.
Run script in shell:
nohup python -u ./myscript.py&
Or modify shebang line of script to #!/usr/bin/python -u and then run:
nohup ./myscript.py&
use shell command stdbuf for turn off buffering stream
See man stdbuf.
Set unbuffered stream for output:
stdbuf --output=0 nohup ./myscript.py&
Set unbuffered stream for output and errors:
stdbuf -o0 -e0 nohup ./myscript.py&

Related

How to run nohup command from Python script?

I have a simple question. I have tried to search for a solution but there are no answers which would explain what I need.
The question is:
How do I start a nohup command from Python? Basically the idea is, that I have a Python script which prepares my environment and I need it to launch multiple scripts with nohup commands. How do I start a nohup command like nohup python3 my_script.py & from within a running Python script to have that nohup command running even after I log out?
Thank you

You do not need nohup -- not even in shell, and even less so in Python. It does the following things:
Configures the HUP signal to be ignored (rarely relevant: if a process has no handles on a TTY it isn't going to be notified when that TTY exits regardless; the shell only propagates signals to children in interactive mode, not when running scripts).
If stdout is a terminal, redirects it to nohup.out
If stderr is a terminal, redirects it to wherever stdout was already redirected.
Redirects stdin to /dev/null
That's it. There's no reason to use nohup to do any of those things; they're all trivial to do without it:
</dev/null redirects stdin from /dev/null in shell; stdin=subprocess.DEVNULL does so in Python.
>nohup.out redirects stdout to nohup.out in shell; stdout=open('nohup.out', 'w') does so in Python.
2>&1 makes stderr go to the same place as stdout in shell; stderr=subprocess.STDOUT does so in Python.
Because your process isn't attached to the terminal by virtue of the above redirections, it won't implicitly get a HUP when that terminal closes. If you're worried about a signal being sent to the parent's entire process group, however, you can avoid that by splitting off the child into a separate one:
The subprocess.Popen argument start_new_session=True splits the child process into a separate group from the parent in Python, so a parent sent to the process group of the parent as a whole will not be received by the child.
Adding a preexec_fn with signal.signal(signal.SIGHUP, signal.SIG_IGN) is even more explicit that the child should by default ignore a SIGHUP even if one is received.
Putting this all together might look like (if you really do want logs to go to a file named nohup.out -- I would suggest picking a better name):
import subprocess, signal
subprocess.Popen(['python3', 'my_script.py'],
stdin=subprocess.DEVNULL,
stdout=open('nohup.out', 'w'),
stderr=subprocess.STDOUT,
start_new_session=True,
preexec_fn=(lambda: signal.signal(signal.SIGHUP, signal.SIG_IGN)))

What is the meaning of "python3 -u"?

When running a Python file from the command line, you use python3 <file>, but VSCode Code Runner uses python3 -u <file> (by default), so I was wondering:
What's the difference (since after testing I see no visible
difference)?
What is the -u part called?

The -u flag, according to Python's --help statement:
force the binary I/O layers of stdout and stderr to be unbuffered; stdin is always buffered; text I/O layer will be line-buffered; also PYTHONUNBUFFERED=x
This is documented here in the Python docs.
These are known as command line options. There are a number of them, which you can read about using python3 --help.

How to remove output buffering when running Python in Sublime Text 3

How can I remove the output buffering from Sublime Text 3 when I build a Python 3 script? I would like real-time output.
I am using Sublime Text 3 with the Anaconda plugin, Python 3.6 and Linux Mint 18. When I run a simple script using control-b:
print('hello')
I get an instant output in a separate window called 'Build output'. When I use a script with a repeated output, such as:
from time import sleep
count = 0
print('starting')
while True:
print('{} hello'.format(count))
count += 1
sleep(0.5)
Initially I get a blank screen in 'Build output'. Some time later it populates with several hundred lines of output. It looks like the output is being buffered. When the buffer is full, it outputs all at once to the 'Build output' screen.
Edit
Sublime Text allows custom build configurations. The default Python build is for python 2. I entered a build configuration for Python 3 and missed the -u flag. The fix is to put the -u flag in the Python 3 build.
File: Python3.sublime-build
{
"shell_cmd": "/usr/bin/env python3 -u ${file}",
"selector": "source.python",
"file_regex": "^(...*?):([0-9]*):?([0-9]*)",
"working_dir": "${file_path}",
}
Save in sublime_install/Data/Packages/User/Python3.sublime-build

By default the exec command is used to execute the commands in build systems, and the exec command doesn't buffer output at all. There is more information in this answer (which also provides a version of exec that does line buffering) but in short exec launches one thread to handle stdout and one to handle stderr, and both forward whatever data they get to the panel as soon as they get it.
As such, a problem like the one you're describing here is generally caused by the program doing it's own buffering. Depending on the language and platform that you're using, buffering may change from what you expect in unexpected ways:
For example, see this text in the man page for stdout under Linux:
The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output.
In the general case, the solution to this problem would be to modify the program itself to ensure that it's not buffering, and how you would do that depends on the language you're using and the platform that you're on. It could be something as simple as setting an environment variable or as complex as startup code that ensures that regardless of circumstance buffering is set as you expect it to be.
In the specific case of Python, the -u command line argument to the interpreter tells Python to keep things unbuffered:
-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
see man page for details on internal buffering relating to '-u'
The Python.sublime-build that ships with Sublime uses this argument to the python command to ensure that the output is unbuffered, and using that build system works as expected for your sample program.
I don't use the Anaconda package so I'm not sure if it provides it's own build systems or not, but you may want to check the build command that you're using to ensure that it uses -u.

Python, subprocess.check_call() and pipes redirection

Why am I getting list of files when executing this command?
subprocess.check_call("time ls &>/dev/null", shell=True)
If I will paste
time ls &>/dev/null
into the console, I will just get the timings.
OS is Linux Ubuntu.

On debian-like systems, the default shell is dash, not bash. Dash does not support the &> shortcut. To get only the subprocess return code, try:
subprocess.check_call("time ls >/dev/null 2>&1", shell=True)
To get subprocess return code and the timing information but not the directory listing, use:
subprocess.check_call("time ls >/dev/null", shell=True)
Minus, of course, the subprocess return code, this is the same behavior that you would see on the dash command prompt.

The Python version is running under sh, but the console version is running in whatever your default shell is, which is probably either bash or dash. (Your sh may actually be a different shell running in POSIX-compliant mode, but that doesn't make any difference.)
Both bash and dash have builtin time functions, but sh doesn't, so you get /usr/bin/time, which is a normal program. The most important difference this makes is that the time builtin is not running as a subprocess with its own independent stdout and stderr.
Also, sh, bash, and dash all have different redirection syntax.
But what you're trying to do seems wrong in the first place, and you're just getting lucky on the console because two mistakes are canceling out.
You want to get rid of the stdout of ls but keep the stderr of time, but that's not what you asked for. You're trying to redirect both stdout and stderr: that's what >& means on any shell that actually supports it.
So why are you still getting the time stderr? Either (a) your default shell doesn't support >&, or (b) you're using the builtin instead of the program, and you're not redirecting the stderr of the shell itself, or maybe (c) both of the above.
If you really want to do exactly the same thing in Python, with the exact same bugs canceling out in the exact same way, you can run your default shell manually instead of using shell=True. Depending on which reason it was working, that would be either this:
subprocess.check_call([os.environ['SHELL'], '-c', 'time ls &> /dev/null'])
or this:
subprocess.check_call('{} -c time ls &> /dev/null'.format(os.environ(SHELL), shell=True)
But really, why are you doing this at all? If you want to redirect stdout and not stderr, write that:
subprocess.check_call('time ls > /dev/null', shell=True)
Or, better yet, why are you even using the shell in the first place?
subprocess.check_call(['time', 'ls'], stdout=subprocess.devnull)

linux tee is not working with python?

I made a python script which communicates with a web server using an infinite loop.
I want to log every communication data to a file and also monitor them from terminal at same time. so I used tee command like this.
python client.py | tee logfile
however, I got nothing from terminal nor logfile.
the python script is working fine.
what is happening here?
am I missing something?
some advice would be appreciated.
thank you in advance.

From man python:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems
where it matters, also put stdin, stdout and stderr in binary mode. Note
that there is internal buffering in xreadlines(), readlines() and file-
object iterators ("for line in sys.stdin") which is not influenced by
this option. To work around this, you will want to use "sys.stdin.read‐
line()" inside a "while 1:" loop.
So what you can do is:
/usr/bin/python -u client.py >> logfile 2>&1
Or using tee:
python -u client.py | tee logfile

Instead of making it fully unbuffered you can make it linebuffered as it is normally with sys.stdout.reconfigure(line_buffering=True) (after import sys of course).
This was added in 3.7, docs: https://docs.python.org/3/library/io.html#io.TextIOWrapper.reconfigure

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.