Python foo > bar(input file, output file)

Python foo > bar(input file, output file) - python

It's propably very basic question but I couldn't find any answer. Right now I have something like:
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as input,open(outFile,'w+') as out:
#dosomething
I can run it with
./modulname foo bar (working). How can I change it so it will work with /.modulname foo > bar? (right now it gives me following error).
./pagereport.py today.log > sample.txt
Traceback (most recent call last):
File "./pagereport.py", line 7, in <module>
outFile = sys.argv[2]
IndexError: list index out of range

You could skip the second open (out) and instead use sys.stdout to write to.
If you want to be able to use both ways of calling it, argparse has a comfortable way of doing that with add_argument by combining type= to a file open for writing and making sys.stdout its default.

When you do:
./modulname foo > bar
> is acted upon by shell, and duplicates the STDOUT stream (FD 1) to the file bar. This happens before the command even runs, so no, you can't pass the command like that and have bar available inside the Python script.
If you insist on using >, a poor man's solution would be to make the arguments a single string, and do some string processing inside, something like:
./modulname 'foo >bar'
And inside your script:
infile, outfile = map(lambda x: x.strip(), sys.argv[1].split('>'))
Assuming no filename have whitespaces, take special treatment like passing two arguments in that case.
Also, take a look at the argparse module for more flexible argument parsing capabilities.

What error have you got?
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as in_put ,open(outFile,'w+') as out:
buff = in_put.read()
out.write(buff)
I try to run you code, but you have no import sys , so after fixed it as above . I can run it as a simple cp command.
python p4.py p4.py p4.py-bk

Related

For some reason sys.argv() is not accepting comman lines. For example I type in hello in the command line sys.argv only show the filename [duplicate]

How do I read from stdin? Some code golf challenges require using stdin for input.

Use the fileinput module:
import fileinput
for line in fileinput.input():
pass
fileinput will loop through all the lines in the input specified as file names given in command-line arguments, or the standard input if no arguments are provided.
Note: line will contain a trailing newline; to remove it use line.rstrip().

There's a few ways to do it.
sys.stdin is a file-like object on which you can call functions read or readlines if you want to read everything or you want to read everything and split it by newline automatically. (You need to import sys for this to work.)
If you want to prompt the user for input, you can use raw_input in Python 2.X, and just input in Python 3.
If you actually just want to read command-line options, you can access them via the sys.argv list.
You will probably find this Wikibook article on I/O in Python to be a useful reference as well.

import sys
for line in sys.stdin:
print(line)
Note that this will include a newline character at the end. To remove the newline at the end, use line.rstrip() as #brittohalloran said.

Python also has built-in functions input() and raw_input(). See the Python documentation under Built-in Functions.
For example,
name = raw_input("Enter your name: ") # Python 2.x
or
name = input("Enter your name: ") # Python 3

Here's from Learning Python:
import sys
data = sys.stdin.readlines()
print "Counted", len(data), "lines."
On Unix, you could test it by doing something like:
% cat countlines.py | python countlines.py
Counted 3 lines.
On Windows or DOS, you'd do:
C:\> type countlines.py | python countlines.py
Counted 3 lines.

How do you read from stdin in Python?
I'm trying to do some of the code golf challenges, but they all require the input to be taken from stdin. How do I get that in Python?
You can use:
sys.stdin - A file-like object - call sys.stdin.read() to read everything.
input(prompt) - pass it an optional prompt to output, it reads from stdin up to the first newline, which it strips. You'd have to do this repeatedly to get more lines, at the end of the input it raises EOFError. (Probably not great for golfing.) In Python 2, this is raw_input(prompt).
open(0).read() - In Python 3, the builtin function open accepts file descriptors (integers representing operating system IO resources), and 0 is the descriptor of stdin. It returns a file-like object like sys.stdin - probably your best bet for golfing. In Python 2, this is io.open.
open('/dev/stdin').read() - similar to open(0), works on Python 2 and 3, but not on Windows (or even Cygwin).
fileinput.input() - returns an iterator over lines in all files listed in sys.argv[1:], or stdin if not given. Use like ''.join(fileinput.input()).
Both sys and fileinput must be imported, respectively, of course.
Quick sys.stdin examples compatible with Python 2 and 3, Windows, Unix
You just need to read from sys.stdin, for example, if you pipe data to stdin:
$ echo foo | python -c "import sys; print(sys.stdin.read())"
foo
We can see that sys.stdin is in default text mode:
>>> import sys
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
file example
Say you have a file, inputs.txt, we can accept that file and write it back out:
python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
Longer answer
Here's a complete, easily replicable demo, using two methods, the builtin function, input (use raw_input in Python 2), and sys.stdin. The data is unmodified, so the processing is a non-operation.
To begin with, let's create a file for inputs:
$ python -c "print('foo\nbar\nbaz')" > inputs.txt
And using the code we've already seen, we can check that we've created the file:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
foo
bar
baz
Here's the help on sys.stdin.read from Python 3:
read(size=-1, /) method of _io.TextIOWrapper instance
Read at most n characters from stream.
Read from underlying buffer until we have n characters or we hit EOF.
If n is negative or omitted, read until EOF.
Builtin function, input (raw_input in Python 2)
The builtin function input reads from standard input up to a newline, which is stripped (complementing print, which adds a newline by default.) This occurs until it gets EOF (End Of File), at which point it raises EOFError.
Thus, here's how you can use input in Python 3 (or raw_input in Python 2) to read from stdin - so we create a Python module we call stdindemo.py:
$ python -c "print('try:\n while True:\n print(input())\nexcept EOFError:\n pass')" > stdindemo.py
And let's print it back out to ensure it's as we expect:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < stdindemo.py
try:
while True:
print(input())
except EOFError:
pass
Again, input reads up until the newline and essentially strips it from the line. print adds a newline. So while they both modify the input, their modifications cancel. (So they are essentially each other's complement.)
And when input gets the end-of-file character, it raises EOFError, which we ignore and then exit from the program.
And on Linux/Unix, we can pipe from cat:
$ cat inputs.txt | python -m stdindemo
foo
bar
baz
Or we can just redirect the file from stdin:
$ python -m stdindemo < inputs.txt
foo
bar
baz
We can also execute the module as a script:
$ python stdindemo.py < inputs.txt
foo
bar
baz
Here's the help on the builtin input from Python 3:
input(prompt=None, /)
Read a string from standard input. The trailing newline is stripped.
The prompt string, if given, is printed to standard output without a
trailing newline before reading input.
If the user hits EOF (*nix: Ctrl-D, Windows: Ctrl-Z+Return), raise EOFError.
On *nix systems, readline is used if available.
sys.stdin
Here we make a demo script using sys.stdin. The efficient way to iterate over a file-like object is to use the file-like object as an iterator. The complementary method to write to stdout from this input is to simply use sys.stdout.write:
$ python -c "print('import sys\nfor line in sys.stdin:\n sys.stdout.write(line)')" > stdindemo2.py
Print it back out to make sure it looks right:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < stdindemo2.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
And redirecting the inputs into the file:
$ python -m stdindemo2 < inputs.txt
foo
bar
baz
Golfed into a command:
$ python -c "import sys; sys.stdout.write(sys.stdin.read())" < inputs.txt
foo
bar
baz
File Descriptors for Golfing
Since the file descriptors for stdin and stdout are 0 and 1 respectively, we can also pass those to open in Python 3 (not 2, and note that we still need the 'w' for writing to stdout).
If this works on your system, it will shave off more characters.
$ python -c "open(1,'w').write(open(0).read())" < inputs.txt
baz
bar
foo
Python 2's io.open does this as well, but the import takes a lot more space:
$ python -c "from io import open; open(1,'w').write(open(0).read())" < inputs.txt
foo
bar
baz
Addressing other comments and answers
One comment suggests ''.join(sys.stdin) for golfing but that's actually longer than sys.stdin.read() - plus Python must create an extra list in memory (that's how str.join works when not given a list) - for contrast:
''.join(sys.stdin)
sys.stdin.read()
The top answer suggests:
import fileinput
for line in fileinput.input():
pass
But, since sys.stdin implements the file API, including the iterator protocol, that's just the same as this:
import sys
for line in sys.stdin:
pass
Another answer does suggest this. Just remember that if you do it in an interpreter, you'll need to do Ctrl-d if you're on Linux or Mac, or Ctrl-z on Windows (after Enter) to send the end-of-file character to the process. Also, that answer suggests print(line) - which adds a '\n' to the end - use print(line, end='') instead (if in Python 2, you'll need from __future__ import print_function).
The real use-case for fileinput is for reading in a series of files.

The answer proposed by others:
for line in sys.stdin:
print line
is very simple and pythonic, but it must be noted that the script will wait until EOF before starting to iterate on the lines of input.
This means that tail -f error_log | myscript.py will not process lines as expected.
The correct script for such a use case would be:
while 1:
try:
line = sys.stdin.readline()
except KeyboardInterrupt:
break
if not line:
break
print line
UPDATE
From the comments it has been cleared that on python 2 only there might be buffering involved, so that you end up waiting for the buffer to fill or EOF before the print call is issued.

This will echo standard input to standard output:
import sys
line = sys.stdin.readline()
while line:
print line,
line = sys.stdin.readline()

Building on all the anwers using sys.stdin, you can also do something like the following to read from an argument file if at least one argument exists, and fall back to stdin otherwise:
import sys
f = open(sys.argv[1]) if len(sys.argv) > 1 else sys.stdin
for line in f:
# Do your stuff
and use it as either
$ python do-my-stuff.py infile.txt
or
$ cat infile.txt | python do-my-stuff.py
or even
$ python do-my-stuff.py < infile.txt
That would make your Python script behave like many GNU/Unix programs such as cat, grep and sed.

argparse is an easy solution
Example compatible with both Python versions 2 and 3:
#!/usr/bin/python
import argparse
import sys
parser = argparse.ArgumentParser()
parser.add_argument('infile',
default=sys.stdin,
type=argparse.FileType('r'),
nargs='?')
args = parser.parse_args()
data = args.infile.read()
You can run this script in many ways:
1. Using stdin
echo 'foo bar' | ./above-script.py
  or shorter by replacing echo by here string:
./above-script.py <<< 'foo bar'
2. Using a filename argument
echo 'foo bar' > my-file.data
./above-script.py my-file.data
3. Using stdin through the special filename -
echo 'foo bar' | ./above-script.py -

The following chip of code will help you (it will read all of stdin blocking unto EOF, into one string):
import sys
input_str = sys.stdin.read()
print input_str.split()

I am pretty amazed no one had mentioned this hack so far:
python -c "import sys; set(map(sys.stdout.write,sys.stdin))"
in python2 you can drop the set() call, but it would work either way

Try this:
import sys
print sys.stdin.read().upper()
and check it with:
$ echo "Hello World" | python myFile.py

You can read from stdin and then store inputs into "data" as follows:
data = ""
for line in sys.stdin:
data += line

Read from sys.stdin, but to read binary data on Windows, you need to be extra careful, because sys.stdin there is opened in text mode and it will corrupt \r\n replacing them with \n.
The solution is to set mode to binary if Windows + Python 2 is detected, and on Python 3 use sys.stdin.buffer.
import sys
PY3K = sys.version_info >= (3, 0)
if PY3K:
source = sys.stdin.buffer
else:
# Python 2 on Windows opens sys.stdin in text mode, and
# binary data that read from it becomes corrupted on \r\n
if sys.platform == "win32":
# set sys.stdin to binary mode
import os, msvcrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
source = sys.stdin
b = source.read()

I use the following method, it returns a string from stdin (I use it for json parsing).
It works with pipe and prompt on Windows (not tested on Linux yet).
When prompting, two line breaks indicate end of input.
def get_from_stdin():
lb = 0
stdin = ''
for line in sys.stdin:
if line == "\n":
lb += 1
if lb == 2:
break
else:
lb = 0
stdin += line
return stdin

For Python 3 that would be:
# Filename e.g. cat.py
import sys
for line in sys.stdin:
print(line, end="")
This is basically a simple form of cat(1), since it doesn't add a newline after each line. You can use this (after You marked the file executable using chmod +x cat.py such as:
echo Hello | ./cat.py

Since Python 3.8 you can use assignment expression:
while (line := input()):
print(line)

The problem I have with solution
import sys
for line in sys.stdin:
print(line)
is that if you don't pass any data to stdin, it will block forever. That's why I love this answer: check if there is some data on stdin first, and then read it. This is what I ended up doing:
import sys
import select
# select(files to read from, files to write to, magic, timeout)
# timeout=0.0 is essential b/c we want to know the asnwer right away
if select.select([sys.stdin], [], [], 0.0)[0]:
help_file_fragment = sys.stdin.read()
else:
print("No data passed to stdin", file=sys.stderr)
sys.exit(2)

When using -c command, as a tricky way, instead of reading the stdin (and more flexible in some cases) you can pass a shell script command as well to your python command by putting the shell command in quotes within a parenthesis started by $ sign.
e.g.
python3 -c "import sys; print(len(sys.argv[1].split('\n')))" "$(cat ~/.goldendict/history)"
This will count the number of lines from goldendict's history file.

I had some issues when getting this to work for reading over sockets piped to it. When the socket got closed it started returning empty string in an active loop. So this is my solution to it (which I only tested in linux, but hope it works in all other systems)
import sys, os
sep=os.linesep
while sep == os.linesep:
data = sys.stdin.readline()
sep = data[-len(os.linesep):]
print '> "%s"' % data.strip()
So if you start listening on a socket it will work properly (e.g. in bash):
while :; do nc -l 12345 | python test.py ; done
And you can call it with telnet or just point a browser to localhost:12345

Regarding this:
for line in sys.stdin:
I just tried it on python 2.7 (following someone else's suggestion) for a very large file, and I don't recommend it, precisely for the reasons mentioned above (nothing happens for a long time).
I ended up with a slightly more pythonic solution (and it works on bigger files):
with open(sys.argv[1], 'r') as f:
for line in f:
Then I can run the script locally as:
python myscript.py "0 1 2 3 4..." # can be a multi-line string or filename - any std.in input will work

There is
os.read(0, x)
which reads xbytes from 0 which represents stdin. This is an unbuffered read, more low level than sys.stdin.read()

Nonblocking, bytemode, stdin -> stdout:
# pipe.py
import os, sys, time
os.set_blocking(0, False)
sys.stdin = os.fdopen(0, 'rb', 0)
sys.stdout = os.fdopen(1, 'wb', 0)
while 1:
time.sleep(.1)
try: out = sys.stdin.read()
except:
sys.stdout.write(b"E")
continue
if out is None:
sys.stdout.write(b"N")
continue
if not out:
sys.stdout.write(b"_")
break
# working..
out = b"<" + out + b">"
sys.stdout.write(out)
sys.stdout.write(b".\n")
Usage:
$ for i in 1 2 3; do sleep 1; printf "===$i==="; done | python3 pipe.py
NNNNNNNNN<===1===>NNNNNNNNN<===2===>NNNNNNNNN<===3===>_.
Minimal code:
import os, sys
os.set_blocking(0, False)
fd0 = os.fdopen(0, 'rb', 0)
fd1 = os.fdopen(1, 'wb', 0)
while 1:
bl = fd0.read()
if bl is None: continue
if not bl: break
fd1.write(bl)
Tested on Linux, Python 3.9.2

Worth saying for short command line chaining input is preferred over fileinput and sys.stdin as it requires no import, and is shorter to type.
$ echo hello word | python3 -c "print(input().upper())"
HELLO WORD

Python: subprocess.Popen variable behavior

I want to read all files in a directory and pass them via command line to another program.
The following is a part of my code (for one file here) which does not seem to work, and I don't really understand why it would not work.
My code (with a bit of debug print):
# -*- coding: iso-8859-15 -*-
# Python 3
avidemux_dir = "C:\\Program Files (x86)\\Avi Demux\\avidemux.exe"
start_dir = "F:\\aaa" # without ending backslash!
extension = ".mpg"
import os
import subprocess
for dirpath, dirnames, filenames in os.walk(start_dir):
if filenames:
first_file = os.path.join(dirpath, filenames[0])
test2 = "--load " + first_file
print(dirpath) #results in: F:\aaa\av01
print(first_file) #results in: F:\aaa\av01\av01.mpg
print(test2) #results in: --load F:\aaa\av01\av01.mpg
p1 = subprocess.Popen([avidemux_dir, "--load", first_file])
p2 = subprocess.Popen([avidemux_dir, test2])
For this example, avidemux will work (load the correct file) only for p1. p2 does not work.
Why is that?
The commandline example that works in .bat:
avidemux.exe --load F:\aaa\av01\av01.mpg
I really would like to have it all in one string like in p2 because I join a larger list of files together to one big string with the correct variables for avidemux.

shlex is one approach, but from the files paths it's obvious you're running on Windows, and shlex assumes conventions used in Unix-ish shells. They can get you in trouble on Windows.
As the docs say, the underlying Windows API call takes a single string as an argument, so on Windows you're generally much better off passing a single string to Popen().
Oops! I see you've already discovered that. But at least now you know why ;-)

use
import shlex
p2 = subprocess.Popen([avidemux_dir] + shlex.split(test2))
see the docs about command args of Popen.

Ah, you're passing a string of two arguments there. You need to split it, if necessary using shlex.split:
p2 = subprocess.Popen([avidemux_dir, *shlex.split(test2)])
Or just pass a string:
p2 = subprocess.Popen(avidemux_dir + ' ' + test2, shell=True)

Just stumbled across the solution: Not using a list, when doing something like that.
Solution:
test2 = avidemux_dir + " --load " + first_file
and
p2 = subprocess.Popen(test2) # no more list but the pure string.

How to write a python script that operates on a list of input files?

I have a program that takes an input file:
python subprogram.py < input.txt > out.txt
If I have a number of input files, how can I write a single python program runs on those inputs and produces a single output? I believe the program should run like:
python program.py < input_1.txt input_2.txt > out.txt
And the program itself should look something like:
from subprogram import MyClass
import sys
if __name__ == '__main__':
myclass = MyClass()
myclass.run()

Have a look at the fileinput module
import fileinput
for line in fileinput.input():
process(line)
This iterates over the lines of all files listed in sys.argv[1:], defaulting to sys.stdin if the list is empty. If a filename is '-', it is also replaced by sys.stdin. To specify an alternative list of filenames, pass it as the first argument to input(). A single file name is also allowed.

Make your program accept command line parameters:
python program.py input_1.txt input_2.txt > out.txt
And you can access them like:
from subprogram import MyClass
import sys
if __name__ == '__main__':
class = MyClass()
class.run(sys.argv)
The way you're using is not about Python, it's about your shell. You are just redirect standart input/output to files. If you want to achieve that:
cat input1.txt input2.txt | python subprogram.py > out.txt

Let your shell do the work for you:
cat input_1.txt input_2.txt | python program.py > out.text
The cat command will concatenate the two input files together and your python program can just read from stdin and treat them like one big file.

open file in Python using "python scriptname.py filename"

I am a novice to python scripting. I have a script that I am hoping to run on all files in a directory. I found very helpful advice in this thread. However, I am having difficulty in determining how to format the actual script so that it retrieves the filename of the file that I want to run the script on in the command prompt, i.e. "python script.py filename.*" I've tried my best at looking through the Python documentation and the forums in this site and have come up empty (I probably just don't know what keywords I should be searching).
I am currently able to run my script on one file at a time, and output it with a new file extension using the following code, but this way I can only do one file at a time. I'd like to be able to iterate over the whole directory using 'GENE.*':
InFileName = 'GENE.303'
InFile = open(InFileName, 'r') #opens a pipeline to the file to be read line by line
OutFileName = InFile + '.phy'
OutFile = open(OutFileName, 'w')
What can I do to the code to allow myself to use an iteration through the directory similar to what is done in this case? Thank you!

You are looking for:
import sys
InFileName = sys.argv[1]
See the documentation.
For something more sophisticated, take a look at the optparse and argparse modules (the latter is preferable but is only available in newer versions of Python).

You have quite a few options to process a list of files using Python:
You can use the shell expansion facilities of your command line to pass more filenames to your script and then iterate the command line arguments:
import sys
def process_file(fname):
with open(fname) as f:
for line in f:
# TODO: implement
print line
for fname in sys.argv[1:]:
process_file(fname)
and call it like:
python my_script.py * # expands to all files in the directory
You can also use the glob module to do this expansion:
import glob
for fname in glob.glob('*'):
process_file(fname)

Forcing a python script to take input from STDIN

A python script I need to run takes input only from a file passed as a command line argument, like so:
$ markdown.py input_file
Is there any way to get it to accept input from STDIN instead? I want to be able to do this through Bash, without significantly modifying the python script:
$ echo "Some text here" | markdown.py
If I have to modify the Python script, how would I go about it?
(EDIT: Here is the script that is parsing the command line options.)

I'm not sure how portable it is, but on Unix-y systems you can name /dev/stdin as your file:
$ echo -n hi there | wc /dev/stdin
0 2 8 /dev/stdin

Make sure this is near the top of the file:
import sys
Then look for something like this:
filename = sys.argv[1]
f = open(filename)
and replace it with this:
f = sys.stdin
It's hard to be more specific without seeing the script that you're starting with.

In the code you have a line like this:
if not len(args) == 1:
What you could do there is to check if you don't have a filename and instead either use "/dev/stdin" (on a system that allows it).
Another solution is to just replace:
if not len(args) == 1:
parser.print_help()
return None, None
else:
input_file = args[0]
with
if not len(args) == 1:
input_file = sys.stdin
else:
input_file = open(args[0])
That means of course that the returned "input_file" is no longer a file name but a file object, which means further modifications in the calling function.
First solution is less modifications but more platform specific, second is more work, but should work on more systems.

I'm guessing from the details of your question that you're asking about Python-Markdown, so I tracked down the relevant line in the source code for you: to do it Daniel's way, in line 443 of markdown/__init__.py, you'd want to replace
input_file = codecs.open(input, mode="r", encoding=encoding)
with
input_file = codecs.EncodedFile(sys.stdin, encoding)
Although then you wouldn't be able to actually process files afterwards, so for a more generally useful hack, you could put in a conditional:
if input:
input_file = codecs.open(input, mode="r", encoding=encoding)
else:
input_file = codecs.EncodedFile(sys.stdin, encoding)
and then you'd have to adjust markdown/commandline.py to not quit if it isn't given a filename: change lines 72-73
parser.print_help()
return None, None
to
input_file = None
The point is, it's not really a simple thing to do. At this point I was going to suggest using a special file like Mark Rushakoff did, if he hadn't beaten me to it ;-)

I suggest going here:
http://codaset.com/repo/python-markdown/tickets/new
And submitting a ticket requesting them to add the feature. It should be straightforward for them and so they might be willing to go ahead and do it.

In bash, you can also use process substitution:
markdown.py <(echo "Some text here")
For a single input /dev/stdin works, but process substitution also applies for several inputs (and even outputs)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python foo > bar(input file, output file) - python

You could skip the second open (out) and instead use sys.stdout to write to. If you want to be able to use both ways of calling it, argparse has a comfortable way of doing that with add_argument by combining type= to a file open for writing and making sys.stdout its default.

Related

For some reason sys.argv() is not accepting comman lines. For example I type in hello in the command line sys.argv only show the filename [duplicate]

Python: subprocess.Popen variable behavior

How to write a python script that operates on a list of input files?

open file in Python using "python scriptname.py filename"

Forcing a python script to take input from STDIN

Categories

Resources