Forcing a python script to take input from STDIN - python

A python script I need to run takes input only from a file passed as a command line argument, like so:
$ markdown.py input_file
Is there any way to get it to accept input from STDIN instead? I want to be able to do this through Bash, without significantly modifying the python script:
$ echo "Some text here" | markdown.py
If I have to modify the Python script, how would I go about it?
(EDIT: Here is the script that is parsing the command line options.)

I'm not sure how portable it is, but on Unix-y systems you can name /dev/stdin as your file:
$ echo -n hi there | wc /dev/stdin
0 2 8 /dev/stdin

Make sure this is near the top of the file:
import sys
Then look for something like this:
filename = sys.argv[1]
f = open(filename)
and replace it with this:
f = sys.stdin
It's hard to be more specific without seeing the script that you're starting with.

In the code you have a line like this:
if not len(args) == 1:
What you could do there is to check if you don't have a filename and instead either use "/dev/stdin" (on a system that allows it).
Another solution is to just replace:
if not len(args) == 1:
parser.print_help()
return None, None
else:
input_file = args[0]
with
if not len(args) == 1:
input_file = sys.stdin
else:
input_file = open(args[0])
That means of course that the returned "input_file" is no longer a file name but a file object, which means further modifications in the calling function.
First solution is less modifications but more platform specific, second is more work, but should work on more systems.

I'm guessing from the details of your question that you're asking about Python-Markdown, so I tracked down the relevant line in the source code for you: to do it Daniel's way, in line 443 of markdown/__init__.py, you'd want to replace
input_file = codecs.open(input, mode="r", encoding=encoding)
with
input_file = codecs.EncodedFile(sys.stdin, encoding)
Although then you wouldn't be able to actually process files afterwards, so for a more generally useful hack, you could put in a conditional:
if input:
input_file = codecs.open(input, mode="r", encoding=encoding)
else:
input_file = codecs.EncodedFile(sys.stdin, encoding)
and then you'd have to adjust markdown/commandline.py to not quit if it isn't given a filename: change lines 72-73
parser.print_help()
return None, None
to
input_file = None
The point is, it's not really a simple thing to do. At this point I was going to suggest using a special file like Mark Rushakoff did, if he hadn't beaten me to it ;-)

I suggest going here:
http://codaset.com/repo/python-markdown/tickets/new
And submitting a ticket requesting them to add the feature. It should be straightforward for them and so they might be willing to go ahead and do it.

In bash, you can also use process substitution:
markdown.py <(echo "Some text here")
For a single input /dev/stdin works, but process substitution also applies for several inputs (and even outputs)

Related

Running .py file with and argument in .bat

The problem: I want to iterate over folder in search of certain file type, then execute it with a program and the name.ext as argument, and then run python script that changes the output name of the first program.
I know there is probably a better way to do the above, but the way I thought of was this:
[BAT]
for /R "C:\..\folder" %%a IN (*.extension) do ( SET name=%%a "C:\...\first_program.exe" "%%a" "C:\...\script.py" "%name%" )
[PY]
import io
import sys
def rename(i):
name = i
with open('my_file.txt', 'r') as file:
data = file.readlines()
data[40] ='"C:\\\\Users\\\\UserName\\\\Desktop\\\\folder\\\\folder\\\\' + name + '"\n'
with open('my_file.txt', 'w') as file:
file.writelines( data )
if __name__ == "__main__":
rename(sys.argv[1])
Expected result: I wish the python file changed the name, but after putting it once into the console it seems to stay with the script. The BAT does not change it and it bothers me.
PS. If there is a better way, I'll be glad to get to know it.
This is the linux bash version, I am sure you can change the loop etc to make it work as batch file instead of your *.exe I use cat as a generic input output example
#! /bin/sh
for f in *.txt
do
suffix=".txt"
name=${f%$suffix}
cat $f > tmp.dat
awk -v myName=$f '{if(NR==5) print $0 myName; else print $0 }' tmp.dat > $name.dat
done
This produces "unique" output *.dat files named after the input *.txt files. The files are treated by cat (virtually your *.exe) and the output is put into a temorary file. Eventually, this is handled by awk changing line 5 here. with the output placed in the unique file, as mentioned above.

Python foo > bar(input file, output file)

It's propably very basic question but I couldn't find any answer. Right now I have something like:
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as input,open(outFile,'w+') as out:
#dosomething
I can run it with
./modulname foo bar (working). How can I change it so it will work with /.modulname foo > bar? (right now it gives me following error).
./pagereport.py today.log > sample.txt
Traceback (most recent call last):
File "./pagereport.py", line 7, in <module>
outFile = sys.argv[2]
IndexError: list index out of range
You could skip the second open (out) and instead use sys.stdout to write to.
If you want to be able to use both ways of calling it, argparse has a comfortable way of doing that with add_argument by combining type= to a file open for writing and making sys.stdout its default.
When you do:
./modulname foo > bar
> is acted upon by shell, and duplicates the STDOUT stream (FD 1) to the file bar. This happens before the command even runs, so no, you can't pass the command like that and have bar available inside the Python script.
If you insist on using >, a poor man's solution would be to make the arguments a single string, and do some string processing inside, something like:
./modulname 'foo >bar'
And inside your script:
infile, outfile = map(lambda x: x.strip(), sys.argv[1].split('>'))
Assuming no filename have whitespaces, take special treatment like passing two arguments in that case.
Also, take a look at the argparse module for more flexible argument parsing capabilities.
What error have you got?
import sys
inFile = sys.argv[1]
outFile = sys.argv[2]
with open(inFile, 'r+') as in_put ,open(outFile,'w+') as out:
buff = in_put.read()
out.write(buff)
I try to run you code, but you have no import sys , so after fixed it as above . I can run it as a simple cp command.
python p4.py p4.py p4.py-bk

Issues calling awk from within Python using subprocess.call

Having some issues calling awk from within Python. Normally, I'd do the following to call the command in awk from the command line.
Open up command line, in admin mode or not.
Change my directory to awk.exe, namely cd R\GnuWin32\bin
Call awk -F "," "{ print > (\"split-\" $10 \".csv\") }" large.csv
My command is used to split up the large.csv file based on the 10th column into a number of files named split-[COL VAL HERE].csv. I have no issues running this command. I tried to run the same code in Python using subprocess.call() but I'm having some issues. I run the following code:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', '\",\"',
'\"{ print > (\\"split-\\" $10 \\".csv\\") }\"', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')
and clearly, something is running when I execute the function (CPU usage, etc) but when I go to check C:/R/GnuWin32/bin/ there are no split files in the directory. Any idea on what's going wrong?
As I stated in my previous answer that was downvoted, you overprotect the arguments, making awk argument parsing fail.
Since there was no comment, I supposed there was a typo but it worked... So I suppose that's because I should have strongly suggested a full-fledged python solution, which is the best thing to do here (as stated in my previous answer)
Writing the equivalent in python is not trivial as we have to emulate the way awk opens files and appends to them afterwards. But it is more integrated, pythonic and handles quoting properly if quoting occurs in the input file.
I took the time to code & test it:
def split_ByInputColumn():
# get rid of the old data from previous runs
for f in glob.glob("split-*.csv"):
os.remove(f)
open_files = dict()
with open('large.csv') as f:
cr = csv.reader(f,delimiter=',')
for r in cr:
tenth_row = r[9]
filename = "split-{}.csv".format(tenth_row)
if not filename in open_files:
handle = open(filename,"wb")
open_files[filename] = (handle,csv.writer(handle,delimiter=','))
open_files[filename][1].writerow(r)
for f,_ in open_files.values():
f.close()
split_ByInputColumn()
in detail:
read the big file as csv (advantage: quoting is handled properly)
compute the destination filename
if filename not in dictionary, open it and create csv.writer object
write the row in the corresponding dictionary
in the end, close file handles
Aside: My old solution, using awk properly:
import subprocess
def split_ByInputColumn():
subprocess.call(['awk.exe', '-F', ',',
'{ print > ("split-" $10 ".csv") }', 'large.csv'],cwd = 'some_directory')
Someone else posted an answer (and then subsequently deleted it), but the issue was that I was over-protecting my arguments. The following code works:
def split_ByInputColumn():
subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', ',',
'{ print > (\"split-\" $10 \".csv\") }', 'large.csv'],
cwd = 'C:/R/GnuWin32/bin/')

Running grep through Python - doesn't work

I have some code like this:
f = open("words.txt", "w")
subprocess.call(["grep", p, "/usr/share/dict/words"], stdout=f)
f.close()
I want to grep the MacOs dictionary for a certain pattern and write the results to words.txt. For example, if I want to do something like grep '\<a.\>' /usr/share/dict/words, I'd run the above code with p = "'\<a.\>'". However, the subprocess call doesn't seem to work properly and words.txt remains empty. Any thoughts on why that is? Also, is there a way to apply regex to /usr/share/dict/words without calling a grep-subprocess?
edit:
When I run grep '\<a.\>' /usr/share/dict/words in my terminal, I get words like: aa
ad
ae
ah
ai
ak
al
am
an
ar
as
at
aw
ax
ay as results in the terminal (or a file if I redirect them there). This is what I expect words.txt to have after I run the subprocess call.
Like #woockashek already commented, you are not getting any results because there are no hits on '\<a.\>' in your input file. You are probably actually hoping to find hits for \<a.\> but then obviously you need to omit the single quotes, which are messing you up.
Of course, Python knows full well how to look for a regex in a file.
import re
rx = re.compile(r'\ba.\b')
with open('/usr/share/dict/words', 'Ur') as reader, open('words.txt', 'w') as writer:
for line in reader:
if rx.search(line):
print(line, file=writer, end='')
The single quotes here are part of Python's string syntax, just like the single quotes on the command line are part of the shell's syntax. In neither case are they part of the actual regular expression you are searching for.
The subprocess.Popen documentation vaguely alludes to the frequently overlooked fact that the shell's quoting is not necessary or useful when you don't have shell=True (which usually you should avoid anyway, for this and other reasons).
Python unfortunately doesn't support \< and \> as word boundary operators, so we have to use (the functionally equivalent) \b instead.
The standard input and output channels for the process started by call() are bound to the parent’s input and output. That means the calling programm cannot capture the output of the command. Use check_output() to capture the output for later processing:
import subprocess
f = open("words.txt", "w")
output = subprocess.check_output(['grep', p ,'-1'])
file.write(output)
print output
f.close()
PD: I hope it works, i cant check the answer because i have not MacOS to try it.

Calling configuration file ID into Linux Command with Date Time from Python

I'm trying to write a script to get the following outputs to a folder (YYYYMMDDHHMMSS = current date and time) using a Linux command in Python, with the ID's in a configutation file
1234_YYYYMMDDHHMMSS.txt
12345_YYYYMMDDHHMMSS.txt
12346_YYYYMMDDHHMMSS.txt
I have a config file with the list of ID's
id1 = 1234
id2 = 12345
id3 = 123456
I want to be able to loop through these in python and incorporate them into a linux command.
Currently, my linux commands are hardcoded in python as such
import subprocess
import datetime
now = datetime.datetime.now()
subprocess.call('autorep -J 1234* -q > /home/test/output/1234.txt', shell=True)
subprocess.call('autorep -J 12345* -q > /home/test/output/12345.txt', shell=True)
subprocess.call('autorep -J 123456* -q > /home/test/output/123456.txt', shell=True)
print now.strftime("%Y%m%d%H%M%S")
The datetime is defined, but doesn't do anything currently, except print it to the console, when I want to incorporate it into the output txt file. However, I want to be able to write a loop to do something like this
subprocess.call('autorep -J id1* -q > /home/test/output/123456._now.strftime("%Y%m%d%H%M%S").txt', shell=True)
subprocess.call('autorep -J id2* -q > /home/test/output/123456._now.strftime("%Y%m%d%H%M%S").txt', shell=True)
subprocess.call('autorep -J id3* -q > /home/test/output/123456._now.strftime("%Y%m%d%H%M%S").txt', shell=True)
I know that I need to use ConfigParser and currently have been this piece written which simply prints the ID's from the configuration file to the console.
from ConfigParser import SafeConfigParser
import os
parser = SafeConfigParser()
parser.read("/home/test/input/ReportConfig.txt")
def getSystemID():
for section_name in parser.sections():
print
for key, value in parser.items(section_name):
print '%s = %s' % (key,value)
print
getSystemID()
But as mentioned in the beggining of the post, my goal is to be able to loop through the ID's, and incorporate them into my linux command while adding the datetime format to the end of the file. I'm thinking all I need is some kind of while loop in the above function in order to get the type of output I want. However, I'm not sure how to call the ID's and the datetime into a linux command.
So far you have most of what you need, you are just missing a few things.
First, I think using ConfigParser is overkill for this. But it's simple enough so lets continue with it. Lets change getSystemID to a generator returning your IDs instead of printing them out, its just a one line change.
parser = SafeConfigParser()
parser.read('mycfg.txt')
def getSystemID():
for section_name in parser.sections():
for key, value in parser.items(section_name):
yield key, value
With a generator we can use getSystemID in a loop directly, now we need to pass this on to the subprocess call.
# This is the string of the current time, what we add to the filename
now = datetime.datetime.now().strftime('%Y%m%d%H%M%S')
# Notice we can iterate over ids / idnumbers directly
for name, number in getSystemID():
print name, number
Now we need to build the subprocess call. The bulk of your problem above was knowing how to format strings, the syntax is described here.
I'm also going to make two notes on how you use subprocess.call. First, pass a list of arguments instead of a long string. This helps python know what arguments to quote so you don't have to worry about it. You can read about it in the subprocess and shlex documentation.
Second, you redirect the output using > in the command and (as you noticed) need shell=True for this to work. Python can redirect for you, and you should use it.
To pick up where I left off above in the foor loop.
for name, number in getSystemID():
# Make the filename to write to
outfile = '/home/test/output/{0}_{1}.txt'.format(number, now)
# open the file for writing
with open(outfile, 'w') as f:
# notice the arguments are in a list
# stdout=f redirects output to the file f named outfile
subprocess.call(['autorep', '-J', name + '*', '-q'], stdout=f)
You can insert the datetime using Python's format instruction.
For example, you could create a new file with the 1234 prefix and the datime stamp like this:
new_file = open("123456.{0}".format(datetime.datetime.now()), 'w+')
I am not sure if I understood what your are looking for, but I hope this helps.

Categories

Resources